Preprocessing Module

Image preparation and analysis tools

Overview

The preprocessing module provides tools for analyzing and adjusting archaeological drawings before conversion. It ensures optimal input quality through statistical analysis and targeted adjustments.

Dataset Analyzer

class DatasetAnalyzer:
    def __init__(self):
        self.metrics = {}
        self.distributions = {}

A comprehensive tool for analysing collections of archaeological drawings, establishing statistical baselines for quality control.

Key Methods

analyze_image

def analyze_image(self, image: Union[str, Image.Image]) -> dict

Extracts key metrics from a single drawing.

Returns

mean: Average brightness
std: Standard deviation
contrast_ratio: Dynamic range measure
median: Middle intensity value
dynamic_range: Total intensity range
entropy: Image information content
iqr: Inter-quartile range
non_empty_ratio: Drawing density measure

analyze_dataset

def analyze_dataset(
    self, 
    dataset_path: str,
    file_pattern: tuple = ('.png', '.jpg', '.jpeg')
) -> dict

Builds statistical distributions from a collection of drawings.

visualize_distributions_kde

def visualize_distributions_kde(
    self,
    metrics_to_plot: Optional[List[str]] = None,
    save: bool = False
)

Creates KDE plots of metric distributions with statistical annotations.

save_analysis

def save_analysis(self, path: str) -> None

Saves the current analysis results to a file for later use. This is particularly useful when establishing reference metrics for a specific archaeological context or drawing style.

Parameters

path: File path to save the analysis results

Examples

analyzer = DatasetAnalyzer()
stats = analyzer.analyze_dataset("reference_drawings/")
analyzer.save_analysis("reference_metrics.npy")

load_analysis

@classmethod
def load_analysis(cls, path: str) -> 'DatasetAnalyzer'

Class method that loads previously saved analysis results. This allows reuse of established reference metrics without reanalyzing the dataset.

Parameters

path: Path to previously saved analysis file

Returns

Returns a new DatasetAnalyzer instance with loaded analysis results

Examples

# Load previously computed statistics
analyzer = DatasetAnalyzer.load_analysis("reference_metrics.npy")

# Use loaded stats for quality checks
check = check_image_quality("new_drawing.jpg", analyzer.distributions)

These methods enable efficient reuse of analysis results across multiple processing sessions, particularly valuable when working with established archaeological documentation standards or specific site collections.

Process Folder Metrics

def process_folder_metrics(
    input_folder: str,
    model_stats: dict,
    file_extensions: tuple = ('.jpg', '.jpeg', '.png')
) -> None

Batch processes a folder of drawings to align their metrics with reference statistics.

Parameters

input_folder: Directory containing drawings to process
model_stats: Reference statistics from DatasetAnalyzer
file_extensions: Supported file types

Apply Recommended Adjustments

def apply_recommended_adjustments(
    image: Union[str, Image.Image],
    model_stats: dict,
    verbose: bool = True
) -> Image.Image

Automatically adjusts a drawing based on statistical analysis.

Parameters

image: Drawing to adjust
model_stats: Reference statistics
verbose: Print adjustment details

Adjustments Applied

Contrast normalization
Brightness alignment
Standard deviation correction
Dynamic range optimization

Examples

adjusted = apply_recommended_adjustments(
    "drawing.jpg",
    reference_stats,
    verbose=True
)

Check Image Quality

def check_image_quality(
    image: Union[str, Image.Image],
    model_stats: dict
) -> dict

Evaluates a drawing against reference metrics to identify needed adjustments.

Returns

Returns a dictionary containing:

metrics: Current image measurements
recommendations: List of suggested adjustments
is_compatible: Boolean indicating if adjustments needed

Examples

check = check_image_quality("new_drawing.jpg", reference_stats)
if not check['is_compatible']:
    print("Adjustments needed:", check['recommendations'])

Visualize Metrics Change

def visualize_metrics_change(
    original_metrics: dict,
    adjusted_metrics: dict,
    model_stats: dict,
    metrics_to_plot: Optional[List[str]] = None,
    save: bool = False
) -> None

Creates detailed visualizations comparing original and adjusted metrics against reference distributions.

Parameters

original_metrics: Metrics before adjustment
adjusted_metrics: Metrics after adjustment
model_stats: Reference statistics
metrics_to_plot: Specific metrics to visualize
save: Save plot to file

Examples

visualize_metrics_change(
    original_metrics,
    adjusted_metrics,
    reference_stats,
    metrics_to_plot=['contrast_ratio', 'mean', 'std']
)