Preprocessing Module

Image preparation and analysis tools

Overview

The preprocessing module provides tools for analyzing and adjusting archaeological drawings before conversion. It ensures optimal input quality through statistical analysis and targeted adjustments.

Dataset Analyzer

class DatasetAnalyzer:
    def __init__(self):
        self.metrics = {}
        self.distributions = {}

A comprehensive tool for analysing collections of archaeological drawings, establishing statistical baselines for quality control.

Key Methods

analyze_image

def analyze_image(self, image: Union[str, Image.Image]) -> dict

Extracts key metrics from a single drawing.

Returns

  • mean: Average brightness
  • std: Standard deviation
  • contrast_ratio: Dynamic range measure
  • median: Middle intensity value
  • dynamic_range: Total intensity range
  • entropy: Image information content
  • iqr: Inter-quartile range
  • non_empty_ratio: Drawing density measure

analyze_dataset

def analyze_dataset(
    self, 
    dataset_path: str,
    file_pattern: tuple = ('.png', '.jpg', '.jpeg')
) -> dict

Builds statistical distributions from a collection of drawings.

visualize_distributions_kde

def visualize_distributions_kde(
    self,
    metrics_to_plot: Optional[List[str]] = None,
    save: bool = False
)

Creates KDE plots of metric distributions with statistical annotations.

save_analysis

def save_analysis(self, path: str) -> None

Saves the current analysis results to a file for later use. This is particularly useful when establishing reference metrics for a specific archaeological context or drawing style.

Parameters
path

File path to save the analysis results

Examples
analyzer = DatasetAnalyzer()
stats = analyzer.analyze_dataset("reference_drawings/")
analyzer.save_analysis("reference_metrics.npy")

load_analysis

@classmethod
def load_analysis(cls, path: str) -> 'DatasetAnalyzer'

Class method that loads previously saved analysis results. This allows reuse of established reference metrics without reanalyzing the dataset.

Parameters
path

Path to previously saved analysis file

Returns

Returns a new DatasetAnalyzer instance with loaded analysis results

Examples
# Load previously computed statistics
analyzer = DatasetAnalyzer.load_analysis("reference_metrics.npy")

# Use loaded stats for quality checks
check = check_image_quality("new_drawing.jpg", analyzer.distributions)

These methods enable efficient reuse of analysis results across multiple processing sessions, particularly valuable when working with established archaeological documentation standards or specific site collections.

Process Folder Metrics

def process_folder_metrics(
    input_folder: str,
    model_stats: dict,
    file_extensions: tuple = ('.jpg', '.jpeg', '.png')
) -> None

Batch processes a folder of drawings to align their metrics with reference statistics.

Parameters

input_folder

Directory containing drawings to process

model_stats

Reference statistics from DatasetAnalyzer

file_extensions

Supported file types

Check Image Quality

def check_image_quality(
    image: Union[str, Image.Image],
    model_stats: dict
) -> dict

Evaluates a drawing against reference metrics to identify needed adjustments.

Returns

Returns a dictionary containing:

  • metrics: Current image measurements
  • recommendations: List of suggested adjustments
  • is_compatible: Boolean indicating if adjustments needed

Examples

check = check_image_quality("new_drawing.jpg", reference_stats)
if not check['is_compatible']:
    print("Adjustments needed:", check['recommendations'])

Visualize Metrics Change

def visualize_metrics_change(
    original_metrics: dict,
    adjusted_metrics: dict,
    model_stats: dict,
    metrics_to_plot: Optional[List[str]] = None,
    save: bool = False
) -> None

Creates detailed visualizations comparing original and adjusted metrics against reference distributions.

Parameters

original_metrics

Metrics before adjustment

adjusted_metrics

Metrics after adjustment

model_stats

Reference statistics

metrics_to_plot

Specific metrics to visualize

save

Save plot to file

Examples

visualize_metrics_change(
    original_metrics,
    adjusted_metrics,
    reference_stats,
    metrics_to_plot=['contrast_ratio', 'mean', 'std']
)