Guide to Fine-tuning PyPotteryInk Models

This guide walks you through the process of fine-tuning a PyPotteryInk model for your specific archaeological context.

Prerequisites

Before starting the fine-tuning process, ensure you have:

A GPU with at least 20GB VRAM for training
Python 3.10 or higher
A paired dataset of pencil drawings and their inked versions
Storage space for model checkpoints and training data

Environment Setup

First, clone the repository:

git clone https://github.com/GaParmar/img2img-turbo.git

Install the required dependencies:

pip install -r img2img-turbo/requirements.txt
pip install git+https://github.com/openai/CLIP.git
pip install wandb vision_aided_loss huggingface-hub==0.25.0

Dataset Preparation

To create a training dataset check out the original docs (https://github.com/GaParmar/img2img-turbo/blob/main/docs/training_pix2pix_turbo.md)
Important considerations for dataset preparation:
- Images should be paired (same filename in both folders)
- Standard image formats (jpg, jpeg, png) are supported
- Both pencil and inked versions should be aligned
- Recommended resolution: at least 512x512 pixels
Data requirements:
- Minimum recommended: 10-20 pairs for fine-tuning
- Each drawing should be clean and well-scanned
- Include variety in pottery types and decorations
- Consistent drawing style across the dataset

Setting Up Fine-tuning

To fine-tune a pre-trained model (like “6h-MCG”), you’ll need to modify the base img2img-turbo repository. This enables the use of a pre-trained model as a starting point for your specialized training.

Prepare the Repository:
- Navigate to your cloned img2img-turbo directory
```
cd img2img-turbo
```
Replace Key Files:
- Copy these files from the PyPotteryInk repository’s “fine-tuning” folder into the src folder:
  - pix2pix_turbo.py
  - train_pix2pix_turbo.py
These modified files contain the necessary adaptations to support training from a pre-trained model.

Running Fine-tuning

Initialize Accelerate Environment:
```
accelerate config
```
This will guide you through setting up your training environment. Follow the prompts to configure for your GPU setup.
Start Training:
```
accelerate launch src/train_pix2pix_turbo.py \
    --pretrained_model_name_or_path="6h-MCG.pkl" \
    --output_dir="YOUR_OUTPUT_DIR" \
    --dataset_folder="YOUR_INPUT_DATA" \
    --resolution=512 \
    --train_batch_size=2 \
    --enable_xformers_memory_efficient_attention \
    --viz_freq 25 \
    --track_val_fid \
    --report_to "wandb" \
    --tracker_project_name "YOUR_PROJECT_NAME"
```
Key Parameters:
- pretrained_model_name_or_path: Path to your pre-trained model (e.g., “6h-MCG.pkl”)
- output_dir: Where to save training outputs and checkpoints
- dataset_folder: Location of your training dataset
- resolution: Image resolution (512 recommended)
- train_batch_size: Number of images per training batch
- viz_freq: How often to generate visualization samples
- track_val_fid: Enable FID score tracking
- tracker_project_name: Your Weights & Biases project name
Note: Adjust the batch size based on your GPU memory. Start with 2 and increase if your GPU can handle it.

Important Considerations

Ensure your pre-trained model file (e.g., “6h-MCG.pkl”) is in the correct location
Monitor GPU memory usage during training
Use Weights & Biases (wandb) to track training progress
Check the output directory periodically for sample outputs
Training time will vary based on your GPU and dataset size