Guide to Fine-tuning PyPotteryInk Models
This guide walks you through the process of fine-tuning a PyPotteryInk model for your specific archaeological context.
Prerequisites
Before starting the fine-tuning process, ensure you have:
- A GPU with at least 20GB VRAM for training
- Python 3.10 or higher
- A paired dataset of pencil drawings and their inked versions
- Storage space for model checkpoints and training data
Environment Setup
- First, clone the repository:
git clone https://github.com/GaParmar/img2img-turbo.git
- Install the required dependencies:
pip install -r img2img-turbo/requirements.txt
pip install git+https://github.com/openai/CLIP.git
pip install wandb vision_aided_loss huggingface-hub==0.25.0
Dataset Preparation
To create a training dataset check out the original docs (https://github.com/GaParmar/img2img-turbo/blob/main/docs/training_pix2pix_turbo.md)
Important considerations for dataset preparation:
- Images should be paired (same filename in both folders)
- Standard image formats (jpg, jpeg, png) are supported
- Both pencil and inked versions should be aligned
- Recommended resolution: at least 512x512 pixels
Data requirements:
- Minimum recommended: 10-20 pairs for fine-tuning
- Each drawing should be clean and well-scanned
- Include variety in pottery types and decorations
- Consistent drawing style across the dataset
Setting Up Fine-tuning
To fine-tune a pre-trained model (like “6h-MCG”), you’ll need to modify the base img2img-turbo repository. This enables the use of a pre-trained model as a starting point for your specialized training.
- Prepare the Repository:
- Navigate to your cloned img2img-turbo directory
cd img2img-turbo
- Replace Key Files:
Copy these files from the PyPotteryInk repository’s “fine-tuning” folder into the src folder:
- pix2pix_turbo.py
- train_pix2pix_turbo.py
Running Fine-tuning
Initialize Accelerate Environment:
accelerate config
This will guide you through setting up your training environment. Follow the prompts to configure for your GPU setup.
Start Training:
accelerate launch src/train_pix2pix_turbo.py \ --pretrained_model_name_or_path="6h-MCG.pkl" \ --output_dir="YOUR_OUTPUT_DIR" \ --dataset_folder="YOUR_INPUT_DATA" \ --resolution=512 \ --train_batch_size=2 \ --enable_xformers_memory_efficient_attention \ --viz_freq 25 \ --track_val_fid \ --report_to "wandb" \ --tracker_project_name "YOUR_PROJECT_NAME"
Key Parameters:
pretrained_model_name_or_path
: Path to your pre-trained model (e.g., “6h-MCG.pkl”)output_dir
: Where to save training outputs and checkpointsdataset_folder
: Location of your training datasetresolution
: Image resolution (512 recommended)train_batch_size
: Number of images per training batchviz_freq
: How often to generate visualization samplestrack_val_fid
: Enable FID score trackingtracker_project_name
: Your Weights & Biases project name
Note: Adjust the batch size based on your GPU memory. Start with 2 and increase if your GPU can handle it.
Important Considerations
- Ensure your pre-trained model file (e.g., “6h-MCG.pkl”) is in the correct location
- Monitor GPU memory usage during training
- Use Weights & Biases (wandb) to track training progress
- Check the output directory periodically for sample outputs
- Training time will vary based on your GPU and dataset size