This repository implements a complete homography estimation pipeline, covering data generation, deep learning models, training, evaluation, and comparison with classical computer vision methods.
The project is inspired by HomographyNet-style approaches and is structured as a research-grade, reproducible experiment.
-
🧪 Synthetic dataset generation from real images (COCO-style)
-
🧠 Deep CNN-based HomographyNet
- Regression formulation (continuous corner offsets)
- Classification formulation (quantized displacement bins)
-
📉 Training & validation with loss tracking
-
📊 Quantitative evaluation using RMSE
-
🖼️ Extensive qualitative visualizations
-
📐 Classical baseline comparison (SIFT / ORB)
.
├── data/
│ ├── mini_coco/ # Source grayscale images
│ ├── generated_train/ # Training dataset (X, y_reg, y_cls)
│ └── generated_test/ # Test dataset
│
├── checkpoints/
│ ├── best_regression.pth
│ ├── best_classification.pth
│ ├── loss_regression.png
│ └── loss_classification.png
│
├── results/
│ ├── triple_regression_00.png
│ ├── triple_classification_00.png
│ ├── histograms_individual.png
│ ├── histogram_combined.png
│ ├── boxplot_comparison.png
│ └── violin_comparison.png
│
├── visualizations/
│ └── visualization_000.png
│
├── generate_dataset.py # Synthetic data generation
├── model.py # HomographyNet architecture
├── train.py # Training script
├── evaluate.py # Quantitative evaluation
├── evaluate_classical.py # Classical CV baseline (SIFT / ORB)
├── visualize_dataset.py # Dataset visualization
├── visualize_prediction.py # Model prediction visualization
└── README.md
The training and test datasets are generated automatically using random homographic perturbations applied to real grayscale images.
Each sample consists of:
- Two 64×64 grayscale patches (original + warped)
- Ground-truth corner displacements (4 corners × 2 coordinates)
- Sample a random square patch from an image
- Randomly perturb its four corners
- Compute the corresponding homography
- Warp the image and extract the warped patch
Generated by visualize_dataset.py
Explanation:
- 🟦 Blue quadrilateral — original patch
- 🟥 Red quadrilateral — displaced patch
- Right image shows the warped image with the displaced patch rectified
The network predicts the 8 displacement values corresponding to the four patch corners.
- Residual CNN with shared encoder
- Input: 2 × 64 × 64 (original + warped patch)
- Output embedding: 512-dimensional feature vector
- Regression head: directly predicts 8 continuous values
- Classification head: predicts 8 × K logits (K = number of displacement bins)
Loss functions:
- 📉 Regression → RMSE
- 📊 Classification → Cross-Entropy (per displacement component)
Training is performed separately for regression and classification models.
- Optimizer: Adam
- Mixed precision (AMP) supported
- Best model saved based on validation loss
Visualizations generated by visualize_prediction.py.
Legend:
- ⬜ White — original patch
- 🟦 Blue — ground-truth displacement
- 🟥 Red — predicted displacement
Bottom row shows:
- Original patch
- Warped patch
- Rectified patch (using predicted homography)
Displacements are predicted as quantized bins and converted back to continuous values.
Evaluation is performed on a held-out synthetic test set using RMSE.
A traditional feature-based homography estimation baseline is implemented using:
- SIFT
- ORB
The same RMSE metric is used to ensure fair comparison with the neural models.
Results are saved to results/ and plotted alongside deep learning methods.
# 1. Generate datasets
python generate_dataset.py
# 2. Train models
python train.py
# 3. Evaluate models
python evaluate.py
# 4. Evaluate classical baseline
python evaluate_classical.py
# 5. Visualize dataset and predictions
python visualize_dataset.py
python visualize_prediction.py- All images are grayscale and normalized to [0, 1]
- Patch size and displacement range are configurable
- The pipeline is fully deterministic with fixed random seeds
⭐ If you find this project useful, feel free to star the repository.








