This repository contains a complete optical flow pipeline implemented in PyTorch, covering:
- 📦 Dataset loading & preprocessing (FlyingChairs, MPI-Sintel)
- 🧠 Two FlowNet-style CNN architectures (single-scale & multi-scale)
- 🏋️ Training with AMP (mixed precision)
- 📊 Quantitative evaluation using Endpoint Error (EPE)
- 🎨 Extensive visualizations (HSV flow maps, training curves, comparisons)
- 🎥 Optical flow estimation on real videos (GIF & frame outputs)
- 🔍 Classical baseline using OpenCV Farnebäck for comparison
The project is structured as a research-style pipeline, suitable for coursework, experiments, and extensions.
.
├── src/ # Core source code
│ ├── dataset_preparation.py # FlyingChairs + Sintel loaders, .flo parsing, HSV visualization
│ ├── model.py # FlowNetSingleOutput & FlowNetMultiOutput
│ ├── train.py # Training loop (AMP, checkpoints, curves)
│ ├── eval_single.py # Sintel evaluation – single-output model
│ ├── eval_multi.py # Sintel evaluation – multi-output model
│ ├── farneback_eval.py # Farnebäck multi-configuration evaluation
│ └── video_flow.py # Optical flow on real videos + GIF generation
│
├── eval_single_clean/ # Single-output Sintel visualizations & stats
├── eval_multi_clean/ # Multi-output Sintel visualizations & stats
├── farneback_multi_test/ # Farnebäck visual results (multiple configs)
├── vis_output/ # FlyingChairs flow visualizations
├── video_flow_output/ # Video optical flow frames & GIFs
│
├── curve_single.png # Training curve (single-output)
├── curve_multi.png # Training curve (multi-output)
├── video.mp4 # Example input video
└── README.md
Used for training FlowNet models.
Expected structure:
data/FlyingChairs/FlyingChairs_release/data/
├── 00001_img1.ppm
├── 00001_img2.ppm
├── 00001_flow.flo
├── ...
Features:
- Optional resizing with correct flow vector scaling
- PyTorch
Dataset+DataLoader
Used for evaluation.
Expected structure:
data/MPI-Sintel-complete/training/
├── clean/
│ ├── alley_1/
│ └── ...
├── flow/
│ ├── alley_1/
│ └── ...
- Encoder–decoder (U-Net style)
- Predicts one full-resolution flow map
- Loss: EPE (Endpoint Error)
Output:
[B, 2, H, W]
- Multi-scale FlowNet variant
- Predicts flow at 4 resolutions: H, H/2, H/4, H/8
- Training with multi-scale supervision
Outputs:
(flow0, flow1, flow2, flow3)
Training uses:
- ✅ Adam optimizer
- ✅ AMP (Automatic Mixed Precision)
- ✅ Multi-scale loss (for multi-output model)
- ✅ Learning rate decay
- ✅ Periodic validation + checkpointing
Example:
python src/train.pyOutputs:
checkpoints/flownet_*_best.ptcheckpoints/flownet_*_last.pt- Training curves saved as PNG
These curves are generated automatically at the end of training and are already present in the repository root.
- Left: FlowNetSingleOutput training & validation EPE
- Right: FlowNetMultiOutput (multi-scale supervision)
Optical flow is visualized using HSV mapping:
- Hue → direction
- Saturation → magnitude
- Value → constant
The following visualizations are generated directly from the FlyingChairs dataset using HSV flow encoding and are saved in the repository.
Each figure shows:
- Image 1
- Image 2
- Ground-truth optical flow (HSV)
Each visualization (already stored in the repository) contains:
- Input image
- Ground-truth flow (HSV)
- Predicted flow (HSV)
- Per-sample Endpoint Error (EPE) shown in the title
Summary statistics are stored in:
eval_single_clean/clean_summary.txt
Each figure visualizes four resolution levels:
- Full resolution (H)
- H / 2
- H / 4
- H / 8
For every scale, ground truth vs prediction is shown using HSV encoding.
Summary statistics are stored in:
eval_multi_clean/clean_summary.txt
The repository includes real Farnebäck results generated with multiple parameter configurations.
Each image shows:
- Input frame
- Ground-truth flow
- Farnebäck-predicted flow
All configurations and quantitative results are summarized in:
farneback_multi_test/summary_all_configs.txt
The following results are generated from the provided video.mp4 and are already stored in the repository.
- FlowNet GIF (LEFT): deep-learning-based optical flow
- Farnebäck GIF (RIGHT): classical OpenCV baseline
Used for:
- Training loss
- Validation
- Sintel benchmarking
- Python ≥ 3.9
- PyTorch ≥ 2.0
- CUDA (optional, recommended)
- OpenCV
- NumPy
- Matplotlib
- ImageIO
- All resizing operations correctly rescale flow vectors
- Models expect spatial dimensions divisible by 8
- Code is written to be readable and modular for experimentation
Developed as an academic / research project focused on optical flow estimation, combining:
- Deep learning (FlowNet-style CNNs)
- Classical vision (Farnebäck)
- Quantitative evaluation (EPE)
- Rich visual analysis
If you use this repository for coursework or research, feel free to adapt and extend it 🚀












