A deep learning project implementing a Convolutional Neural Network for classifying 120 dog breeds with 83.16% accuracy using transfer learning with ResNet-50.
๐ Quick Links:
- ๐ Getting Started Guide - 5-minute setup
- โ FAQ - Common questions answered
- ๐ Project Structure - File organization explained
- ๐ View Presentation - Academic presentation
- Overview
- Quick Start
- Features
- Architecture
- Results
- Installation
- Usage
- Dataset
- Training Details
- Future Improvements
- Real-World Applications
- Project Presentation
- License
- Contact
This project addresses the challenging task of fine-grained visual classification of 120 dog breeds from the Stanford Dogs dataset. The model leverages transfer learning with a pre-trained ResNet-50 architecture to achieve production-ready performance with efficient training time.
Key Achievements:
- โ 83.16% test accuracy on 120-class classification
- โ Training time: ~15 minutes on Tesla T4 GPU
- โ 1.1M trainable parameters (frozen base model)
- โ Comprehensive evaluation with confusion matrices and classification reports
# Clone the repository
git clone https://github.com/BenFricker/dog-breed-cnn-classifier.git
cd dog-breed-cnn-classifier
# Install dependencies
pip install -r requirements.txt
# Download dataset (see Dataset section below)
# Update data_dir in Dog-Breed-CNN.py (line 99)
# Train the model
python Dog-Breed-CNN.pyThat's it! The script will train the model, generate visualizations, and save the best model checkpoint.
- Transfer Learning: Utilizes pre-trained ResNet-50 (ImageNet weights)
- Custom Classification Head: Multi-layer Sequential classifier with dropout regularization
- Data Augmentation Pipeline: Random rotation, horizontal flip, and color jittering
- Model Checkpointing: Automatic saving of best performing model
- Comprehensive Metrics: Classification reports, confusion matrices, training curves
- Production Ready: Clean, documented code with proper error handling
Complete pipeline: Model Architecture โ Data Pipeline โ Training Strategy
- ResNet-50 (pre-trained on ImageNet)
- 23.5M frozen parameters for feature extraction
Linear(2048 โ 512)
ReLU Activation
Dropout(p=0.3)
Linear(512 โ 120)
- Optimizer: Adam (lr=0.001)
- Loss Function: CrossEntropyLoss
- Scheduler: StepLR (step_size=5, gamma=0.1)
- Batch Size: 32
- Epochs: 10
- Data Split: 70% Train / 15% Validation / 15% Test
| Metric | Value |
|---|---|
| Test Accuracy | 83.16% |
| Best Validation Accuracy | 83.67% (Epoch 9) |
| Macro-Precision | 83.1% |
| Macro-Recall | 82.8% |
| Macro-F1 Score | 82.5% |
| Training Time | 15 minutes 23 seconds |
Key Observations:
- Rapid initial learning: 47.82% โ 75.74% validation accuracy in epoch 1
- Learning rate drop impact: Accuracy jumped from 80.62% โ 83.32% at epoch 6
- Best model: Epoch 9 with 83.67% validation accuracy
- Steady convergence: Final epochs show minimal fluctuation (ยฑ0.5%)
Top Performing Breeds (F1 > 0.97):
- Afghan Hound, Keeshond, Saint Bernard achieved perfect classification (F1 = 1.000)
- These breeds have distinctive features: unique coat patterns, clear size differences, distinctive physical characteristics
Challenging Breeds (F1 < 0.60):
- Poodle varieties (Miniature, Toy) and similar fluffy white breeds
- Difficulty due to: inter-breed similarity, morphological overlap, size similarities
The model shows consistent performance across all 120 classes with minimal overfitting, as test accuracy (83.16%) closely aligns with validation performance (83.67%).
- Python 3.8+
- CUDA-capable GPU (recommended)
# Clone the repository
git clone https://github.com/BenFricker/dog-breed-cnn-classifier.git
cd dog-breed-cnn-classifier
# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txtOption A: Download from Kaggle (Recommended - Easier)
- Create a free account at Kaggle.com
- Go to Stanford Dogs Dataset
- Click "Download" button
- Extract the downloaded ZIP file to a location on your computer
Option B: Download from Stanford (Original Source)
- Visit: http://vision.stanford.edu/aditya86/ImageNetDogs/
- Download:
Images.tar - Extract to your preferred location
Expected folder structure after extraction:
your-chosen-location/
โโโ Images/
โโโ n02085620-Chihuahua/
โโโ n02085782-Japanese_spaniel/
โโโ n02085936-Maltese_dog/
โโโ ... (117 more breed folders)
Open Dog-Breed-CNN.py in any text editor and find line 99:
# BEFORE (line 99):
data_dir = r'C:\Users\benwf\OneDrive\Desktop\UOW\UOW\Foundations of Artificial Intelligence\Assessments\Group Project\images'
# AFTER (update to YOUR path):
data_dir = r'C:/Users/YourName/Downloads/Images' # Windows
# OR
data_dir = '/Users/YourName/Downloads/Images' # Mac/Linux๐ก Tip: Use forward slashes / or raw strings r'...' to avoid path issues.
python Dog-Breed-CNN.pyWhat happens during training:
- โ Loads and preprocesses 20,580 images
- โ Splits data (70% train, 15% validation, 15% test)
- โ Trains for 10 epochs (~70 seconds per epoch on GPU)
- โ
Saves best model as
best_dog_breed_model.pth - โ Generates training curves and confusion matrices
After training completes, you'll find these files in your directory:
| File | Description |
|---|---|
best_dog_breed_model.pth |
Trained model checkpoint (best validation accuracy) |
training_curves.png |
Loss and accuracy plots over epochs |
confusion_matrix_part_1.png |
Confusion matrix (classes 0-29) |
confusion_matrix_part_2.png |
Confusion matrix (classes 30-59) |
confusion_matrix_part_3.png |
Confusion matrix (classes 60-89) |
confusion_matrix_part_4.png |
Confusion matrix (classes 90-119) |
Console output will show:
- Training/validation accuracy and loss per epoch
- Final test accuracy (~83%)
- Detailed classification report (precision, recall, F1-score per breed)
Problem: FileNotFoundError: [Errno 2] No such file or directory
- Solution: Check that
data_dirpath is correct and points to theImagesfolder
Problem: CUDA out of memory
- Solution: Reduce
batch_sizefrom 32 to 16 (line 163 in the code)
Problem: Training is very slow
- Solution: Ensure you have a CUDA-capable GPU. CPU training will take 10-20x longer.
Problem: ModuleNotFoundError: No module named 'torch'
- Solution: Run
pip install -r requirements.txt
Stanford Dogs Dataset
- Total Images: 20,580
- Classes: 120 dog breeds
- Split:
- Training: 14,405 images (70%)
- Validation: 3,087 images (15%)
- Testing: 3,088 images (15%)
- Resize: 224ร224
- Random Horizontal Flip
- Random Rotation: ยฑ15ยฐ
- Color Jitter: ยฑ20% brightness/contrast
- ImageNet Normalization
- Parameter Freezing: Reduces training time by 60% while maintaining transfer learning benefits
- Dropout (0.3): Prevents overfitting without sacrificing learning capacity
- Learning Rate Scheduling: Enables fine-grained convergence in later epochs
- Model Checkpointing: Saves best model based on validation accuracy
- Rapid initial learning due to pre-trained features
- Learning rate drop at epoch 6 provided significant boost (80.62% โ 83.32%)
- Minimal overfitting - test accuracy closely matches validation
- Consistent performance across all 120 classes
- Ensemble models (Expected: +2-3%)
- Fine-tuning deeper layers (Expected: +2-3%)
- Attention mechanisms (Expected: +3-5%)
- Increase epochs to 20-30
- Class-weighted loss for imbalanced breeds
- Advanced augmentation (Mixup, CutMix)
- Focal loss for hard examples
Target Accuracy: 86-90%
- Veterinary Services: Automated breed identification for health screening
- Pet Adoption Platforms: Intelligent breed tagging and matching
- Lost Pet Recovery: Identification systems for reunification
- Mobile Applications: Consumer-facing breed recognition apps
A comprehensive presentation covering the project's methodology, results, and analysis is included in this repository:
The presentation includes:
- Literature review of current approaches
- Detailed architecture and training strategy
- Complete results analysis with visualizations
- Real-world applications and future directions
This project was developed as part of a Foundations of Artificial Intelligence course. Despite being assigned as a group project, I completed all aspects independently:
- โ Data acquisition and preprocessing
- โ Model architecture design and implementation
- โ Training pipeline and optimization
- โ Comprehensive evaluation and visualization
- โ 15-minute technical presentation
- โ Complete documentation
Academic Note: I received 93%+ for this project. I was the only contributing member out of a 6-person team. Non-contributing members received 0% as documented by the course coordinator.
- Deep Learning: CNN architectures, transfer learning, fine-tuning
- PyTorch: Model implementation, training loops, data pipelines
- Computer Vision: Image classification, data augmentation
- Data Science: Performance metrics, visualization, statistical analysis
- Software Engineering: Clean code, documentation, version control
- Communication: Technical presentation and reporting
While this is an academic project, I welcome feedback and suggestions! Feel free to:
- Open an issue for bugs or questions
- Suggest improvements or optimizations
- Share your results if you use this code
This project is licensed under the MIT License - see the LICENSE file for details.
Benjamin Fricker
- GitHub: @BenFricker
- LinkedIn: Connect with me on LinkedIn
Current Focus: Double Major in Artificial Intelligence & Cybersecurity (Computer Science)
๐ผ Open to opportunities in AI/ML Engineering, Computer Vision, and Cybersecurity roles.
- Stanford Dogs Dataset creators
- PyTorch and torchvision teams
- ResNet architecture authors (He et al., 2015)
โญ If you find this project useful or interesting, please consider giving it a star!
Last Updated: October 2025



