A CNN-based image classification system for identifying chicken diseases using deep learning and computer vision techniques. This project uses DVC (Data Version Control) for pipeline management and reproducibility.
- Deep Learning: Uses a CNN (VGG16 based) for image classification.
- DVC Pipeline: Automated pipeline for data ingestion, model definition, training, and evaluation.
- Web Interface: User-friendly web app built with FastAPI and Bootstrap for easy interaction.
- Reproducibility: Global random seed (42) and GPU memory growth configuration for consistent results.
- Experiment Tracking: Tracks parameters (epochs, batch size, etc.) and metrics (accuracy, loss).
- AWS S3 Storage: Hybrid local/cloud artifact storage with automatic model uploads.
- CloudWatch Monitoring: Real-time training & prediction logging with custom metrics.
This project supports AWS S3 for artifact storage and AWS CloudWatch for monitoring and observability.
graph LR
subgraph Local["🖥️ Local Environment"]
TP["Training Pipeline"]
FA["FastAPI App /predict"]
CWC["CloudWatch Callback"]
end
subgraph AWS["☁️ AWS Cloud"]
S3["S3 Bucket\n(Model, Data, Logs)"]
CWL["CloudWatch Logs\n(Predictions, Errors)"]
CWM["CloudWatch Metrics\n(Loss, Accuracy)"]
end
TP -->|upload| S3
FA -->|logs| CWL
CWC -->|metrics| CWM
style AWS fill:#FF9900,color:#fff
style Local fill:#232F3E,color:#fff
- Copy
.env.exampleto.envand fill in your AWS credentials:cp .env.example .env
- Set
STORAGE_MODE=s3andENABLE_CLOUDWATCH=truein.env. - Create your S3 bucket and CloudWatch log group in AWS Console.
| Metric | Description |
|---|---|
TrainingLoss |
Loss per epoch during training |
TrainingAccuracy |
Accuracy per epoch during training |
PredictionCount |
Count per prediction (with Class dimension) |
PredictionLatency |
Time taken per prediction (seconds) |
DiseaseDetected |
Binary flag for disease detection |
Training Logs in CloudWatch:
Prediction Logs in CloudWatch:
The pipeline has been tested on the following configuration:
- OS: macOS 26.1
- Model: MacBook Pro
- Chip: Apple M3 Pro
- Cores: 11 (5 performance and 6 efficiency)
- Memory: 18 GB
-
Clone the repository:
git clone https://github.com/neehanthreddym/chicken_disease_clf.git cd chicken_disease_clf -
Install dependencies:
pip install -r requirements.txt
To run the entire machine learning pipeline (Data Ingestion -> Model Definition -> Training -> Evaluation):
dvc reproThis will check for changes in dependencies and only run the necessary stages.
- Data Ingestion (
stage01_data_ingestion.py): Downloads and extracts the dataset. - Model Definition (
stage02_model_definition.py): Prepares the VGG16 base model. - Model Training (
stage03_training.py): Trains the model with augmented data. - Model Evaluation (
stage04_evaluation.py): Evaluates the trained model and saves scores.
The project includes a FastAPI-based web interface to easily classify images.
- Start the application:
python app.py
- Open your browser and navigate to
http://localhost:8000.
- Random Seed: A global seed of
42is set for Python, NumPy, and TensorFlow to ensure reproducible training runs. - GPU Config: TensorFlow GPU memory growth is enabled to prevent allocation errors.
Follow these steps when making changes to the project:
- Update
config.yaml: Modify system configuration settings (paths, URLs). - Update
secrets.yaml(Optional): Add sensitive credentials like API keys. - Update
params.yaml: Adjust parameters for model training/testing (Epochs, Batch Size, etc.). - Update the Entity: Modify data entities (dataclasses) in
src/entityfor accurate input/output. - Update the Configuration Manager: Adjust
src/config/configuration.pyto handle new configs. - Update the Components: Modify or add components in
src/components. - Update the Pipeline: Update the processing steps in
src/pipeline. - Update
main.py: Modify the main script if necessary. - Update
dvc.yaml: Update stage dependencies and outputs if the workflow changes.

