Built for Adobe Machine Learning Engineer (2026 Batch) Application
A production-grade MLOps pipeline that trains, deploys, monitors, and auto-retrains an image classification model — mirroring Adobe's real infrastructure for Firefly and Creative Cloud AI features.
A complete end-to-end MLOps system with:
- Image Classification API — EfficientNet-B0 model trained on 14,000 real images achieving 86.85% validation accuracy
- A/B Testing — Deterministic per-user routing (80% stable / 20% canary) with instant rollback
- MLflow Tracking — Full experiment tracking, model registry, and versioning
- Drift Detection — Automatic data drift monitoring with Evidently AI
- CI/CD Pipeline — GitHub Actions with lint, test, and Docker build stages ✅ Passing
Here click my [@project-website] (https://ml-ops-image-classification-production.up.railway.app/docs#/default/predict_predict_post) as of now I deployed this using railway.app I trained of 5 epochs, if it trained around 10 epochs then it confidence will increase from 20 to 80-90 percent
┌─────────────────────────────────────────────────────────────┐
│ GitHub Actions CI/CD │
│ push → lint → test → Docker build → deploy │
└───────────────────────┬─────────────────────────────────────┘
│
┌───────────────┼───────────────┐
▼ ▼ ▼
┌─────────────┐ ┌──────────────┐ ┌──────────────────┐
│ FastAPI │ │ MLflow │ │ PostgreSQL │
│ A/B Test │ │ Registry │ │ Database │
│ port 8000 │ │ port 5001 │ │ port 5432 │
└─────────────┘ └──────────────┘ └──────────────────┘
| Category | Tools |
|---|---|
| ML Framework | PyTorch 2.3, EfficientNet-B0, Scikit-learn |
| MLOps | MLflow 2.13 (experiment tracking, model registry) |
| Serving | FastAPI, Uvicorn |
| Containerisation | Docker, Docker Compose |
| CI/CD | GitHub Actions ✅ |
| Drift Detection | Evidently AI |
| Database | PostgreSQL |
| Cloud Ready | Terraform (AWS ECR, S3, SageMaker) |
| Metric | Value |
|---|---|
| Model | EfficientNet-B0 (ImageNet pretrained) |
| Dataset | Intel Image Classification (14,000 real images) |
| Classes | buildings, forest, glacier, mountain, sea, street |
| Training Epochs | 10 |
| Best Val Accuracy | 86.85% |
| Model Version | v2 (registered in MLflow) |
- Docker Desktop installed
- 4GB free disk space
git clone https://github.com/sumith25-dev/ML-ops-Image-Classification.git
cd ML-ops-Image-Classification
docker-compose up -d api mlflow postgres| Service | URL |
|---|---|
| FastAPI docs | http://localhost:8000/docs |
| MLflow UI | http://localhost:5001 |
curl -X POST http://localhost:8000/predict \
-H "Content-Type: application/json" \
-d '{
"image_url": "https://images.pexels.com/photos/1547813/pexels-photo-1547813.jpeg",
"user_id": "user_001"
}'Response:
{
"label": "forest",
"confidence": 0.87,
"model_version": "stable",
"latency_ms": 245.32
}User ID → MD5 Hash → Bucket (0-9999)
0-7999 → stable model (80% traffic)
8000-9999 → canary model (20% traffic)
Same user always hits the same model — consistent experience.
Train → MLflow Registry → Assign 'stable' alias → API loads automatically
→ Assign 'canary' alias → 20% traffic routed
Run drift_detector.py
→ If drift > 30% OR accuracy < 80%
→ Flag for retraining
→ Promote new model to canary
git push → GitHub Actions
→ Lint (ruff) ✅
→ Unit Tests ✅
→ Docker Build ✅
ML-ops-Image-Classification/
├── app/
│ ├── main.py # FastAPI — predict, health, metrics, rollback
│ └── model_manager.py # MLflow loading, A/B routing, inference
├── model/
│ └── train.py # EfficientNet training with MLflow tracking
├── monitoring/
│ └── drift_detector.py # Evidently drift analysis
├── pipeline/
│ └── mlops_dag.py # Airflow DAG — drift → retrain → promote
├── tests/
│ └── test_api.py # Unit tests
├── infra/
│ ├── main.tf # Terraform — AWS ECR, S3, SageMaker
│ └── prometheus.yml # Prometheus config (future deployment)
├── notebooks/
│ └── exploration.ipynb # EDA and drift visualisation
├── .github/workflows/
│ └── ci_cd.yml # GitHub Actions CI/CD ✅ passing
├── docker-compose.yml # Full local stack
├── Dockerfile # API container
└── requirements.txt
# Run manually or schedule via Airflow
python monitoring/drift_detector.py
# Auto-retraining triggers when:
# - drift_share >= 30% of features drifted
# - model accuracy < 80%cd infra
terraform init
terraform apply -var="environment=prod"Provisions:
- ECR repository for Docker images
- S3 bucket for MLflow artifacts
- SageMaker endpoint with blue/green deployment
- IAM roles and policies
-
86.85% accuracy on 14,000 real images using EfficientNet-B0
-
A/B testing — Deterministic per-user hash routing for consistent user experience, configurable split at runtime
-
CI/CD — Every commit automatically lints, tests, and builds Docker image via GitHub Actions ✅
-
Drift detection — Evidently AI detects when production data drifts from training distribution, triggering automatic retraining
-
MLflow registry — Complete model versioning with aliases, metrics tracking, and artifact storage
-
Zero-downtime deployment — Blue/green deployment with instant rollback via single API call
Sumith B R — Applying for Adobe Machine Learning Engineer (2026 Batch)
GitHub: @sumith25-dev
MIT License