This project implements a production-grade Machine Learning system that predicts credit card default risk while continuously monitoring for data drift and automatically retraining itself when the real-world data changes.
Unlike notebook-based ML projects, this system mimics how financial institutions deploy, monitor, and maintain ML models in production.
- Trains a credit risk model using historical customer data
- Deploys the model using a FastAPI inference service
- Scores new customer data in real-time and in batch
- Monitors incoming production data for data drift
- Automatically retrains the model when drift is detected
- Replaces the deployed model only if the new one performs better
High-level Flow
Historical Data → Model Training → Live API → Production Data → Drift Detection → Retraining → Model Update
Architecture Overview
Layer Responsibility Tools Data Layer Stores historical and live customer data Pandas, CSV Training Layer Trains credit default model Scikit-learn Serving Layer Provides real-time predictions FastAPI Monitoring Layer Detects data drift Evidently Retraining Layer Rebuilds model when drift occurs Scikit-learn Evaluation Layer Compares old vs new models ROC-AUC Batch Layer Scores thousands of customers Python + Requests
End-to-End ML Pipeline
Historical data is used to train the initial credit risk model.
The trained model is deployed using a FastAPI service.
New customer data flows into the system as production data.
Evidently continuously compares historical vs production data.
If data drift is detected, the model is automatically retrained.
The new model is evaluated against the deployed model.
Only if the new model performs better, it is promoted to production.
The API is updated and starts serving improved predictions.
Real-Time Inference Client → FastAPI (/predict) → ML Model → Credit Risk Probability
The API returns both:
Binary prediction (default / no default)
Probability score (risk level)
Batch Scoring production_clean.csv → batch_predict.py → FastAPI → production_with_predictions.csv
This allows thousands of customers to be scored automatically, just like real financial institutions do during nightly risk evaluations.
Why this Architecture Matters. This is not a notebook-based ML project. It is a production-style MLOps system with:
Data drift monitoring Automated retraining Safe model replacement Live API deployment Batch customer scoring
This mirrors how ML is deployed in banks, fintech companies, and SaaS platforms.