Automated-ml-retraining-engine

This project implements a production-grade Machine Learning system that predicts credit card default risk while continuously monitoring for data drift and automatically retraining itself when the real-world data changes.

Unlike notebook-based ML projects, this system mimics how financial institutions deploy, monitor, and maintain ML models in production.

🚀 What this system does

Trains a credit risk model using historical customer data
Deploys the model using a FastAPI inference service
Scores new customer data in real-time and in batch
Monitors incoming production data for data drift
Automatically retrains the model when drift is detected
Replaces the deployed model only if the new one performs better

🧠 Architecture

High-level Flow

Historical Data → Model Training → Live API → Production Data → Drift Detection → Retraining → Model Update

Architecture Overview

Layer Responsibility Tools Data Layer Stores historical and live customer data Pandas, CSV Training Layer Trains credit default model Scikit-learn Serving Layer Provides real-time predictions FastAPI Monitoring Layer Detects data drift Evidently Retraining Layer Rebuilds model when drift occurs Scikit-learn Evaluation Layer Compares old vs new models ROC-AUC Batch Layer Scores thousands of customers Python + Requests

End-to-End ML Pipeline

Historical data is used to train the initial credit risk model.

The trained model is deployed using a FastAPI service.

New customer data flows into the system as production data.

Evidently continuously compares historical vs production data.

If data drift is detected, the model is automatically retrained.

The new model is evaluated against the deployed model.

Only if the new model performs better, it is promoted to production.

The API is updated and starts serving improved predictions.

Real-Time Inference Client → FastAPI (/predict) → ML Model → Credit Risk Probability

The API returns both:

Binary prediction (default / no default)

Probability score (risk level)

Batch Scoring production_clean.csv → batch_predict.py → FastAPI → production_with_predictions.csv

This allows thousands of customers to be scored automatically, just like real financial institutions do during nightly risk evaluations.

Why this Architecture Matters. This is not a notebook-based ML project. It is a production-style MLOps system with:

Data drift monitoring Automated retraining Safe model replacement Live API deployment Batch customer scoring

This mirrors how ML is deployed in banks, fintech companies, and SaaS platforms.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
src		src
.gitignore		.gitignore
README.md		README.md
app.py		app.py
drift_report.html		drift_report.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Automated-ml-retraining-engine

🚀 What this system does

🧠 Architecture

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Automated-ml-retraining-engine

🚀 What this system does

🧠 Architecture

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages