Skip to content

Credit Risk Prediction System is an end-to-end machine learning project that predicts loan default risk using customer financial data. It applies EDA, feature encoding, and advanced models like Random Forest and XGBoost, and is deployed via Streamlit for real-time credit risk assessment.

Notifications You must be signed in to change notification settings

Sparkydev007/Credit-Risk-Prediction-System

Repository files navigation


💳 Credit Risk Prediction System

A Machine Learning project that predicts whether a loan applicant is Low Risk or High Risk using structured financial and demographic data.


🚀 Live Streamlit App: https://credit-risk-prediction-system-1.streamlit.app/


📊 At-a-Glance Impact

Metric Value
Dataset Size 1,000+ Credit Applications
Models Trained Decision Tree, Random Forest, Extra Trees, XGBoost
Best Accuracy 85%+ (Balanced Classes)
Target Variable Credit Risk (Good / Bad)
Business Goal Reduce Default Risk & Improve Loan Decisions

🚀 Why This Project Matters

Banks and financial institutions lose billions of dollars annually due to loan defaults.

🔴 Even a 1% improvement in credit risk prediction can save millions in bad loans.

This project simulates a real banking credit approval system, applying machine learning to:

  • Reduce default risk
  • Improve approval accuracy
  • Enable data‑driven lending decisions

📊 Dataset Overview

  • Customers: ~1,000 loan applicants
  • Features: Demographic + Financial attributes
  • Target: Credit Risk (Good / Bad)

Key Features Used

Feature Description
Age Applicant age
Sex Gender
Job Employment category
Housing Own / Rent / Free
Saving Accounts Savings status
Checking Account Checking balance
Credit Amount Loan amount requested
Duration Loan duration (months)
Purpose Loan purpose

🧠 Machine Learning Models

Multiple models were trained and evaluated:

Model Strength
Decision Tree Interpretability
Random Forest Stability & performance
Extra Trees Reduced variance
XGBoost High predictive power

XGBoost delivered the best overall performance.


📈 Results & Impact

Metric Score
Accuracy ~80%+
Recall (High Risk) Improved via class balancing
Precision Optimized using GridSearch

Business Interpretation

  • Correctly flags high‑risk borrowers
  • Minimizes false approvals
  • Improves lender profitability

🔧 Feature Engineering

  • Label Encoding for categorical variables
  • Target encoding for Risk variable
  • Class imbalance handled using class_weight="balanced"
  • Hyperparameter tuning with GridSearchCV

📦 Project Structure

credit-risk-prediction/
│
├── data/
│   └── german_credit.csv
│
├── models/
│   ├── decision_tree.pkl
│   ├── random_forest.pkl
│   └── label_encoders/
│
├── notebooks/
│   └── credit_risk_analysis.ipynb
│
├── src/
│   ├── preprocessing.py
│   ├── train.py
│   └── evaluate.py
│
├── README.md
└── requirements.txt

🛠 Tech Stack

  • Python 🐍
  • Pandas / NumPy
  • Scikit‑Learn
  • XGBoost
  • Seaborn & Matplotlib
  • Joblib

⚙️ How to Run

pip install -r requirements.txt
jupyter notebook

or train directly:

python src/train.py

📌 Key Learnings

  • Real‑world data is messy & imbalanced
  • Model accuracy alone is not enough — recall matters
  • Explainability is crucial in finance ML

🎯 Who Should Look at This?

✔ Recruiters (Data Science / ML / FinTech) ✔ Banks & NBFC analysts ✔ ML engineers learning applied finance ✔ Students preparing for placements


🌟 Future Improvements

  • SHAP explainability
  • Model deployment with Streamlit
  • Real‑time credit scoring API
  • ROC‑AUC optimization

🙌 Author

Prathmesh Bunde CSE | FinTech | Machine Learning

📌 If this project helped you, don’t forget to ⭐ the repo!

About

Credit Risk Prediction System is an end-to-end machine learning project that predicts loan default risk using customer financial data. It applies EDA, feature encoding, and advanced models like Random Forest and XGBoost, and is deployed via Streamlit for real-time credit risk assessment.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published