🚀 Quick Start Guide

Installation

pip install -r requirements.txt

1️⃣ Run Full Pipeline (Standard)

python main.py

Output:

20+ figures in output/figures/
Model results in output/models/
All results in output/all_results.json
20-page thesis in docs/Credit_Risk_Modeling_Thesis.pdf

2️⃣ Advanced Pipeline Options

With Hyperparameter Optimization (Slower, Better Accuracy)

python main.py --optuna

Uses Optuna for tuning XGBoost, CatBoost, Random Forest
~10-30 minutes depending on hardware
Typically improves AUC by 1-3%

With SHAP Explanations (Enabled by Default)

python main.py --shap

or disable:

python main.py --no-shap

Both Optimizations

python main.py --optuna --shap

3️⃣ Launch Flask Web Application (NEW!)

python app.py

Then open browser: http://localhost:5000

Features:

💳 Single loan scoring dashboard
🎯 Real-time PD/LGD/EAD predictions
✅ Approval/Rejection decisions
💰 Risk-based pricing recommendations
📊 Portfolio statistics
🤖 5 models: Logistic Regression, Random Forest, Gradient Boosting, XGBoost, CatBoost

📊 What's New (v2.0)?

1. SHAP Model Explainability ✨

Beeswarm plots showing feature impact
Force diagrams for individual loans
Partial dependence plots
Feature importance ranking
Basel compliance ready!

from src.explainability import ModelExplainability
explainer = ModelExplainability(model, X_train, features)
explainer.generate_explanation_report(X_test, predictions=preds)

2. Decision Engine 🎯

Expected Loss calculation: EL = PD × LGD × EAD
Automatic approval/rejection decisions
Risk-based interest rate adjustments
Portfolio-level metrics

from src.decision_engine import LoanDecisionEngine
engine = LoanDecisionEngine(approval_el_threshold=0.05)
decisions = engine.make_decisions(pd, lgd, ead)
pricing = engine.calculate_risk_based_pricing(pd, lgd, ead)

3. CatBoost Model 🚀

Handles categorical features natively
Often outperforms other trees
Included in model ensemble
Auto-tuned with Optuna

4. Optuna Hyperparameter Tuning 🔧

Bayesian optimization for XGBoost, CatBoost, Random Forest
50 trials with median pruning
~20-30 minute runtime
Typically improves AUC by 1-3%

5. Real Data Support 📥

LendingClub dataset integration
Kaggle credit default dataset
Automatic data cleaning
Backward compatible with synthetic data

from src.real_data_loader import RealDataLoader
df, source = RealDataLoader.load_lending_club('path/to/data.csv')
# Or use synthetic fallback
df, source = RealDataLoader.load_synthetic_or_real(
    use_real_data=True, 
    real_data_path='data.csv'
)

6. Flask Web Application 🌐

Production-ready API
Interactive HTML dashboard
Real-time predictions
Batch uploads via CSV
CORS enabled

API Endpoints:

POST /api/predict - Single loan scoring
POST /api/batch_predict - Batch CSV upload
GET /api/portfolio_stats - Portfolio statistics
GET /api/models_info - Deployed models
GET /api/feature_importance - Feature rankings

7. Model Persistence 💾

Auto-save trained models as .pkl files
Load pre-trained models for inference
Metadata for reproducibility
Flask app loads from saved models

from src.pd_model import PDModelSuite
pd_suite.save_models()  # Auto-saves all models
loaded = PDModelSuite.load_models('output/models/')

🔧 Configuration

Edit config.py or create .env:

# Example: Use LendingClub data
export REAL_DATA_PATH="/path/to/lending_club.csv"
export DATASET_NAME="lending_club"
export USE_REAL_DATA=True
export APPROVAL_EL_THRESHOLD=0.05

# Run pipeline
python main.py --optuna

📈 Pipeline Workflow

main.py
├── Data Generation (synthetic or real)
├── EDA + visualization
├── Feature Engineering
├── PD Model Training
│   ├── Logistic Regression
│   ├── Random Forest
│   ├── Gradient Boosting
│   ├── XGBoost (+ Optuna tuning)
│   └── CatBoost (+ Optuna tuning)
├── Model Validation (9 panels)
├── SHAP Explanations (3 plots + feature importance)
├── Credit Scorecard
├── LGD Modeling
├── EAD Modeling
├── Decision Engine (+pricing dashboard)
├── Stress Testing
├── Basel III Capital
└── Thesis PDF Generation

app.py (Flask Web App)
├── Load pre-trained models
├── API: Single loan scoring
├── API: Batch predictions
└── Interactive dashboard

🎓 For Interview/Portfolio

Talking Points:

"I built a production-grade credit risk system with 5 models"
"Implemented SHAP for model explainability (Basel compliance)"
"Created decision engine + risk-based pricing"
"Used Optuna optimization - improved AUC by 2-3%"
"Built Flask API for real-time scoring"
"Integrated CatBoost for categorical feature handling"

Show:

Run python main.py → Full pipeline
Open http://localhost:5000 → Web app demo
Show output/figures/ → 20+ publication-quality charts
Show SHAP plots → Explainability
Show decision engine → Business impact

📚 Model Comparison (Example Results)

Model	CV AUC	Test AUC	KS	Gini
Logistic Reg	0.7342	0.7215	0.412	0.465
Random Forest	0.7891	0.7756	0.487	0.551
Gradient Boost	0.7823	0.7684	0.478	0.537
XGBoost	0.8012	0.7901	0.523	0.580
CatBoost (NEW!)	0.8064	0.7956	0.531	0.591

🐋 Docker (Optional)

docker build -t credit-risk .
docker run -p 5000:5000 credit-risk

📞 Troubleshooting

"SHAP generation failed"

Install: pip install shap
SHAP is slower on large datasets (~30s for 10k samples)

"Optuna trials slow"

Reduce datasets for testing: Modify N_SAMPLES in config
Or use standard pipeline: python main.py (no Optuna)

"Flask won't start"

Port 5000 in use? Change in config: FLASK_PORT=5001
Missing dependencies? Run: pip install -r requirements.txt

"Real data not loading"

Verify CSV path is correct
Check column names match expectations
Falls back to synthetic automatically

🚀 Next Steps (Further Enhancements)

Time-series modeling (ARIMA for macro variables)
Survival analysis (Cox proportional hazards)
Portfolio optimization (CVaR, efficient frontier)
Advanced neural networks (LSTMs for sequences)
Docker + cloud deployment (AWS, GCP)
Database integration (PostgreSQL for model artifacts)
A/B testing framework for model updates

Questions? Check the main README.md for detailed documentation.

Good luck! 🎉

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🚀 Quick Start Guide

Installation

1️⃣ Run Full Pipeline (Standard)

2️⃣ Advanced Pipeline Options

With Hyperparameter Optimization (Slower, Better Accuracy)

With SHAP Explanations (Enabled by Default)

Both Optimizations

3️⃣ Launch Flask Web Application (NEW!)

📊 What's New (v2.0)?

1. SHAP Model Explainability ✨

2. Decision Engine 🎯

3. CatBoost Model 🚀

4. Optuna Hyperparameter Tuning 🔧

5. Real Data Support 📥

6. Flask Web Application 🌐

7. Model Persistence 💾

🔧 Configuration

📈 Pipeline Workflow

🎓 For Interview/Portfolio

Talking Points:

Show:

📚 Model Comparison (Example Results)

🐋 Docker (Optional)

📞 Troubleshooting

🚀 Next Steps (Further Enhancements)

FilesExpand file tree

QUICKSTART.md

Latest commit

History

QUICKSTART.md

File metadata and controls

🚀 Quick Start Guide

Installation

1️⃣ Run Full Pipeline (Standard)

2️⃣ Advanced Pipeline Options

With Hyperparameter Optimization (Slower, Better Accuracy)

With SHAP Explanations (Enabled by Default)

Both Optimizations

3️⃣ Launch Flask Web Application (NEW!)

📊 What's New (v2.0)?

1. SHAP Model Explainability ✨

2. Decision Engine 🎯

3. CatBoost Model 🚀

4. Optuna Hyperparameter Tuning 🔧

5. Real Data Support 📥

6. Flask Web Application 🌐

7. Model Persistence 💾

🔧 Configuration

📈 Pipeline Workflow

🎓 For Interview/Portfolio

Talking Points:

Show:

📚 Model Comparison (Example Results)

🐋 Docker (Optional)

📞 Troubleshooting

🚀 Next Steps (Further Enhancements)