Skip to content

BrandyBrandt/finanseML

Repository files navigation

Credit Risk Analysis - Machine Learning Models

Python License: MIT Jupyter

Advanced machine learning project for credit risk assessment combining interpretable scorecards with state-of-the-art ensemble models for optimal predictive performance and business value.

Project Overview

This project implements a comprehensive credit risk modeling system featuring:

  • Interpretable Credit Scorecards with Weight of Evidence (WoE) transformations for regulatory compliance
  • High-Performance Ensemble Models (LightGBM, XGBoost, SVM) with SHAP explainability
  • Business-Driven Optimization including profit maximization and threshold analysis

Project Structure

finanseML/
├── finalBlackBox.ipynb          # Ensemble models with hyperparameter optimization
├── scoreCard.ipynb              # Interpretable credit scorecards
├── data/                        # Train/validation/test datasets
├── grid_search_results/         # Model optimization results
├── MODEL_CARD.md               # Comprehensive model documentation
├── TECHNICAL_REPORT.md         # Detailed methodology and findings
└── requirements.txt            # Python dependencies

Notebooks

Credit Scorecard (scoreCard.ipynb)

Interpretable credit risk models designed for regulatory compliance and business stakeholder communication.

Key Features:

  • Data preprocessing pipeline: missing value imputation, outlier treatment (winsorization), multicollinearity reduction (VIF)
  • Weight of Evidence (WoE) transformations for interpretable feature binning
  • Multiple scorecard implementations: baseline and advanced models with feature interactions
  • Model calibration: Calibration-in-the-Large, Isotonic Regression
  • Performance evaluation: ROC-AUC, Kolmogorov-Smirnov statistic, Gini coefficient, population stability index
  • SQL code generation for production deployment

Technical Approach:

  • Logistic regression with L1/L2 regularization
  • Systematic feature engineering and interaction terms
  • Partial Dependence Plots (PDP) and Individual Conditional Expectation (ICE) for interpretability

Black-Box Models (finalBlackBox.ipynb)

High-performance machine learning models optimized for predictive accuracy with post-hoc explainability.

Key Features:

  • Advanced gradient boosting: LightGBM and XGBoost with custom objective functions
  • Support Vector Machines with kernel optimization
  • Automated hyperparameter tuning via Optuna
  • Ensemble strategies: seed averaging, blending, stacking
  • Model explainability: SHAP values for global and local interpretations
  • Business analytics: profit matrix optimization, ROI-driven threshold selection, cost-sensitive learning

Models Implemented:

  • LightGBM with custom loss functions
  • XGBoost with early stopping and cross-validation
  • SVM with RBF and polynomial kernels
  • Multi-level stacking ensemble architecture

Getting Started

Prerequisites

  • Python 3.8 or higher
  • Jupyter Notebook/Lab
  • Minimum 4GB RAM

Installation

  1. Clone the repository
git clone https://github.com/BrandyBrandt/finanseML.git
cd finanseML
  1. Create and activate virtual environment
python -m venv venv
source venv/bin/activate      # Linux/macOS
venv\Scripts\activate         # Windows
  1. Install dependencies
pip install -r requirements.txt
  1. Launch Jupyter
jupyter notebook

Results

  • Scorecard Model: ROC-AUC 0.85 - optimized for interpretability and regulatory compliance
  • Ensemble Model: ROC-AUC 0.92 - maximized predictive performance
  • Business Impact: Demonstrated profit optimization through threshold tuning and cost-sensitive modeling

Technology Stack

Machine Learning: scikit-learn, LightGBM, XGBoost
Optimization: Optuna
Explainability: SHAP, LIME, Partial Dependence Plots
Data Analysis: pandas, numpy
Visualization: matplotlib, seaborn

Documentation

Authors

Aleksander Brandt - GitHub
Wojciech Baraniak

License

This project is licensed under the MIT License.

Future Development

  • Deep learning architectures (TabNet, Neural Networks)
  • RESTful API for real-time scoring
  • Feature stability analysis across time periods
  • A/B testing framework for production deployment

About

Advanced ML project for credit risk assessment: interpretable scorecards + black-box models (LightGBM, XGBoost) with SHAP explainability and profit optimization

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors