Credit Risk Analysis - Machine Learning Models

Advanced machine learning project for credit risk assessment combining interpretable scorecards with state-of-the-art ensemble models for optimal predictive performance and business value.

Project Overview

This project implements a comprehensive credit risk modeling system featuring:

Interpretable Credit Scorecards with Weight of Evidence (WoE) transformations for regulatory compliance
High-Performance Ensemble Models (LightGBM, XGBoost, SVM) with SHAP explainability
Business-Driven Optimization including profit maximization and threshold analysis

Project Structure

finanseML/
├── finalBlackBox.ipynb          # Ensemble models with hyperparameter optimization
├── scoreCard.ipynb              # Interpretable credit scorecards
├── data/                        # Train/validation/test datasets
├── grid_search_results/         # Model optimization results
├── MODEL_CARD.md               # Comprehensive model documentation
├── TECHNICAL_REPORT.md         # Detailed methodology and findings
└── requirements.txt            # Python dependencies

Notebooks

Credit Scorecard (`scoreCard.ipynb`)

Interpretable credit risk models designed for regulatory compliance and business stakeholder communication.

Key Features:

Data preprocessing pipeline: missing value imputation, outlier treatment (winsorization), multicollinearity reduction (VIF)
Weight of Evidence (WoE) transformations for interpretable feature binning
Multiple scorecard implementations: baseline and advanced models with feature interactions
Model calibration: Calibration-in-the-Large, Isotonic Regression
Performance evaluation: ROC-AUC, Kolmogorov-Smirnov statistic, Gini coefficient, population stability index
SQL code generation for production deployment

Technical Approach:

Logistic regression with L1/L2 regularization
Systematic feature engineering and interaction terms
Partial Dependence Plots (PDP) and Individual Conditional Expectation (ICE) for interpretability

Black-Box Models (`finalBlackBox.ipynb`)

High-performance machine learning models optimized for predictive accuracy with post-hoc explainability.

Key Features:

Advanced gradient boosting: LightGBM and XGBoost with custom objective functions
Support Vector Machines with kernel optimization
Automated hyperparameter tuning via Optuna
Ensemble strategies: seed averaging, blending, stacking
Model explainability: SHAP values for global and local interpretations
Business analytics: profit matrix optimization, ROI-driven threshold selection, cost-sensitive learning

Models Implemented:

LightGBM with custom loss functions
XGBoost with early stopping and cross-validation
SVM with RBF and polynomial kernels
Multi-level stacking ensemble architecture

Getting Started

Prerequisites

Python 3.8 or higher
Jupyter Notebook/Lab
Minimum 4GB RAM

Installation

Clone the repository

git clone https://github.com/BrandyBrandt/finanseML.git
cd finanseML

Create and activate virtual environment

python -m venv venv
source venv/bin/activate      # Linux/macOS
venv\Scripts\activate         # Windows

Install dependencies

pip install -r requirements.txt

Launch Jupyter

jupyter notebook

Results

Scorecard Model: ROC-AUC 0.85 - optimized for interpretability and regulatory compliance
Ensemble Model: ROC-AUC 0.92 - maximized predictive performance
Business Impact: Demonstrated profit optimization through threshold tuning and cost-sensitive modeling

Technology Stack

Machine Learning: scikit-learn, LightGBM, XGBoost
Optimization: Optuna
Explainability: SHAP, LIME, Partial Dependence Plots
Data Analysis: pandas, numpy
Visualization: matplotlib, seaborn

Documentation

MODEL_CARD.md - Comprehensive model specifications and performance metrics
TECHNICAL_REPORT.md - Detailed methodology, experiments, and results

Authors

Aleksander Brandt - GitHub
Wojciech Baraniak

License

This project is licensed under the MIT License.

Future Development

Deep learning architectures (TabNet, Neural Networks)
RESTful API for real-time scoring
Feature stability analysis across time periods
A/B testing framework for production deployment

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
data		data
.gitignore		.gitignore
MODEL_CARD.md		MODEL_CARD.md
Prezentacja.pdf		Prezentacja.pdf
README.md		README.md
TECHNICAL_REPORT.md		TECHNICAL_REPORT.md
finalBlackBox.ipynb		finalBlackBox.ipynb
requirements.txt		requirements.txt
scoreCard.ipynb		scoreCard.ipynb
zbior_10.csv		zbior_10.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Credit Risk Analysis - Machine Learning Models

Project Overview

Project Structure

Notebooks

Credit Scorecard (`scoreCard.ipynb`)

Black-Box Models (`finalBlackBox.ipynb`)

Getting Started

Prerequisites

Installation

Results

Technology Stack

Documentation

Authors

License

Future Development

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Credit Risk Analysis - Machine Learning Models

Project Overview

Project Structure

Notebooks

Credit Scorecard (scoreCard.ipynb)

Black-Box Models (finalBlackBox.ipynb)

Getting Started

Prerequisites

Installation

Results

Technology Stack

Documentation

Authors

License

Future Development

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Credit Scorecard (`scoreCard.ipynb`)

Black-Box Models (`finalBlackBox.ipynb`)

Packages