Advanced machine learning project for credit risk assessment combining interpretable scorecards with state-of-the-art ensemble models for optimal predictive performance and business value.
This project implements a comprehensive credit risk modeling system featuring:
- Interpretable Credit Scorecards with Weight of Evidence (WoE) transformations for regulatory compliance
- High-Performance Ensemble Models (LightGBM, XGBoost, SVM) with SHAP explainability
- Business-Driven Optimization including profit maximization and threshold analysis
finanseML/
├── finalBlackBox.ipynb # Ensemble models with hyperparameter optimization
├── scoreCard.ipynb # Interpretable credit scorecards
├── data/ # Train/validation/test datasets
├── grid_search_results/ # Model optimization results
├── MODEL_CARD.md # Comprehensive model documentation
├── TECHNICAL_REPORT.md # Detailed methodology and findings
└── requirements.txt # Python dependencies
Interpretable credit risk models designed for regulatory compliance and business stakeholder communication.
Key Features:
- Data preprocessing pipeline: missing value imputation, outlier treatment (winsorization), multicollinearity reduction (VIF)
- Weight of Evidence (WoE) transformations for interpretable feature binning
- Multiple scorecard implementations: baseline and advanced models with feature interactions
- Model calibration: Calibration-in-the-Large, Isotonic Regression
- Performance evaluation: ROC-AUC, Kolmogorov-Smirnov statistic, Gini coefficient, population stability index
- SQL code generation for production deployment
Technical Approach:
- Logistic regression with L1/L2 regularization
- Systematic feature engineering and interaction terms
- Partial Dependence Plots (PDP) and Individual Conditional Expectation (ICE) for interpretability
High-performance machine learning models optimized for predictive accuracy with post-hoc explainability.
Key Features:
- Advanced gradient boosting: LightGBM and XGBoost with custom objective functions
- Support Vector Machines with kernel optimization
- Automated hyperparameter tuning via Optuna
- Ensemble strategies: seed averaging, blending, stacking
- Model explainability: SHAP values for global and local interpretations
- Business analytics: profit matrix optimization, ROI-driven threshold selection, cost-sensitive learning
Models Implemented:
- LightGBM with custom loss functions
- XGBoost with early stopping and cross-validation
- SVM with RBF and polynomial kernels
- Multi-level stacking ensemble architecture
- Python 3.8 or higher
- Jupyter Notebook/Lab
- Minimum 4GB RAM
- Clone the repository
git clone https://github.com/BrandyBrandt/finanseML.git
cd finanseML- Create and activate virtual environment
python -m venv venv
source venv/bin/activate # Linux/macOS
venv\Scripts\activate # Windows- Install dependencies
pip install -r requirements.txt- Launch Jupyter
jupyter notebook- Scorecard Model: ROC-AUC 0.85 - optimized for interpretability and regulatory compliance
- Ensemble Model: ROC-AUC 0.92 - maximized predictive performance
- Business Impact: Demonstrated profit optimization through threshold tuning and cost-sensitive modeling
Machine Learning: scikit-learn, LightGBM, XGBoost
Optimization: Optuna
Explainability: SHAP, LIME, Partial Dependence Plots
Data Analysis: pandas, numpy
Visualization: matplotlib, seaborn
- MODEL_CARD.md - Comprehensive model specifications and performance metrics
- TECHNICAL_REPORT.md - Detailed methodology, experiments, and results
Aleksander Brandt - GitHub
Wojciech Baraniak
This project is licensed under the MIT License.
- Deep learning architectures (TabNet, Neural Networks)
- RESTful API for real-time scoring
- Feature stability analysis across time periods
- A/B testing framework for production deployment