- Kaggle.com: rosaaestrada - Predicting Factors Influencing Loan Default in U.S
- Kaggle.com: rosaaestrada - Natural Disasters as Predictor for Loan Default
This project was completed as my thesis for my Masters degree at National University
In this project I predicted which factors influence loan default using Logistic Regression, Random Forest, and Decision Tree algorithms.
🔸 What borrower details, financial attributes, and potential interactions predict loan default for loans issued between 2012 and 2019 using Advanced Machine Learning techniques?
🔸 Could the occurrence of natural disasters, as witness in the United States from 2012 to 2019, be used as a predictor for loans defaults?
This project employs a structured methodology consisting of several key stages: data cleaning, Exploratory Data Analysis (EDA), feature engineering, and feature selection. Following those steps, predictive modeling is conducted utilizing Logistic Regression, Random Forest, and Decision Tree algorithms. Finally, the project culminates with a comprehensive feature importance analysis, using techniques such as Confusion Matrix and ROC Curve evaluation.
- Python= 3.9.12
- Pandas= 1.4.2
- NumPy= 1.22.3
- Seaborn: 0.11.2
- Matplotlib= 3.5.1
- Imbalanced-learn= 0.9.0
- Scikit-learn= 1.0.2
- Data - Contains raw data, preprocessed data, and the location where the data was collected
- Jupyter Notebook - The full source code along with explanations as a .ipynb file
- Python Code - The full source code along with explanations as a .py file
- Results - Summary Statistics, Visualizations, and Final Evaluation of the project
