Skip to content

rosaaestrada/Loan-Default-Feature-Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

123 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Loan Default Feature Analysis and Classification

         ROC Curve

Purpose

This project was completed as my thesis for my Masters degree at National University

In this project I predicted which factors influence loan default using Logistic Regression, Random Forest, and Decision Tree algorithms.

1️⃣ First Research Question

🔸 What borrower details, financial attributes, and potential interactions predict loan default for loans issued between 2012 and 2019 using Advanced Machine Learning techniques?

2️⃣ Second Research Question

🔸 Could the occurrence of natural disasters, as witness in the United States from 2012 to 2019, be used as a predictor for loans defaults?

Methodology

This project employs a structured methodology consisting of several key stages: data cleaning, Exploratory Data Analysis (EDA), feature engineering, and feature selection. Following those steps, predictive modeling is conducted utilizing Logistic Regression, Random Forest, and Decision Tree algorithms. Finally, the project culminates with a comprehensive feature importance analysis, using techniques such as Confusion Matrix and ROC Curve evaluation.

Built with:

  • Python= 3.9.12
  • Pandas= 1.4.2
  • NumPy= 1.22.3
  • Seaborn: 0.11.2
  • Matplotlib= 3.5.1
  • Imbalanced-learn= 0.9.0
  • Scikit-learn= 1.0.2

Files:

  • Data - Contains raw data, preprocessed data, and the location where the data was collected
  • Jupyter Notebook - The full source code along with explanations as a .ipynb file
  • Python Code - The full source code along with explanations as a .py file
  • Results - Summary Statistics, Visualizations, and Final Evaluation of the project

Releases

No releases published

Packages

 
 
 

Contributors