Skip to content

Credit History & Utilization EDA #3

@RigelAlgebar

Description

@RigelAlgebar

Issue 3 – Credit History & Utilization EDA with OOP + Functional Report

Goal
Explore how credit history, balances, utilization, and inquiries relate to loan_status.

You must use:

  • OOP: create a class that encapsulates credit-history EDA.
  • Functional programming: build a report function that stores and runs multiple EDA steps via functions/lambdas.

Columns in scope

You are responsible for the following columns:

Ratios

  • dti
  • dti_joint
  • revol_util
  • il_util
  • all_util

Delinquency / history

  • delinq_2yrs
  • earliest_cr_line
  • mths_since_last_delinq
  • mths_since_last_record
  • mths_since_last_major_derog

Accounts & public records

  • open_acc
  • total_acc
  • pub_rec
  • acc_now_delinq

Balances & limits

  • revol_bal
  • total_rev_hi_lim
  • tot_coll_amt
  • tot_cur_bal
  • total_bal_il
  • max_bal_bc

Recent account activity

  • open_acc_6m
  • open_il_6m
  • open_il_12m
  • open_il_24m
  • mths_since_rcnt_il
  • open_rv_12m
  • open_rv_24m

Inquiries / pulls

  • inq_last_6mths
  • inq_last_12m
  • inq_fi
  • last_credit_pull_d

Collections

  • collections_12_mths_ex_med

Shared target

  • loan_status (used for default-rate and correlation analyses)

Files to edit

  • src/eda_credit_history.py
  • notebooks/eda_credit_history_demo.ipynb

1. Implement the CreditHistoryEDA class

In src/eda_credit_history.py, create and implement:

"""
import pandas as pd
from typing import Dict, Any, Callable

CREDIT_NUMERIC_COLS = [
"dti", "dti_joint",
"delinq_2yrs",
"mths_since_last_delinq", "mths_since_last_record", "mths_since_last_major_derog",
"open_acc", "total_acc", "pub_rec", "acc_now_delinq",
"revol_bal", "revol_util", "total_rev_hi_lim",
"tot_coll_amt", "tot_cur_bal", "total_bal_il",
"open_acc_6m", "open_il_6m", "open_il_12m", "open_il_24m",
"mths_since_rcnt_il",
"open_rv_12m", "open_rv_24m",
"max_bal_bc",
"all_util",
"inq_last_6mths", "inq_last_12m", "inq_fi",
"collections_12_mths_ex_med",
]

class CreditHistoryEDA:
def init(self, df: pd.DataFrame, target_col: str = "loan_status"):
"""
Store the full DataFrame and the name of the target column.
"""
self.df = df
self.target_col = target_col

def credit_structure_summary(self) -> pd.DataFrame:
    """
    One row per CREDIT_NUMERIC_COLS column with:
    - column
    - dtype
    - n_missing
    - missing_pct
    - mean (if numeric)
    - std (if numeric)
    """
    ...

def default_rate_by_bucket(self, col: str, bins: int = 4) -> pd.DataFrame:
    """
    For a numeric credit column (e.g., dti, revol_util),
    create `bins` buckets and compute default rate per bucket.

    Return a DataFrame with columns:
    - bucket (interval)
    - n_loans
    - default_rate
    """
    ...

def correlation_with_default(self) -> pd.Series:
    """
    Compute correlation of each numeric credit column with the target
    (assuming loan_status is encoded as 0/1).
    Return a Series indexed by column name.
    """
    ...

"""


2. Functional credit-history report

Add a functional report generator that coordinates several EDA steps:

"""
def credit_history_report(eda: CreditHistoryEDA) -> Dict[str, Any]:
"""
Build a dict of step_name -> callable and run them to produce
a combined report.

Example steps:
  - "structure_summary": eda.credit_structure_summary
  - "dti_buckets": lambda: eda.default_rate_by_bucket("dti", bins=5)
  - "revol_util_buckets": lambda: eda.default_rate_by_bucket("revol_util", bins=5)
  - "correlation_with_default": eda.correlation_with_default

Iterate over this dict, call each function, and return
a result dict: step_name -> output.
"""
...

"""

Example idea:

"""
def credit_history_report(eda: CreditHistoryEDA) -> Dict[str, Any]:
steps: Dict[str, Callable[[], Any]] = {
"structure_summary": eda.credit_structure_summary,
"dti_buckets": lambda: eda.default_rate_by_bucket("dti", bins=5),
"revol_util_buckets": lambda: eda.default_rate_by_bucket("revol_util", bins=5),
"correlation_with_default": eda.correlation_with_default,
}

report: Dict[str, Any] = {}
for name, func in steps.items():
    report[name] = func()
return report

"""

This should clearly show higher-order functions (functions stored and called later).


3. Create the demo notebook

In notebooks/eda_credit_history_demo.ipynb:

  1. Load the dataset:

"""
import pandas as pd
from src.eda_credit_history import CreditHistoryEDA, credit_history_report

df = pd.read_csv("data/loan_sample.csv") # or correct path
"""

  1. Instantiate the EDA class:

"""
eda = CreditHistoryEDA(df, target_col="loan_status")
"""

  1. Run the report:

"""
report = credit_history_report(eda)
"""

  1. Display at least:

"""
report["structure_summary"] # structure of all credit-history columns
report["dti_buckets"] # default rate by DTI bucket
report["revol_util_buckets"] # default rate by revol_util bucket
report["correlation_with_default"] # correlation of each credit feature with default
"""

Optionally add one bar plot (e.g., default rate by DTI bucket), but stay within EDA (no modeling).


Acceptance Criteria ✅

  • CreditHistoryEDA:

    • Initializes correctly with a DataFrame.
    • credit_structure_summary() returns a DataFrame with one row per credit-history numeric column and basic stats.
    • default_rate_by_bucket(col, bins) returns a DataFrame with bucket, n_loans, and default_rate.
    • correlation_with_default() returns a Series of correlations.
  • Functional report:

    • credit_history_report(eda) uses a dict of callables or similar FP construct.
    • It iterates over steps, calls each function, and returns a dict of results.
  • Notebook:

    • Runs top-to-bottom without errors.
    • Shows structure summary, bucketed default-rate tables, and correlations.
    • Contains only EDA (no model training).

Metadata

Metadata

Assignees

Labels

No labels
No labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions