Issue 3 – Credit History & Utilization EDA with OOP + Functional Report
Goal
Explore how credit history, balances, utilization, and inquiries relate to loan_status.
You must use:
- OOP: create a class that encapsulates credit-history EDA.
- Functional programming: build a report function that stores and runs multiple EDA steps via functions/lambdas.
Columns in scope
You are responsible for the following columns:
Ratios
dti
dti_joint
revol_util
il_util
all_util
Delinquency / history
delinq_2yrs
earliest_cr_line
mths_since_last_delinq
mths_since_last_record
mths_since_last_major_derog
Accounts & public records
open_acc
total_acc
pub_rec
acc_now_delinq
Balances & limits
revol_bal
total_rev_hi_lim
tot_coll_amt
tot_cur_bal
total_bal_il
max_bal_bc
Recent account activity
open_acc_6m
open_il_6m
open_il_12m
open_il_24m
mths_since_rcnt_il
open_rv_12m
open_rv_24m
Inquiries / pulls
inq_last_6mths
inq_last_12m
inq_fi
last_credit_pull_d
Collections
collections_12_mths_ex_med
Shared target
loan_status (used for default-rate and correlation analyses)
Files to edit
src/eda_credit_history.py
notebooks/eda_credit_history_demo.ipynb
1. Implement the CreditHistoryEDA class
In src/eda_credit_history.py, create and implement:
"""
import pandas as pd
from typing import Dict, Any, Callable
CREDIT_NUMERIC_COLS = [
"dti", "dti_joint",
"delinq_2yrs",
"mths_since_last_delinq", "mths_since_last_record", "mths_since_last_major_derog",
"open_acc", "total_acc", "pub_rec", "acc_now_delinq",
"revol_bal", "revol_util", "total_rev_hi_lim",
"tot_coll_amt", "tot_cur_bal", "total_bal_il",
"open_acc_6m", "open_il_6m", "open_il_12m", "open_il_24m",
"mths_since_rcnt_il",
"open_rv_12m", "open_rv_24m",
"max_bal_bc",
"all_util",
"inq_last_6mths", "inq_last_12m", "inq_fi",
"collections_12_mths_ex_med",
]
class CreditHistoryEDA:
def init(self, df: pd.DataFrame, target_col: str = "loan_status"):
"""
Store the full DataFrame and the name of the target column.
"""
self.df = df
self.target_col = target_col
def credit_structure_summary(self) -> pd.DataFrame:
"""
One row per CREDIT_NUMERIC_COLS column with:
- column
- dtype
- n_missing
- missing_pct
- mean (if numeric)
- std (if numeric)
"""
...
def default_rate_by_bucket(self, col: str, bins: int = 4) -> pd.DataFrame:
"""
For a numeric credit column (e.g., dti, revol_util),
create `bins` buckets and compute default rate per bucket.
Return a DataFrame with columns:
- bucket (interval)
- n_loans
- default_rate
"""
...
def correlation_with_default(self) -> pd.Series:
"""
Compute correlation of each numeric credit column with the target
(assuming loan_status is encoded as 0/1).
Return a Series indexed by column name.
"""
...
"""
2. Functional credit-history report
Add a functional report generator that coordinates several EDA steps:
"""
def credit_history_report(eda: CreditHistoryEDA) -> Dict[str, Any]:
"""
Build a dict of step_name -> callable and run them to produce
a combined report.
Example steps:
- "structure_summary": eda.credit_structure_summary
- "dti_buckets": lambda: eda.default_rate_by_bucket("dti", bins=5)
- "revol_util_buckets": lambda: eda.default_rate_by_bucket("revol_util", bins=5)
- "correlation_with_default": eda.correlation_with_default
Iterate over this dict, call each function, and return
a result dict: step_name -> output.
"""
...
"""
Example idea:
"""
def credit_history_report(eda: CreditHistoryEDA) -> Dict[str, Any]:
steps: Dict[str, Callable[[], Any]] = {
"structure_summary": eda.credit_structure_summary,
"dti_buckets": lambda: eda.default_rate_by_bucket("dti", bins=5),
"revol_util_buckets": lambda: eda.default_rate_by_bucket("revol_util", bins=5),
"correlation_with_default": eda.correlation_with_default,
}
report: Dict[str, Any] = {}
for name, func in steps.items():
report[name] = func()
return report
"""
This should clearly show higher-order functions (functions stored and called later).
3. Create the demo notebook
In notebooks/eda_credit_history_demo.ipynb:
- Load the dataset:
"""
import pandas as pd
from src.eda_credit_history import CreditHistoryEDA, credit_history_report
df = pd.read_csv("data/loan_sample.csv") # or correct path
"""
- Instantiate the EDA class:
"""
eda = CreditHistoryEDA(df, target_col="loan_status")
"""
- Run the report:
"""
report = credit_history_report(eda)
"""
- Display at least:
"""
report["structure_summary"] # structure of all credit-history columns
report["dti_buckets"] # default rate by DTI bucket
report["revol_util_buckets"] # default rate by revol_util bucket
report["correlation_with_default"] # correlation of each credit feature with default
"""
Optionally add one bar plot (e.g., default rate by DTI bucket), but stay within EDA (no modeling).
Acceptance Criteria ✅
-
CreditHistoryEDA:
- Initializes correctly with a DataFrame.
credit_structure_summary() returns a DataFrame with one row per credit-history numeric column and basic stats.
default_rate_by_bucket(col, bins) returns a DataFrame with bucket, n_loans, and default_rate.
correlation_with_default() returns a Series of correlations.
-
Functional report:
credit_history_report(eda) uses a dict of callables or similar FP construct.
- It iterates over steps, calls each function, and returns a dict of results.
-
Notebook:
- Runs top-to-bottom without errors.
- Shows structure summary, bucketed default-rate tables, and correlations.
- Contains only EDA (no model training).
Issue 3 – Credit History & Utilization EDA with OOP + Functional Report
Goal
Explore how credit history, balances, utilization, and inquiries relate to
loan_status.You must use:
Columns in scope
You are responsible for the following columns:
Ratios
dtidti_jointrevol_utilil_utilall_utilDelinquency / history
delinq_2yrsearliest_cr_linemths_since_last_delinqmths_since_last_recordmths_since_last_major_derogAccounts & public records
open_acctotal_accpub_recacc_now_delinqBalances & limits
revol_baltotal_rev_hi_limtot_coll_amttot_cur_baltotal_bal_ilmax_bal_bcRecent account activity
open_acc_6mopen_il_6mopen_il_12mopen_il_24mmths_since_rcnt_ilopen_rv_12mopen_rv_24mInquiries / pulls
inq_last_6mthsinq_last_12minq_filast_credit_pull_dCollections
collections_12_mths_ex_medShared target
loan_status(used for default-rate and correlation analyses)Files to edit
src/eda_credit_history.pynotebooks/eda_credit_history_demo.ipynb1. Implement the
CreditHistoryEDAclassIn
src/eda_credit_history.py, create and implement:"""
import pandas as pd
from typing import Dict, Any, Callable
CREDIT_NUMERIC_COLS = [
"dti", "dti_joint",
"delinq_2yrs",
"mths_since_last_delinq", "mths_since_last_record", "mths_since_last_major_derog",
"open_acc", "total_acc", "pub_rec", "acc_now_delinq",
"revol_bal", "revol_util", "total_rev_hi_lim",
"tot_coll_amt", "tot_cur_bal", "total_bal_il",
"open_acc_6m", "open_il_6m", "open_il_12m", "open_il_24m",
"mths_since_rcnt_il",
"open_rv_12m", "open_rv_24m",
"max_bal_bc",
"all_util",
"inq_last_6mths", "inq_last_12m", "inq_fi",
"collections_12_mths_ex_med",
]
class CreditHistoryEDA:
def init(self, df: pd.DataFrame, target_col: str = "loan_status"):
"""
Store the full DataFrame and the name of the target column.
"""
self.df = df
self.target_col = target_col
"""
2. Functional credit-history report
Add a functional report generator that coordinates several EDA steps:
"""
def credit_history_report(eda: CreditHistoryEDA) -> Dict[str, Any]:
"""
Build a dict of step_name -> callable and run them to produce
a combined report.
"""
Example idea:
"""
def credit_history_report(eda: CreditHistoryEDA) -> Dict[str, Any]:
steps: Dict[str, Callable[[], Any]] = {
"structure_summary": eda.credit_structure_summary,
"dti_buckets": lambda: eda.default_rate_by_bucket("dti", bins=5),
"revol_util_buckets": lambda: eda.default_rate_by_bucket("revol_util", bins=5),
"correlation_with_default": eda.correlation_with_default,
}
"""
This should clearly show higher-order functions (functions stored and called later).
3. Create the demo notebook
In
notebooks/eda_credit_history_demo.ipynb:"""
import pandas as pd
from src.eda_credit_history import CreditHistoryEDA, credit_history_report
df = pd.read_csv("data/loan_sample.csv") # or correct path
"""
"""
eda = CreditHistoryEDA(df, target_col="loan_status")
"""
"""
report = credit_history_report(eda)
"""
"""
report["structure_summary"] # structure of all credit-history columns
report["dti_buckets"] # default rate by DTI bucket
report["revol_util_buckets"] # default rate by revol_util bucket
report["correlation_with_default"] # correlation of each credit feature with default
"""
Optionally add one bar plot (e.g., default rate by DTI bucket), but stay within EDA (no modeling).
Acceptance Criteria ✅
CreditHistoryEDA:credit_structure_summary()returns a DataFrame with one row per credit-history numeric column and basic stats.default_rate_by_bucket(col, bins)returns a DataFrame with bucket, n_loans, and default_rate.correlation_with_default()returns a Series of correlations.Functional report:
credit_history_report(eda)uses a dict of callables or similar FP construct.Notebook: