🧑💻 Subset 1 Numeric & Borrower Preprocessor
** Build a functional + OOP numeric/borrower cleaning & transformation block (subset 1) **
You will work with these feature columns:
SUBSET1_COLS = [
"id",
"member_id",
# numeric – amounts / risk basics
"loan_amnt",
"funded_amnt",
"funded_amnt_inv",
"int_rate",
"installment",
"annual_inc",
"dti",
"delinq_2yrs",
"inq_last_6mths",
"mths_since_last_delinq",
"mths_since_last_record",
"open_acc",
"pub_rec",
"revol_bal",
"revol_util",
"total_acc",
"out_prncp",
# categorical / string
"term",
"grade",
"sub_grade",
"emp_title",
"emp_length",
"home_ownership",
"verification_status",
"issue_d",
]
Treat loan_status as the target (y), not as an input feature here.
🎯 Goal
Create a numeric + basic borrower transformer that:
- Cleans numeric values (invalid, missing, outliers) for subset 1 numeric columns
- Converts some string fields (
term, emp_length, grade) into useful numeric form
- Is implemented as a sklearn-compatible transformer (OOP)
- Uses pure functions internally for the cleaning logic (FP)
✅ Tasks
def clean_invalid_numeric(df: pd.DataFrame, cols: list[str]) -> pd.DataFrame:
# Replace impossible numeric values (e.g. annual_inc <= 0, dti < 0 or > 200) with NaN.
...
def handle_missing_numeric(df: pd.DataFrame, cols: list[str]) -> pd.DataFrame:
# Fill NaNs using median per column.
...
def clip_numeric_outliers(
df: pd.DataFrame,
cols: list[str],
upper_quantile: float = 0.99,
) -> pd.DataFrame:
# Clip numeric features at the given quantile.
...
def normalize_term(df: pd.DataFrame) -> pd.DataFrame:
# Convert 'term' like '36 months' -> numeric 36.
...
def normalize_emp_length(df: pd.DataFrame) -> pd.DataFrame:
# Map '10+ years', '< 1 year', etc. to numeric years (float).
...
def encode_grade_ordinal(df: pd.DataFrame) -> pd.DataFrame:
# Map grade A–G to 1–7; optionally create sub_grade_ord too.
...
def apply_numeric_steps(
df: pd.DataFrame,
cols: list[str],
steps: list,
) -> pd.DataFrame:
# Apply a list of functions(df, cols) -> df in sequence.
for step in steps:
df = step(df, cols)
return df
from sklearn.base import BaseEstimator, TransformerMixin
import pandas as pd
class Subset1NumericBorrowerTransformer(BaseEstimator, TransformerMixin):
def __init__(self, numeric_cols: list[str], upper_quantile: float = 0.99):
self.numeric_cols = numeric_cols
self.upper_quantile = upper_quantile
# store per-column medians, quantiles, etc.
self.medians_ = {}
self.quantiles_ = {}
def fit(self, X, y=None):
df = pd.DataFrame(X, columns=self.numeric_cols).copy()
# learn medians, quantiles etc. from df
for col in self.numeric_cols:
self.medians_[col] = df[col].median()
self.quantiles_[col] = df[col].quantile(self.upper_quantile)
return self
def transform(self, X):
df = pd.DataFrame(X, columns=self.numeric_cols).copy()
# 1) clean invalid values
df = clean_invalid_numeric(df, self.numeric_cols)
# 2) fill NaNs using stored medians
for col in self.numeric_cols:
df[col] = df[col].fillna(self.medians_[col])
# 3) clip using stored quantiles
for col in self.numeric_cols:
df[col] = df[col].clip(upper=self.quantiles_[col])
return df.values
subset1_numeric_cols = [
"loan_amnt", "funded_amnt", "funded_amnt_inv", "int_rate", "installment",
"annual_inc", "dti", "delinq_2yrs", "inq_last_6mths", "mths_since_last_delinq",
"mths_since_last_record", "open_acc", "pub_rec", "revol_bal", "revol_util",
"total_acc", "out_prncp",
]
transformer = Subset1NumericBorrowerTransformer(numeric_cols=subset1_numeric_cols)
transformer.fit(X_train[subset1_numeric_cols])
X_train_num = transformer.transform(X_train[subset1_numeric_cols])
✅ Acceptance Criteria
- Subset-1 numeric & borrower-related features are cleaned and transformed
- Implementation uses pure functions +
Subset1NumericBorrowerTransformer OOP
- Tests cover at least:
- 1 invalid value scenario
- 1 NaN scenario
- 1 string→numeric conversion
🧑💻 Subset 1 Numeric & Borrower Preprocessor
** Build a functional + OOP numeric/borrower cleaning & transformation block (subset 1) **
You will work with these feature columns:
🎯 Goal
Create a numeric + basic borrower transformer that:
term,emp_length,grade) into useful numeric form✅ Tasks
src/cleaning_numeric.py, implement pure numeric cleaning functions:src/cleaning_borrower.py, implement pure “string → numeric” functions:src/transformers.py, implementSubset1NumericBorrowerTransformer:tests/test_subset1_transformer.py:transform, no NaNs in subset-1 numeric columnsannual_inc,dtiwithin a reasonable rangeterm/emp_lengthare numeric✅ Acceptance Criteria
Subset1NumericBorrowerTransformerOOP