Skip to content

jerrycyng/Customer-Behaviour-Prediction-with-Machine-Learning-and-SHAP-Interpretation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Customer-Behaviour-Prediction-with-Machine-Learning-and-SHAP-Interpretation

SHAP Introduction

For the file'Jerry_SHAP_Introduction', I am trying to explain SHAP in a simple way to boost up your basic understandings. The detailed implementation will be demonstrated inside the banking project.

Project Context

The data is related with direct marketing campaigns of a Portuguese banking institution. The classification goal is to predict if the client will subscribe a term deposit.

This dataset is downloaded from https://archive.ics.uci.edu/ml/datasets/Bank+Marketing and you can download it here as well.

Attribute Information

Input variables:

bank client data:

1 - age (numeric)

2 - job : type of job (categorical: 'admin.','blue-collar','entrepreneur','housemaid','management','retired','self-employed','services','student','technician','unemployed','unknown')

3 - marital : marital status (categorical: 'divorced','married','single','unknown'; note: 'divorced' means divorced or widowed)

4 - education (categorical: 'basic.4y','basic.6y','basic.9y','high.school','illiterate','professional.course','university.degree','unknown')

5 - default: has credit in default? (categorical: 'no','yes','unknown')

6 - balance: int

7 - housing: has housing loan? (categorical: 'no','yes','unknown')

8 - loan: has personal loan? (categorical: 'no','yes','unknown')

related with the last contact of the current campaign:

9 - contact: contact communication type (categorical: 'cellular','telephone')

10 - month: last contact month of year (categorical: 'jan', 'feb', 'mar', ..., 'nov', 'dec')

11 - day_of_week: last contact day of the week (categorical: 'mon','tue','wed','thu','fri')

12 - duration: last contact duration, in seconds (numeric). Important note: this attribute highly affects the output target (e.g., if duration=0 then y='no'). Yet, the duration is not known before a call is performed. Also, after the end of the call y is obviously known. Thus, this input should only be included for benchmark purposes and should be discarded if the intention is to have a realistic predictive model.

other attributes:

13 - campaign: number of contacts performed during this campaign and for this client (numeric, includes last contact)

14 - pdays: number of days that passed by after the client was last contacted from a previous campaign (numeric; 999 means client was not previously contacted)

15 - previous: number of contacts performed before this campaign and for this client (numeric)

16 - poutcome: outcome of the previous marketing campaign (categorical: 'failure','nonexistent','success')

Output variable (desired target):

17 - y - has the client subscribed a term deposit? (binary: 'yes','no')

In this project, we will build a machine learning model to predict dependent variable from the independent variables. Also, the objective of this project is to interpret the classification model by SHAP(SHapley Additive exPlanations), which is a game theoretic approach to explain the output of any machine learning model. Through this SHAP Interpretation, we can understand how and why the model predicts in a certain way.

Creator: Jerry Ng (CityU)

About

Banking Deposit Prediction with Python Machine Learning (XGBoost) and model interpretation with SHapley Additive exPlanations (SHAP)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors