Classification is a supervised learning technique used to predict discrete traget variables using set of features/attributes.
The aim of the project is to use client data to predict if the client will subscribe to term deposit or not.
- Data Visualization
- Data Transformation - Encoding
- Heatmaps for correlation
- Feature Engineering
- Model Building
- Predictive Modelling
- Logistic Regression
- Decision Tree Classifier
- Python
- Pandas
- matplotlib and seaborn
- sci-kit learn
Term deposits are a major source of income for a bank. A term deposit is a cash investment held at a financial institution. The bank has various outreach plans to sell term deposits to their customers such as email marketing, advertisements, telephonic marketing and digital marketing.
Telephonic marketing campaigns still remain one of the most effective way to reach out to people. However, they require huge investment as large call centers are hired to actually execute these campaigns. Hence, it is crucial to identify the customers most likely to convert beforehand so that they can be specifically targeted via call.
Client personal data such as age of the client, their job type, their marital status, etc along with the call information such as the duration of the call, day and month of the call, etc is used to predict if the client will subscribe to term deposit or not.
We use Classification to predict the same.
- Import the required modules for Python.
- Import the training data as a Data Frame.
- Print the head of the data.
- The basic
infois printed. - The column names of attributes is also printed.
'ID'column is dropped.'subscribed'is indentified as the Target Variable.- Countplot of
'subscribed'is plotted.
- Stacked Barplot of
'Job'vs Frequency is plotted such that it shows how many have subscribed or not.
-
LabelEncoderfromsklearn.preprocessingis used to convert all categorical variables to numeric variables. -
Heatmap is plotted to check the correlation among the variables.
-
Correlation Table is also created.
-
Dependent and Independent Variables are separated.
-
train_test_splitfromsklearn.model_selectionis used to split the dependent and independent variables into Training and Validation sets.
LogisticRegressionfromsklearn.linear_modelis initialized usinglogi.X_trainandy_trainare fit tologi.- Prediction of
y_valis done by applyingpredictonX_val. - The model scores are calculated.
DecisionTreeClassifierfromsklearn.treeis initialized suingdtc.X_trainandy_trainare fit todtc.- Prediction of
y_valis done by applyingpredictonX_val. - The model scores are calculated.
- Test data is imported as a Data Frame.
- Feature Engineering and Data Transformation is done on Testing data.
predictis used to obtain predictions.- csv file of predicted values is created as
'submission.csv'.
-
Logistic Regression
- accuracy = 0.8829383886255924
-
Decision Tree Regressor
- accuracy = 0.8924170616113745
- Some optional data exploration is done on test data to understand it better.
https://www.linkedin.com/in/naveen-a-902a671b3/
Internshala Data Science Course.


