Predict the Success of Bank telemarketing

MLP Project T32024

Overview

MLP Project T32024
After joining the competition go to the code tab and create a New Notebook.
Keep notebook name as YourRollNo-notebook- t32024 . (i.e 21f1001234-notebook-t32024)
Share the notebook with iitmbscs2008p collaborator. View access is sufficient and make/keep it private.
Build your first ML model (dummy or any other)
predict target of the test data as store as given in sample_submission.csv
store predictions in a DataFrame with "id" and "target columns.
save the DataFrame in a csv and make a submission using that csv
Change your team name to Your Roll No (i.e 21f1001234)

Description

The data is related with direct marketing campaigns of a banking institution. The marketing campaigns were based on phone calls. Often, more than one contact to the same client was required, in order to access if the product (bank term deposit) would be ('yes') or not ('no') subscribed.

Evaluation

The evaluation section describes how submissions will be scored and how participants should format their submissions.

Submissions are evaluated on f1_score(average='macro') between the predicted classes and the True target.

Submission File

For each id in the test set, you must predict a class for the target variable. The file should contain a header and have the following format:

id	target
0	"yes"
1	"no"
etc.	"yes/no"

Dataset Description

The data is related with direct marketing campaigns of a banking institution. The marketing campaigns were based on phone calls. Often, more than one contact to the same client was required, in order to access if the product (bank term deposit) would be ('yes') or not ('no') subscribed.

Files

train.csv - the training set test.csv - the test set sample_submission.csv - a sample submission file in the correct format

Input variables:

last contact date: last contact date
age (numeric)
job : type of job
marital : marital status (categorical: "married","divorced","single"; note: "divorced" means divorced or widowed)
education (categorical: "unknown","secondary","primary","tertiary")
default: has credit in default? (binary: "yes","no")
balance: average yearly balance, in euros (numeric)
housing: has housing loan? (binary: "yes","no")
loan: has personal loan? (binary: "yes","no")
contact: contact communication type (categorical: "unknown","telephone","cellular")
duration: last contact duration, in seconds (numeric)
campaign: number of contacts performed during this campaign and for this client (numeric, includes last contact)
pdays: number of days that passed by after the client was last contacted from a previous campaign (numeric, -1 means client was not previously contacted)
previous: number of contacts performed before this campaign and for this client (numeric)
poutcome: outcome of the previous marketing campaign (categorical: "unknown","other","failure","success")

Output variable (desired target):

16 target: has the client subscribed a term deposit? (binary: "yes","no")

Notebook and Working

The notebook for predicting the success of bank telemarketing follows a series of steps and methodologies to achieve the final prediction. Below are the details:

Data Preprocessing

Loading Data: The dataset is loaded into a pandas DataFrame.
Handling Missing Values: Missing values are handled by either filling them with appropriate values or dropping the rows/columns.
Encoding Categorical Variables: Categorical variables are encoded using techniques such as one-hot encoding or label encoding.
Feature Scaling: Numerical features are scaled using techniques like StandardScaler or MinMaxScaler.

Feature Engineering

Creating New Features: New features are created based on existing features to provide more information to the model.
Feature Selection: Important features are selected based on their correlation with the target variable or using feature importance from models.
PCA: Implemented Principal component Analysis for dimensionality reduction.

Model Training

Splitting Data: The data is split into training and validation sets.
Choosing Model: Machine learning models such as Logistic Regression, Decision Trees, Random Forest, or Gradient Boosting are chosen.
Training Model: The chosen model is trained on the training data.
Hyperparameter Tuning: Hyperparameters of the model are tuned using techniques like GridSearchCV or RandomizedSearchCV.

Evaluation Metrics

Accuracy: The accuracy of the model is calculated on the validation set.
F1 Score: The F1 score (macro) is calculated to evaluate the model's performance.
Confusion Matrix: A confusion matrix is plotted to visualize the performance of the model.

Libraries and Tools

Pandas: For data manipulation and analysis.
NumPy: For numerical computations.
Scikit-learn: For machine learning algorithms and evaluation metrics.
Matplotlib/Seaborn: For data visualization.

Links

kaggle competitions link.
kaggle notebook link.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
Dataset		Dataset
notebook		notebook
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Predict the Success of Bank telemarketing

Overview

Description

Evaluation

Submission File

Dataset Description

Files

Input variables:

Output variable (desired target):

Notebook and Working

Data Preprocessing

Feature Engineering

Model Training

Evaluation Metrics

Libraries and Tools

Links

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Predict the Success of Bank telemarketing

Overview

Description

Evaluation

Submission File

Dataset Description

Files

Input variables:

Output variable (desired target):

Notebook and Working

Data Preprocessing

Feature Engineering

Model Training

Evaluation Metrics

Libraries and Tools

Links

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages