GitHub - sharmajee499/Customer_Churn_Analysis: Customer Churn Prediction for Telemarketing Campaign

Problem

Churn analysis is all about decreasing the Churn Rate, which is defined as the rate at which customers stop doing business with an entity. The same definition can be utilized in the context of employees for a corporation or company, students for a university, etc. The sole purpose of the analysis is to understand the customer through their buying behavior so that respective authorities can stop them from leaving said business/service. Nowadays, with the development of data-driven methodologies, various machine learning (ML) techniques automatically identify the customer with a high churn rate so that managerial personnel can take the necessary steps to decrease this rate. Artificial Intelligence (AI) automation increases efficiency and garners a quick response in decision-making with respect to churn rates.

Dataset

The dataset was extracted from the open-source dataset from a hypothetical telecommunication industry published by IBM that can be found via this link The datset consist of 7045 instances with 20 features. The target variable was 'Churn' which is binary consisting of 'Yes' if the customer churned/left or 'No' if the customer stayed. The dataset was also imbalanced consisting of 73.463 % of 'No' and 26.537 % of 'Yes' values. SMOTE was used to oversample the minority class.

Modeling & Result

Open Source Auto-ML- PyCaret, was utilized to model. From the analysis, we found that the Random Forest outperformed any other model. Moreover, grid search on RF was performed to further improve the model performance. We were able to acheive accuracy of 76.61% and F1-score of 77%.

Explainability and Interpretability

For the model explanation, LIME (Local Interpretable Mode-Agnostic Explanation) is used on the top of the random forest algorithm. LIME provides the explanation for every single observation, explaining the features that do or do not support the target variable Churn.

Deployment

The developed model and interpretability of the model is deployed as a web app through the use of Streamlit. This app taked the input and outputs the probability of the customer churn along side the LIME explainer showing the featurea supporting as well as contradicting the model result. The app can be accessed through this link

Libraries Used: Numpy, Pandas, Scikit-Learn, PyCaret, LIME, Streamlit

Libraries Installation

Install the required libraries by pip install -r requirements.txt in your anaconda console. Make sure you have a different enviroment in conda for this project.

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
Data_Analysis_Modeling.ipynb		Data_Analysis_Modeling.ipynb
README.md		README.md
lime_exp_info.png		lime_exp_info.png
requirements.txt		requirements.txt
rf.pkl		rf.pkl
streamlite.py		streamlite.py
train_data.csv		train_data.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Problem

Dataset

Modeling & Result

Explainability and Interpretability

Deployment

Libraries Installation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Problem

Dataset

Modeling & Result

Explainability and Interpretability

Deployment

Libraries Installation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages