🚀 Live App:
👉 https://churnxgboost.streamlit.app/
Customer churn is a critical business problem where companies lose customers to competitors.
This project builds an end-to-end Machine Learning pipeline to predict whether a customer will churn based on demographic and financial features.
The final deployed model uses XGBoost, selected after comparing multiple models.
- Identify customers likely to churn
- Enable targeted retention strategies
- Reduce customer acquisition cost
- Improve long-term revenue
- Total Records: 10,000
- Target Variable:
Churn(0 = No, 1 = Yes) - Class Distribution: ~80% Non-Churn, ~20% Churn
- Features:
- Credit Score
- Geography
- Gender
- Age
- Tenure
- Balance
- Number of Products
- Has Credit Card
- Is Active Member
- Estimated Salary
- Identified class imbalance
- Analyzed feature distributions
- Detected and handled outliers using IQR
- Correlation analysis performed
- Key insights:
- Germany showed higher churn rate
- Inactive members churn more
- Age positively correlates with churn
- Removed duplicates
- OneHotEncoded categorical features:
- Geography
- Gender
- Applied StandardScaler (for distance-based models)
- Used Stratified Train-Test Split (80-20)
- Handled class imbalance using:
class_weightscale_pos_weightin XGBoost
The following models were trained and compared:
- Logistic Regression
- K-Nearest Neighbors (KNN)
- Support Vector Classifier (SVC)
- Decision Tree
- Random Forest
- XGBoost
Evaluation Metrics Used:
- Accuracy
- Precision
- Recall
- F1 Score
- Confusion Matrix
Selected Model: XGBoost
Reasons:
- Highest Recall (best churn detection)
- Strong F1 Score
- Balanced performance
- Controlled overfitting
The final model was deployed using:
- Streamlit (Frontend + Backend)
- XGBoost model serialized using Pickle
- Hosted on Streamlit Cloud
🔗 Live App:
https://churnxgboost.streamlit.app/