This project aims to predict whether a hotel booking will be canceled based on various customer and reservation-related features. By leveraging machine learning techniques and modern deployment practices, the solution enables proactive management of hotel reservations to improve operational efficiency.
Hotel booking cancellations lead to significant revenue losses and poor resource planning. This project helps hotels predict the likelihood of a booking being canceled based on customer behavior, booking patterns, and preferences.
type_of_meal_planrequired_car_parking_spaceroom_type_reserved
no_of_adultsno_of_childrenno_of_weekend_nightsno_of_week_nightslead_timearrival_dateavg_price_per_roomno_of_special_requests
- Data stored and maintained in a Google Cloud Storage (GCP Bucket).
- Codebase built using VS Code with Python.
- Versioning via GitHub.
- Data pulled directly from GCP Bucket into local development environment.
- Univariate & Bivariate Analysis to explore data trends.
- Label Encoding for categorical features.
- Variance Inflation Factor (VIF) to detect multicollinearity.
- Handling skewness, imbalanced datasets.
- Feature selection using RandomForest importance scores.
Models compared based on key metrics:
- ✅ Accuracy
- 📌 Precision
- 🔁 Recall
- 🎯 F1 Score
Models Tested:
- Random Forest (RF)
- Gradient Boosting
- AdaBoost
- Logistic Regression
- Support Vector Classifier (SVC)
- K-Nearest Neighbors (KNN)
- Gaussian Naive Bayes (NB)
- LightGBM (LGBM)
- XGBoost (XGB)
✅ Best Model: Random Forest Classifier
- Using RandomizedSearchCV to optimize:
bootstrapmax_depthmin_samples_leafmin_samples_splitn_estimators
- Final model trained and serialized using
joblib.
- Integrated MLFlow for tracking experiments, metrics, and models.
- Modular training pipeline created for scalability and reusability.
- Code Versioning: Git + GitHub
- Data Versioning: GitHub + GCP bucket logging
-
Built using Flask to interact with the trained model.
-
Accepts user input and predicts if the booking will be canceled.
-
Create virtual environment using Jenkins pipeline.
-
Dockerize the project.
-
Build Docker image and push to Google Cloud Container Registry (GCR).
-
Deploy image to Google Cloud Run for seamless scalability.
- Python 🐍
- Scikit-learn, Pandas, NumPy, Matplotlib, Seaborn
- MLFlow for experiment tracking
- Flask for UI/API
- Docker, Jenkins for CI/CD
- Google Cloud Platform (GCS, GCR, Cloud Run)
The dataset used for this project can be accessed here on Kaggle.
This project is part of a course taught by Sudhanshu Gusain on udemy.com.
Thanks to the instructor for a practical, hands-on approach to solving real-world ML problems!


