An end-to-end Data Science & Machine Learning project that analyzes tourism datasets, predicts hotel prices, performs sentiment analysis on tourist reviews, and provides an interactive AI-powered dashboard using Streamlit.
Tourism generates massive amounts of data including destinations, hotels, reviews, and visitor preferences. This project transforms raw tourism data into actionable insights using:
- Data Cleaning & Processing
- Exploratory Data Analysis (EDA)
- Machine Learning Models
- Natural Language Processing (NLP)
- Interactive Web Dashboard
The system helps users explore tourism trends, analyze hotel pricing, evaluate tourist sentiments, and visualize monument datasets.
- Analyze tourism destinations data
- Predict hotel prices using Machine Learning
- Perform sentiment analysis on tourist reviews
- Build an interactive analytics dashboard
- Integrate tourism image datasets
- Demonstrate end-to-end Data Science workflow
✅ Tourism Destination Analytics ✅ Hotel Price Prediction Model ✅ Sentiment Analysis using NLP ✅ Interactive Streamlit Dashboard ✅ Monument Image Gallery ✅ Data Cleaning & Feature Engineering ✅ Visualization & Insights
Tourism-Data-Analysis/
│
├── data/
│ ├── raw/
│ │ ├── destinations.csv
│ │ ├── Hotels.csv
│ │ ├── Review_db.csv
│ │ └── Indian-monuments/
│ │ └── images/
│ │
│ └── processed/
│ ├── clean_destinations.csv
│ ├── clean_hotels.csv
│ └── clean_reviews.csv
│
├── notebooks/
│ ├── 01_data_cleaning.ipynb
│ ├── 02_eda_analysis.ipynb
│ └── 03_model_training.ipynb
│
├── models/
│ ├── price_model.pkl
│ └── sentiment_model.pkl
│
├── app.py
├── train_model.py
├── requirements.txt
└── README.md
- City
- State
- Tourist Type
- Establishment Year
- Google Rating
- Weekly Off
- Entrance Fee
- Hotel Name
- Location
- Price
- Rating
- Facilities
- Tourist Reviews
- Sentiment Labels
- Indian monuments categorized by folders
- Used for visual tourism exploration
| Category | Tools |
|---|---|
| Programming | Python |
| Data Analysis | Pandas, NumPy |
| Visualization | Matplotlib, Seaborn |
| Machine Learning | Scikit-learn |
| NLP | TF-IDF, Logistic Regression |
| Dashboard | Streamlit |
| Version Control | Git & GitHub |
- Algorithm: Linear Regression
- Input: Hotel Rating
- Output: Estimated Price
-
TF-IDF Vectorization
-
Logistic Regression Classifier
-
Classifies reviews as:
- Positive
- Negative
- Explore tourist destinations
- Visual insights & statistics
- Price distribution
- Top luxury hotels
- Predict tourist review sentiment
- ML-based hotel price estimator
- Select monument
- View categorized images
- Generated tourism insights from raw data
- Built predictive ML models
- Developed interactive analytics platform
- Integrated NLP & Computer Vision-ready dataset
- AI Trip Planner
- Budget Optimization System
- Tourism Recommendation Engine
- AI Travel Chatbot
- Real-time Travel Alerts
- Weather & Event Integration
- Personalized Travel Suggestions
Raj BTech Computer Science Engineering Student Aspiring Data Scientist & AI Developer
This project was developed as part of academic learning to demonstrate practical applications