You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
FlyTextMetrics & Explainable Machine Learning for Airline Ticket Price Prediction
🚀 Project Overview
Extracted customer reviews of British Airways flights from a travel review website using web scraping.
Classified reviews into positive, negative, and neutral categories using NLP.
Visualized sentiment trends using charts and word clouds.
Analyzed flight booking data to discover customer ticket buying behavior.
Applied machine learning to predict which days of the week typically have the most flight bookings.
Applied Model Explainability: An XGBoost-based pricing model is developed and interpreted using SHAP to understand both global pricing drivers and individual ticket-level predictions.
🛠️ Tech Stack
Area
Tools & Libraries
Web Scraping
BeautifulSoup, requests
Data Processing
pandas, numpy
NLP & Sentiment Analysis
nltk, spaCy, TextBlob
Visualization
matplotlib, seaborn, wordcloud
Machine Learning
scikit-learn (RandomForest, DecisionTree, etc.)
📥 Data Collection
Scraped British Airways customer reviews using BeautifulSoup and requests.
Parsed HTML pages to extract review text, rating, review date, and reviewer details.
Loaded separate dataset for flight bookings containing booking date, flight details, and ticket class.
🧠 Sentiment Analysis
Cleaned review texts (removal of stopwords, punctuation, lemmatization).
Labeled reviews into:
Positive
Negative
Neutral
Applied polarity scoring using TextBlob and keyword-based tagging for validation.
🌐 Text Visualization
Word Clouds: Generated for each sentiment group to highlight frequent terms.
Bar Charts & Pie Charts: To show the proportion of sentiment types and keyword frequency.
Explored trends based on routes, review dates, and classes.
📊 Booking Pattern Analysis
Preprocessed flight booking data and extracted relevant features (e.g., day of the week, seasonality).
Explored booking frequency trends visually.
Trained machine learning models to predict the most common days for ticket purchases.
Evaluated model performance using accuracy and cross-validation.
Model Explainability (SHAP)
Global explainability identifies key drivers of ticket pricing such as booking origin, length of stay, and flight duration.
Local explainability explains individual ticket predictions by showing how specific features increase or decrease the predicted price.
💡 Key Insights
Common issues found in negative reviews included delays, customer service, and seat comfort.
Positive reviews focused on cleanliness, flight staff behavior, and on-time performance.
Most bookings occurred mid-week, suggesting strategic pricing/marketing opportunities.
Shorter trips tend to be priced higher, indicating business travel patterns.
Longer flight durations increase ticket prices due to operational and fuel costs.
Ancillary selections such as extra baggage and preferred seating signal higher willingness to pay and slightly increase prices.
About
This data science project focuses on analyzing customer sentiment and booking patterns for British Airways flights. It combines web scraping, Natural Language Processing (NLP), data visualization, and machine learning to extract actionable insights.