Case Study 3 Grace Robinson
The data that I looked at was information collected on customer visits to restaurants in order to focus on factors that drive customer satisfaction. The reason that I chose this data set is because I was interested in seeing what affects how satisfied people are when they attend a resturant. I always believe that age and income had a big affect of the satisfaction levels because I though that the older one got, the more critical they became and I also thought that the more money one had, the more harsh they would be cause they are used to nicer foods. My results from the data I collected that some of my assumptions were true and some were not. My problem statement is how can restaurants improve overall customer satisfaction by identifying key drivers? My initial hypothesis that I had is that customer satisfaction is influenced primarily by service quality, food quality, and ambiance. I also thought that age also plays a role. This is because I believed that different demographics have varying expectations. I used a random sample of restaurant satisfaction data to build logistic regression models. I did this in order to predict satisfaction levels and identify factors with the highest impact. The reason that I chose logistic regression is because it is effective for binary classification problems like predicting 'High Satisfaction' (1) vs. 'Low Satisfaction' (0). Logistic regression calculates the log odds of an event, satisfaction, as a linear combination of predictor variables. A sigmoid function maps these odds to probabilities. Key metrics like the Area Under the Curve (AUC) will evaluate model performance. When looking at the results it can be seen that the logistic regression models highlight service rating as the most significant predictor of satisfaction. Adjustments to service processes may be the way to improve satisfaction. Insights from this analysis can help restaurants prioritize investments, such as training staff or upgrading menu offerings, to target areas with the most potential for boosting customer satisfaction. I started my analysis of the data by exploring the relationships between various factors influencing customer satisfaction, such as age, service rating, food quality, and ambiance. Another thing that I looked into is how, and if, Income affected anything with satisfaction. My results showed that Income does not have a significant impact on overall satisfaction. These results with income showed me that it didn’t really have a huge impact of the overall satisfaction of the customers so I continued looking into different factors that could affect customer satisfaction. An exploratory plot of Age vs. High Satisfaction revealed that younger customers generally report lower satisfaction scores. This showed that there were specific factors that went into affecting overall satisfaction. The plots of age against other variables like Service Rating highlight the area where improvements could make a significant impact on the increase of higher satisfaction ratings. When I looked at this plot of age and service rating, I saw that there was a decline with age and service ratings. Another important finding is when looking at the Mean Satisfaction vs. Age Group plot. This plot showed that there was a slight incline in satisfaction as age increases. These results were not exactly what I was expecting as I expected customers older in age to be more critical, but overall, the younger age customers were. These results however are helpful as they suggest that specific demographics may require tailored services to boost satisfaction. When I just plotted the data in the exploratory plot, I was not able to get much information from it because there was just so much going on. So, I made other plots like the Conditional Density Plot and Estimated Category Probability plot which show how the satisfaction of a customer changed with age. I also employed logistic regression to predict customer satisfaction ('High Satisfaction' vs. 'Low Satisfaction'). This approach models the probability of a positive outcome as a function of predictor variables like Age, Service Rating, Food Rating, and Ambiance Rating. In order for me to do this, binary columns were created for key variables such as ServiceRatingBinary, FoodRatingBinary, and AmbianceRatingBinary. These plots helped me identify the specific factors that were affecting overall satisfaction of customers to help me identify what specific areas could improve. I created more plots such as a Receiver Operating Characteristic (ROC) curve plot which evaluates model performance by plotting true positive rates vs. false positive rates. The Area Under the Curve (AUC) is then looked at with these plots to quantify how well the model distinguishes between high satisfied and low satisfied customers. In order for me to validate the model's performance, a random baseline is compared, where satisfaction is randomly assigned. The results from these plots showed that there is no significant gap between the AUC of the trained model and the random model. Due to this, it is shown that the predictors used less meaningful. The model achieves an area under the curve of 0.5303, indicating semi good model performance. This is shown by the Model Evaluation (ROC Curve). The next thing I did was I created random data and built a model on it in order to test the effectiveness of my model. Only the satisfaction variable was changed. All of this showed me that my model worked well as the random model did not work as well when looking at the area under the curve for the random model. Also, when going back to the Logistic Regression Result plots, I discovered that service rating has the highest coefficient, showing it is the strongest driver of satisfaction. By visualizing relationships and building a logistic regression model, my analysis uncovers actionable insights to help restaurants focus their efforts on factors that matter most to customers.