GitHub - mollyincali/twittersentiment: Sentiment analysis and NLP exploration of customer service tweets.

Sentiment Analysis on Customer Support Tweets

There are so many ways to find out how customers feel about your brand and products, and sentiment analysis is extremely useful in monitoring your customer’s perception. The dataset used for this exploratory analysis project was obtained from Kaggle, it consists of 2.8 million tweets between customers and companies. The goal is to analyze the tone in the conversation on Twitter between customers and companies by looking at a variety of different features.

Question

"Does the initial outreach message from a company result in an increase in positive sentiment from the customer?"

Overview

Data Wrangling

To best be able to answer my question above I decided to split my dataset into customer tweets and company tweets. An assumption was made that any author with a numeric value in 'author_id' is a customer and any author with text such as 'AmazonHelp' is a company.

Unique Companies	Unique Customers	Total Unique Twitter Accounts
108	702,669	702,777

Once I had the customer and company dataframes I needed to work on finding the flow of conversation:

Customer Message1 The very first customer message to the company
Company Message1 The companies response to the message above
Customer Message2 The customer's response to the message above

I started by looking at the customer dataset to find Customer Message1 by looking for all tweets that have NaN in in_response_to_tweet_id telling me that this is the very first message in the conversation (visual below in red). Then from there, we can look at the corresponding response_tweet_id then match those in the Company dataframe under tweet_id and that will give us Company Message1 (visual below in blue). Next, we match the response_tweet_id to the tweet_id back in the customer dataframe to find Customer Message2 (visual below in green).

Dates

Looking at the dates of these tweets our dataset has a range of dates from 2008 to 2018, when looking at the scatter plot of tweets we can see the majority of our dataset is at the end of the 2017 year.

When we take a closer at just dates from September 2017 and on we can see the spread much better, and better understand that the majority of conversations are happening in this timeframe.

Sentiment Analysis

With many different choices available to use for sentiment analysis I needed to find something that could use our unlabeled text and classify each Tweet as positive, neutral, or negative. I decided to use the VADER library. VADER stands for Valence Aware Dictionary for sEntiment Reasoning (they really forced the E to work there) and is specifically tuned for social media expressions.

What makes VADER special is its ability to use text sentiment and be sensitive to both polarity and intensity of emotion in text. It is able to recognize that “happy” has a different meaning than when it is paired with “not” and it can also understand the emphasis of capitalization in words and differentiate “LOVE” from “love” in its final score of that word. For that purpose I choose not to lowercase my text, remove punctuation, or use any other other preprocessing techniques.

About the scoring system in VADER:

Each tweet is given a percentage of being “negative” “neutral” or “positive” and also a "compound score." The compound score is computed by summing the valence scores of each word in the lexicon, adjusted according to the rules, and then normalized to be between -1 (most extreme negative) and +1 (most extreme positive). This is the most useful metric if you want a single unidimensional measure of sentiment for a given sentence. Calling it a 'normalized, weighted composite score' is accurate. *CREDIT

We can view the compound score of each tweet with Customer Message1 in blue and Customer Message2 (after the company responded) in green.

Text Analysis

We can look at a visual of the top words and bi-grams in the Customer Message1.

We can see a shift in the words used when we take a look at how the customer responds to the company.

Hypothesis Testing

"Does the initial outreach message from a company result in an increase in positive sentiment from the customer?"

Ho: After the company responds to message 1 from the customer, the second message from the customer will have an increase in positive sentiment.

Ha: After the company responds to message 1 from the customer, the second message from the customer will not have an increase in positive sentiment.

We will set an alpha level of 0.05 to determine if we will accept or reject the null hypothesis.

Recap:

Tweets in Customer Message1	Tweets in Customer Message2
261,052	309,236

Avg Compound Score Message1	Average Compound Score Message2
-0.0172	0.0933

p-value results = 0.0

Conclusions

With our p-value being 0.0 and lower than our established alpha level of 0.05 we will reject the null hypothesis that stated "After the company responds to message 1 from the customer, the second message from the customer will have an increase in positive sentiment." I will acknowledge that yes, there is a different in the mean compound scores between message 1 and message 2; however, the difference is not significant.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
images		images
src		src
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sentiment Analysis on Customer Support Tweets

Question

Overview

Data Wrangling

Dates

Sentiment Analysis

About the scoring system in VADER:

Text Analysis

Hypothesis Testing

"Does the initial outreach message from a company result in an increase in positive sentiment from the customer?"

Recap:

p-value results = 0.0

Conclusions

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Sentiment Analysis on Customer Support Tweets

Question

Overview

Data Wrangling

Dates

Sentiment Analysis

About the scoring system in VADER:

Text Analysis

Hypothesis Testing

"Does the initial outreach message from a company result in an increase in positive sentiment from the customer?"

Recap:

p-value results = 0.0

Conclusions

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages