Twitter and US Election Results

by Dillon Hamilton, Kevin Connolly, Sebastian Lopez and Greg Bhola

A study using Natural Language Processing (NLP)

Executive Overview

Based on the results, there is no concrete evidence that justifies Twitter is a good proxy for determining US Election outcomes. This project focused on Sentiment/Emotion Analysis and Twitter Engagement to determine whether this form of Social Media is effective in predicting US election results.

Sentiment Analysis

Click above image to view video of Dashboard.

For powerBI users, view dashboard.

Sentiment Score => The Average Sentiment Score for each candidate in which 1 is the most Positive Sentiment, -1 is the most Negative Sentiment and 0 is Neutral Sentiment.

Sentiment Analysis => Looks at the percentage of tweets for each candidate that are Positive Sentiment, Negative Sentiment and Neutral Sentiment.

Emotion Analysis => This will analyze the list of words used in the tweets and link them to a particular emotion. Refer to emotions.txt for list of words and their linked emotions.

Twitter Engagement

This analysis gauges whether some of the popular Twitter metrics such as, Likes, Replies and Retweets are reliable indicators for determining who wins the race to the White House. The Tweet Length feature was created during the Transformation process.

Key Insights

All winners recorded higher Average Sentiment scores than their opponents
All winners registered a larger percentage of positive tweets compared to their rivals, with 2012 being the only exception
Each winning candidate had a lower percentage of tweets with negative sentiment than their opponents
Emotional Analysis showed no clear patterns
Twitter Engagement results was inconclusive:
- In 2008 and 2016, the losers recorded higher Average Likes scores, however, in 2012 and 2020 the winners had higher Average Likes scores
- In all years, the winners registered higher Average Replies scores
- Average Retweets scores were higher for all winners, except in 2016
- Average Tweet Length was higher for the winners in 2016 and 2020 but not in the other two years

Resources

Twint
Python
MongoDB
Power BI

Data Acquisition

Scraped Twitter using Twint via Anaconda Environment
Used the first and last names of each US Presidential candidate as the key search words Eg. twint -s "Joe Biden" --since "2020-10-15 17:00:00" --until "2020-10-16 17:00:00" --lang en -o biden3_2020.csv --csv twint -s "Donald Trump" --since "2020-10-15 17:00:00" --until "2020-10-16 17:00:00" --lang en -o trump3_2020.csv --csv
Scraped data on the day following each of the three mandatory Presidential debates for consistency
Scraped for tweets in the English language

Data Storage

Datasets were stored in MongoDB database
Created a MongoDB connection using Python to call on each dataset.
See the following jupyter notebook for more details.

Data Preprocessing

Kept the following features
- tweet
- likes_count
- retweets_count
- replies_count
Cleaned tweets using Regular Expressions (Regex)
Created tweet_length feature (which measures the number of words in each tweet) in the following jupyter notebook

Machine Learning - Natural Language Processing (NLP)

Used the TextBlob library to perform Sentiment Analysis
- Sentiment and Subjectivity scores were obtained for each tweet
- Each tweet was ranked as Positive, Negative or Neutral Sentiment based on Sentiment scores
Obtained the key words (minus stop words) to construct word cloud
Used the NLTK library to perform Emotion Analysis

Limitations

NLP is not 100% accurate in measuring sentiment, as it is unable to read sarcasm or wittiness.
The location of the tweets were not revealed, so we have no idea if American citizens made these comments.
Twitter came out in 2006, so there was a dearth of Twitter data in 2008 and 2012. Back then, it was not the social media monster it is today.

References

Twint
Sentiment/Emotion Analysis - Attreya Bhatt
Sentiment Analysis Stackhouse
NLP and NER

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Twitter and US Election Results

by Dillon Hamilton, Kevin Connolly, Sebastian Lopez and Greg Bhola

A study using Natural Language Processing (NLP)

Executive Overview

Sentiment Analysis

Twitter Engagement

Key Insights

Resources

Data Acquisition

Data Storage

Data Preprocessing

Machine Learning - Natural Language Processing (NLP)

Limitations

References

Files

README.md

Latest commit

History

README.md

File metadata and controls

Twitter and US Election Results

by Dillon Hamilton, Kevin Connolly, Sebastian Lopez and Greg Bhola

A study using Natural Language Processing (NLP)

Executive Overview

Sentiment Analysis

Twitter Engagement

Key Insights

Resources

Data Acquisition

Data Storage

Data Preprocessing

Machine Learning - Natural Language Processing (NLP)

Limitations

References