LIVE Sentiment Analysis on Twitter Data using Tweepy, Keras, and Django

Source: Deep Learning on Medium


Go to the profile of Rohit Agrawal
Photo by Marten Bjork on Unsplash

Welcome to this tutorial on performing live sentiment analysis on tweets. I am sure you have come across complex dashboards with tons of graphs and numbers being crunched which look straight out of a sci-fi movie and stared in awe. Well, that is basically our aim here.

Some Complex looking dashboard

Although our final result would not be as extensive as this, it will, however, teach you how to make the necessary data connections so that you can make it as sophisticated as you want. You can continue it as a SaaS business or a mobile app and earn some dough $$$. Our result would be something like this:

Our Result

It will perform live analysis for any hashtag and it’s related contexts and show you new tweets as they come in, along with a sentiment attached to it.

Excited enough? Good, let’s get into it. This article is divided into 4 parts:

  1. Making the Model
  2. Making the UI Interface(Front-end)
  3. Making the Backend, getting live data and connecting everything

1. Model Stuff

Although sentiment analysis is a pretty common topic in Natural Language Processing, I will just brief over the model architecture right now, but I will write a separate post on it later.

I used the Sentiment140 Dataset for training, which contains approx. 1.6 million tweets. After cleaning the text by normalization and removing user tags that begin with ‘@’, I used the gensim package’s Word2Vec function to train it on the entire corpus. Since the corpus is pretty huge, I had enough data to train pretty accurate embeddings, otherwise, I would have used a pre-trained vectorizer.

#Test word embeddings
w2v_model.most_similar("hate")

[('suck', 0.5254894495010376),
('stupid', 0.509635865688324),
('hat', 0.479534387588501),
('ugh', 0.4475134015083313),
('dislike', 0.44565698504447937),
('despise', 0.43604105710983276),
('fuck', 0.4104633331298828),
('annoy', 0.4004197418689728),
('ughh', 0.3961945176124573),
('fml', 0.39270931482315063)]

Next I used the keras Tokenizer to convert the input data into tokens and add padding to make all inputs the same length. This is a standard procedure for data preparation in NLP. Finally, I passed the prepared data into an LSTM network.

predict("@Nintendo I love your games!")
{'score': 0.820274293422699}

The final accuracy turned out to be around 78.4%, which is good enough for now. The entire implementation is here

ACCURACY: 0.784396875
LOSS: 0.45383153524398806

Finally, I saved the model(as an .h5 file) and the trained Keras Tokenizer(as a .pkl file) so that I can use them later on during inference in the server script. You can download the trained files here

Note: I implemented another model using 1D Convolutions instead of an LSTM network for comparison, which turned out to provide almost similar results. For the curious learner, you can find this implementation here.


2. UI Frontend Stuff

I used ReactJS for building the interface. It’s a Javascript framework which promotes modular design by creating components and reusing them like lego pieces. Each component has its own lifecycle, so if there is a change in the data for some component, only that component refreshes. This puts less load on the browser and decreases lag between updating information.

I am not going to go into details of how I made the website, because it is just basic CSS and Javascript, thus you can directly study the code in the repository. However, if you have any doubts, leave a response below and I’ll gladly clear it.

All you need to know

We have a variable called state which belongs to the website and any changes in here, refreshes the components.

this.state = {
hashtag: "",
options: {
colors: ['#F7464A', '#46BFBD', '#FDB45C'],
labels: ['Negative', 'Positive', 'Neutral'],
plotOptions: {
pie: {
donut: {
labels: {
show: true
}
}
}
}
},
series: [44, 55, 41],
tweets: [],
hashtag_desc: ""
}

hashtag contains the value of the input field, and options values belong to some options for the pie chart. We only have one function of interest:

  • The function when called sends a GET request to our server at ‘http://localhost:8000/analyzehashtag’ along with the hashtag value. It expects a JSON response of the form:
{
...
data: {
positive: 43,
negative: 23,
neutral: 12
}
...
}
  • The function also sends a GET request to the public Wikipedia API, along with the hashtag value to get some short information regarding it.
  • Finally, the function sends another GET request to our server at ‘http://localhost:8000/gettweets’ along with the hashtag value. It expects a JSON response of the form:
{
"results": [
{
"text": "Is it possible to wirelessly project my laptop to my #Xbox? #XboxOne https://t.co/KMuSoD2C5j",
"username": "Xbox_One_Reddit",
"label": "Neutral",
"score": 0.5679275393486023
},
{
"text": "This year's #E3 had some big #XBOX news for the gaming industry. A glimpse at the future with Scarlet its Next Gen console, promising 4K & 8K gaming, and of course the franchise that started it all... #Halo Infinite announced!\n\nWhich was you favorite?? #E32019 #XboxE3 #Gaming https://t.co/tykdIYezmr",
"username": "NrdRnx",
"label": "Positive",
"score": 0.9130105972290039
},
{
"text": "DOMED 💀 #apex #apexlegends #apexlegendsxbox #apexlegendsclips #apexlegendscommunity #apexlegendsplays #playapex #controllergang #xbox #mixer #twitch https://t.co/itERG2vpaD",
"username": "gle_oh",
"label": "Negative",
"score": 0.26629960536956787
},
...
]
}

This data is used to populate the component handling the live tweets.


3. Backend Stuff

Finally, we get into the crux of this article. For the backend, we are going to use Django to create it.

Note: If you have no experience in Backend development, I recommend using Flask instead of Django. Flask is very user friendly and you can create the same thing I’m doing here, in minutes. I use Django because I find the deployment a bit easier, and it’s easily scalable to more complex applications.

You can google how to create a Django project, or follow a tutorial given in their documentation. Once you are done, it should have the following folder structure:

│ .gitattributes
│ db.sqlite3
│ manage.py

├───main_app
│ │ admin.py
│ │ apps.py
│ │ config.py
│ │ models.py
│ │ Sentiment_LSTM_model.h5
│ │ tests.py
│ │ tokenizer.pickle
│ │ twitter_query.py
│ │ views.py
│ │ __init__.py
│ │
│ ├───migrations
│ │
│ └───__pycache__
│ admin.cpython-36.pyc
│ config.cpython-36.pyc
│ models.cpython-36.pyc
│ views.cpython-36.pyc
│ __init__.cpython-36.pyc

└───twitter_django
│ settings.py
│ urls.py
│ wsgi.py
│ __init__.py

└───__pycache__
settings.cpython-36.pyc
urls.cpython-36.pyc
wsgi.cpython-36.pyc
__init__.cpython-36.pyc

(Instead of main_app and twitter_django, they are going to be the names of your app that you choose)

Django has the concept of “views” to encapsulate the logic responsible for processing a user’s request and for returning the response. So, any request that we receive to the server, would be processed here. We connect to the views using urls.py:

When we receive a request at a particular URL endpoint, say ‘/gettweets’, it triggers the function specified — ‘views.gettweets’ in this case. The logic for the functions is written in views.py.

Note the lines:

global graph
graph = tf.get_default_graph()
model = load_model('main_app/Sentiment_LSTM_model.h5')

You cannot run your model to get predictions if there is no graph(because of how Tensorflow works). If you try to run, ‘model.predict(..)’ without specifying a graph, you’ll get an error. So whenever you are trying to use your model, do not forget to add:

with graph.as_default(): 
prediction = model.predict(...)

Tweepy

Tweepy is like the go-to package if you want to get data from Twitter. You can install it using pip. All you need to use it, are some unique keys. These keys can be obtained by registering an application at the Twitter Developer Site.

Once that is done, we can initialize tweepy as:

# Twitter
auth = tweepy.OAuthHandler(consumer_key,consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth,wait_on_rate_limit=True)

To get the tweets now, we can call the Cursor function. We have used a custom hashtag for search(which we receive from the frontend) and gathered tweets for the same.

Note: I used “ -filter:retweets” in order to get only unique tweets, otherwise, the sentiment labels might get biased due to a different frequency of retweets.

tweepy.Cursor(api.search,q="#" + request.GET.get("text") + " -filter:retweets",rpp=5,lang="en", tweet_mode='extended').items(100)

Making a REST API

We have two functions of interest in our server:

  1. analyzehashtag()— Takes in the hashtag value, gets a lot of tweets for that hashtag using tweepy, and perform sentiment analysis on each of them. Finally, it calculating the distribution of positive, negative, and neutral tweets in that particular hashtag by simply counting observations.
def analyzehashtag(request): 
positive = 0
neutral = 0
negative = 0
for tweet in tweepy.Cursor(api.search,q="#" + request.GET.get("text") + " -filter:retweets",rpp=5,lang="en", tweet_mode='extended').items(100):
with graph.as_default():
prediction = predict(tweet.full_text)
if(prediction["label"] == "Positive"):
positive += 1
if(prediction["label"] == "Neutral"):
neutral += 1
if(prediction["label"] == "Negative"):
negative += 1
return JsonResponse({"positive": positive, "neutral": neutral, "negative": negative});

2. gettweets() —This is similar to the first function, but instead of calculating a distribution, it gathers a smaller number of tweets and returns the results of each tweet. This way we can show the proficiency of our model on real-time and check if it’s in-line with our common sense.

def gettweets(request):
tweets = []
for tweet in tweepy.Cursor(api.search,q="#" + request.GET.get("text") + " -filter:retweets",rpp=5,lang="en", tweet_mode='extended').items(50):
temp = {}
temp["text"] = tweet.full_text
temp["username"] = tweet.user.screen_name
with graph.as_default():
prediction = predict(tweet.full_text)
temp["label"] = prediction["label"]
temp["score"] = prediction["score"]
tweets.append(temp)
return JsonResponse({"results": tweets});

Now for our frontend to be able to access these functions, we’ll make the functions as APIs. This can be done easily using the Django REST Framework.

Simply install it using pip install djangorestframework and add @api_view([“GET”]) before every function. (Since we just use 2 GET requests here, I am only using GET here).

Do not forget to make the following additions to the settings.py file.

INSTALLED_APPS = [
'django.contrib.admin',
'django.contrib.auth',
'django.contrib.contenttypes',
'django.contrib.sessions',
'django.contrib.messages',
'django.contrib.staticfiles',
'rest_framework',
'corsheaders',
'main_app'
]

Conclusion

Run the server using python manage.py runserver and enjoy getting insights on ‘How people are reacting to the new elections”, or “Did people like the Keanu Reaves cameo in the Cyberpunk 2077 trailer”. Go crazy!

The entire code, along with installation instructions, is available on my GitHub. Check it out if you’d like. Ciao