Emotion Classification from tweets with LSTM, NLTK, Keras.

Source: Deep Learning on Medium


Go to the profile of saitej

Hi all, Today I’m gonna elaborate the usage of LSTM Neural network to classify and analyse text data.

Human Emotions

Full Code Available at : https://github.com/saitejdandge/Sentimental_Analysis_LSTM_Conv1D

Problem Statement

We have to train a model that outputs an emotion for input text. Output that we are trying to predict is just a label, not a continuous number hence we can structure this as a classification problem.

We’ll break this problem into 3 different modules,

1.0 Data Preparation

1.0 Understanding Data

1.1 Removing punctuations, words that start with ‘@’ and stop words

1.2 Tokenising words / Converting words to indices

1.3 Padding Words

1.4 Building Word Embeddings

1.5 One hot encoding labels

2.0 Building Model

2.1 Understanding Embedding Layer

2.2 Understanding LSTM Layer

2.3 Understanding Dense Layer

2.4 Adding Activations at each Layer.

2.5 Model Architecture with input and output shapes

3.0 Training our Model

3.1 Splitting data into training and testing dataset

3.2 Training the network

3.2 Plotting training and testing accuracies


Imports

Let’s start importing modules needed

1. Data Preparation

1.0 Understanding Data

//todo add csv here

We have 4000 tweets each labelled into one of below sentiment’s (labels)

{ anger, boredom, empty, enthusiasm, fun, happiness, hate, love, neutral, relief, sadness, surprise, worry}

What features to consider as input ? #Feature Selection

We call this methodology as Feature Selection. We should only consider columns that we think will affect the output. We can ignore tweet_id and author column, as emotional outcome don’t depend on them.

1.1 Removing punctuations, words that start with ‘@’ and stop words

  • Word vectors are sensitive to words with punctuation and are case sensitive.
  • Words that start with “@” are user and page references and doesn’t add value to output, as their just usernames and page names.
  • Removing Stop words like a, an, the….etc, we need to remove them as they might bias our model’s output. We need concentrate on more important and key words that we think will have impact on our output.

1.2 Tokenising words / Converting words to indices

Now that, we have preprocessed words, we now go forward and convert each word into an index, We get indices by sorting all the words in alphabetical order and adding +1 (index 0 — Unknown word)

1.3 Padding Words

We’ll assume that each input entry will have 20 words and, we pad our last empty paragraph with unknown words, if any.

1.4 Building Word Embeddings

Word Embeddings are vectorised representation of words. Assume we have an space of n-dimensions. Each word in our dictionary has n dimensions and they fit in our word space. This is to preserve relative distances among words and semantic understanding to our neural network.

Word Embeddings in 2D

Example : Distance between words “kitchen and battery” should be high while compared to distance between words “kitchen and bathroom”. Word “kitchen, bathroom” are somehow closely related as they are rooms.

We’ll use something called GloVe model for word embeddings, we’ll just transfer its weights, instead of retraining it. This on a whole gives kickstart for our neural network.

Embedding Layer internally has Embedding Matrix of

shape ( vocab+1, Embedding Dimension)

In our case, it is

(vocab +1, 50D) as we are using GloVe 50D vectors, Each word will be represented as 50 D vector and they all are sorted row wise based on indexes that are obtained after labelling words alphabetically.

1.55 One Hot encoding Labels

In machine learning, one-hot is a group of bits among which the legal combinations of values are only those with a single high (1) bit and all the others low (0).

Labels : { anger, boredom, empty, enthusiasm, fun, happiness, hate, love, neutral, relief, sadness, surprise, worry}

Each unique emotion is assigned to an integer value (Label Encoding).

For example, “anger” is 0, “boredom” is 1, and “empty” is 2….etc in an alphabetical order.

After this, a one-hot encoding can be applied to the integer representation. This is where the integer encoded variable is removed and a new binary variable is added for each unique integer value.

There are 13 unique labels/emotions and therefore 13 binary variables are needed. A “1” value is placed in the binary variable for the emotion and “0” values for the other emotions.

Example :

  • anger : 0000000000001
  • boredom : 0000000000010
  • empty: 0000000000100

..etc

2. Building Model

2.1 Understanding Embedding Layers (First Layer)

This layer acts as lookup table for vectors, given word index, it will return embedded word vector.

Embedding layer can only be used as first layer in Keras.

Our input layer will of be size : (None,20) ; None means variable number.

As we have padded 20 words for each input in data preparation stage. we have 20 word indices in each row.

Embedding layer will convert each index to its corresponding vector by taking help of Embedding Matrix, Please refer above if you are confused. 50 because, we have used GloVe 50 D in word embedding step.

input (None, 20) = >(Embedding Layer) => (None,20,50)

2.2 Understanding LSTM / GRU layers (Hidden Layers)

They fall under category call Recurrent Neural Networks. Recurrent Neural networks will add output from previous timestamp as input for current timestamp.

Due to an internal memory, which makes it perfectly suited for Machine Learning problems that involve sequential data.

Recurrent Neural Networks

Output of Embedding layer will be fed to this LSTM layer.

We’ll use LSTM layer with 100 units.

Input given to LSTM will be considered as (batch_size, timesteps, features).

We’ll create 100 units in LSTM layer and we modify return_sequences variable in LSTM constructor based on our requirement.

There are two different scenarios based on return_sequences

return_sequences = True

Output of RNN layer will include all the outputs from all the units/cells, this would make output size.

(None, 20,50) = > LSTM(100, return_sequences=True) => (None,20,100)

In the next step, we’ll flatten.

(None, 20,100) = > Flatten => (None,2000)

return_sequences = False

Output of RNN layer will only include the outputs from this last units/cells, this would make output size

input (None, 20) = >LSTM(100, return_sequences=False) => (None,100)

we can either go with any of above scenario, based on our requirement, in the end we’ll have output shape in 2 dimensions that is either

(None,2000) (return_sequences=true)

or

(None, 100 ) (return_sequences=false)

2.3 Understanding Dense Layer (Last Layers)

We’ll use these as last layers for our Model.

We use this connect all the data that we get from different levels. We keep reducing outputs to (None, labels_count) by adding multiple Dense Layers.

input (None, 2000) or input (None,100)= > Dense(300) => (None,300)

Adding another dense layer

input (None,300) => Dense(13) => (None,13)

13 is labels’count. i.e total number of emotions count.

2.4 Adding Activation at each layer

We’ll add activation at each layer to add non linear understanding to our model. We’ll use ReLU at every layer and softmax for last layer.

Softmax is probability distribution activation function and helps in achieving better results.

2.5 Model Architecture with input and output shapes

Additional Info : we can add 1D convolution layers as hidden layers for better insights

Model Architecture with input and output shapes

3.0 Training Model :

3.1 Splitting data into training and testing dataset

We gotta split our data into two parts, Training Data, Testing Data

We use Training dataset to train our neural network. Test dataset to provide an unbiased evaluation of a final model fit on the training dataset.

This is to find a sweet spot between underfitting and overfitting of our model.

3.2 Training the network

Now, we define number of epochs and check point conditions, these checkpoints will save our model locally if there’s an improvement.

Let’s now start training our model

3.3 Plotting training and testing accuracies

This is will start training, now we shall monitor the accuracies and plot them as graph to understand the results.

As you can see our training accuracy has reached around 40, while validation accuracy (testing accuracy is fluctuating)

By this results, we can say that our model still needs more data to understand insights to calculate emotion, Our Model ended up getting training and testing accuracy of 50 %.

Full code Available at : https://github.com/saitejdandge/Sentimental_Analysis_LSTM_Conv1D

Thank you all for reading, please clap, if you find this useful.