Source: Deep Learning on Medium
Finding Positivity on Twitter (The nerdy way)
Understanding Recurrent Neural Networks and LSTMs to do sentiment analysis and much more
Positivity seems out of reach nowadays with so many twitter accounts spreading negativity. However, life doesn’t have to be so bad, and you can decide to find more positivity on twitter (the nerdy way). We can use deep learning to detect whether an account is mainly positive and negative. This article is going to be a long one, so buckle up and get ready for a thrilling ride!
Basics of Deep Learning and Neural Networks
If you do not know what Neural Networks are and how they work, you are in for a treat. Check out this video to learn about the vast and exciting world of neural networks. I will not be diving into the depths of neural networks in this piece to avoid repetitiveness.
Intro to Recurrent Neural Networks
When you read a sentence, you understand each word based on the words or sentences that came before it. It would be impossible to understand each standalone word in the context of thought or and idea without being given any other words. It would be like forgetting the last word as soon as you start reading the next word. It sounds like a terrible way to read, doesn’t it?
Well, it is terribly inefficient. But do you know of anyone that actually reads like that? Probably not/ But, I do. Traditional Neural Networks. They can be thought of as having an exceptionally terrible memory. However, since humans thoughts have persistence, conventional neural networks find tasks like translation, language modelling, speech recognition to be inherently hard. What’s the answer? Recurrent Neural Networks are otherwise known as RNNs (nerdy people love their acronyms).
What makes RNNs so good?
RNNs have shown great success when dealing with sequential data. Sequential data refers to any type of data in which it is essential to know the order of the data. To predict the next word in a sentence, it is crucial to understand what the sentence preceding it was about for context and the type of word preceding it was (e.x verb, noun, pronoun, adjective, etc.) to follow grammatical rules.
At the most basic level, RNNs have loops to transfer information from one step of the network to the next. If we examine RNNs further, we can see that they form a chain-like structure where the hidden state contains information from all the steps leading up to the current state. This hidden state is also passed into the network along with the specified input to take into consideration previous data. Since RNNs deal with a series of inputs, they do not have a constraint for the input, output, or the number of computational steps to go from one to the other.
Character Level Models
One of the most basic and fundamental uses of RNNs is to predict the next word in a sentence given a large chunk of data as input. The RNNs deployed in this cause usually use character-level models rather than word-level models. This allows the list of possible outputs to be far less (layers of 26 letters vs. millions of words in the English language and subcultures). Once given text as input, it will calculate a probability distribution of what the next letter will be. It will then choose the letter that is most likely to be the correct one and passes some values along to the next node in the network to predict the following letter and so on.
After first hearing about RNNs, it may seem that they are absolutely perfect. Just like everything else in this world, RNNs have their own shortcomings. One of the biggest problems with vanilla or plain RNNs is their inability to utilize long term memory. The next predicted word or character will be almost entirely be influenced by the past 10 words or so. This isn’t the ideal scenario when dealing with long passages of text, speech or any data which requires a network to remember information between the long gap of inputs.
As always, the computer science gods (otherwise known as computer science academics) answer to these shortcomings of die-hard nerds. This time, they produced Long Short Term Memory networks capable of learning long-term dependencies. How do they achieve this? While traditional RNNs have one single neural network layer, LSTM networks have four additional layers in the form of a hidden state and gates. Each of these four neural network layers interacts with each other in such a way to preserve long term memory and improve overall performance over a traditional RNN.
The neural layer dealing with the new long-term memory is the cell state. It passes the long-term memory from module to module, which gets slightly updated each time.
The hidden state is responsible for storing and updating the working memory. This memory is often referred to as short term memory and is the neural layer found in vanilla RNNs.
While long term memory provides LSTM networks with enhanced capability, it can also be impractical to store all the long-term memory all the time. So, who decides what information is retained and what is discarded? This problem is solved by introducing the forget gate. Technically, the layer consists of a vector of numbers which assigns each set of long-term information with a value between 1 (keep it) and 0 (forget it completely).
Input (Save) Gate
The primary function of the input gate is to decide what new information should be added to the long-term memory (cell state). It saves a selective portion of the input and adds it to the cell state for future use.
Output (Focus) Gate
You may be wondering how the short term memory gets updated. Output gate to the rescue! The output gate transfers or focuses seem of the long-term memory that will be useful immediately.