RNN (Recurrent Neural Networks) & LSTM

Source: Deep Learning on Medium

RNN (Recurrent Neural Networks) & LSTM

Recurrent neural networks are a special kind of neural networks that are designed to effectively deal with sequential data. This kind of data includes time series (a list of values of some parameters over a certain period of time) text documents, which can be seen as a sequence of words, or audio, which can be seen as a sequence of sound frequencies over time.

Recurrent neural networks are a special type of neural network where the outputs from previous time steps are fed as input to the current time step.

Basic Recurrent neural network with three input nodes

The way RNNs do this, is by taking the output of each neuron (input nodes are fed into a hidden layer with sigmoid activations), and feeding it back to it as an input. By doing this, it does not only receive new pieces of information in every time step, but it also adds to these new pieces of information a w̲e̲i̲g̲h̲t̲e̲d̲ ̲v̲e̲r̲s̲i̲o̲n̲ of the previous output. As you can see the hidden layer outputs are passed through a conceptual delay block to allow the input of h ᵗ⁻¹ into the hidden layer. What is the point of this? Simply, the point is that we can now model time or sequence-dependent data.
This makes these neurons have a kind of “memory of the previous inputs it has had, as they are somehow quantified by the output being fed back to the neuron.

A recurrent neuron, where the output data is multiplied by a weight and fed back into the input

Cells that are a function of inputs from previous time steps are also known as memory cells.

A particularly good example of this is in predicting text sequences. Consider the following text string: “A girl walked into a bar, and she said ‘Can I have a drink please?’. The bartender said ‘Certainly { }”. There are many options for what could fill in the { } symbol in the above string, for instance, “miss”, “ma’am” and so on. However, other words could also fit, such as “sir”, “Mister” etc. In order to get the correct gender of the noun, the neural network needs to “recall” that two previous words designating the likely gender (i.e. “girl” and “she”) were used. This type of flow of information through time (or sequence) in a recurrent neural network is shown in the diagram below, which unrolls the sequence:

Unrolled recurrent neural network

This unrolled network shows how we can supply a stream of data to the recurrent neural network. For instance, first we supply the word vector for “A” to the network F — the output of the nodes in F are fed into the “next” network and also act as a stand-alone output ( h₀ ). The next network (though it’s the same network) F at time t=1 takes the next word vector for “girl” and the previous output h₀ into its hidden nodes, producing the next output h₁ and so on.

NOTE: Although shown for easy explanation in Diagram, but the words themselves i.e. “A”, “girl” etc. aren’t inputted directly into the neural network. Neither are their one-hot vector type representations — rather, an embedding word vector (read Word2Vec) is used for each word.