Original article can be found here (source): Deep Learning on Medium

# Recurrent Neural Networks

## Understand the intuition behind RNN!

# Introduction

The goal of this article is to explore Recurrent Neural Networks in-depth, which are a kind of Neural Networks with a different architecture than the ones seen in previous articles (Link).

Concretely, the article is segmented in the following parts:

- What RNNs are
- Long Short-Term Memory (LSTM) networks
- Implementation of RNNs to temporal series

# What are RNNs?

As we have seen here, CNNs do not have any kind of memory, RNNs can fo beyond this limitation of ‘starting to think from scratch’ each time because they have some kind of memory.

Let’s see how do they work with a very visual example:

## Example

Let’s say that we live in an apartment and we have the perfect roommate, he cooks one different meal depending on the weather, sunny or rainy.

So, if we codify these meals with vectors:

And our Neural Network does the following:

If we recall, neural networks learn some weights that can be expressed as matrixes, and those weights are used to make predictions. Ours will be as follows:

If it is a sunny day:

If it is a rainy day:

And if we take a look at our weight matrix, this time seen as a graph:

Let’s see now what add RNNs following this example:

## Recurrent Neural Networks

Let’s say that now our dear roommate not only bases the decision of what to cook on the weather but now simply looks at what he cooked yesterday.

The network in charge of getting to predict what the roommate will cook tomorrow based on what she cooked today is a Recurrent Neural Network (RNN).

This RNN can be expressed as the following matrix:

So what we have is a:

## Let’s Make it a Little Bit More Complex

Imagine now that your roommate decides what to cook based on what she cooked yesterday and the weather.

- If the day is sunny, she spends the day on the terrace with a good beer in her hand, so she does not cook, so we eat the same thing as yesterday. But
- If it rains, she stays home and cooks.

It would be something like this:

So we end up having one model that tells us what we are going to eat depending on what we ate yesterday and another model that tells us whether our roommate will cook or not.

And the add and merge operations are the following:

And here you can see the graph:

And that is how it works!

This example is from a great video which I recommend you check out as many times as you need to interiorize and understand the previous explanation. You can find the video here: https://www.youtube.com/watch?v=UNmqTiOnRfg

## And what are RNNs used for?

There are several types:

They are very good at making predictions, especially when our data is sequential:

**Stock market forecasts**

The values of a share depend largely on the values it had previously

**Sequence generation**

As long as data are sequences and data in an instant *t* depends on the data in the instant *t-1.*

**Text generation**

For example, when your cell phone suggests words. It looks at the last word you have written, and at the letters, you are writing at that moment to suggest the next letters or even words.

**Voice recognition**

In this case, we have the previous word recognized, and the audio that reaches us at that moment.

# Long Short-Term Memory Networks

Let’s study now how the most popular RNN work. They are the LSTM networks and their structure is as follows:

But first: **Why are they the most popular ones?**

It turns out that conventional RNNs have memory problems. Specially designed memory networks are incapable of long-term memory. And why is this a problem?

Well, going back to the problem of our roommate, for this example we just need to know what we ate yesterday, so nothing would happen.

But imagine if instead of a three-course menu, I had 60 courses.

Conventional RNNs wouldn’t be able to remember things that happened a long time ago. However, the LSTM would!

And why?

Let’s take a look at the architecture of the RNN and the LSTM:

## RNN

## LSTM

It turns out that where RNNs have a single layer, LSTMs have a combination of layers that interact with each other in a very special way.

Let’s try to understand this, but first, let me explain the nomenclature:

In the diagrams above:

- A vector travels along each line, from the output of one node to the inputs of others.
- The pink circles indicate element to element operations, such as vector sums, while the yellow boxes are neural layers that are learned by training.
- Lines that join indicate concatenation, and lines that separate indicate that the same line content travels to two different destinations.

## The key idea of LSTMs

The key is the state of the cell, which is indicated in the diagram as the line that travels across the top:

The state of the cell is like a kind of conveyor belt that travels along with the whole architecture of the network with very few interactions (and they are linear): this implies that the information simply flows without being modified.

The ingenious part is that the layers of the LSTM can (or cannot) contribute information to this conveyor belt, and that decision is made by the “gates”:

The gates are nothing more than a way of carefully regulating the information that arrives on the conveyor belt. They are composed of a neural network with sigmoid-type activation and elemental multiplication.

Thus, the sigmoid layer outputs a number between 0 and one, which implies how important that information is to let it pass to the conveyor belt. 0 means I don’t care, and a 1 means it’s very important.

As you can see in the diagram, an LSTM has 3 such doors, to protect and control the conveyor belt.

The specific details about this operation, are greatly explained here: http://colah.github.io/posts/2015-08-Understanding-LSTMs/

And this blog is also very interesting: http://karpathy.github.io/2015/05/21/rnn-effectiveness/

With this in mind, let’s see what Recurring Networks can do!

# LSTM Implementation

## Image Classification with LSTM

We’ll follow an example that can be found here:

https://medium.com/the-artificial-impostor/notes-understanding-tensorflow-part-2-f7e5ece849f5

from keras.models import Sequential

from keras.layers import LSTM, Dense

from keras.datasets import mnist

from keras.utils import np_utils

from keras import initializers# Hyper parameters

batch_size = 128

nb_epoch = 10# Parameters for MNIST dataset

img_rows, img_cols = 28, 28

nb_classes = 10# Parameters for LSTM network

nb_lstm_outputs = 30

nb_time_steps = img_rows

dim_input_vector = img_cols# Load MNIST dataset

(X_train, y_train), (X_test, y_test) = mnist.load_data()

print('X_train original shape:', X_train.shape)

input_shape = (nb_time_steps, dim_input_vector)X_train = X_train.astype('float32') / 255.

X_test = X_test.astype('float32') / 255.Y_train = np_utils.to_categorical(y_train, nb_classes)

Y_test = np_utils.to_categorical(y_test, nb_classes)print('X_train shape:', X_train.shape)

print(X_train.shape[0], 'train samples')

print(X_test.shape[0], 'test samples')