Original article can be found here (source): Deep Learning on Medium
Recurrent Neural Networks
Understand the intuition behind RNN!
The goal of this article is to explore Recurrent Neural Networks in-depth, which are a kind of Neural Networks with a different architecture than the ones seen in previous articles (Link).
Concretely, the article is segmented in the following parts:
- What RNNs are
- Long Short-Term Memory (LSTM) networks
- Implementation of RNNs to temporal series
What are RNNs?
As we have seen here, CNNs do not have any kind of memory, RNNs can fo beyond this limitation of ‘starting to think from scratch’ each time because they have some kind of memory.
Let’s see how do they work with a very visual example:
Let’s say that we live in an apartment and we have the perfect roommate, he cooks one different meal depending on the weather, sunny or rainy.
So, if we codify these meals with vectors:
And our Neural Network does the following:
If we recall, neural networks learn some weights that can be expressed as matrixes, and those weights are used to make predictions. Ours will be as follows:
If it is a sunny day:
If it is a rainy day:
And if we take a look at our weight matrix, this time seen as a graph:
Let’s see now what add RNNs following this example:
Recurrent Neural Networks
Let’s say that now our dear roommate not only bases the decision of what to cook on the weather but now simply looks at what he cooked yesterday.
The network in charge of getting to predict what the roommate will cook tomorrow based on what she cooked today is a Recurrent Neural Network (RNN).
This RNN can be expressed as the following matrix:
So what we have is a:
Let’s Make it a Little Bit More Complex
Imagine now that your roommate decides what to cook based on what she cooked yesterday and the weather.
- If the day is sunny, she spends the day on the terrace with a good beer in her hand, so she does not cook, so we eat the same thing as yesterday. But
- If it rains, she stays home and cooks.
It would be something like this:
So we end up having one model that tells us what we are going to eat depending on what we ate yesterday and another model that tells us whether our roommate will cook or not.
And the add and merge operations are the following:
And here you can see the graph:
And that is how it works!
This example is from a great video which I recommend you check out as many times as you need to interiorize and understand the previous explanation. You can find the video here: https://www.youtube.com/watch?v=UNmqTiOnRfg
And what are RNNs used for?
There are several types:
They are very good at making predictions, especially when our data is sequential:
Stock market forecasts
The values of a share depend largely on the values it had previously
As long as data are sequences and data in an instant t depends on the data in the instant t-1.
For example, when your cell phone suggests words. It looks at the last word you have written, and at the letters, you are writing at that moment to suggest the next letters or even words.
In this case, we have the previous word recognized, and the audio that reaches us at that moment.
Long Short-Term Memory Networks
Let’s study now how the most popular RNN work. They are the LSTM networks and their structure is as follows:
But first: Why are they the most popular ones?
It turns out that conventional RNNs have memory problems. Specially designed memory networks are incapable of long-term memory. And why is this a problem?
Well, going back to the problem of our roommate, for this example we just need to know what we ate yesterday, so nothing would happen.
But imagine if instead of a three-course menu, I had 60 courses.
Conventional RNNs wouldn’t be able to remember things that happened a long time ago. However, the LSTM would!
Let’s take a look at the architecture of the RNN and the LSTM:
It turns out that where RNNs have a single layer, LSTMs have a combination of layers that interact with each other in a very special way.
Let’s try to understand this, but first, let me explain the nomenclature:
In the diagrams above:
- A vector travels along each line, from the output of one node to the inputs of others.
- The pink circles indicate element to element operations, such as vector sums, while the yellow boxes are neural layers that are learned by training.
- Lines that join indicate concatenation, and lines that separate indicate that the same line content travels to two different destinations.
The key idea of LSTMs
The key is the state of the cell, which is indicated in the diagram as the line that travels across the top:
The state of the cell is like a kind of conveyor belt that travels along with the whole architecture of the network with very few interactions (and they are linear): this implies that the information simply flows without being modified.
The ingenious part is that the layers of the LSTM can (or cannot) contribute information to this conveyor belt, and that decision is made by the “gates”:
The gates are nothing more than a way of carefully regulating the information that arrives on the conveyor belt. They are composed of a neural network with sigmoid-type activation and elemental multiplication.
Thus, the sigmoid layer outputs a number between 0 and one, which implies how important that information is to let it pass to the conveyor belt. 0 means I don’t care, and a 1 means it’s very important.
As you can see in the diagram, an LSTM has 3 such doors, to protect and control the conveyor belt.
The specific details about this operation, are greatly explained here: http://colah.github.io/posts/2015-08-Understanding-LSTMs/
And this blog is also very interesting: http://karpathy.github.io/2015/05/21/rnn-effectiveness/
With this in mind, let’s see what Recurring Networks can do!
Image Classification with LSTM
We’ll follow an example that can be found here:
from keras.models import Sequential
from keras.layers import LSTM, Dense
from keras.datasets import mnist
from keras.utils import np_utils
from keras import initializers
# Hyper parameters
batch_size = 128
nb_epoch = 10# Parameters for MNIST dataset
img_rows, img_cols = 28, 28
nb_classes = 10# Parameters for LSTM network
nb_lstm_outputs = 30
nb_time_steps = img_rows
dim_input_vector = img_cols# Load MNIST dataset
(X_train, y_train), (X_test, y_test) = mnist.load_data()
print('X_train original shape:', X_train.shape)
input_shape = (nb_time_steps, dim_input_vector)X_train = X_train.astype('float32') / 255.
X_test = X_test.astype('float32') / 255.Y_train = np_utils.to_categorical(y_train, nb_classes)
Y_test = np_utils.to_categorical(y_test, nb_classes)print('X_train shape:', X_train.shape)
print(X_train.shape, 'train samples')
print(X_test.shape, 'test samples')
# LSTM Building
model = Sequential()
model.compile(optimizer='rmsprop', loss='categorical_crossentropy', metrics=['accuracy'])
# Training the model
history = model.fit(X_train,
evaluation = model.evaluate(X_test, Y_test, batch_size=batch_size, verbose=1)
print('Summary: Loss over the test dataset: %.2f, Accuracy: %.2f' % (evaluation, evaluation))
Time Series Prediction with LSTM
# LSTM for international airline passengers problem with regression framing
!wget https://raw.githubusercontent.com/lazyprogrammer/machine_learning_examples/master/airline/international-airline-passengers.csvimport numpy
import matplotlib.pyplot as plt
from pandas import read_csv
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import mean_squared_error# convert an array of values into a dataset matrix
def create_dataset(dataset, look_back=1):
dataX, dataY = , 
for i in range(len(dataset)-look_back-1):
a = dataset[i:(i+look_back), 0]
dataY.append(dataset[i + look_back, 0])
return numpy.array(dataX), numpy.array(dataY)# fix random seed for reproducibility
numpy.random.seed(7)# load the dataset
dataframe = read_csv('international-airline-passengers.csv', usecols=, engine='python', skipfooter=3)
dataset = dataframe.values
dataset = dataset.astype('float32')# normalize the dataset
scaler = MinMaxScaler(feature_range=(0, 1))
dataset = scaler.fit_transform(dataset)# split into train and test sets
train_size = int(len(dataset) * 0.67)
test_size = len(dataset) - train_size
train, test = dataset[0:train_size,:], dataset[train_size:len(dataset),:]# reshape into X=t and Y=t+1
look_back = 1
trainX, trainY = create_dataset(train, look_back)
testX, testY = create_dataset(test, look_back)# reshape input to be [samples, time steps, features]
trainX = numpy.reshape(trainX, (trainX.shape, 1, trainX.shape))
testX = numpy.reshape(testX, (testX.shape, 1, testX.shape))# create and fit the LSTM network
model = Sequential()
model.add(LSTM(4, input_shape=(1, look_back)))
model.fit(trainX, trainY, epochs=100, batch_size=1, verbose=2)# make predictions
trainPredict = model.predict(trainX)
testPredict = model.predict(testX)# invert predictions
trainPredict = scaler.inverse_transform(trainPredict)
trainY = scaler.inverse_transform([trainY])
testPredict = scaler.inverse_transform(testPredict)
testY = scaler.inverse_transform([testY])# calculate root mean squared error
trainScore = math.sqrt(mean_squared_error(trainY, trainPredict[:,0]))
print('Train Score: %.2f RMSE' % (trainScore))
testScore = math.sqrt(mean_squared_error(testY, testPredict[:,0]))
print('Test Score: %.2f RMSE' % (testScore))# shift train predictions for plotting
trainPredictPlot = numpy.empty_like(dataset)
trainPredictPlot[:, :] = numpy.nan
trainPredictPlot[look_back:len(trainPredict)+look_back, :] = trainPredict# shift test predictions for plotting
testPredictPlot = numpy.empty_like(dataset)
testPredictPlot[:, :] = numpy.nan
testPredictPlot[len(trainPredict)+(look_back*2)+1:len(dataset)-1, :] = testPredict# plot baseline and predictions
As always, I hope you enjoyed the post, and that you gained an intuition about RNNs and how to implement them!
If you liked this post then you can take a look at my other posts on Data Science and Machine Learning here.
If you want to learn more about Machine Learning, Data Science and Artificial Intelligence follow me on Medium, and stay tuned for my next posts!