Source: Deep Learning on Medium
This project involves the use of Recurrent Neural Networks in the form of Long Short-Term Memory (LSTM) to predict the stock prices of Google.
Note — This project is a part of the Deep Learning A-Z course that I have been pursuing, along with some extra information that I gathered from different sources to make the topic more understandable. Please check out the course in Udemy for more insight on different deep learning algorithms.
Recurrent Neural Network (RNN) was designed, like any other Neural Network, to function as a specific part of the human brain. When looking at our brain’s cerebrum, we can divide it into the Temporal, Parietal, Occipital, and Frontal lobe.
Out of these, the Frontal lobe is the part which deals with short-term memory and remembers what happened in the immediate present and use it for decision making in the near future. It is this Frontal lobe that the RNN tries to replicate.
The Long Short-Term Memory (LSTM) is a variation of the RNN which solves the Vanishing Gradient Problem, which leads to the decrease in the gradient for each level in a regular RNN because of the Recurring Weight being close to 0. This leads to a difficulty in the manipulation of weights for a layer farther away from the “present” layer and thus makes predictions inaccurate. We will cover the details of this phenomenon in another blog post.
About the Dataset
We are using Google’s Stock price from 5 years till now from a financial website (Yahoo Finance). The idea of this project was based on a project by students from Stanford (Financial Market Time Series Prediction with Recurrent Neural Networks — Bernal, Fok, Pidaparthi). The team used an Echo State Network instead of an LSTM. We will use their findings as a comparison to how our LSTM performs. The team trained their model from late 2004 to early 2009 with data from Yahoo Finance. They created a visualization comparing their predictions with the actual data shown below.
We will also train our LSTM on 5 years of data. We can see that their predictions are quite close to the actual Stock Price. We can try to get the same accuracy from our model as well.
We assume that the present day is January 01, 2017. We will then get the Google Stock price for the previous 5 years. Once we train our LSTM, we will try to predict the stock price for the month of January 2017.
We have a training set of 5 years of Google Stock price. The test set contains the stock price for January 2017. We will first import it. As we can see in the data-frame, the dates range from 2012 to 2016.
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
training_set = pd.read_csv('Google_Stock_Price_Train.csv')
We can see that there are 3 types of stocks — Open, High, Low and Close. There is also a Volume column which contains the Volume of stocks for Google. We will focus on the Open stock price and will predict this price for January 2017.
The input for RNN will NOT be Date and Open Stock Price, it will just be the Open Stock Price for different time frames. We will get this by using just the Open Stock Price form our Dataframe using ‘iloc’.
training_set = training_set.iloc[:,1:2].values
For feature scaling, we have 2 options — Standardisation and Normalisation. Check this StackExchange discussion to learn the difference between Standardisation and Normalisation. Since LSTMs use many sigmoid functions, which work in 0s and 1s, it makes sense to use Normalisation, that converts the data between 0 and 1. But you can check the results of both the methods and choose for yourself.
from sklearn.preprocessing import MinMaxScaler
sc = MinMaxScaler()
training_set = sc.fit_transform(training_set)
We will next try to determine what the input (X_train) and output (y_train) will be. The input will be the value that changes with time i.e. the current open stock price (time t). The output would obviously be the future value of the same i.e. the near future open stock price (time t+1). The trick behind choosing the ranges would be that the prediction would be a day after the current value. The training set contains 1258 values. The input should be therefore restricted to 1257. The output, on the other hand, cannot contain the 0th day’s prediction, so it will start from 1 and end at 1258.
X_train = training_set[0:1257]
y_train = training_set[1:1258]
The use of LSTM (and RNN) involves the prediction of a particular value along time. Our input is currently 2-dimensional — we have 1257 rows and 1 column. We need to add another dimension to the input to account for time. This process is called reshaping. This format of the input is required by Keras and the arguments have to be in the order of batch_size (number of rows), timesteps (the number of time intervals or days between any 2 rows, in this case, it will be 1) and input_dim (number of columns). These 3 arguments are encapsulated together and come after the original data as the argument of Numpy’s reshape function.
X_train = np.reshape(X_train, (1257, 1, 1))
Building the RNN with LSTM
First, we will import 3 classes. The Sequential class that will initialize our RNN. The Dense class will create the output layer of our RNN. And finally, the LSTM class which will make our RNN have “Long Memory”.
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
We are predicting a continuous variable and are hence using a regression model instead of a classification model and call our object ‘regressor’. To this regressor object, we will add the LSTM layer, which in itself will take the input layer as input. The arguments we add to the LSTM layer are:
- units — number of memory units,
- activation function — can be tanh or sigmoid. Other arguments will be default. But there will be an additional argument — the input shape argument to specify the format of our input layer. This argument would be none and 1 — none to specify that model can expect any time step and 1 because we have just 1 column of input. The optimal number of memory units that we can use is 4, the activation function is sigmoid, and the input_shape would be (None, 1).
regressor = Sequential()
regressor.add(LSTM(units = 4, activation = 'sigmoid', input_shape = (None, 1)))
The next layer that we will add is the Output layer. We will use the Dense class, with the argument being units, everything else being default. The units are the number of neurons that should be present in the output layer, which is dependent on the dimensions of the output. So, our units argument for Dense class will have value 1.
regressor.add(Dense(units = 1))
To compile all the layers into a single system, we will use the compile function along with its arguments. The Optimizer can be RMSprop or Adam. Both the optimizers give similar results but RMS is memory heavy, so I went forwarded with Adam. But usually, RMSprop is recommended in Keras documentation. The Loss argument decides the manipulation of weights, so for this, we should be using Mean Squared Error for the continuous variable. For the test set, we might use Root Mean Square Error in its place. Other arguments will be default.
regressor.compile(optimizer = 'adam', loss = 'mean_squared_error')
Now we will fit this regressor to the training dataset. We will use the fit method for this. The important arguments include the input, output, batch_size, and epochs. We will keep the default batch size of 32 but will change epochs to 200 for better convergence.
regressor.fit(X_train, y_train, batch_size = 32, epochs = 200)
With each fitting epoch passage, we see that the loss keeps on decreasing. But we will get accurate results only if we have the same loss in the test set.
Making Predictions and Visualizing the Result
The methods of getting and transforming the Test Dataset are the same as for the Training Dataset. We will just rename it to real_stock_price so that we can distinguish between prediction and actual values.
The model that we have created is on scaled values. When used as it is, it will give incorrect predictions. So we will convert the input using the same “sc” scaling object used for scaling the training data. We will also have to reshape the data according to the format expected by the predict method in a 3d format.
test_set = pd.read_csv('Google_Stock_Price_Test.csv')
real_stock_price = test_set.iloc[:,1:2].values
inputs = real_stock_price
inputs = sc.transform(inputs)
inputs = np.reshape(inputs, (20, 1, 1))
Next, we use our model to make predictions on the test dataset. But we should keep in mind that every prediction is for the next day and not the present. We will use the regressor model to make predictions on the input and store it in the predicted_stock_price. The argument would obviously be the input. We now have the predicted stock price for the month of January 2017.
But this output will be scaled. We will have to use the inverse transform method of the same “sc” object we had used to scale the data to get the proper predicted values. This is the final prediction.
predicted_stock_price = regressor.predict(inputs)
predicted_stock_price = sc.inverse_transform(predicted_stock_price)
Now we visualize our predictions with the actual stock prices of Google. For this, we use the pyplot module. We have some arguments with pyplot, like the use of color. For real stock prices, we will use red. We will also include a label mentioning the real stock price. We will keep blue the color for the Predicted Stock price and change the label as well. We will also add the axis labels and title and display it.
plt.plot(real_stock_price, color = 'red', label = 'Real Google Stock Price')
plt.plot(predicted_stock_price, color = 'blue', label = 'Predicted Google Stock Price')
plt.title('Google Stock Price Prediction')
plt.ylabel('Google Stock Price')
What’s important to note is that this is a 1 time-step prediction i.e. input is of time t and prediction is of time t+1. We should note that we were able to make these predictions for 20 days only because we had the stock price for 20 days. This would not have been possible if we had the stock price for just 1 day. It would be wonderful to make predictions for a long future, but we would hardly get such amazing predictions. In finance, there a phenomenon called Brownian Motion, which makes future values of stock prices independent of the past, so it would be impossible to make long term predictions for a stock price.
Although the LSTM we have designed predicts quite accurately the stock prices of Google, the reason it is so accurate is that it is learning at time-step of 1. This leads to a reset of the hidden layer, and this process goes on and the model is not learning anything useful. This output is not relevant because of this 1 time-step learning.
To make our model useful in the real world, we need to increase the time-step.
The project discussed at the beginning of the post is indeed from Stanford, but it is created by undergraduate students of the CS229 course. The students of this course submitted a final project for evaluation; it was just put online but it is not published anywhere. There is a big difference between reading a Stanford paper published online and reading an assignment written by a group of undergrads.
For sample code and additional materials related to this project, please visit my Github Repo for this project.