A real-word application on Azure Deep Learning Virtual Machine
This week I attended The Data Science Conference in Chicago: this is an industry-focused yet academic conference, focusing on real-world solutions. Every presentation is an example of cutting edge work being done, and a great opportunity to get an in-depth update on the state of industry and learn best practices from other data scientists.
In my presentation, I shared a few insights on my latest research on “Neural Networks for Forecasting Financial and Economic Time Series”. Neural networks are a very comprehensive family of machine learning models and, in recent years, their applications in finance and economics have dramatically increased. However, this class of algorithms is not really familiar with the concept of a time axis.
The goal of this article is to provide a practical introductory guide to neural networks for forecasting financial time series data using Azure Deep Learning Virtual Machine. A multiple step approach to design a neural network forecasting model will be explained, including an application of stock market predictions with LSTM in Python.
Introduction to time series forecast
One of the most important elements of today’s decision-making world, in both the public and the private sectors, is the forecasting of macroeconomic and financial variables. During the past few decades, econometric model-based forecasting has become very popular in the private and the public decision-making process. In order to understand better the meaning of “Time series Forecast”, let’s split the term in two parts:
- Time series is a sequence of observations taken sequentially in time.
- Forecast means making predictions about a future event.
When forecasting is made on a time series data, such as events happening over a time interval, then it is called time series forecasting. Time Series Forecasting Forecasting is the process of predicting future events based on historical data.
Time series forecast has been in use across multiple industries for quite some time; it is commonly used in every industry to guide future decision, for example in retail sales forecast is very important, so that the raw material can be procured accordingly. The most famous example is weather forecasting, where based on the pattern in the past and recent changes, the future can be predicted. These predictions are very important and usually are the first step to solve other problem, as plan the power generations to avoid unnecessary power disruptions or overproduction.
In any forecast scenario, there are three questions that you always want to ask yourself before building the forecast model:
- What is the time horizon of predictions that is required?
- At what temporal frequency are forecasts required?
- Can forecasts be updated frequently over time or must they be made once and remain static?
The answers to these three questions will help you define the most critical components of time series, that are:
- Trend: Long-term, gradual increase or decrease of the series.
- Cyclical: Long-term, gradual up-and-down potentially irregular swings of the series.
- Seasonality: Regular, relatively short-term repetitive up-and-down ﬂuctuations.
- Error: Random variability in the observations that cannot be explained by the model.
Introduction to Neural Networks
Simple Exponential Smoothing and Autoregressive integrated moving average (ARIMA) are considered more traditional time series forecast models. However, in recent years, Neural Networks have become one of the most popular trends in machine learning and have applications to many areas, including driverless cars and robotics, speech and image recognition, financial forecasting.
Neural networks are a set of algorithms, modeled loosely after the human brain, that are designed to recognize patterns. Deep learning is the name we use for “stacked neural networks”; that is, networks composed of several layers.
Neural networks are a set of algorithms, that are designed to recognize patterns and deep learning is the name we use for “stacked neural networks”; that is, networks composed of several layers. The layers are made of nodes. A node is just a place where computation happens and combines input from the data with a set of coefficients, or weights, that either amplify or reduce that input.
These input-weight products are summed and the sum is passed through a node’s so-called activation function, to determine whether and to what extent that signal progresses further through the network to affect the ultimate outcome. This is Forward Propagation: we move from the input layer, to the hidden layer and finally to the output layer!
What are RNNs and LSTMs? Let’s Unroll!
The idea behind Recurrent Neural Networks is to make use of sequential information. In a traditional neural network we assume that all inputs (and outputs) are independent of each other. But for many tasks that’s a very bad idea. If you want to predict the next word in a sentence you better know which words came before it. RNNs are called recurrent because they perform the same task for every element of a sequence, with the output being depended on the previous computations.
Another way to think about RNNs is that they have a “memory” which captures information about what has been calculated so far. In theory RNNs can make use of information in arbitrarily long sequences, but in practice they are limited to looking back only a few steps (more on this later).
LSTM networks are quite popular these days . LSTMs don’t have a fundamentally different architecture from RNNs, but they use a different function to compute the hidden state. An LSTM module (or cell) has 5 essential components which allows it to model both long-term and short-term data:
- Cell state: This represents the internal memory of the cell which stores both short term memory and long-term memories
- Hidden state: This is output state information calculated w.r.t. current input, previous hidden state and current cell input which you eventually use to predict the future stock market prices. Additionally, the hidden state can decide to only retrieve the short or long-term or both types of memory stored in the cell state to make the next prediction.
- Input gate: Decides how much information from current input flows to the cell state
- Forget gate: Decides how much information from the current input and the previous cell state flows into the current cell state
- Output gate: Decides how much information from the current cell state flows into the hidden state, so that if needed LSTM can only pick the long-term memories or short-term memories and long-term memories
One critical advantage of LSTMs is their ability to remember from long-term sequences (window sizes) which is hard to achieve by traditional feature engineering.
Stock market predictions with LSTMs: A real-word application on Azure Deep Learning Virtual Machine
Good deep learning models can use the history of a sequence of data and correctly predict what the future elements of the sequence are going to be. This is very helpful in many different scenarios, for example when you need to model stock prices correctly, so as a stock buyer you can reasonably decide when to buy stocks and when to sell them to make a profit.
For this specific scenario, I built my model using a Deep Learning Virtual Machine: deep learning requires large amount of computational power to train models with these large datasets. With the cloud and availability of Graphical Processing Units (GPUs), it is becoming possible to build sophisticated deep neural architectures and train them on a large data set on powerful computing infrastructure on the cloud.
The Deep Learning Virtual Machine is a specially configured variant of the Data Science Virtual Machine (DSVM) to make it more straightforward to use GPU-based VM instances for training deep learning models. Here you can find more information on how to get started and provision Data Science Virtual Machine on Azure.
For this real-word example I used a stock market data set from Kaggle with the following information:
- Open: Opening stock price of the day
- Close: Closing stock price of the day
- High: Highest stock price of the data
- Low: Lowest stock price of the day
In the rest of this article, I will describe a multiple step approach to design a neural network forecasting model. This approach can be summarized as follow:
The first step is to define hyperparameters:
Next you define placeholders for training inputs and labels:
Now you can define the parameters of the LSTM and regression layer:
For each batch of predictions and true outputs, you can calculate the Mean Squared Error. Finally, you define the optimizer you’re going to use to optimize the neural network.
Now you can train and predict stock price movements. The goal of this post was to provide a practical introductory guide to neural networks for forecasting financial time series data using Azure Deep Learning Virtual Machine.
A multiple step approach to design a neural network forecasting model was explained, including an application of stock market predictions with LSTM in Python.
- Azure Data Science Virtual Machine
- Azure Deep Learning Virtual Machine
- Stock Market Predictions with LSTM in Python: https://www.datacamp.com/community/tutorials/lstm-python-stock-market
Source: Deep Learning on Medium