Effectiveness of LSTM for Financial Time Series

Source: Deep Learning on Medium

Effectiveness of LSTM for Financial Time Series

Github link: https://github.com/stoictrader/LSTM_categorical

The reason for this post is an interesting discussion I had on twitter regarding Convolution Neural Net (CNN) for market prediction https://twitter.com/BeingHorizontal/status/1205388149767077888?s=20

“ Feeding stock pattern images to CNN and predicting the market, just the way how CNN predicts cats & dogs. I just went through a code where someone tried to attempt this. Your thoughts about the approach?”

Now let’s look at this hypothetical scenario.

Imagine data that predict ahead of time. It will have no correlation with predictions at the time step (t+1) but it will have a definite correlation with predictions in the future (here, t+10). To put it in other words, if the data is 10 periods old then only it predicts 1 period ahead. We call it context.

Here is our custom time series looks like

To make it even more interesting while keeping in line with the actual market conditions, let’s design our artificial time series in such a way that we can frame our deep learning model as a multiclass problem. Markets will be slightly bullish, extremely bullish and bullish outliers. The same for the bearish scenario. in my experience, this works a lot better instead of framing it as a continuous variable problem. Predicting the return % instead of bias, which in my humble opinion is a very difficult task due to the dynamic nature of the market.

This is how we created 6 different classes. Single-digit up, double-digit up, positive outliers. Then, single-digit down, double-digit down and negative outliers.

I kept outliers zero for simplicity, which means the model will predict choice 0,1 and 3,4. This is how the series looks like.

Should note that the sequence of a bullish and bearish pattern is random, which means a two-digit +ve sequence may follow with anything from single-digit +ve sequence to double-digit -ve sequence.

Now, imagine if we have to use a convolution neural net (CNN) for this. Obviously, it can’t distinguish b/w bullish and extremely bullish scenarios since the CNN will be blind to scale. Images for the single-digit positive sequence will look exactly the same as images for a double-digit +ve sequences, the same for bearish patterns.

Even for binary classification models using CNN, image output at each time step will look different. To reiterate, a sliding window by 1 is important for live predictions as new data arrives at every new time step.

Scroll down and check the image where I demonstrated how a one-step sliding window looks like in Excel, you can pretty much imagine how image output will be different for each window frame.

In the past, I have done some experiments on Random Forest (RF) using a kind of similar time series. The first problem is RF will use the big chunk of training data to split it into branches based on target at t+1 which we know are random, they are only linked to timestep t+11. Theoretically, we can feed data at period t — 10 but the approach is not robust. That’s because we know the definite time lapse in this example. In a real-time environment, we never know how much context is needed for accurate predictions, it could be t-10 or t-30.

LSTM based deep learning model will be an ideal choice to deal with such problem.

I used 15 time-steps in LSTM (sequences). 15 rows will be used as a context to predict target at t+1. Out of those 15 rows, only a few rows will have a useful relationship with the target variable. In this screenshot, cells in the column ‘inputs’ and ‘targets’ which has direct correlation denoted by the same color. As we slide the window further down by 1 as shown in this pic, the number of rows that had a useful relationship with the target will get reduced by 1, and so on.

As we can see, the LSTM model has to rely on the historical sequence for the pattern to emerge so it can predict the next target. Knowing how many + and — values in each step wouldn’t help as they keep changing. It has to store in the memory what happened before and what followed afterward. Hence the name Long Short-term Memory

You can increase the sequence from 15 but shouldn’t reduce below 10 as we know that there is no pattern below the sequence of 10. In the real-time environment, it’s better to have this time sequence number large since we never know how far the model has to dig to establish the pattern.

For instance, if we run the code using time sequence = 5 the model will fail to learn anything useful and wouldn’t converge.

This is how the LSTM network looks like.

Results: Almost 98% accuracy, and near-perfect convergence.

You can increase the complexity further by adding more features in CSV with a different type of correlation with target variables, or add noise and see how it affects the model.

That’s it for now, adios amigos.