The battle for Time series forcasting — Part one.

Source: Deep Learning on Medium


Old VS New. Deep Learning VS Statistics. The future is at stake, will Time series always be a statistical problem or Neural Network can take this over too? I wanted to find out.

Go to the profile of Mimmo Iaia

DISCLAIMER: I won’t deep dive into the statistical analysis of the time series, I’m not a statistics expert nor I pretend to be one. I will use the most common statistical approaches and tool with a bit of reasoning. There are probably millions of expert who could do much better with a more lengthy statistical analysis, my point here is to show how deep neural network can be very useful in this topic without much manual variables engineering. No offence meant to the people who spent their life studying the time series from the statistical point of view.

Deep Learning is on the rise, the applications are countless and results more and more promising then ever, but how will deep neural network perform on an old problem where statistics is still queen of the realm, like Time series forecasting? After reading countless articles about this topic I decided to see for myself, using real world data.

In this and the next article I will explore the most common approaches to this problem, I will compare the results, the performances on training and inference and evaluate which one is still bring today the best value for production deployment.

Let’s get cracking.

Python will be my language of choice for several reasons, it’s easier to use (for me, at least), you can have more robust implementations of neural networks and it will level the field when it comes to compare the performances of the different approaches.

I’m going to use my own private dataset that is taken from an undisclosed company revenue stream, so no usual airport flights data, stock market prices or bitcoin/cryptocurrency madness.

First of all let’s start by having a look at the data, it is generally useful, or mandatory, to choose what kind of stati(sti)c approach to use:

And we can decompose it in it’s Trend, Seasonality and Residual Noise:

from statsmodels.tsa.seasonal import seasonal_decompose
result = seasonal_decompose(data, freq=365)

We clearly have a trend, but the seasonality is not extremely obvious and it looks longer than a year, with such fine resolution (we have mostly daily data) finding our variables for the statistical models won’t be a piece of cake.

In addition the mean looks strictly dependent on time and we have highy variance, this means that our series is not stationary, (it means it ‘randomly’ changes a lot over time) making it impossible to be modeled as it is with statistical approaches.

We can verify that just by looking at it or we can use some statistical tests like ADF (Augmented Dickey-Fuller) or KPSS (Kwiatkowski-Phillips-Schmidt-Shin).

from statsmodels.tsa.stattools import adfuller
from statsmodels.tsa.stattools import kpss
ADF_test = adfuller(data, autolag='AIC')
KPSS_test = kpss(data, regression='c')

They give us these results:

Results of ADF Test:
Test Statistic -2.069042
p-value 0.257144
#Lags Used 7.000000
Number of Observations Used 722.000000
Critical Value (1%) -3.439440
Critical Value (5%) -2.865551
Critical Value (10%) -2.568906
Results of KPSS Test: 
Test Statistic 3.196637
p-value 0.0240274
Lags Used 20.000000
Critical Value (10%) 0.347000
Critical Value (5%) 0.463000
Critical Value (2.5%) 0.574000
Critical Value (1%) 0.739000

This test will verify the so-called Null Hypotesys, in short if the p-value is greater (or smaller for KPSS test) than 0.05 the series is Non-Stationary.

So, to give statistics a chance we would have to look at the series in sub-samples to attempt to verify that they are locally stationary and therefore try to make local predictions, this however would decrease our ability to make accurate long-time predictions and intrinsically increase the error the longer we try to look at the future.

This process on analyzing the data to extract patterns, sub-series, classes, etc. is fundamental for any statistical approach, without it any algorithm that is not design to learn parameters by itself, as deep neural network do, will inevitably fail to make any precise prediction, as we will see later.

With Deep Neural Network on the other hand we don’t need much to worry about all of this, as long as we choose the right architecture the model will learn all the variables to find hopefully a good fit.

By the time we’ve done with all of this, a deep learning practitioner would be at the coffee machine, having a couple of models already running predictions to be tested.

It must be said however that no matter which technique we use, looking longer into the future will always introduce some degrees of error, no prediction is perfect, particularly long term once, the entropy of the universe makes sure of that and this is not magic.

So now, without further ado, let’s introduce our contestants.

On the left Statistical side of the ring:

  • AutoARIMAAutomatic AutoRegressive Integrated Moving Average
  • SARIMAX — Seasonal AutoRegressive Integrated Moving Average with eXogenous variables
  • Random Forest Regressor — An ensemble learning method to build decision trees.
  • TBATS — Trigonometrig regression, Box-Cox Transformation, Arma errors, Trend, Seasonality.
  • Prophet — Facebook’s automatic forecasting procedures. A Bayesian based curve fitting method.

On the Right Deep Learning side of the ring:

  • LSTM — Long Short Term Memory Neural Network. Recurrent neural network cells stacked together.
  • Seq2Seq,- Sequence-to-Sequence. An encoder-decoder model.

As you started to see the left side of the ring requires a lot of analysis and preparation, while the right side is already here weaving at the crowd…

model = Sequential() 
model.add(CuDNNLSTM(units=128, return_sequences=True, input_shape=(features_set.shape[1], 1)))
model.add(Dropout(0.1))
model.add(CuDNNLSTM(units=128, return_sequences=True))
model.add(Dropout(0.1))
model.add(CuDNNLSTM(units=128, return_sequences=True))
model.add(CuDNNLSTM(units=128))
model.add(Dense(units = 1))
model.compile(optimizer = 'adam', loss = 'mean_squared_error')
model.fit(features_set, labels, epochs = 200, batch_size = 32)

We’ll find out more and who will win in the next article: The battle for Time Series Forcasting — Part Two.

If you liked this article click on the 👏 button and follow me here on medium, if you have any questions, requests, complaints (or compliments) write them in the comments below and, if you want check out my other articles/tutorial (in italian) regarding Machine Learning and Artificial Intelligence!