Time Series Forecasting Using Empirical Mode Decomposition and (Dilated) Convolutional Networks (3)…



So, as mentioned in previous posts, Empirical Mode Decomposition is a form of (additive) spectral analysis that involves decomposing a signal or time series into Frequency Modulated functions called IMFs — and the important thing, for us, is that these IMFs tell you a lot about how a signal behaves at differing frequency resolutions, which means that they are useful as features that deep learning methods can take and use to create predictions.

While this method has been augmented of late to include multiplicative decomposition, I’m sticking with the additive version in these simple experiments.

IRX-the ticker for the 13 year Treasury Bill. These are opening prices.

When you do the standard EMD process on this series, you get the Intrinsic Mode Functions or IMFs. Using a standard Python package with some standard parameter settings (used to determine when the sifting process stops), you get 11 IMFs and a final residue — or the “trend” as it’s more

IMFs 1–3 of the IRX time series. These are the highest frequency IMFs.

The lowest 3 IMFs — the last one actually being the residue — show the oscillations that happen on longer frequency scales. It is useful to see these longer cycles, which are often physically meaningful as they reflect long-term processes in the natural world that affect the values of the time series.

The last 2 IMFs and the residue, showing the long-term trend.

.

Now here is the important stuff: the network that I used. This is done with Keras running with a TensorFlow backend. In other experiments — which you will see in other posts — I use variations of this architecture; this mostly consists in adding or subtracting a layer or two, and changing filter number or size or dilation rate — but usually within a fairly small range. Number of filters is usually 32, 64, 128, 256…the kernel size is either 2 or 3 — and the dilation rate starts from 1 or 2 and I use powers of 2 until I get a (dilated) kernel that covers most of if not all of the sequence.

Keras/Tensorflow code for the network. Input layer and 4 conv layers are concatenated into a combo feature-map.

As you can see, there are 3 Conv1D layers, each of which has a different dilation rate, which I believe gives you coverage over different spans, from shorter to longer. There’s no point in the filter — which is padded with zeros to “dilate” it — being too large, or larger than the sequence it’s convolving over.

I used 1000 out-of-sample points — the next 1000 in the time series — to test out the forecasting performance. In essence, I trained the network so that it would take the last 10 values of a time series and then predict the decomposed into an 11-tuple vector version of the next value. If you were collecting data with some instrument, you don’t observe the IMFs, they are constructs, after all. So the idea is that if, given what data you have of the past, you can learn the structure behind the decomposition thus far, that you will have a pretty good basis with which to see how the decomposition will work in the future. Here’s what I got, in terms of the R2 score, as a rough indicator of how well the predictions match the actual out-of-sample values:

Predicted vs. Actual values over the approx 1000 next points in the time series. The R2 score is almost 99%!

99% !!! Well, assuming I didn’t make any huge methodological assumptions or mistakes, this seems pretty decent. However, the trick is seeing how this performs on other time series, and what adjustments to the architecture have to be made. The idea, of course, is that when it comes to prediction, that we have a fairly “turnkey” type solution that isn’t expensive and can be optimized with a few well-placed hyperparameter tweaks.

Source: Deep Learning on Medium