Original article was published by John Foxworthy on Deep Learning on Medium

If you want to know where you are going, then you need to know where you came from. Initial conditions matter, but datasets that rhyme are easier to predict future values than secular trends. Below is a contemporary model called **LSTM**.

In the early 1990’s, two German computer scientists, **Sepp Horchreiter** and **Jurgen Schmidhuber** created a **Recurrent Neural Network** for Deep Learning called **Long Short — Term Memory **or LSTM.

To elaborate, **Machine Learning** can be defined as software program that makes decisions without explicit programming, . . . so a layered version is called **Deep Learning**. The phrase Neural Network is often used to describe Deep Learning . . . because, the Human Nervous System, controls what the body does, i.e. Neural Network is like a Nervous System. Skipping forward, **Francois Chollet**, a French Engineer at Google created the below **Keras** in early 2015 that runs on top of **TensorFlow**, a symbolic math library, used for many research and production issues at **Google**. Today, LSTM is used for many **Sequence Modeling** tasks.

Our dataset is a single series of monthly sales with a mathematical property of **Stationarity** to assist **one version of LSTM** for our prediction. **The common tendency of an average and its variation around that average, if more or less constant, is Stationarity**. The visual confirmation above, can be furthered technically, with a **Unit Root** test for a **Stochastic Trend**, as per below.

A great **p — value** rejecting our **Null Hypothesis** of no effect on stationarity, . . . leaving us to accept the **Alternative Hypothesis**, which does claim there is an effect on stationarity, i.e. there is stationarity.

Separately, what if our dataset is **Non — Stationary**? Here is an example below from a separate dataset to prove our reasoning . . .

The above dataset in the blue line plot is the median Residential Housing price of my current hometown . . . and below it fails the technical test. There is a strong chance to use **Feature Engineering** to build a separate methodology in a Factor — Based Model to Predict Future Prices.

The blue — colored line plot with all negative statistical values does bend towards stationarity, as the beginning of time series in the late 1990’s show some stability . . . However, there is a horrible p — value to accept the Null Hypothesis, . . . i.e. null as in nothing . . . or no effect on stationarity . . . so definitely non — stationary. The recent half from 2020 and before, destroyed stationarity in this blue — colored dataset.

Let’s begin with LSTM with the first dataset on monthly sales by differencing. Since there is a lack of prior work on Feature Engineering, then this leaves us with a lot **Data Preprocessing** work to run LSTM. Or in mathematics, this is called **Parameterization**, and we begin first with **Differencing**. We cannot accept the data as it is . . . we have to fit it. Fitting, does not involve changing, modifying or transforming the values of a single dataset. The preparation depends on an attribute of the dataset, which is **Linearity**. To put it another way, visually, . . . from the green line plot of monthly sales, there is a movement of each date from left to right . . . with one-unit increases causing the monthly sales, vertically, to move in straight lines . . . so this should help us in a **Linear Regression**.

After differencing the data in 12-month lags, then we need to test for historical contamination or use **Adjusted R — Squared **to ensure no **Serial Correlation**. No relationship of past values with present values. The purpose is to cross — validate the original single dataset in small parts to build a case for LSTM.

The rearrangement of the 12 lag moments and the linear regression runs raises the stationarity of the dataset. Fitting the data to the model to forecast future values . . . causes the accuracy score to increase to 0.837 to improve the project in the third attempt above.

Lastly, our statistical test now shows the strongest stationarity of all datasets with a strong p — value and similar negative statistical values. Let’s scale our generated features in a range to train our model.

After some transformation and scaling of the predicted value dataframe . . . we have our forecast.

Altogether, the Long Short — Term Memory model with Data Preprocessing can provide systematic sales forecasting to just about any organization.