Time Series beginners guide!

Source: Deep Learning on Medium

Go to the profile of Musab Khan

In this blog, we will learn about, What is Time Series analysis, Stages in time series forecasting, Different kinds of forecasting methods, Different types of plots, Simple Moving Averages, cumulative moving Averages, Exponential Moving Averages.

Time series analysis is the collection of data at specific intervals over a period of time, with the purpose of identifying trends, cycles, and seasonal variances to aid in the forecasting of a future event. Data is any observed outcome that’s measurable. Unlike in statistical sampling, in time series analysis, data must be measured over time at consistent intervals to identify patterns that form trends, cycles, and seasonal variances. Measurements at random intervals lose the ability to predict future events.

There are two main goals of time series analysis: (a) identifying the nature of the phenomenon represented by the sequence of observations, and (b) forecasting (predicting future values of the time series variable). Both of these goals require that the pattern of observed time series data is identified and more or less formally described. Once the pattern is established, we can interpret and integrate it with other data (i.e., use it in our theory of the investigated phenomenon, e.g., seasonal commodity prices). Regardless of the depth of our understanding and the validity of our interpretation (theory) of the phenomenon, we can extrapolate the identified pattern to predict future events.

Problem Statement

There is a company X which has been keeping a record of monthly sales of Clothes for the past 3 years. Company X wants to forecast the sale of the Clothes for the next 4 months so that the demand and supply gap can be managed by the organization. Our main job here is to simply predict the sales of Clothes for the next 4 months.

Data set comprises of only two columns. One is the Date of the month and the other is the sale of the Clothes in that month.

Stages in Time Series Forecasting

Solving a time series problem is a little different as compared to a regular modeling task. A simple/basic journey of solving a time series problem can be demonstrated through the following processes. We will understand about tasks which one needs to perform in every stage. We will also look at the python implementation of each stage of our problem-solving journey.

Steps are –

Visualizing Time Series

In this step, we try to visualize the series. We try to identify all the underlying patterns related to the series like trend and seasonality. Do not worry about these terms right now, as we will discuss them during implementation. You can say that this is more a type of an exploratory analysis of time series data.

Stationarising Time Series

A stationary time series is one whose statistical properties such as mean, variance, auto-correlation, etc. are all constant over time. Most statistical forecasting methods are based on the assumption that the time series can be rendered approximately stationary (i.e., “stationarised”) through the use of mathematical transformations. A stationarised series is relatively easy to predict: you simply predict that its statistical properties will be the same in the future as they have been in the past! Another reason for trying to stationarise a time series is to be able to obtain meaningful sample statistics such as means, variances, and correlations with other variables. Such statistics are useful as descriptors of future behaviour only if the series is stationary. For example, if the series is consistently increasing over time, the sample mean and variance will grow with the size of the sample, and they will always underestimate the mean and variance in future periods. And if the mean and variance of a series are not well-defined, then neither are its correlations with other variables

Finding the best parameters for our model

We need to find optimal parameters for forecasting models one’s we have a stationary series. These parameters come from the ACF and PACF plots. Hence, this stage is more about plotting above two graphs and extracting optimal model parameters based on them. Do not worry, we will cover on how to determine these parameters during the implementation part below!

Fitting model

Once we have our optimal model parameters, we can fit an ARIMA model to learn the pattern of the series. Always remember that time series algorithms work on stationary data only hence making a series stationary is an important aspect

Moving Averages

We already talked about how there is randomness or noise in our data, and how to separate it with decomposition, but sometimes we simply want to reduce the noise in our data and smooth out the fluctuations from the noise in order to better forecast future data points.

A moving average model leverages the average of the data points that exist in a specific overlapping subsection of the series. An average is taken from the first subset of the data, and then it is moved forward to the next data point while dropping out the initial data point. A moving average can give you information about the current trends, and reduce the amount of noise in your data. Often, it is a preprocessing step for forecasting.

While there are many different moving average models, we’ll cover the simple moving average (SMA), autoregressive integrated moving average (ARIMA), and exponential smoothing.

Simple Moving Average:

The way an SMA is calculated is that it takes the subset of the data mentioned in the moving average model description, adds together the data points, and then takes the average over the subset of data. It can help identify the direction of trends in your data and identify levels of resistance wherein business or trading data, there is a price ceiling that can’t be broken through. For instance, if you’re trying to identify the point where you can’t charge past a certain amount for a product, or in investing why a stock can’t move past a certain price point, you can identify that ceiling with a moving average.

Here’s a snippet of our loan rate data (this data is grouped by day):

Date Daily Average Rates

1 2011–06–01 4.896667

2 2011–06–02 4.857500

3 2011–06–03 4.947500

4 2011–06–06 4.908571

5 2011–06–07 4.946250

6 2011–06–08 4.861250

If we wanted the subset of the data to be, say, n = 6, then the first SMA of that subset would be 4.902956 for our mortgage rate for jumbo loans. It’s up to you to choose the period of time (n data points) for the average rates that you want to analyze. Here is the SMA graphed of the mortgage rate data, where you can see how the noise is smoothed out to better detect the trend:

Cumulative Moving Average:

In a cumulative moving average, the mean is calculated from a data stream up to the current data point, which gets updated with each new data point that comes in from the stream. This is often used to smooth data when analyzing stock market data, network data, and even social media data that has some numeric value such as sentiment scores. With smoothing, it becomes easier to detect trends. The key difference between the cumulative moving average and a simple moving average is that because the cumulative moving average is applied to a data stream versus static data, the average is updated as each new data point comes in through the stream.

Our Zillow mortgage rate data isn’t streaming data, but here is an example of a cumulative average plotted:

In the above graph you can see the actual data points, along with the blue line that represents the cumulative moving average and recall that the average gets updated with each new data point.

Exponential Moving Average:

Unlike the simple moving average that moves forward one data point and drops the oldest data point to take the average from, an exponential moving average (EMA) uses all data points before the current data point. Weights are associated with each data point and those further away from the current point are given less weight, in an exponentially decreasing fashion, than the data points closest to the current point. When the most current data has the most influence, you can better determine trends that matter most in real time which is why EMA is used in evaluating stock prices. Exponential moving average is also used in trading to identify support and resistance levels of stock prices as mentioned earlier in our discussion of Moving Averages. But, this model is not only used in trading.

Below is a graph of our mortgage series data that shows the daily rates for mortgage rates between 2011 and 2017 and includes the EMA for 50 and 100 days which can be used to detect long-term trends. If you wanted to detect short-term trends you can use shorter periods.


Forecasting is one of the most relevant tasks when working with time series data, but it’s hard to know where to get started. Although you can forecast with SMA or EMA, another moving average model called Autoregressive Integrated Moving Average is popular for fairly accurate and quick forecasting of time series.

Autoregressive Integrated Moving Average:

The Autoregressive Integrated Moving Average, or ARIMA model, is a univariate linear function that is used for predicting future data points based on past data. ARIMA combines the models own past data points to determine future points versus a linear regression model that would rely on an independent variable to predict the dependent variable such as using treasury note rates to predict mortgage rates. Because of ARIMA’s reliance on it’s own past data, a longer series is preferable to get more accurate results.

ARIMA relies on a stationary model, meaning that the mean, variance and other measures of central tendency must remain the same over the course of the series. To achieve a stationary model, you would need to remove trend, seasonality, and business cycle if that exists. The differencing method is used to fit the model by subtracting the current value from the previous data point, including the lag. Note that the lag is simply the difference between values from the same period.

For instance, if you have data that has monthly seasonality, you would take each data point, subtract it from the previous data point in the same month from the previous year, and so on in order to difference your data and create a stationary model.

In an ARIMA model you’ll find that you can adjust various parameters which are:

p (AR) which is the number of AutoRegressions,

d (I) which is the number of differences, and

q (MA) which is the forecast error coefficient, which together, make up (AR, I, MA).

Here is an example of an ARMA forecasted model using our mortgage data:

Many statistical packages allow you to either set the ARIMA parameters manually or with an automated function that best fits your model. If you choose to select your parameters manually, you can estimate the appropriate ones with Autocorrelation and partial Autocorrelation function (ACF, PCF). If you aren’t familiar with these functions check out this online resource PSU that has more information on ACF and PCF.


After fitting our model, we will be predicting the future in this stage. We will find out the sales of the shampoo for the next 4 months.

Time series in general, including those outsides of the financial world, often contain the following features:

  • Trends — A trend is a consistent directional movement in a time series. These trends will either be deterministic or stochastic. The former allows us to provide an underlying rationale for the trend, while the latter is a random feature of a series that we will be unlikely to explain. Trends often appear in financial series, particularly commodities prices, and many Commodity Trading Advisor (CTA) funds use sophisticated trend identification models in their trading algorithms.
  • Seasonal Variation — Many time series contain seasonal variation. This is particularly true in series representing business sales or climate levels. In quantitative finance we often see seasonal variation in commodities, particularly those related to growing seasons or annual temperature variation (such as natural gas).
  • Serial Dependence — One of the most important characteristics of time series, particularly financial series, is that of serial correlation. This occurs when time series observations that are close together in time tend to be correlated. Volatility clustering is one aspect of serial correlation that is particularly important in quantitative trading.

Since we are now familiar with a basic flow of solving a time series problem, let us get to the implementation. The language used is python in this case.

%matplotlib inline
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import datetime
from dateutil.relativedelta import relativedelta
import seaborn as sns
import statsmodels.api as sm
from statsmodels.tsa.stattools import acf
from statsmodels.tsa.stattools import pacf
from statsmodels.tsa.seasonal import seasonal_decompose

In [2]:

df = pd.read_csv('portland-oregon-average-monthly-.csv', index_col=0)
df.drop(df.index[114], inplace=True)

In [3]:

start = datetime.datetime.strptime("1973-01-01", "%Y-%m-%d")
date_list = [start + relativedelta(months=x) for x in range(0,114)]
df['index'] =date_list
df.set_index(['index'], inplace=True)

In [4]:

df.columns= ['riders']
df['riders'] = df.riders.apply(lambda x: int(x)*100)

In [5]:

df.riders.plot(figsize=(12,8), title= 'Monthly Ridership', fontsize=14)
plt.savefig('month_ridership.png', bbox_inches='tight')

In [6]:

decomposition = seasonal_decompose(df.riders, freq=12) 
fig = plt.figure()
fig = decomposition.plot()
fig.set_size_inches(15, 8)

In [7]:

from statsmodels.tsa.stattools import adfuller
def test_stationarity(timeseries):

#Determing rolling statistics
rolmean = pd.rolling_mean(timeseries, window=12)
rolstd = pd.rolling_std(timeseries, window=12)

#Plot rolling statistics:
fig = plt.figure(figsize=(12, 8))
orig = plt.plot(timeseries, color='blue',label='Original')
mean = plt.plot(rolmean, color='red', label='Rolling Mean')
std = plt.plot(rolstd, color='black', label = 'Rolling Std')
plt.title('Rolling Mean & Standard Deviation')

#Perform Dickey-Fuller test:
print 'Results of Dickey-Fuller Test:'
dftest = adfuller(timeseries, autolag='AIC')
dfoutput = pd.Series(dftest[0:4], index=['Test Statistic','p-value','#Lags Used','Number of Observations Used'])
for key,value in dftest[4].items():
dfoutput['Critical Value (%s)'%key] = value
print dfoutput

In [8]:

Results of Dickey-Fuller Test:
Test Statistic -1.536597
p-value 0.515336
#Lags Used 12.000000
Number of Observations Used 101.000000
Critical Value (5%) -2.890611
Critical Value (1%) -3.496818
Critical Value (10%) -2.582277
dtype: float64

In [9]:

df.riders_log= df.riders.apply(lambda x: np.log(x)) 

Results of Dickey-Fuller Test:
Test Statistic -1.677830
p-value 0.442570
#Lags Used 12.000000
Number of Observations Used 101.000000
Critical Value (5%) -2.890611
Critical Value (1%) -3.496818
Critical Value (10%) -2.582277
dtype: float64

In [10]:

df['first_difference'] = df.riders - df.riders.shift(1) 
Results of Dickey-Fuller Test:
Test Statistic -1.938696
p-value 0.314082
#Lags Used 11.000000
Number of Observations Used 101.000000
Critical Value (5%) -2.890611
Critical Value (1%) -3.496818
Critical Value (10%) -2.582277
dtype: float64

In [11]:

df['log_first_difference'] = df.riders_log - df.riders_log.shift(1) 
Results of Dickey-Fuller Test:
Test Statistic -2.047539
p-value 0.266126
#Lags Used 11.000000
Number of Observations Used 101.000000
Critical Value (5%) -2.890611
Critical Value (1%) -3.496818
Critical Value (10%) -2.582277
dtype: float64

In [12]:

df['seasonal_difference'] = df.riders - df.riders.shift(12) 
Results of Dickey-Fuller Test:
Test Statistic -2.469741
p-value 0.123011
#Lags Used 3.000000
Number of Observations Used 98.000000
Critical Value (5%) -2.891516
Critical Value (1%) -3.498910
Critical Value (10%) -2.582760
dtype: float64

In [13]:

df['log_seasonal_difference'] = df.riders_log - df.riders_log.shift(12) 
Results of Dickey-Fuller Test:
Test Statistic -1.919681
p-value 0.322860
#Lags Used 0.000000
Number of Observations Used 101.000000
Critical Value (5%) -2.890611
Critical Value (1%) -3.496818
Critical Value (10%) -2.582277
dtype: float64

In [14]:

df['seasonal_first_difference'] = df.first_difference - df.first_difference.shift(12) 
Results of Dickey-Fuller Test:
Test Statistic -9.258520e+00
p-value 1.427874e-15
#Lags Used 0.000000e+00
Number of Observations Used 1.000000e+02
Critical Value (5%) -2.890906e+00
Critical Value (1%) -3.497501e+00
Critical Value (10%) -2.582435e+00
dtype: float64

In [15]:

df['log_seasonal_first_difference'] = df.log_first_difference - df.log_first_difference.shift(12) 
Results of Dickey-Fuller Test:
Test Statistic -8.882112e+00
p-value 1.309452e-14
#Lags Used 0.000000e+00
Number of Observations Used 1.000000e+02
Critical Value (5%) -2.890906e+00
Critical Value (1%) -3.497501e+00
Critical Value (10%) -2.582435e+00
dtype: float64

In [16]:

fig = plt.figure(figsize=(12,8))
ax1 = fig.add_subplot(211)
fig = sm.graphics.tsa.plot_acf(df.seasonal_first_difference.iloc[13:], lags=40, ax=ax1)
ax2 = fig.add_subplot(212)
fig = sm.graphics.tsa.plot_pacf(df.seasonal_first_difference.iloc[13:], lags=40, ax=ax2)

/Users/seanwilson/anaconda/lib/python2.7/site-packages/matplotlib/collections.py:590: FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison
if self._edgecolors == str('face'):

In [17]:

mod = sm.tsa.statespace.SARIMAX(df.riders, trend='n', order=(0,1,0), seasonal_order=(0,1,1,12))
results = mod.fit()
print results.summary()
Statespace Model Results 
Dep. Variable: riders No. Observations: 114
Model: SARIMAX(0, 1, 0)x(0, 1, 1, 12) Log Likelihood -976.135
Date: Wed, 23 Mar 2016 AIC 1956.271
Time: 13:16:18 BIC 1961.743
Sample: 01-01-1973 HQIC 1958.492
- 06-01-1982
Covariance Type: opg
coef std err z P>|z| [0.025 0.975]
ma.S.L12 -0.1377 0.050 -2.757 0.006 -0.236 -0.040
sigma2 1.424e+07 2.62e-10 5.44e+16 0.000 1.42e+07 1.42e+07
Ljung-Box (Q): 44.28 Jarque-Bera (JB): 4.18
Prob(Q): 0.30 Prob(JB): 0.12
Heteroskedasticity (H): 1.44 Skew: 0.20
Prob(H) (two-sided): 0.29 Kurtosis: 3.91

[1] Covariance matrix calculated using the outer product of gradients.

In [27]:

mod = sm.tsa.statespace.SARIMAX(df.riders, trend='n', order=(0,1,0), seasonal_order=(1,1,1,12))
results = mod.fit()
print results.summary()
Statespace Model Results 
Dep. Variable: riders No. Observations: 126
Model: SARIMAX(0, 1, 0)x(1, 1, 1, 12) Log Likelihood -970.257
Date: Wed, 23 Mar 2016 AIC 1946.514
Time: 13:20:09 BIC 1955.023
Sample: 01-01-1973 HQIC 1949.971
- 06-01-1983
Covariance Type: opg
coef std err z P>|z| [0.025 0.975]
ar.S.L12 0.5591 0.004 142.852 0.000 0.551 0.567
ma.S.L12 -0.9986 0.009 -116.113 0.000 -1.015 -0.982
sigma2 1.143e+07 7.45e-10 1.53e+16 0.000 1.14e+07 1.14e+07
Ljung-Box (Q): nan Jarque-Bera (JB): 8.68
Prob(Q): nan Prob(JB): 0.01
Heteroskedasticity (H): 0.73 Skew: 0.31
Prob(H) (two-sided): 0.40 Kurtosis: 4.29

[1] Covariance matrix calculated using the outer product of gradients.
[2] Covariance matrix is singular or near-singular, with condition number 8.33e+30. Standard errors may be unstable.
/Users/seanwilson/anaconda/lib/python2.7/site-packages/statsmodels-0.8.0-py2.7-macosx-10.5-x86_64.egg/statsmodels/tsa/statespace/sarimax.py:989: VisibleDeprecationWarning: boolean index did not match indexed array along dimension 0; dimension is 113 but corresponding boolean dimension is 101
trend_data = trend_data[~np.isnan(endog)]

In [19]:

df['forecast'] = results.predict(start = 102, end= 114, dynamic= True) 
df[['riders', 'forecast']].plot(figsize=(12, 8))
plt.savefig('ts_df_predict.png', bbox_inches='tight')
/Users/seanwilson/anaconda/lib/python2.7/site-packages/statsmodels-0.8.0-py2.7-macosx-10.5-x86_64.egg/statsmodels/base/data.py:551: FutureWarning: TimeSeries is deprecated. Please use Series
return TimeSeries(squeezed, index=self.predict_dates)

In [20]:

npredict =df.riders['1982'].shape[0]
fig, ax = plt.subplots(figsize=(12,6))
npre = 12
ax.set(title='Ridership', xlabel='Date', ylabel='Riders')
ax.plot(df.index[-npredict-npre+1:], df.ix[-npredict-npre+1:, 'riders'], 'o', label='Observed')
ax.plot(df.index[-npredict-npre+1:], df.ix[-npredict-npre+1:, 'forecast'], 'g', label='Dynamic forecast')
legend = ax.legend(loc='lower right')
plt.savefig('ts_predict_compare.png', bbox_inches='tight')

In [21]:

start = datetime.datetime.strptime("1982-07-01", "%Y-%m-%d")
date_list = [start + relativedelta(months=x) for x in range(0,12)]
future = pd.DataFrame(index=date_list, columns= df.columns)
df = pd.concat([df, future])

In [24]:

df['forecast'] = results.predict(start = 114, end = 125, dynamic= True) 
df[['riders', 'forecast']].ix[-24:].plot(figsize=(12, 8))
plt.savefig('ts_predict_future.png', bbox_inches='tight')

In [25]:

## A little extra I was doing to try and include an exogenous variable for number of weekdays in each month.
## Thinking that people take public transportation more during the week, a count of weekdays in each month could help explain some of the variance.

start = datetime.datetime.strptime("1973-01-01", "%Y-%m-%d")
moving = start
d = {}
year =0
month =0
while moving < datetime.datetime(1982,7,1):
# print moving
if moving.year == year:
if moving.month == month:
if moving.weekday() < 5:
d[str(moving.year)+"-"+ str(moving.month)] += 1
d[str(moving.year)+"-"+ str(moving.month)]=0
if moving.weekday() < 5:
d[str(moving.year)+"-"+ str(moving.month)] += 1
# d[moving.year] = {}
d[str(moving.year)+"-"+ str(moving.month)]=0
if moving.weekday() < 5:
d[str(moving.year)+"-"+ str(moving.month)] += 1

year = moving.year
month = moving.month
moving += datetime.timedelta(days=1)
df_dow = pd.DataFrame(d.items(), columns=['Month', 'DateValue'])
df_dow.Month = pd.to_datetime(df_dow.Month)
df_dow.sort('Month', inplace=True)

def holiday_adj(x):
if x['Month'].month==1:
x['DateValue'] -=1
return x['DateValue']
elif x['Month'].month==2:
x['DateValue'] -=1
return x['DateValue']
elif x['Month'].month==5:
x['DateValue'] -=1
return x['DateValue']
elif x['Month'].month==7:
x['DateValue'] -=1
return x['DateValue']
elif x['Month'].month==9:
x['DateValue'] -=1
return x['DateValue']
elif x['Month'].month==10:
x['DateValue'] -=1
return x['DateValue']
elif x['Month'].month==11:
x['DateValue'] -=3
return x['DateValue']
elif x['Month'].month==12:
x['DateValue'] -=2
return x['DateValue']

return x['DateValue']

df_dow['days'] = df_dow.apply(holiday_adj, axis=1)
df_dow.set_index('Month', inplace=True)
df_dow.index.name = None


Finally, we were able to build an ARIMA model and actually forecast for a future time period. Keep note that this is a basic implementation to get one started with time series forecasting. There are a lot of concepts like smoothening etc and models like ARIMAX, prophet etc to build your time series models.