Time Series Analysis: Basic Concepts

Original article was published on Artificial Intelligence on Medium

Patterns in Time Series

The time series in the figures we have seen so far follow a trend which means an overall direction upwards or downwards. This is just one of the patterns we can observe in time series.

Another common pattern is seasonality which indicates patters that repeat at observed or predictable intervals. For instance, sales data of retail stores usually show seasonality. The sales amount increases and decreases at certain times or special dates. Let’s see an example:

We usually encounter time series with multiple patterns combined such as a trend and seasonality. It is highly likely to have some noise as well:

Up to now, we have seen stationary time series data that follows some pattern. However, life is full of surprises so it is likely that some events break the pattern and creates non-stationary time series. For example, coronavirus is such a big event that messed up many patterns.

Time series does not always follow a pattern or include seasonality. Some processes produce just random data. This kind of time series is called white noise which makes it almost impossible to make accurate predictions.

Applications of Time Series Analysis

One of common applications of time series analysis is forecasting which means predicting the patterns in the future based on the values in the past. It is very important for businesses to have accurate forecast. For instance, a retail business can optimize its stocks if the demand forecast is accurate. It also increases customer satisfaction because the demand of customers will be fulfilled with no exceptions. It is a valuable asset to be able to plan ahead.

In predictive analytics, we train the models with known data (training data) and expect the model to predict new, previously unseen data (test data). This process of training and testing is known as supervised learning because we kind of supervise the model with known (labeled data). When working with normal data, we can randomly split it into training and test subsets. However, when working with time series, we cannot randomly split. We define a point in time as a splitting point to divide the entire dataset into training and test subsets. Since time series data is sequential, test set follows the training set.

Note: If time series has seasonality, it is important that both training and test sets contain a whole number of seasons. For example, in the figure above, training set contains 3 seasons and test set contains 1 season. If we split at a point inside a season, model may not be able to capture the trends or seasonality well.

We sometimes need to predict the past. Consider a time series from time t to (t+400). By analyzing this data, we can try to predict the series from time (t-40) to (t-1). This is kind of predicting what generates the data we have. Sometimes there are missing data at certain time points and we may need to predict those missing points as well. This process of predicting past data or missing points is called imputation.

Another application is anomaly detection which means detection abnormal activities in time series data. Consider a time series data of the traffic to a website. The peaks in normal trend might be an indication of malicious attacks. Another example would be unexpected peaks in sales of a retail store which might be due an event like coronavirus. Anomaly detection is a type of unsupervised learning in which the models try to find the structure in the data and detect outliers.

The following figure shows peaks as anomalies in time series data. Please keep in mind that anomalies may not always be that easy to spot.

Speech recognition tasks are also done with time series data. Sound waves are analyzed to capture certain words in order to build up the entire speech.