Algo Trading can change your future!

Source: Artificial Intelligence on Medium

Algo Trading can change your future!

Math Techniques viz. ARIMA, Frequency Decomposition, Fourier Filtering, Linear Regression & Bi-directional LSTMs on Feature Engineered Stock Market Data.

“When weather changed, nobody believed the laws of physics have changed it”

If AI can predict the future then why not predict the stock prices? Who don’t like to be rich? But that’s easier said than done. Market sentiments may depend not only on technical factors but also on the political situation, product breakthroughs, company M&As and even recent news bytes. How to objectively quantify all these factors to predict the stock demand-supply curve? Well, it might sound daunting but it is within the realms of the ambitiously possible!

Here we introduce some mathematical techniques to analyze time series data from stock markets and predict the future curve. If you are among the ones who is curious to apply machine learning for trading, but not sure where to start then you have reached the right place.

The source code of all techniques and respective data-sets can be found here.

Quant Trading

Quant strategies follow a data-driven approach to pick stocks. This approach which seeks to reduce the role of human bias conceptually fall in between active and passive trading. The stock data is a classic example of “ time series” where the prices are sampled at regular intervals.

Time Series Data

A time series is a series of data points indexed in time order. Two properties of time series are:

  • Periodic: when there is a repeating pattern in the data.
  • Trending: when average value of the curve increases over time.
Time Series: Periodic & Trending

Gaussian is the most loved distribution among mathematicians, as it has well defined characteristics. Similarly, if time series is stationary, then it is easy to forecast.

Stationary Time Series

A time series is stationary when it satisfies 3 conditions,

  1. Mean Constant: average of a window of values is constant over time. (no upward or downward trend)
  2. Variance Constant: variance of values in a window should be constant.
  3. Covariance is only a function of gap: inter-variability of values should not vary based on time, but can vary based on gap.

A. Data Preparation

In real world, time series contain noise, trends and seasonality as well. Trends can result in varying mean over time, whereas seasonality can result in changing variance over time. Hence, in data preparation stage, we need to induce stationarity and de-noise the signal prior to modeling.

i) Induce Stationarity

Akin to box-cox transform or log-transform which convert a non-Gaussian distribution to Gaussian, there are 2 ways to convert a non-stationary time series to stationary.

  1. Transforming: if series is log-normal, then take log(values)
  2. Differencing: Akin to differentiation, take difference with previous value
First & Second order differences

Second order difference is more likely to be stationary than first order. Thus, you can do d’th order differences to try make the values stationary. As you know, only a few distributions like log-normal or linear progression can be converted to Gaussian. Similarly, not all time series can be made stationary but d’th order difference works well in practice. Here, d is a hyper-parameter.

How to identify Stationary vs non-Stationary TS?

Just like we do QQ plot to check whether a distribution is Gaussian, an easy check for stationarity is to visually check for the above 3 conditions.

Second curve is trending down. Hence, not stationary.

Similar to KS Test or Anderson Darling (AD) test to statistical check whether a distribution is Gaussian, you can do Augmented-Dickey Fuller Test (ADF) to check the time series is stationary or not.

ii) Fourier Filtering

Fourier Filtering helps to de-noise the signal in order to find out the significant curve. This technique can be used before feeding the prediction model or even to smooth the model output.

We can plot the power of the FFT of a signal and take inverse FFT back to reconstruct a signal. Then remove all the high frequencies and transform back from signal peak frequencies.

If we run the above code on noisy data such as daily record of the price of the S&P500 Index from 1950 to 2015, then we can find the significant price changes after removing the day to day fluctuations.

Blue: S&P500 Daily Price. Orange: Filtered Signal

B. Prediction Models

After the data preparation step, we can feed in the values to prediction models. These are the commonly used models for time series:

  • ARIMA (Auto Regressive Integrated Moving Average)
  • Frequency Decomposition
  • Linear Regression on Feature Engineered Data
  • LSTM & Bi-directional LSTM

i) ARIMA Model

aka. Box-Jenkins model, ARIMA was created in 1976. It is conceptually similar to a linear regression model applied on time series data.

ARIMA has 3 parts:

  • Auto Regression
  • Integration (Differencing)
  • Moving Average

Auto Regression (AR)

AR (p) is like a linear regression model on previous ‘p’ values in the series. We assume future values can be computed by ‘p’ previous values, just like “almost a line” assumption in linear regression. Here, p is a hyper-parameter.

Linear sum of last ‘p’ values is conceptually similar to LR

Moving Average

MA (q) model is a linear regression over previous ‘q’ error terms, similar to AR (p). Hence, ARMA model can be represented as below

ARMA = linear sum of previous ‘p’ values & ‘q’ errors

Hence, ‘p’ previous values and ‘q’ error terms are the features and α and θ are linear regression parameters. Integration is same as the concept of “differencing” explained above.

In brief, ARIMA (p, q, d) can be defined as linear regression model on previous ‘p’ values & ‘q’ errors after differencing ‘d’ times. Here p, q & d are hyper-parameters.

Auto-Correlation

If a set of values of the curve correlates with itself after a gap of k, then covariance value tend towards 1. But how do we know the gap?

Auto Correlation Factor (ACF) and Partial ACF are plots which help to determine the gap. While ACF plots correlation value against gap, k, partial ACF plots cumulative correlation value against k.

Courtesy
X = k (gap). Y = Correlation value (outside 95% suggests correlation)

Confidence interval is set to 95% , suggesting that correlation values outside of this code are very likely a correlation and not a statistical fluke.

To make plot easier to read, limit the lag on x-axis to 50.

From the above graph, k = 1 to 10 has acceptable auto correlation.

Partial Auto Correlation plots the incremental value only. i.e. you can know the benefit of using one more previous value.

Lets try to apply ARIMA model on TCS stock data-set depicted below.

TCS Stock Market Prices

When you run the auto correlation you get the graph below.

High correlation found on last 5~10 values
Predicted values correlate with values in test data.

For close watch, we plot only the prediction values alongside original values.

Predicted vs Original: Zoomed in

ARIMA model is extended to SARIMA(Seasonal Autoregressive Integrated Moving Average) to support the seasonal component of the series. It is used on univariate time series containing trends and seasonality.

Autoregressive Integrated Moving Average with Explanatory Variable (ARIMAX) model is used when there are additional explanatory variables (multivariate) in categorical or numeric format.

ii) Frequency Decomposition

Auto correlation is one way to compute periodicity. But a more scientific way to find periodicity is Fourier Transform. This technique can be used to find out the most significant periodic changes in historical data, which gives a dependable hint about future.

Lets try to plot,

Sine and Cosine Additive

Plot the power spectral density curve and find out the significant peaks.

Take inverse FFT of frequencies corresponding to significant peak points.

Top: Inverse FFT of 1st PSD peak. Bottom: Inverse FFT of 2nd PSD peak

Below is the code to do frequency decomposition depicted above.

When you add the separated waves above, you will get the input wave back. This proves the above 2 waves represents the components of the input wave.

Sum of Inverse FFT curves

iii) Linear Regression

We were using only ‘p’ previous values and ‘q’ errors to predict. But we can use other features such as day of week, time of day, holidays etc. This technique, known as Feature Engineering is more of an art than science.

Here we take a CSV file containing daily record of the price of the S&P500 Index from 1950 to 2015. Lets try to predict response variable, i.e. closing price, prior to a day. We can use features the features below:

1) Average Price of past 365 days.

2) Ratio of average price for the past 5 days & past 365 days.

3) Mean and Standard Deviation of previous 365 days.

Mean Absolute Error (MAE) = 99.43097824659425
Median Absolute Percentage Error (MAPE) = 6.0%

An ideal way to solve any time series problem is to first use differencing or transformation to make the data more stationary and then apply advanced regression models like GBDT or Random Forest.

However, the most popular models for time-series data are RNNs in Deep-learning. The state of the art models for time series are LSTMs, GRUs, Attention models & Transformers. As LSTMs are more powerful than the rest, here we take a deeper look.

iv) LSTM & Bi-directional LSTM

Long Short-Term Memory networks, can be used to learn from the series of past observations to predict the next value in the sequence.

A vanilla LSTM model has a single hidden layer of LSTM units, and an output layer used to make prediction. Here we are working with a uni-variate series, so the number of features is one.

We apply LSTM on the same S&P 500 data taken for Fourier filtering. First we draw the auto correlation graph to estimate the lag.

Auto Correlation on Closing Price

Lag upto 500 is found to be of positive correlation. Since first 100 has more than 80% correlation, window size is set as 100 and epochs as 200.

LSTM Prediction of S&P500 curve

As mentioned before, we can use Fourier filtering to smooth the output also.

Fourier Filtered Prediction Curve of S&P 500

Bi-directional LSTMs

In time series sequence prediction problems, it can be beneficial to allow the LSTM model to learn the input sequence both forward and backwards and concatenate both interpretations.

Bidirectional LSTM for univariate time series forecasting by wrapping the first hidden layer.

How to start trading?

  • Open a demat account in Zerodha or Upstox.
  • Do document upload and account setup (15 mins)
  • Install the respective app and monitor the market charts.
  • Go to algo labs from menu and familiarise yourself.
  • To do algo trading, install Amibroker or Ninjatrader.
  • Buy data from some authorised data vendors
  • Write code for the above algos and analyze.

If you are shaky on the above math models, then you can feed in the live data from data vendors to third party solutions like Amazon Forecast also. You can also do virtual trading on paper trading apps without investing real money.

Conclusion

In this blog, you have seen some basic mathematical techniques to model data indexed in time order. It will give you sufficient technical base to start off with time series data. However to beat the market, you may have to resort to advanced techniques.

We tried to model stock market behaviour using supervised learning approaches viz. Linear Regression or LSTM. But Reinforcement Learning is more robust to account for various environmental factors that affects stock market, as it aims to maximise reward in a given situation.

Fourier analysis works best with waves or wavelets that are regular and predictable, for which stock market is an antithesis. Hence it is beneficial to look into spectral analysis and signal extraction also. If you like to know about such advanced techniques kindly hold on for the coming up blog or you can reach me here.

The source code of all the discussed techniques can be found here.

References