Is it Actually Meaningful to Forecast the Stock Market?

Original article was published on Artificial Intelligence on Medium

Strategies & Perspectives in Trading

There are generally two general views in how to predict the stock market: fundamental analysis and technical analysis. Many successful investors claim to have made their fortunes on the backs of both of these trading strategies.

Fundamental analysis postulates that every company’s stock has an intrinsic, or fundamental, value, and that a stock’s price is simply a fluctuation above and below the stock’s intrinsic value, continually correcting itself. Then, making trades is just about buying when a stock is below its fundamental value and selling when it is sufficiently above. Fundamental analysts look at a plethora of macro and microeconomic factors from the state of the economy to the effectiveness of a company’s management to determine what they believe a company’s stock is truly worth.

On the other hand, technical analysis is based on the assumption that all these factors are priced into a stock’s price, and that instead the price of a stock can be predicted using past trends. Technical analysts believe that trends and patterns dictate what the next prices should be, instead of the intrinsic value. The marketing price is everything — factors that may affect it are not considered because they have been factored into the price already.

Then, of course, there is the Random Walk Theory, which says that, except for general trends like ‘the tech industry is performing well’, the high-variance nature of stock is essentially random, composed of too many infinitesimally small factors to be reasonably predicted. One interesting exercise is to pick out which of the following four stock price sequences, all of which are the same length (1000 values) — one is from an iconic American S&P 500 company and the other three were randomly generated sequences by randomly adding or subtracting any number between -3 and 3 (inclusive) to the previous value.

The answer: the top left diagram is IBM’s stock, from January 24th, 2000 to January 15th, 2004: 1000 days worth of data. It’s difficult to tell, though! Even randomly generated sequences can have very convincing cycles, peaks, and valleys. So many patently ridiculous investing theories end up actually appearing to ‘work’ — consider, for instance, one theory that said prices could be predicted by how high dresses were, or another that bases the stock market on the outcome of the Super Bowl — that one can’t help but to wonder if the price of a stock really is just a general trend masked by countless layers of rambling randomness.

The random walk theory would also explain why so many investing legends attribute their success to such a wide variety of investment strategies: the ones that we always hear about are the lucky success stories, but we never hear about those that lose their entire investments on very similar strategies. It also explains who so many successful investors also made so many mistakes along their trading history.

Believers of the Random Walk Theory do not believe that the future of a stock can be predicted using its past prices, and instead advocate for a buy-and-hold strategy. It may not make as much money as those with risky — and lucky — investment strategies, but if you hold a stock for long enough, the chances are on your side that you will make a profit. For instance, out of all S&P 500 companies, if you were to purchase stock from that company at any time from IPO to present and hold it for one year, it is almost guaranteed that your investment would yield at least a 5% increase.

When most data scientists talk about predicting the stock market, they are usually talking about technical analysis — utilizing mathematical techniques like moving averages, support & resistance, trend lines, and momentum-based indicators to forecast the future based on previous patterns.

Stock Forecasting Methods

Perhaps the simplest of stock forecasting methods is the moving average. The simple moving average simply takes the average of a window of values, and has the effect of smoothing out high-variance data and leaving behind a smoother ‘trend line’ for technical analysts to analyze. The exponential moving average is a variant that weights more recent values higher to give analysts a more present-minded curve.

The Autoregressive Integrated Moving Average, or ARIMA, is a very popular stock forecasting algorithm that utilizes moving averages. Instead of examining the actual values, however, ARIMA analyzes the difference in values to avoid the bounded problem, which is when time series predictions go outside the range of values the model was trained on. There are three parameters in an ARIMA model:

  • p: the number of lag observations in the model, determining how far back the ARIMA model should consider in relevancy to each data point.
  • d: the number of times that the raw observations are differenced; also known as the degree of differencing. Differencing is the act of taking the difference between one variable and another; second-degree differencing means taking the difference of the difference.
  • q: the size of the moving average window; also known as the order of the moving average. The larger the size, the greater the lag and the less variance there is in the data.

These three parameters are usually searched for in permutations computationally such that they yield the least error metric. The parameters are used in a mathematical model, and have been testified to perform reasonably well on so-called ‘common sense stocks’ that most people believe will go up or down.

But as purely statistical models begin to fall out of favor, many have turned to deep learning models: specifically, the recurrent neural network, which was created to handle sequential data by being able to ‘memorize’ the output of the previous prediction to better inform itself about what the output of the current prediction should be. Deep Recurrent Neural Networks seem to show promise in predicting second-by-second stock prices, perhaps being able to model the general trading patterns of thousands of transactions. On the other hand, it has little success predicting long-term on stocks that a human would be unsure about.

There are a variety of methods created to make forecasting stock prices easier for models; differencing is one method, which takes the difference between a place in time t and t+1. Then, the model only needs to predict the difference and can hence represent a growth of 1000 by predicting +10 ten times, whereas if the data had not been differenced it would not know to predict such a high value because presumably none of such high data points had been in the dataset. Another method of addressing this is to use percent change, which divides the price at t+1 by the price at time t.

Stationarity is a time series’ property that describes how much its statistical properties remain constant over time. This includes the mean (the general trend of the series must remain constant) and the standard deviation (the variance must remain constant throughout the series). Through statistical procedures like differencing to eliminate cycles of seasonality and other non-stationary features, the end result should look like a heart pulse — a constant center and standard deviation.


It’s important to remember that technical analysis is based on the assumption that history repeats itself. The extent to which this is true is unclear: few could have predicted the housing bubble, 9/11, or the coronavirus based on the history of the price. There’s an argument to be made that technical analysis can work on a day-to-day basis and in predictable stretches of time, but it is difficult to forecast more than three months into the future because the prediction assumes that nothing unusual will happen.

Investing is a significant gamble, and predicting stock prices vastly into the future is like trying to predict the roll of a dice. The expected value is 3.5, but it really could be anything. Even as models and techniques become more and more sophisticated, the odds of successfully predicting the future repeatedly with a singular winning strategy are still stubbornly low. Models that show second-by-second promise may be able to reap consistent rewards if used en masse, but regardless, attempting to artificially model a system one can argue is more based in psychology and randomness than the actual performance of companies remains a difficult task.