Using deep learning to predict not just what, but when

Using deep learning to predict not just what, but when

Deep learning/machine learning/AI is increasingly being used in business to predict customer behavior such as purchasing. In addition to improving operational efficiency, such as with inventory management, accurately predicting customer purchasing behavior helps organizations improve brand engagement, cross-sell, upsell, optimize pricing, and prevent churn.

But current approaches focus solely on what customers will buy. They fail to consider both what and when customers will buy it.

The ability to predict what a consumer will buy next is useful, especially when it comes to estimating customer lifetime value (CLV). There are several approaches that can be used when determining CLV, among them recency, frequency, monetary (RFM) analysis and beta geometric/negative binomial distribution (BG/NBD). However, none of the current methods capture the richness of customer transactions. That’s because they reduce the information to a set of parsimoniously parameterized probability distributions, such as the mean and standard deviation of a purchase rate.

Moreover, because they look at average — not individual — customer behavior over a period of time, they are unable to determine when, exactly, a customer will buy an item. To enable that level of hyper-personalization, we need to leverage deep learning.

At BCG Gamma, we’ve recently patented (US patent number 10,002,322) a next-gen forecasting and personalization model, which we call “Crystal,” that uses deep learning to predict both what transaction will be made and when — to within a time frame of a mere few hours. (See Exhibit 1).

Exhibit 1: Crystal, a next-gen forecasting and personalization model

Leveraging LSTM

To build this model, we took publicly available data and applied long short-term memory (LSTM), a framework whose key differentiator is that it remembers what is important and what you need to remember over the long term and also notes what is irrelevant, not useful, and can be forgotten in the short term. Among the applications LSTM is used for is to formulate how we learn languages, by predicting the next word based on the previous words used.

We adopted the LSTM-based language modelling work and applied it to transaction history by treating transactions as a sequence of words, with single words corresponding to single transactions and sentences to series of transactions. We then used LSTM to predict the next transaction based on previous transactions. (See Exhibit 2). And because deep learning is used to find and “learn” non-linear and rare relationships in transaction history, Crystal understands non-linear trends and one-off behavior, which means it doesn’t just recommend the most commonly purchased items.

Exhibit 2: Crystal uses the language modeling LSTM to predict transactions

More importantly, Crystal extends the language learning framework to predict the when in addition to the what. It learns the eccentric buying patterns of consumers to predict when — to within a few hours — they will make a purchase next.

A customized model for certain industries

Companies that can leverage the ability to predict both the what and when of future transactions aren’t limited to retailers; they can come from industries such as consumer packaged goods, industrial goods, as well as banking (to analyze spend patterns on credit card transactions) and energy (to track and predict consumption). It is not a solution for every industry; it won’t work in insurance, for example, where the transaction patterns are far too few.

Moreover, it is not a plug-and-play solution, but one that must be customized to the consumer consumption or transaction history use case in question. Once customized, however, the ability to predict both the what and the when of future transactions enables multiple use cases: from getting customers to purchase more expensive/higher-value products (product mix) more often (frequency), to decreasing churn by targeting customers with a high propensity to leave the customer base (churn prevention), to improving internal costs through better optimization (operational efficiency) and receiving a better ROI on coupons, discounts, and rewards offered (promotional efficiency).

Crystal can predict what transactions will take place with a more than 95% degree of accuracy. And it can predict when those transactions will take place to within just a few hours. (See Exhibit 3).

Exhibit 3: Crystal’s superior learning capabilities yield equally superior results

Algorithmically, Crystal is similar to CLV models like RFM or BG/NBD. But while they try to reduce a customer’s transaction history to just a handful of parameters, Crystal takes a more longitudinal approach, learning from the richness of long transaction histories and using that information to make predictions with high latitudinal precision. Think of it in terms of the law of averages vs. learning from the specifics.


When it comes to recommendation systems, the first question most companies ask, especially retailers, is how well do we know our customers? Because the better a company knows its customers, the more empowered it is to offer them the most personalized recommendations around what to buy.

But knowing what a customer will buy is helpful only to a point. By using a customizable deep learning model like Crystal, companies can also know when they will buy it.

Source: Deep Learning on Medium