Variational Auto-Encoders for Customer Insight

Original article can be found here (source): Deep Learning on Medium

Variational Auto-Encoders for Customer Insight

Github repository: VAEs-in-Economics

Neural networks are sometimes perceived as super complicated. They’re not. The most attractive application, in my opinion, of neural networks for small and medium-sized businesses, is in customer segmentation, and in my upcoming workshop at ODSC East 2020, “Variational Auto-Encoders for Customer Insight,” I will show you how to do it.

One-Dimensional VAE

The particular type of neural network we’re going to set up is a one-dimensional Variational Auto-Encoder, VAE for short. VAEs are very simple and powerful, and you might have seen a demonstration of them encoding peoples’ faces or handwritten digits, or some other image data. In my workshop, you will learn how you can use VAEs to encode customer characteristics, or any other business data that isn’t an image.

Customer data is the basis of market segmentation, and the encoding produced by the VAE will allow you to help your business better understand and serve its customers. You may have experience using clustering for customer segmentation, but as you might know, clustering can only handle a couple of variables. If you have more than 7–10 pieces of data about each customer, VAEs give you a solution to how you can use all of your data to its fullest potential.

Our task is dimension reduction. The idea is that you have data that has lots of variables in it, but with so many variables it’s hard to understand what’s going on. You want to think about your customers, not as this huge spreadsheet with tens or hundreds of pieces of information about each customer, but as something simple. You want to think of just a couple of customer types, with the idea that you know people are not very different from those ideal types.

Once you have someone’s type, you know almost everything you need to know from that huge dataset. The type gives you a good summary of the data that you’ve been collecting about the customer, but thanks to the VAE, it’s a simple description. For sure you’re going to lose some detail, but not much. And that simple description produced by the VAE is going to be super useful. Everyone in the business is going to be able to look at that type and immediately know the story of the customer. Instant insight.

https://odsc.com/boston/livestream/

I like to use the analogy of a lens. Before lenses were invented, we were severely limited in what we could observe with our eyes. Stars seemed indistinguishable from light fixtures on a celestial sphere. We couldn’t see that all life is made out of cells. That all life shares a common origin. The invention of lenses literally, but more importantly figuratively, changed how we view the world around us. By using telescopes, we can now see galaxies and black holes across the universe. Using microscopes, we regularly examine ourselves on a molecular level. Similarly, by using the lens of a VAE, we will be able to start observing a hidden aspect of our organizations’ data. The promise of the VAE is that once we observe these hidden types, we will gain a profoundly deeper understanding of our data’s underlying structure.

Okay, so how does this kind of magic work?

A Very Good Bottleneck

A one-dimensional VAE is a type of neural network that aims to condense the data into one number in a way that preserves as much detail as possible. The objective function for training the network is for it to give us back the same data that we put in with little or no distortion. Why do we do this? The neural network we use for this task has a very particular structure. It has a bottleneck hiding in the middle. This is the one time a bottleneck is a good thing. The bottleneck forces the neural network to find a low-dimensional encoding of all the patterns that exist in the data. If you make the bottleneck just one neuron, as I do, it will arrange the patterns along a continuum. As I will demonstrate, this will allow us to visualize the data in a way that would not be possible otherwise.

Once we complete training, we’re actually not going to be using the whole network. The useful part is the portion from the input to the bottleneck, and that’s it. This bit of the network is typically referred to as an “encoder.” We call it that because it “encodes” the complexity of the data in a simpler form. And that’s a beautiful thing.

VAEs allow you to dramatically decomplexify the data and make it as simple as possible. Once you make things simple, and identify good types to describe your clients, you can figure out what each type needs. For example, maybe you have a group of clients with low sales numbers. If you really understood that they are a different type than your common clients, you could study their special needs and design a product that addresses those needs specifically.

Marketing professionals think this kind of micro-segmentation will change the world. How? Because in the future we will be tailoring products to narrower and narrower groups of customers. Continuously finding needs that were not recognized until detected by real-time algorithms processing incoming data. Coming out of this workshop, you will know how to do this.

In the workshop, we’re going to apply the Variational Auto-Encoder (VAE) to the American Time Use Survey (ATUS) data. For each 24-hour period in the survey, we have the number of minutes spent by the respondent in each of the 389 activity categories. Can you imagine trying to get a feel for what’s going on in a 389-dimensional data set? Our approach is that we’re going to use the VAE to identify simple types of American time use. We want to understand what types of days are out there and how many surveyed days belong to each type.

So how do you define types?

In the context of a VAE the most natural and sound way to define types is using equal-sized bins. Using equal-sized bins means that we divide the range of encoded data into a predetermined number of sections, each of equal length. These are bins in the sense that all data that gets encoded into each section is said to “fall” into that bin. We treat every piece of data that gets encoded to the same bin as being of the same type. The bins become our definition of types. Sure, bins might not be the most sophisticated way of defining types, but they’re simple and transparent.

And once you think about how a VAE actually works, you’ll realize bins aren’t such a bad idea after all. One of the truly powerful things about VAEs is that any two points that were close to each other in the original high-dimensional space will end up close to each other in the encoded space. So that implies that they either end up in the same bin or adjacent bins.

Here’s a preview of the distribution of types we get from computing a VAE on the 2013 ATUS data.

We’ll get into the details of how to read the following chart in the workshop. The top left panel presents a histogram of how many respondents fell into each of sixty equal-sized bins of encoded space. Interpreting what the encoding means requires us to use the decoder to uncover the character of time use that is typical for each bin, and this is what the colorful, lower-left panel is about.

As you’ll see in the workshop, you do need a person to interpret what the types mean. Artificial intelligence can bring you up to this point, but from this point onward you will need a human. VAEs in that sense enable humans and AIs to collaborate. With each contributing from their area of strength, we get the most value for your business, its customers, and society as a whole.

To learn more about Variational Auto-Encoders, be sure to check ou my ODSC East 2020 talk, “Variational Auto-Encoders for Customer Insight.”