# Solving Inventory Assortment Problem by Utilizing Product Embedding

The Inventory Assortment Problem

Imagine that you placed an online order for a coffee maker with a bag of coffee, and the coffee maker arrived the next day but your coffee arrived three days afterwards. Have you ever experienced such situations when different items ordered online at a same time ended up arrived in multiple boxes and at different times?

In this post, we will describe one of our best practices at JD.com to reduce such confusing situations for customers by carefully selecting the inventory assortment to distribute at each node in our fulfillment and warehousing network.

JD, as the online retailer who offers superior delivery speed over its competitors, delivers more than 90% in the same and next day.

To achieve faster delivery speed and better customer shopping experience, JD has built a multi-level distribution network (Figure 2) consists of Regional Distribution Centers (RDC), Front Distribution Centers (FDC), lower level distribution centers which we called TDC, and other local warehouses to cover 99% population of mainland China. JD uses lower level distribution centers such as FDCs and TDCs to meet the customer demand from medium or small sized cities as fast as possible. Orders shipped from the lower level distribution centers also provide additional cost savings in fulfillment.

However, the FDCs and TDCs cannot hold as many stock keeping units (SKUs) as the large distribution centers like the RDCs. The Inventory Assortment problem at FDC is to determine which SKUs to be held in the FDCs to maximize the number of orders that can be fulfilled entirely from the FDCs. If a customer places an order containing only one SKU, then the order can be fulfilled by the closest FDC if the SKU is kept as part of the inventory at the FDC. If multiple SKUs are contained in the order, then the order may be split. That is, some SKUs need to be fulfilled by a higher level distribution center like the RDC because the FDC does not hold these SKUs in its inventory, resulting in order split and potentially inconsistent delivery times (illustrated in Figure 3).

Mathematical Formulation

Let us take an illustrative example of how to make the inventory assortment decision for a single FDC.

Given the set of the orders placed during a time period, we would like to maximize the number of orders that can be satisfied solely by the FDC local inventory. If all SKUs in an order are present in the FDC, we get a reward of 1 for satisfying such an order; otherwise, we get 0 reward since the order will be split and fulfilled by multiple distribution centers. For any fixed inventory assortment at the FDC, we can compute the reward for each order, and the summation of the rewards is the total number of orders that need not to be split. Then the problem becomes to determine an inventory assortment which maximizes the rewards. However, such a problem is extremely difficult since the number of assortments can be very large. Selecting 100 SKUs out of a pool of 1000 candidate SKUs can result in 6.38×10­­139possibilities. JD.com has millions of items sold on the website to choose from to form an assortment.

Mathematically, the problem can be formulated as follows. We define I as the set of candidate SKUs, J as the set of (unique) order types.Each order type j J is associated with a weight v_j which is the number of times it appears in the order set.

We define the binary decision variables as X_j, i ∈ I being 1 if SKU i is selected in the FDC assortment; j J being 1 if order type j can be satisfied solely by the FDC assortment. We note that we assume we always have enough inventory at FDC for the SKUs belong to the assortment. The mathematical formulation of the problem is:

where K is the number of unique SKUs that can be stored in the FDC.

Product Embedding

Solving the above optimization problem exactly is not practical due to the combinatorial nature and the large scale of the problem. The problem can contain 10 million+ variables which makes it even unloadable into normal math solvers such as CPLEX.

An easy way to obtain a good quality solution is to use heuristic methods. The simplest heuristic one can think of is to rank SKUs by their popularities (we will refer the algorithm as Greedy Ranking through the article). The logic behind is that we can ship more orders from the FDC as the popular SKUs represent the majority of the orders. However, the Greedy Ranking does not provide good enough solution as it does not consider what SKUs are more likely to be purchased together.

In order to get a better solution, what we really need is the popularity on the order level, i.e., what are the most popular product bundles? Is a customer buying baby diapers more likely to buy beers at the same time? or some baby snacks of particular brands?

If we can identify what products in the popular orders are more likely to be purchased together and keep them as inventory at the FDC, then we will be confident that a large portion of the orders can be solely fulfilled by the local inventory. However, it is extremely hard to predict the popularity of an order pattern (or product bundles) compared to the product level popularity prediction, since the number of product combinations is almost infinitely large.

To conquer this challenge, we used a technique called SKU2Vec to compute a latent vector for each SKU. The idea is motivated by Google’s Word2Vec paper which proposes an unsupervised method to learn the representation of words by studying the sentences they appear in together. In our case, the SKUs are like words in a sentence, and an order containing multiple SKUs is an analogy of a sentence containing many words.

With SKU2Vec, the order context information is embedded in the SKU latent vectors. If the latent vectors of the two SKUs are close ‘in distance’, we know they are more likely to be purchased together, and thus should be considered being stored at the FDC together.

SKU2Vec methods follows a few steps. We first transfer an order containing N products into partial orders containing N-1 products where every product is removed from the original order in turns. Then the remaining partial orders serve as the input to a supervised model which tries to predict what is the missing product from the original order. Each product in the input partial order is represented by a low dimensional vector and averaged to get the vector representation of the partial order — called order intention vector. Then a predication is given based on the order intention vector. In this sense, products that appear frequently in the same type of orders shall have similar vector representations which indicate their closeness in the order contexts.

Here is a visual example of the vector representations of products projected onto 2D space using TSNE, trained using transactional information:

In Figure 5, the blue dots represent a bunch of baby diapers and red dots on on the bottom-right contains several snacks such as dates (大枣) products which are regarded as nutrition supplementals for new mothers who just gave birth. As diapers are among the most popular products which will definitely be stored in the FDC, the closeness between diapers and dates suggests that the dates products (not the beer:) should also be stored at the FDC although they are not among the top sellers.

End-to-End Neural Network Framework

We designed an End-to-End neural network framework to make inventory assortment decisions by directly capturing the co-purchase relationship between products. In the network, the novel techniques we used are:

– We used Embedding layers to map high dimensional categorical information associated with products such as category labels into latent space which can be used as inputs.

– We modeled the order split ratio directly as the loss function which closes the gap between prediction and optimization.

The architecture of the network is shown as below:

First, we fetch all product-level features such as categorical and quantitative information like recent sales, number of orders placed, page views etc. The categorical information is mapped to a vector by the Embedding layer and concatenated with the quantitative features. A product-level in-assortment probability is calculated based on the input feature vectors. Finally we multiple the in-assortment probabilities of each SKU in the original order to compute the order-level fulfilled-as-a-entire-package probability, i.e., the probability of having all SKUs in the FDC inventory assortment.

In this sense, whenever a pair of products always appear in a same type of order which is placed frequently, the in-assortment probabilities of the two products in the pair shall be either close to 1 or 0 in a synced way. Otherwise, the benefit of including any one of the product in the warehouse vanishes (think about the case where one product has probability 1 and the other 0, which leads to 0 fulfilled-as-a-entire-package probability of the order).

Performance Evaluation

We tested the SKU2Vec algorithm at three of our main regional warehouses. We evaluated the algorithm in a rolling horizon fashion as stated below, where 2 weeks of data are used as training sets, and the result is benchmarked using the following week’s orders.

Overall, we achieved around 2% order split ratio reduction compared to the benchmark algorithm which is an improved version of the Greedy Ranked algorithm. The reduction on the order split ratio implies 2 million less packages that need to be delivered each year. This further implies less work for our warehouse associates and delivery couriers and happier customers who will enjoy their new coffee machines with coffee in the same package.

Currently, we are working on piloting the algorithm on a few of our warehouses and implementing it in production system.

In this post, we showed how to use the state-of-art method such as product embedding through neural networks on solving a problem which shares the nature of predication and optimization. This post is the first one of a two-series blog post on the inventory assortment problem. In the second post, we will focus on a more Operations Research oriented way to achieve further improvements! (Deep Learning is not always the best method. Keep tune.)