The story behind an Instacart order, part 3: predicting the shop

Source: Deep Learning on Medium

In our last installment, we learned about the tech makes it easy to find and buy the ingredients your favorite weeknight lemony herbed salmon. In this third installment, we’ll concentrate on the predictive models that help us determine if your Meyer Lemons, Fresh Dill, and Salmon Fillets are in stock. We’re also taking a look at the machine learning model that makes it easier for customers to choose and communicate their preferred item replacements.

Training Data

Our catalog data is the technical foundation of our four-sided marketplace.

We have about five million unique products in our catalog and about 950,000,000 product listings in total. Each of these listings has a name, a product ID, and several sub-attributes like departments, aisle numbers, dietary/cuisine tags, and nutritional information. We also rely on historical data, detailing how products behave in the marketplace. This data includes item availability history, the number of times an item has been chosen as a replacement, and more.

Together, this data powers many of our machine learning models. Two models in particular work-hand-in-hand to ensure a smooth customer and shopper experience: our Item Availability Model and our Replacement Recommendation Model.

Pinpointing item availability

We predict the availability of over 500 million listings every 30 minutes.

Our Item Availability Model relies on historical retailer availability data, store location, an item’s purchase history, and shopper inputs to predict the likelihood that a particular item in our catalog is or isn’t in stock at any one of nearly 25,000 physical stores.

This is really hard to do. We get a sense of an item’s availability about once a day from our retail partners, but as we all know availability can be extremely variable throughout the day. One data drop a day doesn’t give us the hour-by-hour predictions we need to set expectations appropriately…especially for a harder-to-come-by fruit variety like the Meyer Lemon! Some locations may get new shipments from their growers seasonally. And when the fruit is in season, some store locations may only restock lemons in the mornings, while others may be a bit busier and stock produce section multiple times per day.

To understand variability throughout the day, we’ve built a model that looks at time-centric data features — notably the time of day and the day of the week that the item has been picked by a shopper in-store — to give each listing an availability score.

Here’s a look at how the availability score for Meyer Lemons may change throughout the day: