Offer Optimization Using Machine Learning

Original article was published on Artificial Intelligence on Medium

Offer Optimization Using Machine Learning

For teenagers… X will be the best offer!
For adults… Y will be the best offer!
Likewise, For elderly people… Z will be the best offer.

But who decides this??? TBH, many people trust their guts to do this. However, every time trusting our guts doesn’t guarantee success. Regarding that, you will be introduced to a new way of offering your customers; with an illustration of Starbucks’ data.

Have you seen my blog on Targeted Promotions? This blog is similar to that but complexity is very high. Click here to see that blog.

(Credits: Starbucks official site)

These kinds of pamphlets are being sent through social media, emails, SMSs, and other channels.

But Company’s Growth Hacker would be interested in getting the highest possible profit from offers, right? To know how to do this with machine learning, read ahead!

Note: You can access the code accompanied with this blog here.

Illustration: Starbucks Offers Optimization

Goal: Build a model that predicts how the customer will respond to an offer. (either he will complete it or not)

The whole example is divided into parts:
1) Introduction
2) Exploratory Data Analysis of Dataset
3) Data Pre-processing
4) Data Visualization (with a bunch of business-related questions)
5) Data Modeling
6) Results & Conclusion

1) Introduction

This data set contains simulated data that mimics customer behavior on the Starbucks rewards mobile app. Once every few days, Starbucks sends out an offer to users of the mobile app. An offer can be merely an advertisement for a drink or an actual offer such as a discount or BOGO (buy one get one free). Some users might not receive any offers during certain weeks.

Not all users receive the same offer, and that is the challenge to solve with this data set.

Every offer has a validity period before the offer expires. As an example, a BOGO offer might be valid for only 5 days. You’ll see in the data set that informational offers have a validity period even though these ads are merely providing information about a product; for example, if an informational offer has 7 days of validity, you can assume the customer is feeling the influence of the offer for 7 days after receiving the advertisement.

To give an example, a user could receive a discount offer buy 10 dollars get 2 off on Monday. The offer is valid for 10 days from receipt. If the customer accumulates at least 10 dollars in purchases during the validity period, the customer completes the offer.

The data is contained in three files:
• portfolio.json — containing offer ids and metadata about each offer (duration, type, etc.)
• profile.json — demographic data for each customer
• transcript.json — records for transactions, offers received, offers viewed, and offers completed

2) Exploratory Data Analysis

Well, EDA is a very ubiquitous task for every dataset and problem statement. Here, we will go step-by-step.

① Portfolio.json: (metadata of offers)
• The dataset has 6 columns and 10 rows.
• This dataset has no null values nor duplicates.
• There are three types of offers: ‘BOGO’(Buy One Get One free), ‘informational’, and ‘discount’.
• There are 4 offers included in the dataset that are classified as: “BOGO”, 4 offers classified as “discount” and 2 offers classified as: “informational”.

First five rows of Portfolio data

② Profile.json: (metadata of customers)
• The dataset has 5 columns and 17,000 rows.
• The dataset has no duplicated rows.
• The dataset has 2175 missing values on each of: ‘gender’, ’income’ variables.
• The customer’s ages range from 18 to 101. Although those 2175 customers were registered at age 118; I still considered this specific age an outlier because it clearly appears that there is something wrong related to these 2175 rows in the dataset.
• The missing values in ‘gender’ and ‘income’ variables which are are related solely and specifically with the 2175 customers registered at age 118. In other words, customers at age 118 have no registered ‘gender’ and ‘income’.
• Customers’ income ranges from 30,000 and 120,000 with most of the customers’ incomes fall between 50,000 and 75,0000.
• According to the available data, There are three ‘gender’ categories into which the customers fall in ( M, F, and O). Keeping in our mind the above observation that there are 2175 missing values, Male Customers (8484 men) are more than Female Customers(6129 women) with 57% of customers are Males compared to 41% Females. However, there are 212 customers chose “O” as their gender.

First five rows of Profile data

③ Transcript.json: (customer-offer interaction data)
• The dataset has 4 columns and 306,534 rows.
• The dataset has no duplicated rows nor missing values.
• The ‘value’ column is a dictionary in which we can apply some kind of Feature Engineering to extract useful data that would surely contribute to the success of our future model.
• There are four types of events in this dataset: ‘transaction’, ’ offer received’, ‘offer viewed’, and ‘offer completed’.
• All the events that are classified as ‘transaction’ do not have an ‘offer_id’ within its ‘value’ column.

First five rows of Transcript data

What do you think, who will buy more coffees with offers? Men or Women? Answer is ahead… stay in!

3) Data Pre-processing:

Now we know how data looks like, we can put our perceptions on this data to process it in the required form for the model.

① Data Pre-processing on Portfolio.json:
• Rename ‘id’ column to ‘offer_id’.
• Change the unit of the ‘duration’ column from days to hours.
• Rename ‘duration’ column to ‘duration_h’ representing that the unit of measurement is ‘hours’.
• Normalize ‘difficulty’ and ‘reward’ features using the MinMaxScaler.
• Create dummy variables from the ‘channels’ column using one-hot encoding then Drop the ‘channels’ column.
• Replace the ‘offer_id’ by more easy ids.
• Replace the ‘offer_type’ by integers representing each offer type as follow:
1: bogo
2: discount
3: informational

② Data Pre-processing on Profile.json:
• Rename ‘id’ column name to ‘customer_id’.
• Re-arrange the columns to have the ‘customer_id’ column the first column in the dataset.
• Replace the customer_id string values with the easiest numerical values.
• Replace age = 118 by NaN value.
• Remove customers (drop rows) with no ‘age’, ‘gender’, and ‘income’.
• Change the data type of ‘age’ and ‘income’ columns to ‘int’.
• Create a new column ‘age_group’ that includes the age_group to which each customer belongs.
• Replace the ‘age_group’ categorical label by a corresponding numerical label, as follows:
1: teenager
2: young-adult
3: adult
4: elderly
• Create a new column ‘income_range’ that includes the income-range to which the customer’s income belongs.
• Replace the ‘income_range’ categorical label by a corresponding numerical label, as follows:
1: average (30,000–60,000)
2: above-average (60,0001–90,000)
3: high (more than 90,000)
• Replace the ‘gender’ categorical labels with the corresponding numerical label, as follows:
1: F (Female)
2: M (Male)
3: O
• Change the data type of ‘became_member_on’ column from int to date and put it in the appropriate format in order to have a readable date format that can be analyzed easily if required.
• Add a new column ‘start_year’, that will present the year at which the customer becomes a member, to the existing dataset (for further analysis).
Add a new column ‘membership_days’, that will present the number of days since the customer became a member, to the existing dataset (for further analysis).
•Create a new column ‘member_type’ representing the type of the member: new, regular, or loyal depending on the number of his ‘membership_days’.
Replace the ‘member_type’ categorical label by a corresponding numerical label, as follows:
1: new (member since 1200 days or less)
2: regular (1201–2000 days of membership)
3: loyal (more than 2000 days of membership)
•Drop ‘age’, ‘income’, ‘became_member_on’ and ‘membership_days’ columns, since they are no longer needed.

③ Data Pre-processing on Transcript.json:
• Rename ‘time’ column to ‘time_h’ representing that the unit of measurement is ‘hours’.
• Rename ‘person’ column to ‘customer_id’.
• Replace the categorical values in the ‘customer_id’ column by the newly initiated numerical values corresponding with each customer id, which resulted from the previous preprocessing for the ‘id’ feature.
• Extract each key that exists in the ‘value’ column to a separate column than dropping the ‘value’ column.
• Fill all the NaNs in the ‘offer_id’ column with ‘N/A’ values (i.e. Not Applicable).
• Drop the ‘value’ column since it is no longer needed.
• Excluding all events of ‘transaction’ or ‘offer received’ from our clean_transcript dataset.
• Replace the ‘event’ categorical labels with the corresponding numerical label, as follows:
1: offer completed
2: offer viewed
• Replace the categorical values in the ‘offer_id’ column by the corresponding numerical values used initiated during Preprocessing Portfolio Dataset

You can access code accompanied with this blog here.

Tired of reading so much? Don’t worry… now we will play with charts and graphs.

4) Data Visualization:

Visualization is immensely important in every aspect of digital technology. We want to know how data is organized, what are the outliers, and many more.

For this example we will discuss business-related questions:

  • What is the common offer for each age group ( teenagers, young-adults, adults, and elderly)?

The most common offer type among all age groups is the BOGO, followed by the Discount Offers. Whereas, the least common offer to be sent is the informational offers. I believe that BOGO offers are more attractive compared to other offers provided by Starbucks.

  • Based on the demographic data of the customers who get the highest income range, males or females?

Customers with High income (Above 90,000) are mostly female customers. Whereas, Average Income(30,000–60,000) customers are mostly males.

  • How many new members did Starbucks get each year?

The year 2017 was the best year for Starbucks in terms of the number of new members.

  • From all the offers the customers viewed, how many offers they completed? (Spoiler ALERT!!)

You got the answer to the question that you were looking for… Females seem to be convinced by the offers easier than males.

Let’s move forward to put this data(merged data) in Models…

5) Data Modeling:

Here it comes, the crucial part of ML pipeline where we have to choose models while tweaking hyperparameters and many things parallelly.

Before feeding data to model, we have to identify important features of our dataset; i.e. time, offer_id, amount, reward, difficulty, duration, offer_type, gender, age_group, income_range, member_type, and target variable will be an event. (offer completed or not)

In this approach, I have tried six models;
i.e Logistic Regression, Support vector machines, K nearest neighbors, Naive Bayes, Decision trees, Random Forest.

6) Results & Conclusion:

Results summary

We got some of these models, overfitted. But I chose KNN with ~98% accuracy.

Although, I believe in saying that “There is always room of improvement” I am planning to try this data with techniques such as GridSearchCV (sklearn pipeline) with XGBoost.

As there is always space for improvement, I believe that the idea of improving the data collection or fixing issues related to the missing data or NaNs would be really helpful. In my opinion, I think that we can get great insights from this data set and great prediction models might be also built to solve problems statements associated with this data set.

Thank you! for sparing your valuable time to read this blog, I hope you liked it. If there any queries respond below or contact me here on LinkedIn. (Reminder: Appaud this blog 😉)

I haven’t included any code here, but you can access it in this Github Repository.