Your Chances To Respond To Starbucks offer.

Original article was published on Deep Learning on Medium

Your Chances To Respond To Starbucks offer.



The case we discuss here is a real-life marketing strategy study based on a simulated data set that mimics customer behavior on the Starbucks rewards mobile app.

The goal is to combine transaction, demographic and offer data to analyze which demographic groups respond best to which offer type; Also, we will build a supervised learning model(specifically, a classification problem) that predicts whether or not someone will respond to an offer.

Similar machine learning prediction problems are Finding Donors for CharityML & Boston House Price Prediction, which both used supervised learning models.

Part I. Data Exploration and Visualization

We will first have a glimpse at the 3 json data sets given:

  1. For portfolio data, since the values in column “channels” is in a list, and in the list are one or more values in “email”, “mobile”, “social” and “web”, hence we made it as dummy variables using the code below:

and we got:

Portfolio Raw Data

which clearly shows there are 10 choices of offers, and their difficulty(i.e. the threshold to get the reward), duration, offer id, offer type, reward, channel.

2. Next, we take a look at profile data:

we can see from the data set that the age, the date when the customer became a member(in yymmdd format), gender(M, F, O or NaN), person id, and income are listed.

It is a good way to visualize the age & income distribution of our customer base.

For Age, it is better to group the customers into a group for every 5 years interval using the code below:

then we got the bar chart:

Persons Number By Age groups

since those who did not enter their age are marked as 118 years old, we can see from the chart that the amount is over 2000 for this group. And apart from this, most of the customers fall in the age between 50 and 60 years old.

For Income, we drew a pie chart below:

Person Number by income group

it appears that most customers have an income between 50000 and 80000.

3. We then explore the transcript data set:

For “offer received” and “offer viewed”, the value only contains its offer id:

Transcript of Offers Received and offers Viewed

For event “offer completed”, offer id as well as the reward amount are shown:

Transcript of Offers Completed

In the transcript, transaction data is also included, with the amount attached:

Transaction Data

however, it does not show any offers associated with the transaction, therefore, we need to have data preprocessing procedure to solve this problem and facilitate our analysis in the following sections.

Part II. Data Preprocessing

  1. We used the code below to make it clear of the time, reward, and transaction amount(can be NaN) for each record in the transcript data set:
Code to Preprocess the Transcript


Processed Transcript

2. Based on the above data frame, we want to see more clearly on the offer information as well as the related customer’s information, hence we combined it with the portfolio and profile:

Preprocessed Transcript with the person and other information.

3. Note that it is tricky to figure out how many completed offers were from offers that the person viewed beforehand because some of the offers were completed first and then viewed afterward. The reason to do this is that even though the customers made the transaction and completed the offer, he/she may not be aware of the offer. In other words, his/her action may not be offer-oriented, hence it should not be counted as responded to the offer.

Here is the pseudo-code for separating viewed&completed vs. noviewed&completed offers:

Pseudo Code Separating Viewed&Completed vs. No viewed & Completed Offers

then we got the data set below showing the number of received offer as well as number of viewed&completed offer totally and for each offer type:

Separated Viewed & completed vs. No viewed & Completed offer Records

4. Similar to step 3, I attempted to separate “received and completed” vs. “completed before received” offers to see whether there are offers completed without being received. However, the result shows no such records, so I don’t put the process in this article.

5. Next, as discussed above, in the transcript transaction data, there are the only person, amounts, and time given, without telling us whether it is related to any offer. Hence, we need to differentiate total transaction, transaction-related to viewed&completed offers and transaction-related to noviewed&completed offers.

Here is the pseudo-code for such actions:

Pseudo Code for separating Viewed & completed vs. No viewed &Completed Transactions

and the result is as follows:

Separated Viewed&Completed vs. Noviewed&Completed Transaction Records

Part III. Customer Behavior Analysis

We first combined profile with data set generated in step 3&5 above, and got customer information&offer&transaction in one table:

Data Set Combing Customer Information&Offer&Transaction

For visualization purpose, we applied the processes below to make a new variable “responded” to be “T” when viewed&completed offer amount is non-zero, otherwise “F”; And made gender variable to be 0,1,2 in order to be shown to the axis:

  1. We then drew pair-plot for variables ‘age’,’gender’,’ income’,’became_member_on’,’view_complete_tran’,’ responded’ and fill any NaN with 0:
Pair-plot for multiple variable

From the above graph, we can clearly see that the response has positive relations with age, income, and became_member_on(concluded from the last row of the graph).

Also, we can see there are orange lines in the 3 plots(age, income, became_member_on) of the last row. Those customers might be those who newly registered the mobile app and have low income.

From the gender-gender plot, we can see that females tend to respond to offer compared with males and those input “Other” or did not input.

Furthermore, those who didn’t fully provide their personal information tend not to respond to the offer.

2. Next, we make an analysis based on the offer type:

Calculation Based on the Offer type

From above we can see that the complete rate is about 44%, and it appears that the discount offer is more preferable than BOGO and informational. Especially, for the informational offer, the complete rate is 0, which shows that this type of offer might be simply providing information and hardly inspires people to buy the products.

3. Further, we developed the heuristics based on offer channels:

Calculations Based on the offer channels

From above, it appears that the web might be the most efficient channel, which contributes to the highest completion rate.

Part IV. Data Modeling

In this part, we are going to build a machine learning model that predicts whether or not someone will respond to an offer.

  1. From the above analysis, we know that age, gender, income, membership date, offer type can affect whether a customer responds to an offer.

Since the offer record generated in Part II, step 3 is customer unique and only contains the amount of each type completed/received/viewed&completed offers, and each customer may receive more than one offer, we need to preprocess the data again such that we have train/test data sets without person id, only with age, income, gender, membership date, offer type, etc, as our “X”, and “responded” as our “y”.

Also, we note that membership date appears in the format of yymmdd, this should be either turned into ordinal variables or extract its year, month, date to capture its real sequence. Otherwise, if you directly use this yymmdd number, we will have strange results, e.g. For the pairs 20170831 vs. 20170901, and 20170901 vs. 20170902, they are both 1-day-distant pairs, however, the number difference is not the same. Here, we chose to use extracting year&month of membership date.

Furthermore, the “gender” values are in M, F, O, and NaN, while “responded” values are in BOGO, discount, and informational, which both are categorical values. However, most machine learning algorithm requires the input to be numeric. Hence, these two variables should be turned into dummy variables.

For a limited time, I just simply dropped all the NaNs and conducted the actions above(including train_test_split) and got the train/test data sets we want:

Preprocessed Training Data

Observing the labels, we also found that they are imbalanced, approximately 2:1 for Positive vs. Negative labels:

Value counts for Train/Test Labels

Hence, we should use an F1 score instead of accuracy, since the F1 score is a metric to balance recall&precision and to deal with imbalanced labels.

Recall the definition of F1 score:

Then we scaled the X_train & X_test to [0,1] and fit the training data into several classifiers(with its default mode) and got the F1 score(ranges from 0 to 1) for each model as shown below:

We find that in default mode, AdaBoost performed the best. Hence, we can conduct Gridsearch on it and find the best parameters:

GridSearch for AdaBoost

Saving the best parameters:

Best Model

2. Next, we made a prediction engine such that we take in customer info, offer type, etc, then we can transform all these information into a format which can be taken by the classifier like the test data, then predict whether the customer respond or not based on the raw profile&offer information, rather than simply seeing the test set performance without actually having any real-life usage.

Prediction Engine Testing

After building the prediction engine, we test it using a random sample of the raw profile, and chose one type of offer(‘BOGO’,’ discount’ or ‘informational’) and found it actually works. At first glance of the data above(since the function not only replies you “respond” or not, it also returns the customer info for you to check whether this response makes sense.), the first customer has a high income, and the gender is female(in the above analysis, high income and female tends to respond to the offer), while the second person is male with lower income without responding to the offer. Also, these responses may also result from the fact that the ‘discount’ offer is more preferable than ‘BOGO’


To conclude, in this project, we:

  • Cleanse the offer data such that we can separate the completed offer data into A.viewed&completed offer and B.noviewed&completed offer. The reason to do this is that even though the customers made the transaction and completed the offer, he/she may not be aware of the offer. In other words, his/her action may not be offer-oriented, hence should not be counted as responded to the offer;
  • Cleanse the offer data such that we can separate the completed offer data into A.received&completed offer and B.noreceived&completed offer to see whether there is anyone who completed the offer without actually receiving the offer. However, it appears that there is no record for noreceived&completed offer in our case/data;
  • Compare the transaction time and the offer completed time in order to determine whether the transaction is associated with a completed offer(more specifically, viewed&completed offer or noviewed&completed offer or other offers), because, in the raw data, there are only person, amounts and time given, without telling us whether it is related to any offer;
  • Next, we have demographic, offer type and channel analysis on the complete rate;
  • Finally, we build a machine learning model to enable the prediction of the customer’s response given a customer with age, gender, income, offer type, and other information.

Possible Enhancement in the Future:

  • For convenience, I drop all the NaNs; in the future, other skills can be used, such as using mean, median, etc; or just simply keep it as a value, since in the future, there will still be part of customers who won’t fill in their information;
  • This is a classification model; Alternatively, the regression model can be built to predict how much someone will spend(i.e. transaction) based on demographics and offer type;
  • A web app can be built such that when inputting the customer information, the prediction of response/transaction amount can be output.
  • For project click on the GitHub links given below