Marketing Email Campaign

Source: Deep Learning on Medium

Marketing Email Campaign

Recently, I went through a “ Machine learning challenge” for a big UAE company.

Emails are a natural way marketing teams are using to target customers. You receive so many emails each day, sometimes too much that you can’t open neither read them. So as a marketing team how to tailor email campaigns to strike a balance between sending the right message to the interested person.

In this article I will share how marketing can address the right marketing email to the right person using machine learning.

The data is confidential this is why I will share only the descrition of the data and how I adressed this with a simple deeplearning model and a bonus in the end 🙂

email_table — info about each email that was sent


email_text ( short and long test)






email_opened_table — the id of the emails that were opened at least once.


link_clicked_table — the id of the emails whose link inside was clicked at least once.


The “challenge” description:

The marketing team of an e-commerce site has launched an email campaign. This site has email addresses from all the users who created an account in the past.

They have chosen a random sample of users and emailed them. The email lets the user know about a new feature implemented on the site. From the marketing team perspective, success is if the user clicks on the link inside of the email. This link takes the user to the company site. You are in charge of figuring out how the email campaign performed and were asked the following questions:

• What percentage of users opened the email and what percentage clicked on the link

within the email?

• The VP of marketing thinks that it is stupid to send emails in a random way. Based on all the information you have about the emails that were sent, can you build a model to optimize in future how to send emails to maximize the probability of users

clicking on the link inside the email?

  • By how much do you think your model would improve click through rate (defined as # of users who click on the link/total users who receive the email). How would you test that?
  • Did you find any interesting pattern on how the email campaign performed for different segments of users? Explain.
  • By how much do you think your model would improve click through rate (defined as # of users who click on the link/total users who receive the email). How would you test that?
  • Did you find any interesting pattern on how the email campaign performed for different segments of users? Explain.

Let’s start NOWW:

Import the right librairies

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
from sklearn.metrics import accuracy_score, precision_recall_fscore_support
import warnings

After calling data, let’s explore Data length and what pourcentage of users opened the email and what percentage clicked on the link within the email

"emails {}, opened {}, clicked {}, email opened {:.2f}, link clicked {:.2f}".format(
100 * email_opened.shape[0]/emails.shape[0],
100 * link_clicked.shape[0]/emails.shape[0]

data exploration: how email campain performed for different segments of users **

click_vals = link_clicked['email_id'].values
def clicks_encoder(x):
if x in click_vals:
return 1
return 0
emails['click'] = emails['email_id'].apply(lambda x:clicks_encoder(x))

> Insights on how email campain performed for different segments of users:

* The analysis are not encouraging as only 10% of the sent emails are opened and only 2% are clicked on the link inside the email
* Globally the non-clicks are very high compared to the clicks 10% vs 2%: This proves that random sampling is not a good idea as we need to target interested customers

From data analysis we can see that:

* Short personalized email attracted more clicks
* Highers clicks are gained from 5am to 1PM
* Monday, Tuesday, Thursday Wednesday seem to be favorable days to clicks
* US customers are more interested to the emails → more clicks
* The Next section you can find conversation rates : they are confirming these insights

**Building a sequentiel dense model to predict emails with high chance to clicked on the link **

#adding new features to the email table
emails_joined = emails.set_index('email_id').join(
emails_joined = emails_joined.fillna(0)

### *Label encoding*

#this is the helper file
#the helper is attached if MAF team is used to call helpers
from sklearn.preprocessing import LabelEncoder
def encode_label(values):
encoder = LabelEncoder()
return encoder.transform(values), encoder#coding features as one can only feed numbers to the neural netsemails_joined["email_text"], email_text_encoder = encode_label(emails_joined["email_text"].values)emails_joined["email_version"], email_version_encoder = encode_label(emails_joined["email_version"].values)emails_joined["weekday"], weekday_encoder = encode_label(emails_joined["weekday"].values)emails_joined["user_country"], user_country_encoder = encode_label(emails_joined["user_country"].values)

Train and test spilt

from sklearn.model_selection import train_test_splitx_cols = ["email_text", "email_version", "weekday", "user_country", "hour"]
y_cols = ["opened", "clicked"]
x_data = emails_joined[x_cols]
y_data = emails_joined[y_cols]
X_train, X_test, y_train, y_test = train_test_split(x_data, y_data, test_size=0.10, random_state=42)
X_train.shape, X_test.shape, y_train.shape, y_test.shape
As you see the data is not that big

### Data scaling

#data normalizing method
from sklearn.preprocessing import StandardScaler
ss = StandardScaler()
X_train_standard = ss.transform(X_train)

*### Model Implementation*

from keras.models import Sequential
from keras.layers import Dense
model = Sequential()
model.add(Dense(12, input_dim=5, activation='relu'))
model.add(Dense(12, activation='relu'))
model.add(Dense(2, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])model.summary(), y_train, epochs=10, batch_size=32)


X_test_standard = ss.transform(X_test)
model.evaluate(X_test_standard, y_test)

**This model will check whether an email will get how much response. You can send to the email with highest response, in fact for new emails we can collect features then we can feed it to the model after that we will get the result ** The accuracy of the validation is 0.936


#testing on one instance by calling the trained modeldef get_label(confidence):
return "yes" if confidence >= 0.5 else "no"
result = model.predict(X_test_standard[:1])predicted_clicked = []
actual_clicked = []
predicted_opened = []
actual_opened = []
for y_hat, y_true in zip(result, y_test[:1].values):
for y_h, y_t, y_col in zip(y_hat, y_true, y_cols):
if y_col == "clicked":
result = pd.DataFrame()
result["actual_clicked"] = actual_clicked
result["actual_opened"] = actual_opened
result["predicted_clicked"] = predicted_clicked
result["predicted_opened"] = predicted_opened

**Comparison with a Random Forest Classifier**

from sklearn.ensemble import RandomForestClassifier
rf = RandomForestClassifier(), y_train)
y_pred = rf.predict(X_test_standard)
print("Accuracy Metric")
print(accuracy_score(y_test, y_pred))

Dense model performed better in predicting clicks and can be trusted in optimizing how to send emails.

Thank you for encouraging this work. I hope this will help the community !!!