Original article was published on Artificial Intelligence on Medium
How to Create Interactive Visualizations with Plotly Express
A practical guide with many examples
In this post, we will go through many examples while incresing the level of complexity step-by-step. We will explore the effect of each feature/structure added to the visualizations.
If you don’t have plotly.py installed in your working environment, you can install using pip or conda:
$ pip install plotly==4.8.0$ conda install -c plotly plotly=4.8.0
Let’s start by importing plotly express:
import plotly.express as px
For the examples, we will use two different datasets. One is the “telco customer churn” dataset available on kaggle. The other one is gapminder dataset which is available in plotly library. These built-in datasets of plotly come in handy for practicing.
Churn prediction is a common use case in machine learning domain. If you are not familiar with the term, churn means “leaving the company”. It is very critical for a business to have an idea about why and when customers are likely to churn. Having a robust and accurate churn prediction model helps businesses to take actions to prevent customers from leaving the company. We will try to explore the dataset and have an understanding about the underlying structure of the dataset. The original dataset contains 20 features (independent variables) and 1 target (dependent) variable for 7043 customers. We will only use 7 features and the target variable in this post.
churn = pd.read_csv("Telco-Customer-Churn.csv")churn = churn[['gender', 'Partner', 'tenure', 'PhoneService', 'InternetService', 'Contract', 'MonthlyCharges','Churn']]
We start with a basic box plot to check the distribution of monthly charges according to contract types:
fig = px.box(churn, x="Contract", y="MonthlyCharges")fig.show()
The taller the boxplot, the more spread out the values are. This plot tells us the range of monthly charges is bigger for long term contracts. We can see the critical values of a box plot by hovering on the visualizations which are min, first quartile, median, third quartile, and max values.
We can use different colors for different groups with color parameter and also add an additional variable for comparison facet_col parameter.
fig = px.box(churn, x="Contract", y="MonthlyCharges",
It seems like having a partner does not change the contract type dramatically.
The scatter plots are also commonly used to understand the relationship among variables. For clear demonstration, I will take the first 200 rows of the dataset.
churn_filtered = churn.iloc[:200,:]
We can check the relationship between tenure and monthly charges and how this relationship changes according to contract type and having a partner. Tenure variable is the number of months that a customer has been a customer.
fig = px.scatter(churn_filtered,
facet_col creates subplots based on the specified variable. Facet_col_wrap parameters adjusts the arrangement of subplots.
What this plot tells us is that customers without partners tend to have month-to-month contracts. Also, customers with partners are staying for a longer period (high tenure) with the company. This is a subset of the original dataset but, according to these 200 rows, company sells more month-to-month contracts than one-year or two-year contracts. Each point in the plot represents a customer and we can see the data by hovering on the point.
We can also confirm our intuition by checking the averages with groupby function:
For each contract type, tenure is higher for customers with a partner. Also, the number of customers without a partner are more in month-to-month contract segment.
Let’s try to see the churn rate with respect to monthly charges, contract type, and tenure. We also add a title to the plot:
fig = px.scatter(churn_filtered, x="tenure", y="MonthlyCharges",
title= "Churn Rate Analysis")fig.show()
As we see on the plot above, it is highly unlikely that a customer with long term contract will churn (i.e. leave the company). If the company wants to stick with its customers, the priority should be signing long term contracts.
We can also add an indication of the distributions to the scatter plots using marginal_x and marginal_y parameters. Let’s plot the entire dataset this time and check if our sample with 200 rows is actually a good representation of the whole:
fig = px.scatter(churn,
Let’s first evaluate the x-axis. For tenures less than 10 months, red points (churn=yes) dominates. As tenure keeps increasing, blue points (churn=no) are becoming the dominant class. We also see that on the histogram above the scatter plot. It shows the how the distribution of red and blue point change according to the position on x-axis. Most of the customers who churned have a tenure of less than 10 months.
The y-axis indicates monthly charges. The density of red points in the scatter plot increases as we go up in the y-axis (i.e. increasing monthly charges). This can also be seen on the rug plot on the right side of scatter plot. The density of horizontal lines are more dense in the upper part. The density of blue points are more uniform compared to red points with the exception of the bottom part.