Source: Deep Learning on Medium
Cool plots with Seaborn barplot with hue and proportions
Plot side-by-side bar charts, comparing proportions, stratas of different populations. Follow us for beginner friendly and succinct, ready-to-use tutorials like this. Get premium interview and full course content at uniqtech.substack.com.
This article presume that you are familiar with basic data visualizations such as bar plot and scatter plot. This tutorial utilize the additional, third dimension: hue to create a side-by-side plot. This visualization display multiple vertical bars from two or more populations on the same plot.
On this StackOverflow page, you can see the classical Titantic dataset visualized by class and gender. Source here
First, this is the proportion or ratio data we will use in this tutorial.
There are two populations with pre-calculated proportions. A real world scenario can be the number of male vs female students in computer science classes in group 1, versus economics classes in group 2. With a little bit of mental math and comparison, the reader just need one extra step to figure out there’s disparity. We can make that obvious using Seaborn visualization.
Note the above chart is already a summary table. Often you will need to calculate your own proportions and summary table from thousands even millions of rows. Note if you use the countplot for this task, the proportion won’t turn out correctly unless you do some preprocessing. Students taking the intro to machine learning nanodegree at Udacity may encounter this task in their final customer segmentation project. This plot is very useful for customer segmentation. For pre-processing use Pandas analysis.
In this tutorial, we stay focused on the visualization part. Let’s translate this chart into a Pandas dataframe that will make plotting hue charts way easier.
We actually had to augment the table, making it slightly more complex and redundant for the seaborn barplot API with hue to work seamlessly. We prefer pre-calculating proportions because we know we can trust the number. Aggregation using programmatic language is great in theory but prone to mistakes. One can always use aggregate functions to double check the work though.
This redundancy allows us to use group as x-axis and gender as hue.
import pandas as pd
Convert table to Pandas Series and DataFrame
df = pd.DataFrame()
df['percent'] = pd.Series([0.64, 0.36, 0.49, 0.51])
df['gender'] = pd.Series(['M','F','M','F'])
Plot with Seaborn barplot with gender as hue
The first two dimensions of our data is the x and y axis. X is group and y is percentage in this case. Hue, the third dimension, is the gender.
Final result: Seaborn graph