Shape Tables Like Jelly With Pandas `melt()` and `pivot()`

Original article was published by Bex T. on Artificial Intelligence on Medium


Pandas melt()

Let’s start with a very stupid example. I will create a 1×1 dataframe that holds a city name and a temperature for a single day. Then, I will call melt() on it to see what effect it has:

>>> df.melt()

So, without any parameters melt() takes a column and turns it into a row with two new columns (excluding the index). Let’s add two more cities as columns:

If you notice, this type of format for dataframes are not easy to work with and it is not clean. What would be ideal is to take the columns and turn them into rows with their temperature values on the right side:

df.melt()

Let’s add more temperatures for the cities:

What do you think will happen if we call melt() on this version of the dataframe? Watch:

df_larger.melt()

Just like expected, it converts each column value into a row. For example, let’s take a key-value pair. New York’s temperatures are [25, 27, 23, 25, 29]. This means there are 5 key-value pairs and when we use melt(), pandas takes each of those pairs and displays them as a single row with two columns. After pandas is done with New York, it moves on to other columns.

When melt() displays each key-value pair in two columns, it gives the columns default names which are variable and value. It is possible to change them to something that makes more sense:

var_name and value_name can be used to change the labels of the melted dataframe’s columns.

If we keep adding columns, melt() will always convert each value into a row with two columns that contain the previous column’s name and its value.

Now, let’s get a little serious. Say we have this dataframe:

This time, we already have the cities as a column. But still, this type of format for tables are not useful to work with. This dataset holds temperature information for 5 cities for 5 days. We can’t even perform simple computations like mean on this type of data. Let’s try melting the dataframe:

>>> temperatures.melt()

This is not what we want, melt() turned the city names into rows too. What would be ideal is if we kept the cities as columns and append the remaining columns as rows. melt() has a parameter called id_vars to do just that.

If we want to turn only some of the columns into rows, pass the columns to keep as a list (even if it is a single value) to id_vars . id_vars stands for identity variables.

temperatures.melt(id_vars=['city'])

After using id_vars, the city column stayed as a column. But it has become longer. The reason is that for each city there were 5 days of observations. When we take those observations from columns and display them as rows, pandas automatically adds new rows to fit the new values.

Even though we have the table in better shape, the column names are not exactly what we want. Instead of changing them manually after melting the table, we can directly do it with melt():

The same dataframe with different column labels.