Index Data in Pandas using Python

Original article was published by Luay Matalka on Artificial Intelligence on Medium

Boolean Array

Lastly, we can use an array of boolean values. However, this array of boolean values must have the same length as the axis we are using it on. For example, our ufo dataframe has a shape of (18241, 5) according to the shape attribute we used above, meaning it has 18241 rows and 5 columns. So if we want to use a boolean array to specify our rows, then it would need to have a length of 18241 elements. If we want to use a boolean array to specify our columns, it would need to have a length of 5 elements. The most common way of creating this boolean array is by using a conditional.

For example, let’s say we wanted to select only the rows that included Abilene as the city in which the ufo sightings took place. We can start with the following condition:

ufo.City == ‘Abilene’

Note how this returns a pandas series (or array like object) that has a length of 18241 and is made up of boolean values (True or False). This is the exact number of values we need to be able to use this boolean array to specify our rows using the loc method. Imagine you are overlaying this series of True and False values over the index of our ufo dataframe. Wherever there is a True boolean value in this series, that specific row will be selected and will show up in our dataframe. Here we can see that the index or label of 3 is True (in the 4th row), which means that the first row we will see once we use this array of boolean values with our loc method is the row with the label 3 (or 4th row in our ufo dataframe).

ufo.loc[ufo.City == ‘Abilene’, :]
ufo sightings in Abilene

And that is exactly what we see! We have specified the rows we want using an array of boolean values with a length equal to the number of rows in our original dataframe.

Remember, we can combine these different ways of specifying rows and columns, meaning we can use one way of indexing on the rows and a different way on the columns. For example:

ufo.loc[ufo.City == ‘Abilene’, ‘City’:’State’]

Note how we used a condition that returns an array of boolean values to specify the rows and the slice object using labels to specify the columns.