This page is a quick reference for exploring data using Boolean indexing in Pandas. Remember that a boolean operation returns only True or False, and can only be performed against a series.
Using booleans in data is usually a two-step process: the first step is to evaluate the series and identify which cell is true and which is false relative to the given boolean. The second step is to filter the series based on the boolean result.
|True if both |
Filtering a dataframe
Consider the following simple dataframe, which samples four people associated with the number of times the person has seen Star Wars:
Let’s establish a boolean to evaluate which people have see the film fewer than 10 times?
What this does is evaluate every entry in the column ’number’, compare it with the boolean, and place the result in the series “evaluation.” Now we have evaluation, a series that evaluates the truthiness of the boolean
The second step is to use this true/false evaluation as a filter, creating a new series populated by values that are true.
Now we have a new series called deprived_persons that contains two rows. If we like, we can filter just one column:
Selecting specific data
Select all rows whose country value is either Brazil or Venezuela
Select the first five companies in the Technology sector for which the country is not the Germany