Is that a warbler? Bird classification with Keras CNN in Python

Source: Deep Learning on Medium

Is that a warbler? Bird classification with Keras CNN in Python

Ever wondered ‘What is that bird?’

I constantly wondered ‘What is that bird?’ when I walked my dog along a park in Boston that was filled with birds at all times of the year: baby ducks during the summer, migratory songbirds in the fall/spring, and waterfowl in the winter. My grandpa (a long-time bird watcher) sent me The Sibley Field Guide to Birds and that sparked a hobby for me. Before you discount this as an old person hobby, I highly recommend going bird watching, especially if you have a camera.

Since most small birds do not sit still long enough for you to flip through 400 pages of the field guide and compare 20+ markings, I started taking pictures of birds in hopes I would have a clear enough picture for identification later. I then discovered this website called eBird that allows you to keep track of which bird species you have seen and where. You can even upload a photo as proof. For those nerds out that love Pokemon, it is just like that but with real live birds!


Occasionally, I upload a photo of the wrong bird, but luckily there are eBird volunteers who monitor the bird photos and email you (kindly) saying you flagged the wrong species. Don’t do this too often though because then they will lock your account (oops!). Usually, these volunteers will also tell you the correct species. This is a lot of work for those volunteers!

Not a Savannah Sparrow

As a data scientist, I was thinking: what if we could automatically check each bird photo that is uploaded with deep learning? As a proof of principle for a weekend project, I created this predictive model to detect if the bird image is a warbler (my grandpa’s favorite category of birds).

Project Definition

Given an image of a bird, predict if it is a warbler (see below for warbler species tags)

Data Set

The data set in this project comes from Caltech-UCSD Birds-200–2011 ( This data set has 200 bird species across 11,788 images. Since the number of images for any one species is quite small, I decided as a proof of principle to group all the warbler images.

Set up: load metadata

Let’s start by aggregating all the metadata provided by Caltech-UCSD:

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
#path to dataset
data_path = '../CUB_200_2011/CUB_200_2011/'
# aggregate datasets
df_images = pd.read_csv(data_path+'images.txt',
sep = ' ',header = None,
names = ['img_num','img'])
df_labels = pd.read_csv(data_path+'image_class_labels.txt',
sep = ' ',header = None,
names = ['img_num','class_id'])
df_classes = pd.read_csv(data_path+'classes.txt',
sep = ' ', header = None,
names = ['class_id','bird_class'])
df_split = pd.read_csv(data_path +'train_test_split.txt',
sep = ' ', header = None,
names = ['img_num','dataset'])
df = pd.merge(df_images, df_labels, on = 'img_num', how = 'inner')
df = pd.merge(df, df_classes, on = 'class_id',how = 'inner')
df = pd.merge(df, df_split, on = 'img_num',how = 'inner')

The data has the image number, the img name (with path), the id of the species as well as the text of the species as shown below. Since we will do our own train/test split, we will ignore the split provided by this team which is the last column.

Make Warbler Output Label

I went through the list of species and extracted all the warblers in a list:

warbler_class = ['020.Yellow_breasted_Chat','158.Bay_breasted_Warbler',
'159.Black_and_white_Warbler', '160.Black_throated_Blue_Warbler',
'161.Blue_winged_Warbler', '162.Canada_Warbler',
'163.Cape_May_Warbler', '164.Cerulean_Warbler',
'165.Chestnut_sided_Warbler', '166.Golden_winged_Warbler',
'167.Hooded_Warbler', '168.Kentucky_Warbler',
'169.Magnolia_Warbler', '170.Mourning_Warbler',
'171.Myrtle_Warbler', '172.Nashville_Warbler',
'173.Orange_crowned_Warbler', '174.Palm_Warbler',
'175.Pine_Warbler', '176.Prairie_Warbler',
'177.Prothonotary_Warbler', '178.Swainson_Warbler',
'179.Tennessee_Warbler', '180.Wilson_Warbler',
'181.Worm_eating_Warbler', '182.Yellow_Warbler',
'183.Northern_Waterthrush', '184.Louisiana_Waterthrush', '200.Common_Yellowthroat']

This allows us to make a binary output label:

df['OUTPUT_LABEL'] = (df.bird_class.isin(warbler_class)).astype('int')

Split data into train and validation

We can split our data into 70% train and 30% validation.

df = df.sample(n = len(df), random_state = 42)
df_train_all = df.sample(frac = 0.7, random_state = 42)
df_valid = df.drop(df_train_all.index)

And check the prevalence is about the same in both groups:

def calc_prevalence(y):
return sum(y)/ len(y)
print('train all %.3f'%calc_prevalence(df_train_all.OUTPUT_LABEL))
print('valid %.3f'%calc_prevalence(df_valid.OUTPUT_LABEL))

Which is approximately 15% in each

Image Augmentation

At this point, we could just train a deep learning model, but the model may end up just dumbly always predicting NOT A WARBLER due to the imbalance. I tried it and it happened to me.

To counter this imbalance, we need to either get or make more warbler images or sub-sample the not-warbler images. For this project, I’m going to use data augmentation (rotate, zoom, crop, flip, etc) to increase the number of warbler images. For a great review on data augmentation see this butterfly detector project.

Let’s grab all the warbler images from our dataframe:

warbler_imgs = df_train_all.loc[df_train_all.OUTPUT_LABEL == 1,’img’].values

We can then use Keras’ ImageDataGenerator to make new augmented images. To keep things simple, I’m just going to save these new images in an augmented warblers folder. In addition, it probably would be a good idea to also add augmentation to the non-warbler images so that the DL model doesn’t learn that ‘augmentation’ is warbler, but I’ll skip this for now. I have also seen other articles that do this augmentation on the fly during training, but I’ll skip this for now too.

We can then aggregated the augmented images:

from os import listdir
warbler_aug_files = ['aug_warblers/'+ a for a in listdir(data_path+'images/aug_warblers/') if a.endswith('.jpg')]df_aug = pd.DataFrame({'img':warbler_aug_files, 'OUTPUT_LABEL': [1]*len(warbler_aug_files) })

And concatenate with our existing training

df_c = pd.concat([df_train_all[['img','OUTPUT_LABEL']],df_aug],
axis = 0, ignore_index = True, sort = False)

Just to be safe, let’s balance the data with a 1:1 ratio between warbler and non-warbler:

rows_pos = df_c.OUTPUT_LABEL == 1
df_pos = df_c.loc[rows_pos]
df_neg = df_c.loc[~rows_pos]
n= min([len(df_pos), len(df_neg)])
df_train = pd.concat([df_pos.sample(n = n,random_state = 42),
df_neg.sample(n = n, random_state = 42)],
axis = 0)
df_train = df_train.sample(frac = 1, random_state = 42)

Build X and Y

We can now build our X and Y for machine learning. In order to do this, let’s make a function for loading all the images given a dataframe that

  • resizes each image to 224×224
  • converts to RGB (3 channels)
  • normalizes from 0 to 1 (i.e. divide by 255)
IMG_SIZE = 224
def load_imgs(df):
imgs = np.ndarray(shape = (len(df), IMG_SIZE, IMG_SIZE,3), dtype = np.float32)
for ii in range(len(df)):
file = df.img.values[ii]
img = load_img(data_path+'images/'+file, target_size=(IMG_SIZE, IMG_SIZE),color_mode='rgb')
img = img_to_array(img)/255
imgs[ii] = img
return imgs

We can make our X and Y with

X_train = load_imgs(df_train)
X_valid = load_imgs(df_valid)
y_train = df_train.OUTPUT_LABEL.values
y_valid = df_valid.OUTPUT_LABEL.values

The X matrix here isn’t exactly what we want for Keras so let’s resize with

# reshape
X_train = X_train.reshape(X_train.shape[0], IMG_SIZE,IMG_SIZE, 3)
X_valid = X_valid.reshape(X_valid.shape[0], IMG_SIZE,IMG_SIZE, 3)

My final X_train has a shape (14104, 224, 224, 3) which means we have 14104 images that are 224 x 224 with 3 colors.

We can plot one of the images with:

ii = 3
Image of augmented yellow-rumped warbler

CNN Machine Learning Model

For simplicity, let’s create an architecture that has two CNN layers with dropout, a dense layer, and a final sigmoid for this binary classifier. Other more complicated architectures could be tried later.

from keras.models import Sequential
from keras.layers import Conv2D, MaxPool2D, Dense, Flatten, Dropout
model = Sequential()
model.add(Conv2D(filters = 64, kernel_size = (5,5),
activation = 'relu',
input_shape = X_train.shape[1:]))
model.add(MaxPool2D(pool_size = (3,3)))
model.add(Dropout(rate = 0.25))
model.add(Conv2D(filters = 64, kernel_size = (3,3),
activation = 'relu'))
model.add(MaxPool2D(pool_size = (3,3)))
model.add(Dropout(rate = 0.25))
model.add(Dense(64, activation = 'relu'))
model.add(Dropout(rate = 0.25))
model.add(Dense(1, activation = 'sigmoid'))

We will compile our model with Adam and a binary cross entropy loss (i.e. log-loss for 2 class).

loss = 'binary_crossentropy',
optimizer = 'adam',
metrics = ['accuracy'])

You can train your classifier with (for speed I just did 2 epochs at this time with a batch size of 64):, y_train, batch_size = 64, epochs= 2, verbose = 1)

Predictions and Model Performance

We can calculate predictions for both training and validation as:

y_train_preds = model.predict_proba(X_train,verbose = 1)
y_valid_preds = model.predict_proba(X_valid,verbose = 1)

I’m going to save the validation predictions in the df_valid for further analysis

df_valid['pred'] = y_valid_preds

We can look at the warbler species that we did the best on (highest average score) with

df_valid.loc[(df_valid.OUTPUT_LABEL == 1) ].groupby('bird_class').pred.mean().sort_values(ascending = False)

From looking at a few pictures, it seems the model does better on the warblers with yellow than the warblers without yellow in their colors.

We can also look at the species that the model tends to think are warblers but are not:

which makes sense since Goldfinches are very yellow!

We can calculate the performance across a range of metrics (for tutorial on classification metrics see my post here)

from sklearn.metrics import roc_auc_score, accuracy_score, \
precision_score, recall_score
def calc_specificity(y_actual, y_pred, thresh):
# calculates specificity
return sum((y_pred < thresh) & (y_actual == 0)) /sum(y_actual ==0)
def print_report(y_actual, y_pred, thresh):

auc = roc_auc_score(y_actual, y_pred)
accuracy = accuracy_score(y_actual, (y_pred > thresh))
recall = recall_score(y_actual, (y_pred > thresh))
precision = precision_score(y_actual, (y_pred > thresh))
specificity = calc_specificity(y_actual, y_pred, thresh)
print('pred pos:%.3f'%(sum(y_pred > thresh)/len(y_actual)))
print(' ')
return auc, accuracy, recall, precision, specificity

Since we balanced the data, let’s set a threshold of 0.50 to label as predicted Warbler:

thresh = 0.5
print_report(y_train, y_train_preds[:,0], thresh);
print_report(y_valid, y_valid_preds[:,0], thresh);

We can plot the ROC curve with:

from sklearn.metrics import roc_curve, roc_auc_scorefpr_train, tpr_train, t_train = roc_curve(y_train, y_train_preds[:,0])
auc_train = roc_auc_score(y_train, y_train_preds[:,0])
fpr_valid, tpr_valid, t_valid = roc_curve(y_valid, y_valid_preds[:,0])
auc_valid = roc_auc_score(y_valid, y_valid_preds[:,0])
plt.plot(fpr_train, tpr_train, 'r-', label = 'Train AUC:%.3f'%auc_train)
plt.plot(fpr_valid, tpr_valid, 'b-', label = 'Valid AUC:%.3f'%auc_valid)
plt.plot([0,1],[0,1], 'k--')

As we can see the AUC with this simple model is quite high on the validation. That gives me great hope of building a classifier to help me label bird images.

Let’s test it on a few of my own images:

file = 'magnolia2.png'
x = load_img(file, target_size=(IMG_SIZE, IMG_SIZE),color_mode='rgb')
x= img_to_array(x)/255
x=x.reshape(1,IMG_SIZE,IMG_SIZE, 3)
print('prob it is warbler:%.3f'%model.predict_proba(x,verbose = 1)[0][0])

I can correctly classify the magnolia warbler as a warbler

And classify the Surf Scoter as not a warbler


In this post, we built a simple CNN model to predict if a bird picture is a warbler!