What type of Kingfisher is this?

Original article was published by Sachin Kale on Deep Learning on Medium


Every-time when someone shares a Kingfisher photo in my social circle with identification, I wondered how they can identify the species given that the Kingfisher family has more than 100 species.

Computer Vision has greatly benefitted by Deep learning and Neural networks, it can now recognize objects in the images with high accuracy. I tried to find if machine learning can identify Kingfisher types. Some Preexisting machine learning model which are trained on the COCO image dataset (https://cocodataset.org/#home) can detect 4 types, but that’s all pre-trained machine learning models can offer.

I am not a Data Scientist but still wanted to take a stab at it and create a machine/deep learning model that will predict Kingfisher types.

To apply machine learning one does not needs to have a deep understanding of Data Science as there are tools and APIs that provide abstraction. Some level of understanding would be necessary if you want to customize or tinker the model or deviate from example code.

Why use AI/Deep Learning?

For those who are new to AI or Deep learning — doing Image recognition by programming would be dauntingly difficult as the recognition should be spatially invariant. No matter how the kingfisher is angled, tilted, or in different body positions, flying — sitting on a branch, catching prey, it should still classify the Kingfisher to correct type. If a kid is taught how a dog looks like, no matter which angle the kids see the dog, he/she recognizes it pretty accurately. We need to incorporate similar spatially invariant detection in the model hence we need AI / Deep learning to detect patterns that are layered to form a complete image.

For Image Classification, Convolutional Neural Network is used. For people who are new to Machine learning, an excellent video by Educator and YouTuber — 3Blue1Brown would give a gist — https://youtu.be/aircAruvnKk

DataSet:

The biggest problem in machine learning is not the actual code to implement it (thanks to Keras and other tools this is simpler) but is the Data set.

Lack of Dataset often results in poor model-accuracy. Finding or creating a good dataset is a challenge.

I sample searched images for a few Kingfisher types and realized there is inequality in the number of available searchable images for each type. Some Kingfishers are clicked more often than others, being more photogenic or more common occurrence leads to more clicks. This imbalance in data would bias the model towards predicting the largest set more often than others. It would cause “overfitting” of the largest available type. In Statistics, “Overfitting” of model means, model follows the training data too closely that it fails to identify if it sees new data.

To avoid this, I decided to limit the Kingfisher types instead of covering all 140 types. Also, the reduced number of classes is dictated by my ability to find sample images for training.

In the search for human-curated training images, I stumbled upon http://orientalbirdimages.org

This site has an excellent collection of Kingfisher types. I decided to scrape this site for types that had at least more than 30 images, I found 17 types that were having more than 30 images. COCO Imageset provided an additional 2 which brings up to a total of 19 types.

These are the type of Kingfisher that the model will be able to recognize:

{
"0": "Banded-Kingfisher",
"1": "Black-capped-Kingfisher",
"2": "Blue-eared-Kingfisher",
"3": "Brown-winged-Kingfisher",
"4": "Collared-Kingfisher",
"5": "Common-Kingfisher",
"6": "Green-Kingfisher",
"7": "Green-backed-Kingfisher",
"8": "Javan-Kingfisher",
"9": "Lilac-Kingfisher",
"10": "Oriental-Dwarf-Kingfisher",
"11": "Pied-Kingfisher",
"12": "Ringed-Kingfisher",
"13": "Ruddy-Kingfisher",
"14": "Rufous-collared-Kingfisher",
"15": "Silvery-Kingfisher",
"16": "Spotted-Kingfisher",
"17": "Stork-billed-Kingfisher",
"18": "White-breasted-Kingfisher"
}

The Image count of 30 is still small for training and validation purposes, hence I decided to get additional images from Google Image search to ensure that I have at least 40 images of each type.

For this, I found a good script to scrape the images from Google search –

After gathering the images, one relevant observation is frequently noticeable — most images clicked are from Photographers (rookie/expert) having decent gears, and most photographers tend to follow the Rule of Thirds (https://en.wikipedia.org/wiki/Rule_of_thirds) to make the picture more atheistically pleasing. This results in adding more pixels apart from the subject.

Photo by Vincent van Zalinge on Unsplash

In the above image, the number of pixels dedicated to the bird is a small percentage of the entire image. Furthermore, images need to be of reduced size to make training time smaller. Typically target image size is around 224 x 224 pixels, if we reduce the above image to 224 x 224 pixels, a lot of data related to bird pixel would be averaged and bird details would be lost. This results in unnecessary delays in the training process as the Model will take longer to converge towards the desired accuracy.

To reduce the image size and also preserve bird details, we need to extract the bird out of the image. To do so, I employed YOLO. YOLO is a Convolutional Neural Network too with a fancy name “Darknet”, it is pre-trained to identify objects pretty fast. YOLO allows us to add a bounding box to an image like this.

All we have to do is use this bounding box and extract the image and save it as a new file.

Here is how it can be done —

https://gist.github.com/sachinkale/57d6ababc8d9063646961af9659ac62e

Once the bird pixels were extracted from the image, we are now ready for creating and training our model!

One more thing: We have less number of images per type (50), to increase this we need to apply image augmentation, which means we will take an image and make more variants out of it by rotating, shearing, inverting the same image. This will increase our train data set count. We will incorporate this using ImageDataGenerator in our model.

Model:

For Model, there are two options, one can scribble the model like the one I did below

OR use a pre-trained model as base and “transfer learning” from them to your specialized layers. The advantage of a hybrid model over scribbling your own is you get weights that are adjusted to identify a bird in the base model, this saves a lot of training cycles and also improves validation accuracy.

I used VGG16 as a base model as it had fewer layers (reduced training time) and it outperforms many other models.

Here is the base model with added layers :

Note that we need to save the model and class-map ID into a file for future use.

Training Results:

Training on around 1000 images of 19 categories of Kingfisher, the model took roughly 2 hours for 30 epochs on my laptop.

Validation Accuracy (how correctly the model can classify unseen Kingfisher images) of 90% was achieved.

Time for Prediction:

Finally, the model is ready to be used for prediction.

The model needs to be loaded along with the class-map definition –

To find images that the model has never seen, I borrowed Kingfisher images from a friend and wildlife photographer – Sushil Khekare (https://www.instagram.com/skhekare/?hl=en ).

Results

Comparing Model performance with existing solutions :

Google Cloud Vision API: Google Cloud vision API was able to identify it as “Coraciiformes” Biological name of the Kingfisher Family.

Merlin Bird ID App:

Merlin Bird ID App was close, it identified the image as Black-backed Dwarf Kingfisher but not as Oriental Dwarf Kingfisher. Merlin’s Model may not have been trained with Oriental Dwarf Kingfisher Images or this may be one of the 10% failures as its accuracy is also around 90% (https://news.cornell.edu/stories/2016/12/merlin-bird-photo-id-mobile-app-launches).

Conclusion

Convolutional Neural Networks are game changes in computer vision identification and it amazes me to contemplate how it can predict classes pretty accurately.

If we feed training images for all 140 classes to this model, it can still classify spatially invariant Kingfishers or can be extended to more bird types. The only limiting factor is the availability of human-curated training images.

It will be difficult to find a single human being who can identify all the 140 Kingfisher types. It is incredible how few lines of code (Thanks to underlying APIs) can achieve this feat. We have come this far!

Next Steps

Although the Validation Accuracy is 90%, the model will fail 10% of the time so there is a scope of improvement. Improvement can be made by using more samples of Images and adding techniques like Dropouts to further tune the model.

If someone is interested in furthering the validation accuracy or adding more classes, here is the model — https://www.dropbox.com/s/zbqge8umwd20pal/vggK5-weights-best.h5?dl=0

If some wants to use it only for identification:

Install Python3Install Tensorflow using pip “pip3 install tensorflow”Download model file:https://www.dropbox.com/s/zbqge8umwd20pal/vggK5-weights-best.h5?dl=0Download prediction code: https://github.com/sachinkale/kingfisherdetection/blob/master/model/vggpredict.pyReplace Image file with your Kingfisher Image and run
python3 vggpredict.py

Code Repo link: https://github.com/sachinkale/kingfisherdetection.git

References and Credits

Photo by Dennis Buchner on Unsplash

Photo by Boris Smokrovic on Unsplash

Photo by Peter Scholten on Unsplash

Photo by Chris Charles on Unsplash

Transfer Learning : https://www.tensorflow.org/tutorials/images/transfer_learning