Original article was published on Deep Learning on Medium
Wheat head detection using FasterRCNN in PyTorch
Object detection is a task in computer vision where instead of classifying an image, we try to classify the objects present in the image. It has various applications like detecting empty parking space, tracking objects in CCTV footage, and monitoring crops, which is the one we’ll be looking at.
There are many ways to perform object detection; Fasterrcnn is just one of them. Many of the other techniques, like YOLO and SSD, work equally well. The reason you should learn about Fasterrcnn is that it has given state-of-the-art results in many competitions and is used in real applications like the Pinterest app.
This article gives a brief overview of Fasterrcnn, the concepts related to it and its implemetation in PyTorch.
The architecture of Fasterrcnn looks as follows:
The network starts by extracting features from an image. This part of the architecture is called the backbone of the network. Any standard image classification architecture can be used. We will be using ResNet50 with Feature Pyramid Network as our backbone.
In the second step, the Region Proposal Network is used to generate regions that potentially contain an object. This network is a real boost over past approaches like sliding a window over the full image to find potential regions, which, as you can imagine, is a really slow and computationally expensive process.
Another improvement over past approaches in this network is the use of anchor boxes to detect images at various scales and aspect ratios.
Finally, as you can see from the architecture, the same feature map is used for detecting regions and classification. This idea of reusing feature maps is super smart, and you will see lots of architectures doing this. Take the feature map and attach multiple heads to it for multiple tasks.
Let’s now implement a Fasterrcnn in PyTorch and understand some more terms along the way.
Implementing Fasterrcnn in PyTorch
Kaggle recently hosted a competition for identifying wheat heads. I’ve used the data from this competition to train a model.
The data is available in the form of a csv and its corresponding images. We have the bottom left coordinate and also the width and height of each box.