One year of Custom Detector

Source: Deep Learning on Medium

The 3 bets

One and half year ago, when we started brainstorming ways to increase the number of detectors on our platform, we came up with 3 options:

  1. Add additional built-in detectors that we train on large datasets (SpaceNet, XView, …)
  2. Allow users to sub-categorize the output of built-in detectors (e.g. detect houses and then classify each house into having a solar panel or not)
  3. Allow users to build their own Custom Detector: A Deep Learning model that is trained to detect whatever they are interested in

The first two options were things that we knew how to do: we had done it before and that’s the process we were used to as ML practitioners.

The third option seemed a bit crazy: we had no idea how to do it and we had no idea if it would actually work: most of the ML literature would make you think it wouldn’t.

But we thought that the bet was worth making and that it would be too bad to discard option 3 without even trying. We could also always fallback to one of the first two options if things went wrong. The Custom Detector was born.

What’s in a Custom Detector ?

A Custom Detector is a Machine Learning model that detect objects in images. To build such a model, you need:

  • An image
  • Some example of what you want to detect (training annotations)
  • An area in the image where all the examples are annotated (training area)

Our vision for the first version of the Custom Detector was really to automate something that the users would have done manually otherwise. For example, counting crops in a drone image of a farm field. We also wanted the Custom Detector to work with few examples — let’s say 10 annotations — and be an iterative process. So training a detector should take in the order of minutes and the UI should be quick & intuitive.

We started some tests with some popular cutting edge detection models, but we quickly figured out that:

  • Training them is more in the hours/days ballpark than minutes
  • They require a large amount of annotations

This is because these models are complex and were designed for complex datasets like MS COCO which are multi-class, multi-scale problems.

Our focus is on geospatial imagery and one of the defining trait of them is that they are taken from above and have a known spatial resolution (e.g. one pixel is 0.3cm by 0.3cm on the ground). This means that you don’t have objects that appear smaller because of their distance. You basically don’t have to deal with perspective.

This and the fact that we went for single-class Custom Detector allowed us to start with smaller models. Our first Custom Detector model took about 10 minutes to train on a K80 GPU.

Ship a simple model and then iterate

This initial version of the Custom Detector was honestly embarrassing. The engineering team coined the concept of ‘Maximally Embarrassing Product’. Although it did work on some use cases, it failed pretty badly on a lot of others. We also knew that this was a first version and had a long list of ideas to improve the model.

Yet, we shipped it and we learnt so much…

First, we learnt by looking at what kind of objects users were trying to detect. We had talked to people before and therefore had some ideas of what they would be interested in, but once we made the Custom Detector available to anybody, we got people trying to do completely unexpected use cases. This ranged from detecting wooden debris in rivers to cattle counting.

As embarrassing as it was, this early release of the Custom Detector allowed us to start aggregating the various datasets that users were building, which bootstrapped our model improvement efforts.

One of our first car detection experiment. As you can tell, we detect everything but cars…