Recent trends in the application of Deep Learning for Object Detection

Original article was published by Kiran Narayanan on Artificial Intelligence on Medium


Applications of Deep Learning for real-time Object Detection

The global computer vision market was valued at $27.3 Billion in 2019 with a CAGR of 19% from 2020 to 2027 [1]. One of the core computer vision tasks in object detection, which has a broad range of industrial applications. Examples include detection cancers from radiology-based images, detection of manufacturing defects in modern factories, surveillance and public safety, traffic safety (detection of seat belts, parking in restricted areas) and retail (signage visibility, stock level analysis.

What is object detection?

There are four types of visual recognition tasks in computer vision. First, image classification, which is the assignment of labels to images, for example, labelling cows in a picture of farm as cows. Second, object detection, which is to not only label the cow but also to locate the cow using a bounding box. Third, semantic segmentation, which predicts the labels for each pixel of an image without differentiation between objects with the same label. Fourth, instance segmentation, which involves labeling as well as segmentation.

Different types of computer vision tasks

What is Deep Learning?

Deep learning is a subset of machine learning, that can process data from a very wide variety of sources. Compared to traditional machine learning, it requires lesser data preprocessing by humans and can often produce more accurate predictions from the data. In deep learning, interconnected layers of software-based calculators known as neurons, form a neural network. There are layers of such neurons, hence the word “deep” neural network. The network ingests data and processes them through each layer of the neural network, which each layer learning increasingly complex features of the data. Once a deep neural network has learned how to make determinations from input data correctly, it can then use what it has learned to make determinations about new data. For example, once it learns what an object looks like, it can recognize the object in a new image. In other words, a deep neural network that has learned how to recognize cows, can quick detect cows in new images.

How a “Neural network” a.k.a “AI model” works: The network processes signals by sending them through a network of nodes analogous to neurons. Signal passes from one to another along links. “Learning” improves the outcome by adjusting the weights that amplify or damp the signals each link carries. Nodes are typically arranged in a series of layers, in other words, a “deep” neural network. Image from Waldropp Mitchell, PNAS, 2019, 116(4)

Technical detail: How does deep learning for object detection work?

Sequence of tasks involved in object detection
Use of a deep neural network for object detection

Recent trends in applications of deep learning for object detection

Overall, the accuracy and performance of state-of-the-art deep learning models reported in 2019 are significantly higher than those of the previous years. Higher accuracy has a profound impact on application of the technology in medical imaging as well as surveillance systems. Improvement in performance means results can be inferred much faster on modern edge-based computing systems, paving the way for applications such as real-time drone based video analytics.

Specifically, the new improvements to deep learning models came by way of the following advancements:

  1. Face Detection Mean Average Precision went above 90%

Face detection is a computer vision problem to detect human faces in images, which is the first step to applications such as face verification, face alignment and facial recognition. Face detection is different from generic object detection in two ways. First, the range of scale of objects is larger in face detection and blurring is more common in face detection. Second, face detection has a single target and depends strongly on the structural characteristics of the face.

WIDER FACE is currently the most commonly used benchmark for evaluating face detection algorithms. The high variance of face scales and large number of faces per image make WIDER FACE the hardest benchmark for face detection, with three evaluation metrics (easy, medium and hard).

In 2019, PyramidBox++ [2], VIM-FD [3], ISRN [4], Retinaface [5], AlnnoFace [6] and RefineFace [7] all reported mAP scores of greater than 90% for the easy, medium and hard metrics. This is a significant improvment over the previous years.

2. Recent trends in pedestrian detection

CityPersons is a new and challenging benchmark for pedestrian detection. The dataset is split into different subsets according to the height and visibility level of the objects, and thus it’s able to evaluate the detectors in a more comprehensive manner. In 2019, the APD model reported a 30% improvement in the object detection performance over 2018 [8].