High-Precision, Semi-Automatic Image Annotator by FORMCEPT

Source: Deep Learning on Medium

High-Precision, Semi-Automatic Image Annotator by FORMCEPT

for Building High-quality Datasets to Train Object Detection Models

If you are looking to convert your images/videos/3D-point cloud into pixel-perfect and accurately annotated training data, your search ends here.

Computer Vision is perhaps one of the most important and emerging fields in AI. It cuts across diverse areas of science, engineering, and technology. At the level of engineering, it mimics the human visual system to accomplish wonders. It consists of a wide array of elements such as image recognition, object detection, image segmentation, pose estimation, image generation, image super-resolution and so on.

Among these, Object Detection, in particular, is concerned with detecting and identifying key elements like people, things, etc. in images, stock video footage or live video streams. This achieved by broadly dividing an image or video frame into objects, backgrounds and noise and then diving deeper into the segmentation of objects.

For example, on a road, Object Detection could encompass applications like pedestrian detection, face recognition, vehicle detection, etc.

Image Source: https://github.com/facebookresearch/detectron2

Image Annotation for Object Detection:

When we look at an image, for example, the human eye can:

  • Detect the elements in an image
  • Detect where each element is located in a particular scene (in case it is moving, the location keeps changing)
  • Compare colour, size and shape across elements
  • Recognize the points of similarity between 2 or more images
  • Recognize the points of difference between 2 or more images
  • Understand the continuity in images, if any (for example, two screenshots of the same scene in a video taken at different points in time).

Image Annotation for Object Annotation helps to attain these capabilities by training an algorithm with the image database that is annotated and labelled with high precision. Image annotation is the process of marking images with identifiers/labels. The more complex the task, the more accurately annotated data is required for high-end training. Further, the training needs to accommodate real-time feedback to reduce errors and to augment the capability of the Object Detection algorithm.

How Object Detection Works:

Broadly speaking, the Object Detection algorithm gives two specific outcomes:

  1. Object name
  2. Object position

The position of an object in an image can be shown by drawing a 2D or 3D bounding-box around the object. The various ways in which images need to be annotated are:

  • Labelling objects and backgrounds using bounding boxes, polygons, cuboids, lines and points
  • Labelling the entire image without drawing boxes (e.g. generation of ‘alt-text’ for images by Microsoft Powerpoint)
  • Customizing labels and matching them to object attributes
  • Enabling search function across images (search by labels and attributes)
  • Enabling auto-suggest for quick selection of object names
  • Converting videos to non-redundant image frames

2D object detection is used for use cases like searching entities from multimedia files, vehicle detection, license plate recognition (LPR), pedestrian counting, digitization of hand-written text and many more. A significant use-case of identifying object name and position is the Autonomous Driving System. It is designed with the help of 3D Object Detection to scan and scope the various static and moving objects around the autonomous vehicle so that the vehicle can self-drive on the road safely. It helps to understand the direction and the distance between the self-driving car and other cars, pedestrians, traffic lights, other traffic/non-traffic objects. Other use-cases of 3D Object Detection include Parking Lot Management, Smart City Planning, Traffic Optimization and Vehicle Lifecycle management.

Various Data Formats in Image Annotation:

The common formats for images are .jpeg and .png. Medical images (X-ray scans, CT scans, MRIs) are often stored in more complex formats like DICOM. Common video data formats include .mov, .avi and .mp4.

High-quality machine learning models can only be created from high-quality training data that covers all possible scenarios and outcomes.

This is where we come in.

Use Case: Autonomous Vehicles

Now that the self-driving car is no longer science fiction, innovators across the world are competing to outsmart each other and build cars that make drivers truly redundant. Today, almost all autonomous vehicle companies are targeting level-5 autonomy (i.e. 100% driverless).

Autotech companies use a setup involving LiDARs calibrated together with cameras working in sync to perceive the world around them. As a result, these companies are seeing a huge surge in the demand for data to train the deep learning systems built around these sensors. Over time, these LiDAR sensors are improved to capture the information in real-time.