HybridPose: neural network recognizes object pose in 6D

Source: Deep Learning on Medium

HybridPose: neural network recognizes object pose in 6D

HybridPose is a neural network model for recognizing the pose of an object in 6D. The model takes an image of the object as an input and predicts key points, border vectors, and the ratio of the object’s pose relative to its standard position. Using intermediate representations of the object’s posture improves the stability of model predictions. For example, this is relevant for the case of overlapping objects on top of each other. On the Occlusion Linemod dataset, the neural network surpassed the previous state-of-the-art by 67.4% in the accuracy of the predicted poses.

Standard approaches for 6D pose recognition use a single representation to encode object pose data. HybridPose uses an intermediate view that stores information about the geometry of the object: key points, edge vectors, and the displacement of the position of the object from its standard position (symmetry correspondence).

Neural network architecture

HybridPose receives an image with an object of a known class, which was made using a pinhole camera with known parameters, as an input . At the output, the model gives 6D the location of the object relative to the camera. HybridPose uses three predictive networks to determine:

  • A set of key points of an object (keypoints);
  • A set of connections between points (edges between keypoints);
  • Symmetric correspondence between image pixels (symmetry correspondences)
Pipeline Model Learning

Evaluation of the model

Researchers tested the model on the Linemod dataset . As a metric used ADD(-S) accuracy. ADD(-S) accuracy is defined as the percentage of test cases for which the average distance between the prediction and the true value is less than 10%. HybridPose was compared with baseline approaches for assessing 6D object posture: PoseCNN, Oberweger et al., Hu et al., PVNet and DPOD. Below you can see that on the subnet Linemod HybridPose produces more accurate results. Occlusion Linemod is a part of a dataset that consists of images in which objects overlap.

Accuracy of models for different types of objects from the Occlusion Linemod dataset

Source: https://arxiv.org/pdf/2001.01869.pdf

Github: https://github.com/chensong1995/HybridPose