Source: Deep Learning on Medium
AI Scholar: A Large-Scale Clustered and Densely Annotated Dataset for Object Grasping
Introducing GraspNet Dataset
This research summary is just one of many that are distributed weekly on the AI scholar newsletter. To start receiving the weekly newsletter, sign up here.
Machine Learning (ML) and Deep Learning (DL) have attracted a lot of interest in recent years. ML is a subfield of Artificial Intelligence (AI) where algorithms can learn and improve themselves by studying huge chunks of available data. DL, on the other hand, is a subfield of ML where algorithms are inspired by the structure and brain function called artificial neural networks(ANNs).
As you know, training these models requires that you have big sets of data. You also probably know that the process of collecting a sufficient amount of such data is expensive. That said, some fields are suffering from a lack of insufficient training data and the lack of evaluation benchmarks needed to make robust models. One such field is object grasping which is a fundamental problem in computer vision with many applications.
Current Challenges in Object Grasping
To start with, grasp pose has varied representations including rectangle and 6D pose representations which are evaluated with different metrics. The challenge here is that the differences in the evaluation metrics make it difficult to compare the methods in a unified manner directly. Again, performing the evaluation with real robotics radically increases the costs.
The other challenge is the fact that it is difficult to obtain large-scale high-quality training data for object grasping. Current datasets annotated by humans are typically small in scale and provide but sparse annotations. And while large scale datasets can be obtained from simulated environments, the visual domain gap between simulation and reality degrade the performance of algorithms in real-world applications.
Introducing GraspNet Dataset
Object grasping is critical for many applications in industry, agriculture and service trade. However, for the clustered scene, researches suffer from the problems of insufficient training data and the lack of evaluation benchmarks as mentioned above.
In this paper, several researchers contribute a large-scale grasp pose detection dataset, GraspNet with a unified evaluation system comprising about 87,040 RGBD images with over 370 million grasp poses.
The evaluation system directly reports whether a grasping is successful or not by analytic computation, which is able to evaluate any kind of grasp poses without exhausted labeling pose ground-truth. They conduct extensive experiments to show that the proposed dataset and evaluation system can align well with real-world scenarios.
Potential Uses and Effects
This work builds a large-scale dataset for clustered scene object grasping. It consists of images taken by real-world sensors and has rich and dense annotations. The proposed unified evaluation system promotes development in this area. Such methodology greatly reduces the labor of annotating grasp poses.
In the future, researchers intend to extend the dataset to multi-finger gripper and vacuum-based end effectors. Watch out for the dataset, source code, and models will soon be made publicly available.
Read more: Introducing GraspNet