what is data labeling?

Source: Deep Learning on Medium

In this Post,I’ll talking about what data labeling is and Why is it so important in making machine learning models.In fact, data labeling is a way that giving the groundtruth of the labeled object .So,why we need it? To answer the question, We must have some insight on how dose machine learning algorithm work?

How dose machine categorie cat and dog?

How does machine learning algorithm work?

Similar to the learning process when we are learning to categorize cat or dog. First of all,we may have many picture of cats and dogs. Secondly,we are given many cats picture and say this kind of animal is cat. Thirdly,repeat the same process for dog. Finally,we know that this kind of animals is cat and the other is dog. Basically, Machine learning these things in the same way as we do.

But here comes a problem.how does the machine know the answer before learning from data samples? The answer is data labeling. Because we human already have knowledge that we can give them the right answers.The process of getting answers for the computer is called data labeling process. If we give them the wrong results, they will learning in the wrong ways. For example, if we label the machine both the cat and dog are cat, it’ll recognize the dog to cat when giving a dog. That’s why data annotation quality is critical to our deep learning model.

What the common type of data labeling?

Those type of data labeling are the basics.Above all, the most widely used labeling type are classification and object detection.In the real world application, we always doing a object detection and then executing a classification.The other data labeling methods are for more complicated scenario,such as autonomous cars or face landmark detection or skeleton detection use cases. If you want to learn more about them, check out the above link for more information.


This airticle describes what data labeling is and how does it fit in the shoes of machine learning process.In addtion,We list seveal common type of data labeling and their usecases.