Original article was published on Deep Learning on Medium
Why use Single Number Evaluation Metric in Machine Learning?
If you are tuning hyperparameters, or attempting various ideas Learning algorithms, or simply trying out various options for constructing your learning machine. You will notice your development would be much quicker if you have Single Number Evaluation Metric that quickly lets you tell whether the new thing you are doing works better than the last idea, or worse. So when teams start on a project for machine learning you need to set your problem to a single real number evaluation metric. Let us take a look at one case.
We often get an idea, code it, run the experiment to see how it worked and Then use the experiment’s findings to improve your thinking. And then keep going around that loop as you continue to improve your algorithm. So let’s assume you ‘d previously created some classifier A for your classifier. And by changing the training sets and the hyperparameters or Another thing, now you’ve trained a new classifier, B. So one fair way to determine the classifier ‘s performance is to look at its accuracy and recall.
So if classifier A has 95% precision, this means that when classifier A says something is a cat, there’s a 95% chance it really is a cat. So if classifier A is 90% recall, this means that of all of the images in, say, your dev sets that really are cats, classifier A accurately pulled out 90% of them. The problem with using precision recall as your evaluation metric is that if classifier A does better on recall, which it does here, the classifier B does better on precision, then you’re not sure which classifier is better.
And with two evaluation metrics, it is difficult to know how to quickly pick one of the two or quickly pick one of the ten. So what I recommend is rather than using two numbers, precision and recall, to pick a classifier,you just have to find a new evaluation metric that combines precision and recall.
In the literature on machine learning, the standard way of combining precision and Recall is what’s called an F1 score. In this case however, Immediately, you can see that classifier A has a better F1 score. And assuming F1 score is a reasonable way to combine accuracy with recall; Then you can easily pick Classifier A over Classifier B.
Evaluation metric allows you to quickly tell if classifier A or classifier B is better, and therefore having a dev set plus single number evaluation metric distance to speed up iterating. It speeds up this iterative process of improving your machine learning algorithm.