Top 10 model evaluation metrics for classification ML models

Original article was published on Artificial Intelligence on Medium

4. Accuracy

Now, the above three metrics discussed are General-purpose metrics irrespective of the kind of training and test data that you have and the kind of classification algorithm yu have deployed for your problem statements.

We are now moving towards discussing the metrics which are well suited for a particular type of data.

Let’s start talking about Accuracy here, this is a metric that is best used for a balanced dataset. Refer to the diagram below which is sourced from this medium article.

Source: Link

As you can see, a balanced dataset is one where the 1’s and 0’s, yes’s and no’s, positive and negatives are equally represented by the training data. On the other hand, if the ratio of the two class-labels is skewed then our model will get biased towards one category.

Assuming we have a Balanced dataset, let’s learn what is Accuracy.

Accuracy is the proximity of measurement results to the true value. It tell us how accurate our classification model is able to predict the class labels given in the problem statement.

For example: Let’s suppose that our classification model is trying to predict for customer attrition scenario. In the image above, Out of the total 700 actually attrited customers (TP+FN) , the model was correctly able to classify 500 attrited customers correctly (TP). Similarly, out of the total 300 retained customers (FP+TN), the model was correctly able to classify 200 retained customers correctly (TN).

Accuracy= (TP+TN)/Total customers

In the above scenario, we see that the accuracy of the model on the test dataset of 1000 customers is 70%.

Now, we learned that Accuracy is a metric that should be used only for a balanced dataset. Why is that so? Let’s look at an example to understand that.

In this example, this model was trained on an imbalanced dataset and even the test dataset is imbalanced. The Accuracy metric has a score of 72% which might give us the impression that our model is doing a good job at the classification. But, look closer, this model is doing a terrible job out of predicting the Negative class labels. It only predicted 20 correct outcomes out of 100 total negative label observations. This is why the Accuracy metric should not be used if you have an imbalanced dataset.

The next question is, then what is to be used if you have an imbalanced dataset? The answer is Recall and Precision. Let’s learn more about these.