Original article was published on Artificial Intelligence on Medium
We will be explaining the difference between the evaluation metrics we listed out above. But first, in order to understand these metrics, you will need to know what False Positives and False Negatives are and the difference between the two. Please see the article below for an explanation of the differences between the two:
In order to calculate the different evaluation metrics, you will need to know the amount of False Positives (FP), False Negatives (FN), True Positives (TP), and True Negatives (TN). It would also be helpful to know the difference between them.
Let’s start with simplest of the four evaluation metrics — Accuracy. Accuracy is simply the measure of how many observations our model correctly predicted over the total number of observations:
Accuracy = (TP + TN) / (TP + TN + FP + FN)
For example, let’s say we have a machine that classifies if a fruit is an apple or not. In a sample of hundreds of apples and oranges, the accuracy of the machine will be how many apples it classified correctly as apples and how many oranges it classified as not apples divided by the total number of apples and oranges. It is a simple and effective measurement as long as the number of apples and oranges are the same. Otherwise, we may have to use a different evaluation metric.
Precision is the measure of how many observations our model correctly predicted over the amount of correct and incorrect predictions.
Precision = TP / (TP + FP)
Using our apple and oranges example, precision would measure the number of correctly classified apples divided by the apples correctly labeled as apples and the oranges incorrectly labeled as apples. In other words, precision measures how many of our classified apples were actually oranges.
Recall is the measure of how many observations our model correctly predicted over the total amount of observations.
Recall = TP / (TP + FN)
In our apples and oranges example, recall measures the amount apples labeled correctly divided by the total amount of apples present. In other words, recall measures how many apples we might have missed in the entire sample of fruit.
If we put our focus into one score, we might end up neglecting the other. In order to combat this we can use the F1 Score, which strikes a balance between the Precision and Recall scores. To calculate the F1 Score, you need to know the Precision and Recall scores and input them into the following formula:
F1 Score = 2 * ( (Precision * Recall) / (Precision + Recall) )
Using our apples and oranges example, F1 score will calculate a balance between Precision and Recall. It will measure the amount of misclassified oranges as apples (False Positives) and the amount of apples not correctly classified as apples (False Negatives).