Original article was published on Artificial Intelligence on Medium

# Evaluation Metrics

We will be explaining the difference between the evaluation metrics we listed out above. But first, in order to understand these metrics, you will need to know what ** False Positives and False Negatives** are and the difference between the two. Please see the article below for an explanation of the differences between the two:

In order to calculate the different evaluation metrics, you will need to know the amount of *False Positives (FP), False Negatives (FN), True Positives (TP), and True Negatives (TN)*. It would also be helpful to know the difference between them.

## Accuracy

Let’s start with simplest of the four evaluation metrics — Accuracy. Accuracy is simply the measure of how many observations our model correctly predicted over the total number of observations:

Accuracy = (TP + TN) / (TP + TN + FP + FN)

For example, let’s say we have a machine that classifies if a fruit is an apple or not. In a sample of hundreds of apples and oranges, the accuracy of the machine will be how many apples it classified correctly as *apples* and how many oranges it classified as *not apples* divided by the total number of apples and oranges. It is a simple and effective measurement as long as the number of apples and oranges are the same. Otherwise, we may have to use a different evaluation metric.

## Precision

Precision is the measure of how many observations our model correctly predicted over the amount of correct and incorrect predictions.

Precision = TP / (TP + FP)

Using our apple and oranges example, precision would measure the number of correctly classified apples divided by the apples *correctly* labeled as apples *and* the oranges *incorrectly* labeled as apples. In other words, precision measures how many of our classified apples were actually oranges.

## Recall

Recall is the measure of how many observations our model correctly predicted over the total amount of observations.

Recall = TP / (TP + FN)

In our apples and oranges example, recall measures the amount apples labeled correctly divided by the *total amount* of apples present. In other words, recall measures how many apples we might have missed in the entire sample of fruit.

## F1 Score

If we put our focus into one score, we might end up neglecting the other. In order to combat this we can use the F1 Score, which strikes a balance between the Precision and Recall scores. To calculate the F1 Score, you need to know the Precision and Recall scores and input them into the following formula:

F1 Score = 2 * ( (Precision * Recall) / (Precision + Recall) )

Using our apples and oranges example, F1 score will calculate a balance between Precision and Recall. It will measure the amount of misclassified oranges as apples (*False Positives*) and the amount of apples not correctly classified as apples (*False Negatives*).