Simplifying Precision, Recall and other Evaluation Metrics

Original article can be found here (source): Artificial Intelligence on Medium

Simplifying Precision, Recall and other Evaluation Metrics

Explaining evaluation metrics in basic terms

Machine learning/statistics terms can be very convoluted, as if they were made to be understood by machines only. Unintuitive and similar sounding names like False and True Positives, Precision, Recall, Area Under ROC, Sensitivity, Specificity, Insanity. Ok the last one wasn’t real.

There are some great articles on precision and recall already, but when I read them and discussions on stackexchange, the messy terms all mix up in my mind and I’m left more confused than an unlabelled confusion matrix — so I’ve never felt like I understood it fully.

A confused confusion matrix

But to know how our model is working, it is important to master the evaluation metrics and understand them at a deep level. So what does a data scientist really need to know to evaluate a classification model? I explain the 3 core terms below using visuals with some examples so it can stick in my/our brains better.


Literally how accurate is the model at guessing the correct labels. If your dataset is pretty balanced and you care about getting every category correct, this is all you need to worry about.