[Pytorch] Performance Evaluation of a Classification Model-Confusion Matrix

Original article was published by Yeseul Lee on Deep Learning on Medium

[Pytorch] Performance Evaluation of a Classification Model-Confusion Matrix

There are several ways to evaluate the performance of a classification model. One of them is a ‘Confusion Matrix’ which classifies our predictions into several groups depending on the model’s prediction and its actual class. Through calculating confusion matrix, we can get the model’s accuracy, sensitivity, specificity, positive predictive value(PPV), negative predictive value(NPV) and F1 score, which are useful performance indicators of the classifier.

This is the example confusion matrix(2*2) of a binary classifier. (If the number of model’s classes is n, shape of the confusion matrix is n*n)

Let’s define some basic terminologies.

  • True Positive(TP): The model predicted ‘Positive’ and it’s actual class is ‘Positive’, which is ‘True’
  • False Positive(FP): The model predicted ‘Positive’ and it’s actual class is ‘Negative’, which is ‘False’
  • False Negative(FN): The model predicted ‘Negative’ and it’s actual class is ‘Positive’, which is ‘False’
  • True Negative(TN): The model predicted ‘Negative’ and it’s actual class is ‘Negative’, which is ‘True’

These are the performance criteria calculated from the confusion matrix.


  • Accuracy: (TP+TN)/(P+N)
  • Sensitivity: TP/P
  • Specificity: TN/N
  • PPV: TP/(TP+FP)
  • NPV: TN/(TN+FN)
  • F1 score: 2*(PPV*Sensitivity)/(PPV+Sensitivity) =(2*TP)/(2*TP+FP+FN)

Then, there’s Pytorch codes to calculate confusion matrix and its accuracy, sensitivity, specificity, PPV and NPV.

def getConfusionMatrix(model, show_image=False):
model.eval() #set the model to evaluation mode
confusion_matrix=np.zeros((2,2),dtype=int) #initialize a confusion matrix
num_images=testset_sizes['test'] #size of the testset

with torch.no_grad(): #disable back prop to test the model
for i, (inputs, labels) in enumerate(testloaders['test']):
inputs = inputs.to(device)
labels = labels.to(device)
#get predictions of the model
outputs = model(inputs)
_, preds = torch.max(outputs, 1)

#get confusion matrix
for j in range(inputs.size()[0]):
if preds[j]==1 and labels[j]==1:
elif preds[j]==1 and labels[j]==0:
elif preds[j]==0 and labels[j]==1:
elif preds[j]==0 and labels[j]==0:
#show image and its class in confusion matrix
if show_image:
print('predicted: {}'.format(class_names[preds[j]]))
#print results
print('Confusion Matrix: ')
print('Sensitivity: ', 100*confusion_matrix[0][0]/(confusion_matrix[0][0]+confusion_matrix[0][1]))
print('Specificity: ', 100*confusion_matrix[1][1]/(confusion_matrix[1][1]+confusion_matrix[1][0]))
print('PPV: ', 100*confusion_matrix[0][0]/(confusion_matrix[0][0]+confusion_matrix[1][0]))
print('NPV: ', 100*confusion_matrix[1][1]/(confusion_matrix[1][1]+confusion_matrix[0][1]))

return confusion_matrix