Original article was published by Yeseul Lee on Deep Learning on Medium

# [Pytorch] Performance Evaluation of a Classification Model-Confusion Matrix

There are several ways to evaluate the performance of a classification model. One of them is a ‘Confusion Matrix’ which classifies our predictions into several groups depending on the model’s prediction and its actual class. Through calculating confusion matrix, we can get the model’s accuracy, sensitivity, specificity, positive predictive value(PPV), negative predictive value(NPV) and F1 score, which are useful performance indicators of the classifier.

This is the example confusion matrix(2*2) of a binary classifier. (If the number of model’s classes is n, shape of the confusion matrix is n*n)

Let’s define some basic terminologies.

- True Positive(TP): The model predicted ‘Positive’ and it’s actual class is ‘Positive’, which is ‘True’
- False Positive(FP): The model predicted ‘Positive’ and it’s actual class is ‘Negative’, which is ‘False’
- False Negative(FN): The model predicted ‘Negative’ and it’s actual class is ‘Positive’, which is ‘False’
- True Negative(TN): The model predicted ‘Negative’ and it’s actual class is ‘Negative’, which is ‘True’

These are the performance criteria calculated from the confusion matrix.

(P=TP+FN, N=TN+FP)

- Accuracy: (TP+TN)/(P+N)
- Sensitivity: TP/P
- Specificity: TN/N
- PPV: TP/(TP+FP)
- NPV: TN/(TN+FN)
- F1 score: 2*(PPV*Sensitivity)/(PPV+Sensitivity) =(2*TP)/(2*TP+FP+FN)

Then, there’s Pytorch codes to calculate confusion matrix and its accuracy, sensitivity, specificity, PPV and NPV.

def getConfusionMatrix(model, show_image=False):

model.eval() #set the model to evaluation mode

confusion_matrix=np.zeros((2,2),dtype=int) #initialize a confusion matrix

num_images=testset_sizes['test'] #size of the testset

with torch.no_grad(): #disable back prop to test the model

for i, (inputs, labels) in enumerate(testloaders['test']):

inputs = inputs.to(device)

labels = labels.to(device) #get predictions of the model

outputs = model(inputs)

_, preds = torch.max(outputs, 1)

#get confusion matrix

for j in range(inputs.size()[0]):

if preds[j]==1 and labels[j]==1:

term='TP'

confusion_matrix[0][0]+=1

elif preds[j]==1 and labels[j]==0:

term='FP'

confusion_matrix[1][0]+=1

elif preds[j]==0 and labels[j]==1:

term='FN'

confusion_matrix[0][1]+=1

elif preds[j]==0 and labels[j]==0:

term='TN'

confusion_matrix[1][1]+=1 #show image and its class in confusion matrix

if show_image:

print('predicted: {}'.format(class_names[preds[j]]))

print(term)

imshow(inputs.cpu().data[j])

print()

#print results

print('Confusion Matrix: ')

print(confusion_matrix)

print()

print('Sensitivity: ', 100*confusion_matrix[0][0]/(confusion_matrix[0][0]+confusion_matrix[0][1]))

print('Specificity: ', 100*confusion_matrix[1][1]/(confusion_matrix[1][1]+confusion_matrix[1][0]))

print('PPV: ', 100*confusion_matrix[0][0]/(confusion_matrix[0][0]+confusion_matrix[1][0]))

print('NPV: ', 100*confusion_matrix[1][1]/(confusion_matrix[1][1]+confusion_matrix[0][1]))

return confusion_matrix