Image classification tutorials in pytorch-transfer learning

Source: Deep Learning on Medium

Image classification is a task of machine learning/deep learning in which we classify images based on the human labeled data of specific classes. We use convolutional neural networks for image data and there are various good pre-trained architectures available inbuilt in the pytorch’s torchvision framework. We will use recently published efficient-net as it gives the best accuracies. Its implementation in pytorch is given here. I will try to make this article complete with data analysis, error plots, confusion matrix, etc. Github repository for this article is here. We will use intel scene classification data available on kaggle.


In image data, the data is big enough and we cannot load complete data in memory so we need to make dataloader which will load images as we pass through the data and will only load few images in memory(The number of images loaded in memory is the batch size). To make dataloader we either need our images to be divided into the folders each of every class in the dataset or if we have the name of all images and their corresponding labels then we need to make a custom dataset classes and then a dataloader. Our dataset was already available in separate folders so I have used the first method. To see an example of the second method click here, “FaceLandmarksDataset” class in the above link is the custom dataset class. See comments in the code for a basic explanation. The dataset folder has 6 subfolders of each class. I have also split the test data into two parts valid and test.

#We need to pass path to folder containing folders of classes
train_data = torchvision.datasets.ImageFolder(root = '../input/seg_train/seg_train', transform = train_transforms)
train_loader = DataLoader(train_data, batch_size = 8 , shuffle = True)
test_data = torchvision.datasets.ImageFolder(root = '../input/seg_test/seg_test', transform = test_transforms)
# Splitting data in valid and train set
valid_size = 0.2
data_len = len(test_data)
indices = list(range(data_len))
split1 = int(np.floor(valid_size * data_len))
valid_idx , test_idx = indices[:split1], indices[split1:]
valid_sampler = SubsetRandomSampler(valid_idx)
test_sampler = SubsetRandomSampler(test_idx)
valid_loader = DataLoader(test_data, batch_size=8, sampler=valid_sampler)
test_loader = DataLoader(test_data, batch_size=8, sampler=test_sampler)
dataloaders = {'train':train_loader,'val':valid_loader}

Here train_loader and test_loader are the dataloaders that are created using torchvision package. We have also defined train_transforms. It is a list of data augmentation techniques which you want to apply on the dataset. In case you don’t want any data augmentation it can contain the functions to resize image and convert it into pytorch tensor which we need to before feeding into the neural network. It is mostly beneficial to use image augmentation so I have declared it as:

im_size = 150
train_transforms = transforms.Compose([
transforms.RandomResizedCrop(size=315, scale=(0.95, 1.0)), transforms.RandomRotation(degrees=10), transforms.RandomHorizontalFlip(),
transforms.CenterCrop(size=299), # Image net standards
transforms.ToTensor(), transforms.Normalize((0.4302, 0.4575, 0.4539), (0.2361, 0.2347, 0.2432))])

The value of the mean and standard deviation which is fed in the normalization function( transforms.Normalize((0.4302, 0.4575, 0.4539), (0.2361, 0.2347, 0.2432))])) can either be set to 0.5 but it is better to find mean and standard deviation of dataset. It can be calculated for your custom dataset using:

mean = 0.
std = 0.
nb_samples = len(data)
for data,_ in dataloader:
batch_samples = data.size(0)
data = data.view(batch_samples, data.size(1), -1)
mean += data.mean(2).sum(0)
std += data.std(2).sum(0)
mean /= nb_samples
std /= nb_samples

Data analysis

We need to visualize dataset with there specific classes this code will help in visualizing the classes. We can get dataset classes name as:

classes = train_data.classes

We can make it easy to find labels by making encoder and decoder of the classes.

#encoder and decoder to convert classes into integer
decoder = {}
for i in range(len(classes)):
decoder[classes[i]] = i
encoder = {}
for i in range(len(classes)):
encoder[i] = classes[i]

This code will visualize n_figures figure with there specific classes.

import matplotlib.pyplot as plt
import random
#plotting rondom images from dataset
def class_plot( data , encoder ,inv_normalizen_figures = 12):
n_row = int(n_figures/3)
fig,axes = plt.subplots(figsize=(14, 10), nrows = n_row, ncols=3)
for ax in axes.flatten():
a = random.randint(0,len(data))
(image,label) = data[a]
label = int(label)
l = encoder[label]
image = inv_normalize(image)
image = image.numpy().transpose(1,2,0)
im = ax.imshow(image)

Here we have used inv_normalize. It is used in case we have normalized the original image. We need to inverse normalize the image before visualizing it. We can define inv_normalize using the same transform.normalize(). The value of mean and standard deviation will change as:

inv_normalize = transforms.Normalize(
mean=[-0.4302/0.2361, -0.4575/0.2347, -0.4539/0.2432],
std=[1/0.2361, 1/0.2347, 1/0.2432]


As told earlier I have used Efficient net. If you want to use the model give in pytorch just replace line 4 by

self.resnet = models.resnet34(pretrained = True)

This will make the model as resnet34 other models available in pytorch can be seen here.

Here we have created a custom class of neural network.

For classification, we use cross-entropy loss. We can also use negative log-likelihood loss but then we need the output of our neural network to be log softmax.


To find the learning rate to begin with I used learning rate scheduler as suggested in fast ai course. For pytorch, I used the implementation available on GitHub.

from lr_finder import LRFinder
optimizer_ft = optim.Adam(classifier.parameters(), lr=0.0000001)
lr_finder = LRFinder(classifier, optimizer_ft, criterion, device=device)
lr_finder.range_test(train_loader, end_lr=1, num_iter=500)

This will give us loss vs epoch plot and we can find the maximum learning rate for which model will keep on converging faster. For reducing overfitting I have also used early stopping which is available for pytorch on GitHub. Early stopping will stop the model based on validation loss.

Testing and performance metrics:

To check the performance of the model we need to get predictions from the test split of the data. The function below will take the testloader and modelas an argument and will return the predicted labels, true labels, true and predicted label of wrong predictions and the list of images of the wrong prediction. The wrong prediction lists created will help us to plot the images with wrong predictions.

batch_size = 8
sm = nn.Softmax(dim = 1)
def test(model,dataloader):
running_corrects = 0
pred = []
true = []
pred_wrong = []
true_wrong = []
image = []

for batch_idx, (data, target) in enumerate(dataloader):
data, target = Variable(data), Variable(target)
data = data.type(torch.cuda.FloatTensor)
target = target.type(torch.cuda.LongTensor)
output = model(data)
loss = criterion(output, target)
output = sm(output)
_, preds = torch.max(output, 1)
running_corrects = running_corrects + torch.sum(preds ==
running_loss += loss.item() * data.size(0)
preds = preds.cpu().numpy()
target = target.cpu().numpy()
preds = np.reshape(preds,(len(preds),1))
target = np.reshape(target,(len(preds),1))
data = data.cpu().numpy()

for i in range(len(preds)):

epoch_acc = running_corrects.double()/(len(dataloader)*batch_size)
epoch_loss = running_loss/(len(dataloader)*batch_size)
return true,pred,image,true_wrong,pred_wrong

Now we can plot the images with the wrong prediction using the code. It will take the true_wrong, pred_wrong and image returned by the above function as input. It will also take inv_normalize which was defined above.

def wrong_plot(true,ima,pred,encoder,inv_normalize,n_figures = 12):
print('Classes in order Actual and Predicted')
n_row = int(n_figures/3)
fig,axes = plt.subplots(figsize=(14, 10), nrows = n_row, ncols=3)
for ax in axes.flatten():
a = random.randint(0,len(true)-1)

image,correct,wrong = ima[a],true[a],pred[a]
image = torch.from_numpy(image)
correct = int(correct)
c = encoder[correct]
wrong = int(wrong)
w = encoder
f = 'A:'+c + ',' +'P:'+w
if inv_normalize !=None:
image = inv_normalize(image)
image = image.numpy().transpose(1,2,0)
im = ax.imshow(image)
wrong predictions

Now we will check the performance of models using various performance measures such as accuracy, f1 score, precision, recall. We will also plot the confusion matrix of the prediction.

def performance_matrix(true,pred):
precision = metrics.precision_score(true,pred,average='macro')
recall = metrics.recall_score(true,pred,average='macro')
accuracy = metrics.accuracy_score(true,pred)
f1_score = metrics.f1_score(true,pred,average='macro')
print('Confusion Matrix:\n',metrics.confusion_matrix(true, pred))
print('Precision: {} Recall: {}, Accuracy: {}: ,f1_score: {}'.format(precision*100,recall*100,accuracy*100,f1_score*100))


Precision: 90.78, Recall: 90.68 , Accuracy: 90.33 ,f1_score: 90.44

Confusion matrix:

To plot this type of image use this code:

def plot_confusion_matrix(y_true, y_pred, classes,
This function prints and plots the confusion matrix.
Normalization can be applied by setting `normalize=True`.
if not title:
if normalize:
title = 'Normalized confusion matrix'
title = 'Confusion matrix, without normalization'
# Compute confusion matrix
cm = metrics.confusion_matrix(y_true, y_pred)
# Only use the labels that appear in the data
#classes = classes[unique_labels(y_true, y_pred)]
if normalize:
cm = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]
print("Normalized confusion matrix")
print('Confusion matrix, without normalization')
fig, ax = plt.subplots()
im = ax.imshow(cm, interpolation='nearest', cmap=cmap)
ax.figure.colorbar(im, ax=ax)
# We want to show all ticks...
# ... and label them with the respective list entries
xticklabels=classes, yticklabels=classes,
ylabel='True label',
xlabel='Predicted label')
# Rotate the tick labels and set their alignment.
plt.setp(ax.get_xticklabels(), rotation=45, ha="right",
# Loop over data dimensions and create text annotations.
fmt = '.2f' if normalize else 'd'
thresh = cm.max() / 2.
for i in range(cm.shape[0]):
for j in range(cm.shape[1]):
ax.text(j, i, format(cm[i, j], fmt),
ha="center", va="center",
color="white" if cm[i, j] > thresh else "black")
return ax
plot_confusion_matrix(true, pred, classes= classes,title='Confusion matrix, without normalization')

This function will take the true label and predicted label as input. It will also take the name of classes as input.

Thank You for completing the blog. Please comment if I have done something wrong.