Using PyTorch for building a Convolutional Neural Network (CNN) model

Original article was published on Deep Learning on Medium


Import the libraries required in PyTorch

import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.utils.data import DataLoader
from torchvision import datasets, transforms
from torchvision.utils import make_grid

Loading the data from PyTorch repo using datasets function which is part of the utils helper function. And only transformation we wish to do on this data set is .ToTensor()

transform =  transforms.ToTensor()

Loading the train and test data

train_data = datasets.CIFAR10( root = r"c:\path", download=True, train= True, transform= transform)test_data = datasets.CIFAR10( root = r"c:\path", download=True, train= False, transform= transform)

Loading the data into batches to optimize the memory and to avoid running into out-of-memory error. Here we are using the batch size of 100 images for both test and train

train_loader = DataLoader(train_data, batch_size= 100, shuffle=True)
test_loader = DataLoader(test_data, batch_size = 100, shuffle=True)

We can visualize the images that are loaded already using make_grid

for images,labels in train_loader:
break
im = make_grid(images[0:10], nrow= 5)
plt.figure(figsize=(10,4))
plt.imshow(np.transpose(im.numpy(),(1,2,0)))

Lets check the dimension of each batch. As you can see below each batch has 100 images and each image is of the dimension [3, 32, 32], where 3 indicates 3 color channel and 32*32 is the image width and height

images.shape

torch.Size([100, 3, 32, 32])

We are going to build a CNN model which has architecture as given below.

The initial layers of CNN is usually referred to as a sequential layer, as you can see that sequences of convolution layer + relu + pooling is placed in sequence. Second part of the model is referred to as classification layer which actually does the job of classifying the images into its respective classes.

Features learning has happen in the initial layers of CNN model which finds patterns in an image. We can do that by convoluting over an image and looking for patterns. In the first few layers of CNNs the network can identify lines and corners, but we can then pass these patterns down through our neural net and start recognizing more complex features as we get deeper. It simply does convolve with a multiplication hence the name “convolutional neural network” indicates that the network employs a mathematical operation called convolution. This property makes CNNs really good at identifying objects in images.

In terms of architecture the image is converted to feature maps. The number of feature maps we want from an image is an arbitrary ( hyperparameter ). We use conv2d function of pytorch to get the image converted to feature maps. These feature maps are then downsampled using a maxpooling function. This step is performed multiple times depending upon the depth of the model we choose.

We use stride, filtersize and padding to decide the traversal of image or graph by kernels. By default the stride =1, which means on step size at a time. Filtersize the is dimension of the square filter , for example when filter is 3 , it means we are 3*3 square filter. Padding usually = 0, when set to any +ve value other 0, it add that many rows of values in the border of the image.

Below you can see how the data set when convoluted with a 3*3 square filter is converted to a feature map.

Below illustration shows how the padding = 1, helps the filter to traverse corner pixels of the image more number of time than the earlier state where padding = 0

What follows a conv2d is a pooling layer which is used for downsampling the image dimension. We usually use two types of pooling — Avg and Max pooling. Sharing the illustration below which is using the Maxpooling with filter size 2*2. Filter size 2*2 reduces the dimension of matrix or feature map to half,i.e. from m*m to [m/2]*[m/2]