Original article was published on Deep Learning on Medium
Introduction to Transfer Learning
Transfer Learning allows us to use the pre-trained models which are trained on large datasets of texts, or images or other datasets so that we can use it directly to our benefit. I’d like to explain Transfer Learning with a simple example. You might have learned to ride a bi-cycle. It might have taken quite sometime to learn to balance and ride it. But, once you are experienced with it, you can use that knowledge to ride a motorcycle without much effort. The same mechanism is very much valid with Transfer Learning. You take the deep learning model previously trained on a huge dataset and then only train few of the layers (mostly fully connected layers) to train on your small datasets which would require very less computation power and memory. PyTorch has a huge collection of pre-trained models ready to be used.
Pneumonia X-Ray Dataset
In this article, I would like to talk about the dataset of Pneumonia X-Ray images and how we can use Transfer Learning in PyTorch for the classification of X-Rays as Normal or Pneumonia affected. The dataset is available in Kaggle and can be obtained here. The total size of the dataset which contains x-ray images is more than 1 GB. The dataset has three folders viz. train, val and test. Each of these folders contain two sub-folders normal and pneumonia and each of them contain x-ray images of different dimensions.
I shall be using PyTorch framework for the pre-trained ResNet18 model (this model has 18 hidden layers). The FC layer will be replaced by custom Sequential layer. This has to be done as ResNet18 default configuration has 1000 outputs. But, our requirement is just two outputs viz. normal or pneumonia. So, we will replace the last layer to have just two outputs. This FC layer shall be trained with the images available in the train folder and validated using the images from val folder. The train_loss, val_loss and val_accuracy shall be recorded. The following steps will be followed.
First of all, load all the required modules.
The image size shall be of size 128 * 128. Each batch shall contain 64 images. The image net mean and standard deviation shall be used for normalization.
For the preprocessing of the images, we shall apply the following transformations on all the train, val and test datasets. A number of transformations are applied such as resizing (as images are of different dimensions), horizontal flipping, rotation, normalisation, etc. Data Augmentation is proven to be very effective as model gets a chance to train on different images on every epoch. This may help the model to generalize well.
Using ImageFolder function from Torch, we create the datasets. The function expects all the images to be organized in class labeled folders. For e.g., in the train folder, there are two folders normal and pneumonia which shall contain respective images.
Inspecting each folder, we see that there are 5216 images in train folder, 16 images in val folder, and 624 images in test folder respectively.
Now, we create data loaders so that it can be fed to the model later. The batch size as set above is 64 and the images are shuffled for randomization.
To allow exploit GPU if available, we shall use the below code.
Select the device (cpu or cuda whichever is available)
Load all the data to the cuda or cpu device
Now, we prepare the ResNet18 model. We use the pretrained model and update the Fully Connected (FC) layer to have a custom sequential layer as shown below. We have used 3 linear layers each supported by BatchNormalization and Dropout for Regularization. The very last layer reduces to two outputs where LogSoftMax is used for output probabilities. We use ReLU activation function.
Now, once the model is ready, we create an instance of the model and call the freeze function to allow only the FC layers to be trained.
The number of epochs is set to be 5 and learning rate is chosen to be 1e-4.
For the loss calculation, we chose the NLLLoss() as it is found to work much better. Also, for optimisation, we use Adam optimizer.
For the training the model, we iterate over several epochs. In the meantime, we record the train_loss, val_loss and val_accuracy using the evaluate function.
Now, we plot the loss of the train and validation. The validation accuracy is found to be 81 %.
Now, we evaluate over the test dataset.
The test accuracy is calculated to be 82.5%
We store the weights of the trained model on this new dataset.
Thanks to Transfer Learning, we were able to obtain pretty good accuracy of 82.5% with just 5 epochs. We can even increase the accuracy by increasing the epochs or testing on different hyper-parameters such as learning rate, batch size, etc.