Transfer learning for Deep Neural Networks using TensorFlow

Original article was published on Deep Learning on Medium

Transfer learning for Deep Neural Networks using TensorFlow

A practical and hands-on example to know how to use transfer learning using TensorFlow.

Photo by Jopwell from Pexels

In this article, we will learn how to use transfer learning for a classification task.

One of the most powerful ideas in deep learning is that we can take the knowledge that a neural network has learned from one task and apply that knowledge to another task. This is called transfer learning.

Transfer learning makes sense when we have a lot of data for the problem we are transferring from and usually relatively less data for the problem we are transferring the knowledge to.

As the first step lets import required modules and load cats_vs_dogs dataset which is a TensorFlow Dataset. We will consider only 20% of the dataset, as we want to experiment with the usage of transfer learning when the training data is less.

Note: I prefer explaining the code using comments in the code snippets.

Sample images from the dataset

As we can notice that images in the dataset are of different shapes lets convert them to the same shapes and form batches of data for training. Please refer to the tf.image and tf.data.Dataset modules before moving to the next part.

formatting images to the required format

Non-Pre-Trained Model :

Let us use the MobileNet V2 Neural Network for our example. We can directly import it from tf.keras.applications, which has different inbuilt models that can be directly used. On the other hand, we can import pre-trained model weights by defining the “weights” parameter.

At first, we will check the accuracy of the model without importing the pre-trained model for the chosen small dataset and later compare it with a pre-trained model. So assigning weights = None and changing the last classification layer as our application has only two classes.

Creating basic non-pre-trained model

From the result of original_model.summary(), we can observe that there are 2,223,872 trainable parameters. In order to convert (batch_size, 5, 5, 1280 ) into the last stage of classification, we use a GlobalAveragePooling2D followed by a Dense(1, activation = “sigmoid”) as the last layer which can be used for classification, as we have only two classes in our dataset to be classified.

Adding an output layer to the model

Let us train the model for 10 epochs and see how the accuracy metrics are for training, validation, and test sets.

training and testing the model

Non-pre-trained model: Epochs = 10
training loss: 0.5750, training accuracy: 0.8306
val_loss: 0.6958, val_accuracy: 0.4815
test_loss: 0.6991, test acc: 0.4952

In this article, we will use two ways to customize a pre-trained model:

Feature Extraction/Frozen Pre-Trained Model: We will use the representations of a previous network to learn to extract meaningful features for new samples. We will simply add a new classifier to the pre-trained model and train only the classifier part from scratch so that we can use the feature maps previously learned for the dataset.

Fine-Tuning/Unfrozen Pre-Trained Model: We will unfreeze a few of the top layers of a pre-trained model and jointly train both the newly-added classifier layer and the unfrozen layers of the pre-trained model. This allows us to “fine-tune” the representations of the higher-order features in the base model to make them more relevant for this particular task.

Frozen Pre-Trained Model :

In the next step, let us work with the same model and the same datasets, but we will import the model along with weights it learned by training on the “imagenet” dataset. We can load the weights by assigning the parameter weights = “imagenet”. As in the previous case, we will define the classification layer of the model according to our application. In this section, we will only train the classifier part and freeze the whole pre-trained model layers. We set frozen_model.trainable = False to achieve this.

building, training, and testing frozen pre-trained mode

Frozen pre-trained model: Epoch 10/10
training loss: 0.5322, training accuracy: 0.9652
val_loss: 0.5226, val_accuracy: 0.9725
test_loss: 0.5315, test acc: 0.9664

Unfrozen Pre-Trained Model:

In this section, we will unfreeze some of the topmost layers of the pre-trained model, add a classification layer, and then fine-tune the whole model by training in with the available small data sets. We can unfreeze layers by setting layer.trainable = True for a certain number of layers. In this case, the trainable parameters of the model will be around 1,862,592.

building, training, and testing unfroze pre-trained mode

Unfrozen pre-trained model: Epoch 10/10
training loss: 0.5030, training accuracy: 0.9989
val_loss: 0.5060, val_accuracy: 0.9828
test_loss: 0.5123, test acc: 0.9810

Comparison:

We can observe that the pre-trained models out-performed the base model in terms of accuracy.

Comparison of above-discussed models

Conclusion:

Using a pre-trained model for feature extraction: It is common practise to take advantage of features learned from a model trained on a larger dataset within the same domain while operating with a small data set. It is achieved by instantiating the pre-trained model and placing on top of it a fully connected classifier.

The pre-trained model is “frozen” and only the classifier weights are changed during the training. In this case, all the features associated with each image were extracted by the convolutional layers, and we have only trained a classifier that determines the image class provided the set of extracted features.

Fine-tuning a pre-trained model: In order to further improve performance, the top-level layers of the pre-trained models could be repurposed via fine-tuning to the new dataset.

In this case, we tuned our weights in such a way that our model learned high-level features specific to the dataset. Usually, this technique is recommended when the training dataset is large enough and very similar to the original data set on which the pre-trained model was trained on.

The complete Jupiter notebook can be found at my git hub.

Please provide feedback on the article if any areas of my writing can be improved. Thank you.