Part 3: Image Classification using Features Extracted by Transfer Learning in Keras

Source: Deep Learning on Medium

Part 3: Image Classification using Features Extracted by Transfer Learning in Keras

By Ahmed F. Gad, Alibaba Cloud Community Blog author

Welcome again in a new part of the series in which the Fruits360 dataset will be classified in Keras running in Jupyter notebook using features extracted by transfer learning of MobileNet which is a pre-trained convolutional neural network (CNN).

In Part 2, a Jupyter notebook is created in which Python code is written for downloading the Fruits360 dataset. After being downloaded, the training and test data are read into NumPy arrays will be later used for feature extraction. The arrays were saved in the disk associated with the notebook which is prone to loss of the notebook is recycled or crashed. Thus, we have to rerun the code for generating such arrays later.

In this tutorial, which is Part 3 of the series, we are going to transfer the learning of MobileNet for working with the Fruits360 dataset. The sections covered in this tutorial are as follows:

  • Train Data Generator.
  • Validation Data Generator.
  • Loading MobileNet.
  • Removing the Last FC Layers from the Original Model.
  • Adding New FC Layers to the Modified Model.
  • Compiling the New Model.
  • MobileNet Transfer Learning over the Fruits360 Dataset.

Let’s get started.

Train Data Generator

In order to train a model from scratch or even transfer learning from a pre-trained model, the model is fed by the data for either training or transfer learning. The data is firstly loaded into the machine RAM and then fed to the model. Unfortunately, some data might be very large to be loaded into the memory at once and this is a reason for the memory overflow. To overcome this issue, the data is not fed to the model at once but in batches where each batch contains a pre-defined number of images.

Assume that 32 batches are to be used. What does that mean? This means that just 32 images will be fed to the model at a time. If for example, the data has 320 samples, then dividing the number of samples (320) by the batch size (32) returns 10. This means that the entire training data will be fed into the model in 10 steps. All of these steps represent a single epoch for training the model. So, a single epoch, in this case, has 10 steps.

For the Fruits360 dataset, there are 52,718 training samples. If 32 batches are used, then the total number of steps for feeding the entire training data to the model is equal to 52,718/32=1647.4375 steps which could be rounded up to be 1648 steps. Thus a single epoch has 1648 steps.

Keras supports a class named ImageDataGenerator for generating batches of tensor image data. It can also do real-time data augmentation. The next line creates an instance of the ImageDataGenerator class.

from tensorflow import keras

train_datagen = keras.preprocessing.image.ImageDataGenerator()

Just creating an instance of this class does not mean it is ready to generate images. We have to give this instance the data from which the batches will be generated. There are 2 main sources from which the data can be supplied which are:

  1. Directory.
  2. Pandas DataFrame.

If you decided to use the Pandas DataFrame option, then use the flow_from_dataframe() method which accepts the prepared DataFrame of the images data. We are going to use the directory option. You do not need to create the DataFrame from scratch because you can just load the previous NumPy arrays that were created in Part 2 and then convert them into Pandas DataFrame using the DataFrame() constructor.

For loading the data into the generator using the directory, then just use the flow_from_directory() method. This method accepts many arguments but only 2 of them must be specified in our experiment which are:

  1. directory: Directory from which the images will be loaded for creating the batches.
  2. target_size: Target size of the loaded image which is (256, 256) by default. This has to be changed to reflect the input size expected by MobileNet. In this series, we are going to use (128, 128) as the image size.

The directory refers to the path from which the images will read. This directory is expected to include a number of folders equal to the number of classes within the dataset. This is how the number of classes is deduced. Remember that the dataset was explored in Part 2 and its training and test data are organized in the format expected by this method. That is there is a training directory in which there are 102 folders, one for each class. The same happens for the test data.

Besides the previous 2 arguments that are mandatory to be supplied, the below 3 arguments are useful to know.

  • color_mode: Color mode of the loaded images which is set to ‘rgb’ by default. If you wish to work with the model in a different color space, then specify it.
  • class_mode: Class mode which is set to ‘categorical’ by default to reflect that multiple independent classes are used.
  • batch_size: Number of images in each batch which is 32 by default.

After building an instance of the ImageDataGenerator class and being able to specify the parameters to that generator inside the flow_from_directory() method, here is the complete code for building the data generator for the training data. The generator saved in the train_generator variable will be used later when transfer learning of MobileNet.

import os
import tensorflow as tf
from tensorflow import keras

zip_file = tf.keras.utils.get_file(origin="https://storage.googleapis.com/kaggle-datasets/5857/414958/fruits-360_dataset.zip?GoogleAccessId=web-data@kaggle-161607.iam.gserviceaccount.com&Expires=1566314394&Signature=rqm4aS%2FIxgWrNSJ84ccMfQGpjJzZ7gZ9WmsKok6quQMyKin14vJyHBMSjqXSHR%2B6isrN2POYfwXqAgibIkeAy%2FcB2OIrxTsBtCmKUQKuzqdGMnXiLUiSKw0XgKUfvYOTC%2F0gdWMv%2B2xMLMjZQ3CYwUHNnPoDRPm9GopyyA6kZO%2B0UpwB59uwhADNiDNdVgD3GPMVleo4hPdOBVHpaWl%2F%2B%2BPDkOmQdlcH6b%2F983JHaktssmnCu8f0LVeQjzZY96d24O4H85x8wdZtmkHZCoFiIgCCMU%2BKMMBAbTL66QiUUB%2FW%2FpULPlpzN9sBBUR2yydB3CUwqLmSjAcwz3wQ%2FpIhzg%3D%3D",
fname="fruits-360.zip", extract=True)
base_dir, _ = os.path.splitext(zip_file)

train_dir = os.path.join(base_dir, 'Training')

image_size = 128

train_datagen = keras.preprocessing.image.ImageDataGenerator()

train_generator = train_datagen.flow_from_directory(directory=train_dir, target_size=(image_size, image_size))

After preparing the training data generator, the next section discusses building the validation data generator.

Validation Data Generator

Building the validation data generator is similar to building the generator of the training data. The code used for this purpose is listed below. At first, the variable assigned to the directory argument of the flow_from_directory() method is now validation_dir rather than train_dir. The value of the validation_dir variable is created by joining the dataset directory saved in the base_dir variable to the word Test for loading the test data for model validation. The validation data generator is finally saved in the validation_generator variable. It will be used later when transfer learning of MobileNet

import os
import tensorflow as tf
from tensorflow import keras

zip_file = tf.keras.utils.get_file(origin="https://storage.googleapis.com/kaggle-datasets/5857/414958/fruits-360_dataset.zip?GoogleAccessId=web-data@kaggle-161607.iam.gserviceaccount.com&Expires=1566314394&Signature=rqm4aS%2FIxgWrNSJ84ccMfQGpjJzZ7gZ9WmsKok6quQMyKin14vJyHBMSjqXSHR%2B6isrN2POYfwXqAgibIkeAy%2FcB2OIrxTsBtCmKUQKuzqdGMnXiLUiSKw0XgKUfvYOTC%2F0gdWMv%2B2xMLMjZQ3CYwUHNnPoDRPm9GopyyA6kZO%2B0UpwB59uwhADNiDNdVgD3GPMVleo4hPdOBVHpaWl%2F%2B%2BPDkOmQdlcH6b%2F983JHaktssmnCu8f0LVeQjzZY96d24O4H85x8wdZtmkHZCoFiIgCCMU%2BKMMBAbTL66QiUUB%2FW%2FpULPlpzN9sBBUR2yydB3CUwqLmSjAcwz3wQ%2FpIhzg%3D%3D",
fname="fruits-360.zip", extract=True)
base_dir, _ = os.path.splitext(zip_file)

validation_dir = os.path.join(base_dir, 'Test')

image_size = 128

validation_datagen = keras.preprocessing.image.ImageDataGenerator()

validation_generator = validation_datagen.flow_from_directory(directory=train_dir, target_size=(image_size, image_size))

At this time, both the train and validation data generators are created. The next section loads MobileNet model.

Loading MobileNet

TensorFlow has a module tensorflow.keras.applications which holds a number of pre-trained deep learning models. When loading any of such models, not only their architectures are existing but also their trained weights. So, you are ready to either use them for making predictions or transfer learning. What is the difference between loading the model for making a prediction or transfer learning? This is an important question.

When a model is loaded for making predictions, then the entire model is loaded including the last fully connected (FC) layers. For transfer learning, these layers are not included. But why not loading these layers for transfer learning? The answer is simple.

If a model is trained by a given dataset, then the last FC layers will have a number of neurons equal to the number of classes within this dataset. For example, a dataset might have 1,000 classes and thus the last FC layers will have 1,000 neurons. For transfer learning, suppose that the new dataset has a number of classes equal to 102 as. Thus, we need the last FC layers to have 102 neurons rather than 1,000 neurons. As a result, the last FC layers in the original model is no longer needed and we have to remove them and replace them by FC layers working with just 102 classes.

According to the above discussion, there are 2 steps which are as follows:

  1. Removing the last FC layers from the original model.
  2. Adding new FC layers to the modified model.

Let’s discuss how to do each of these steps.

Removing the Last FC Layers from the Original Model

The tensorflow.keras.applications module has a function named MobileNet() for loading the MobileNet model as given below. It returns the model as it is without any changes.

import tensorflow as tf

base_model = tf.keras.applications.MobileNet()

After loading the original model, we can summarize it using the summary() method as given below.

import tensorflow as tf

base_model = tf.keras.applications.MobileNet()
base_model.summary()

The summary() method prints the complete model architecture. You can check the entire output of this function in the notebook but just some few entries are listed below. At the beginning of the output, the input image shape expected by the model is (224, 224, 3). At the end of the result, the model returns a vector of 1,000 values for each input image which indicates that it works for 1,000 classes.

_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_2 (InputLayer) [(None, 224, 224, 3)] 0
_________________________________________________________________
...
_________________________________________________________________
conv_dw_13_relu (ReLU) (None, 4, 4, 1024) 0
_________________________________________________________________
conv_pw_13 (Conv2D) (None, 4, 4, 1024) 1048576
_________________________________________________________________
conv_pw_13_bn (BatchNormaliz (None, 4, 4, 1024) 4096
_________________________________________________________________
conv_pw_13_relu (ReLU) (None, 4, 4, 1024) 0
_________________________________________________________________
conv_preds (Conv2D) (None, 1, 1, 1000) 1025000
_________________________________________________________________
reshape_2 (Reshape) (None, 1000) 0
_________________________________________________________________
act_softmax (Activation) (None, 1000) 0
=================================================================

Loading the model this way indicates that we are going to retrain it again. This is indicated in the below output of the summary() method which prints a summary of the network parameters. It indicates that there are 4,231,976 trainable parameters in the network. We do not need to retrain the model. Later, we will inform the model about that. Let’s just focus now on customizing the model to the Fruits360 dataset.

Total params: 4,253,864
Trainable params: 4,231,976
Non-trainable params: 21,888

Based on the output of the summary() method, what do you expect to change to make this model suitable for working the Fruits360 dataset?

The first change is to lower down the shape for the input images from (224, 224, 3) to (128, 128, 3). The second change is to make the model to return just a vector of length 102 for each input image to make it suitable for the Fruits360 dataset. In order to apply such 2 changes, there are 2 arguments accepted by the MobileNet() function to be used in our experiments which are listed below.

  1. input_shape
  2. include_top

The input_shape argument specifies the shape of the input images which is to be (128, 128, 3) for our experiment. Fortunately, at the same time of loading a pre-trained model, we can ask whether to leave or keep the last FC layers. The include_top argument serves this purpose. If set to True, then the last FC layers will be loaded. If False, then the last FC layers will not be loaded. The new function call is given below.

import tensorflow as tf

base_model = tf.keras.applications.MobileNet(input_shape=IMG_SHAPE, include_top=False)
base_model.summary()

The result of the summary() method is given below. The result indicates that the input image shape expected by the model is now (128, 128, 3) and that the top layers in the model that were working for 1,000 classes are removed. Note that the shape of the output from the MobileNet in this case is (4, 4, 1024).

_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_4 (InputLayer) [(None, 128, 128, 3)] 0
_________________________________________________________________
...
_________________________________________________________________
conv_dw_13_relu (ReLU) (None, 4, 4, 1024) 0
_________________________________________________________________
conv_pw_13 (Conv2D) (None, 4, 4, 1024) 1048576
_________________________________________________________________
conv_pw_13_bn (BatchNormaliz (None, 4, 4, 1024) 4096
_________________________________________________________________
conv_pw_13_relu (ReLU) (None, 4, 4, 1024) 0
=================================================================

After removing the top layers from the MobileNet architecture, the parameters summary printed by the summary() method is given below. The network now has 3,206,976 trainable parameters rather than 4,231,976 before removing the top layers. The network still thinks it will be retrained.

Total params: 3,228,864
Trainable params: 3,206,976
Non-trainable params: 21,888

To inform the network that it will not be retrained, the trainable parameter of the loaded model is set to False. This indicates that no layer will be trained. Q

import tensorflow as tfbase_model = tf.keras.applications.MobileNet(input_shape=IMG_SHAPE, include_top=False)base_model.trainable = Falsebase_model.summary()

By setting the trainable parameter to False, the network architecture will not be changed by the number of trainable parameters will change according to the output of the summary() method as given below. Now, there are 0 trainable parameters and all other parameters are non-trainable. As a result, the values of all of these pre-trained parameters will be used.

Total params: 3,228,864
Trainable params: 0
Non-trainable params: 3,228,864

At this time, the first change required in the MobileNet model is fulfilled. Let’s move to the second change which is to add top layers to the model to make it suitable for working with the 102 classes of the Fruits360 dataset.

Adding New FC Layers to the Modified Model

The layers in the MobileNet model that were working for 1,000 classes are now removed and we need to add new layers that can work with the 102 classes in the Fruits360 dataset. How to do that? This is by creating a sequential using the Sequential class in Keras as listed below.

The constructor of the class Sequential() accepts a list defining the sequence of layers. The constructor returns a model with such layers. Note that we do not need to create a new architecture but just make use of the architecture in the modified model stored in the base_model variable after appending new layers at the top of it to make it suitable for our dataset. To do this, just use the base_model variable at the beginning of the list followed by at least a single FC layer with the number of neurons equal to the number of classes in the dataset.

According to the code below, there are 2 layers added at the top of the modified MobileNet architecture which are:

  1. Average pooling layer.
  2. FC layer with 102 neurons using the Dense class. Note that an FC layer is also called a dense layer and this is why the class used for building the FC layer is called Dense.

There is a question. If just a single FC layer is sufficient for transfer learning, why adding an extra FC layer? The reason is that transfer learning adapts the previously trained parameters to the new dataset. The more adaptation the more accurate the model will be in making predictions for the new dataset. So, adapting 2 layers is sure better than adapting 1 layer. Adapting 3 layers is better than just adapting 1 or 2 layers and so on. Generally, the more layers to be adapted in transfer learning the more accurate predictions the model will make for the new dataset.

import tensorflow as tf
from tensorflow import keras
IMG_SHAPE = (128, 128, 3)base_model = tf.keras.applications.MobileNet(input_shape=IMG_SHAPE, include_top=False)
base_model.trainable = False
base_model = tf.keras.Sequential([
base_model,
keras.layers.GlobalAveragePooling2D(),
keras.layers.Dense(102, activation='sigmoid')
])
base_model.summary()

In this case, we added just 2 layers at the top of the architecture. The newly added 2 layers are trainable by default. You can add more layers but take care as this increases the number of trainable parameters and thus more time for transfer learning.

You can not only add new layers but also make some of the existing layers in MobileNet trainable. This is by indexing the layer of your choice according to the line below.

base_model.layers[n].trainable = True

In this series, just the 2 newly added layers will be the only trainable layers. The output of the summary() method is listed below. The first row indicates that the output of the modified MobileNet is (4, 4, 1024). This will be the input to the newly added average pooling layer. Finally, the pooling layer output will be the input to the FC layer which returns 102 neurons, one for each class.

_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
mobilenet_1.00_128 (Model) (None, 4, 4, 1024) 3228864
_________________________________________________________________
global_average_pooling2d_3 ( (None, 1024) 0
_________________________________________________________________
dense_1 (Dense) (None, 102) 104550
=================================================================
Total params: 3,333,414
Trainable params: 104,550
Non-trainable params: 3,228,864

According to the above output, there are just 104,550 trainable parameters which are the parameters of the FC layer. Note that the pooling layers have no trainable parameters. The number of non-trainable parameters is 3,228,864. The values of these parameters will be just used from the pre-tarined MobileNet.

At this time, we have successfully created a new model that makes use of the pre-trained MobileNet model for transferring its learning to the Fruits360 dataset. The remaining step before training the model is compiling it which is discussed in the next section.

Compiling the New Model

Before training the model, the following items must be specified. No model can be trained without specifying such parameters.

  1. Optimizer for training the model and learning rate.
  2. Loss function.
  3. Metrics.

These parameters are specified using the compile() method as given in the modified code below. This method accepts a number of parameters of which the 3 below are used in our experiment.

  1. optimizer: Accepts the optimizer to be used. While creating the optimizer, the learning rate can be specified.
  2. loss: Loss function. The binary crossentropy loss function is used.
  3. metrics: Metrics to be evaluated while training and testing the model. Because we are working on a classification project, then just the accuracy is used.
import tensorflow as tf
from tensorflow import keras
IMG_SHAPE = (128, 128, 3)base_model = tf.keras.applications.MobileNet(input_shape=IMG_SHAPE, include_top=False)
base_model.trainable = False
base_model = tf.keras.Sequential([
base_model,
keras.layers.GlobalAveragePooling2D(),
keras.layers.Dense(102, activation='sigmoid')
])
base_model.summary()base_model.compile(optimizer=tf.keras.optimizers.RMSprop(lr=0.0001),
loss='binary_crossentropy',
metrics=['accuracy'])

After compiling the model, we are ready to train it which is discussed in the next section. The next section discusses training the new model. Note that training the new model means just training the 104,550 parameters of the newly added FC layer.

MobileNet Transfer Learning over the Fruits360 Dataset

In order to train a model, the fit_generator() method is used. From the arguments accepted by this method, the following are used.

  • generator: Train generator.
  • steps_per_epoch: Always set to ceil(number of training samples / batch size).
  • epochs: Number of training epochs.
  • validation_data: Validation generator.
  • validation_steps: Could be calculated as ceil(number of validation samples / batch size)

The modified code that calls the fit_generator() method by the proper arguments are listed below. Note that the batch_size argument in the flow_from_directory() method is explicitly set to 32 although its default is also 32. The reason is that the batch size is used to calculate the values assigned to the steps_per_epoch and validation_steps arguments in the fit_generator() method. So, I prefer setting it explicitly in case you need to change the default value later.

Note that the steps_per_epoch is calculated as the ceil of dividing the number of training samples over the number of batches. In order to return the number of training samples, the parameter n of train_generator is used. Because there are 52,718 samples for training, the number of steps per epoch is 1648. If there are 25 epochs, then the total number of steps for training the model is 1648*5=8,240.

The value of the validation_steps argument is calculated by dividing the number of validation samples by the batch size. The number of validation samples is returned using the parameter n of validation_generator. Because there are 17,692 samples for testing, then the number of validation steps is 553.

import os
import tensorflow as tf
from tensorflow import keras
import numpy

zip_file = tf.keras.utils.get_file(origin="https://storage.googleapis.com/kaggle-datasets/5857/414958/fruits-360_dataset.zip?GoogleAccessId=web-data@kaggle-161607.iam.gserviceaccount.com&Expires=1566314394&Signature=rqm4aS%2FIxgWrNSJ84ccMfQGpjJzZ7gZ9WmsKok6quQMyKin14vJyHBMSjqXSHR%2B6isrN2POYfwXqAgibIkeAy%2FcB2OIrxTsBtCmKUQKuzqdGMnXiLUiSKw0XgKUfvYOTC%2F0gdWMv%2B2xMLMjZQ3CYwUHNnPoDRPm9GopyyA6kZO%2B0UpwB59uwhADNiDNdVgD3GPMVleo4hPdOBVHpaWl%2F%2B%2BPDkOmQdlcH6b%2F983JHaktssmnCu8f0LVeQjzZY96d24O4H85x8wdZtmkHZCoFiIgCCMU%2BKMMBAbTL66QiUUB%2FW%2FpULPlpzN9sBBUR2yydB3CUwqLmSjAcwz3wQ%2FpIhzg%3D%3D",
fname="fruits-360.zip", extract=True)
base_dir, _ = os.path.splitext(zip_file)

train_dir = os.path.join(base_dir, 'Training')
validation_dir = os.path.join(base_dir, 'Test')

image_size = 128
batch_size = 32

train_datagen = keras.preprocessing.image.ImageDataGenerator()

train_generator = train_datagen.flow_from_directory(directory=train_dir, target_size=(image_size, image_size), batch_size=batch_size)

validation_datagen = keras.preprocessing.image.ImageDataGenerator()

validation_generator = validation_datagen.flow_from_directory(directory=validation_dir, target_size=(image_size, image_size), batch_size=batch_size)

IMG_SHAPE = (image_size, image_size, 3)

base_model = tf.keras.applications.MobileNet(input_shape=IMG_SHAPE, include_top=False)
base_model.trainable = False

base_model = tf.keras.Sequential([
base_model,
keras.layers.GlobalAveragePooling2D(),
keras.layers.Dense(102, activation='sigmoid')
])

base_model.summary()

base_model.compile(optimizer=tf.keras.optimizers.RMSprop(lr=0.0001),
loss='binary_crossentropy',
metrics=['accuracy'])

epochs = 25
steps_per_epoch = numpy.ceil(train_generator.n / batch_size)
validation_steps = numpy.ceil(validation_generator.n / batch_size)

history = base_model.fit_generator(generator=train_generator,
steps_per_epoch = steps_per_epoch,
epochs=epochs,
validation_data=validation_generator,
validation_steps=validation_steps)

base_model.save('MobileNet_TransferLearning_Fruits360v48.h5')

Note that it might be time-consuming to complete all 25 epochs and thus you can reduce it to whatever you want. It does not matter how many epochs are used as long as you reach a good accuracy at the end.

After training completes, the model is saved with the name MobileNet_TransferLearning_Fruits360v48 which has the extension h5 which is a format for saving structured data.

Conclusion

This tutorial discussed downloading the MobileNet model for transfer learning for the Fruits360 dataset. The generated model is saved for later use.

In the next tutorial [Part 4], the new model will be used for extracting features from the Fruits360 dataset. This is by feeding the NumPy arrays produced in Part 2 to the model saved in this tutorial.