Implementing a fully convolutional network (FCN) in TensorFlow 2

Source: Deep Learning on Medium

Implementing a fully convolutional network (FCN) in TensorFlow 2

A tutorial on building, training and deploying a small and nimble FCN model for image classification in TensorFlow using Keras

Photo by David Travis on Unsplash

Convolutional neural networks (CNN) work great for computer vision tasks. Using a pre-trained model that is trained on huge datasets like ImageNet, COCO, etc. we can quickly specialize these architectures to work for our unique dataset. This process is termed as transfer learning. However, there’s a catch! Pre-trained models for image classification and object detection tasks are usually trained on fixed input image sizes. These typically range from 224x224x3 to somewhere around 512x512x3 and mostly have an aspect ratio of 1 i.e. the width and height of the image are equal. If they are not equal then the images are resized to be of equal height and width.

Newer architectures do have the ability to handle variable input image sizes but it’s more common in object detection and segmentation tasks as compared to image classification tasks. Recently, I came across an interesting use case wherein I had 5 different classes of image and each of the classes had minuscule differences. Also, the aspect ratio of the images was higher than usual. The average height of the image was around 30 pixels and the width was around 300 pixels. This was an interesting one for the following reasons:

  1. Resizing the images easily distorted the important features
  2. Pre-trained architectures were gargantuan and always overfitted the dataset
  3. The task demanded low latency

The need for a CNN with variable input dimensions

I tried base models of MobileNet and EfficientNet but nothing worked. There was a need for a network which didn’t have any restrictions on input image size and could perform image classification task at hand. The first thing that struck me was fully convolutional networks (FCNs). FCN is a network that does not contain any “Dense” layers (as in traditional CNNs) instead it contains 1×1 convolutions that perform the task of fully connected layers (Dense layers). Though the absence of dense layers makes it possible to feed in variable inputs, there are a couple of techniques that enable us to use dense layers while cherishing variable input dimensions. This tutorial delineates some of those techniques. In this tutorial, we will go through the following steps:

  1. Building a fully convolutional network (FCN) in TensorFlow using Keras
  2. Downloading and splitting a sample dataset
  3. Creating a generator in Keras to load and process a batch of data in memory
  4. Training the network with variable batch dimensions
  5. Deploying the model using TensorFlow Serving

Get the scriptures

As always in my tutorials, here’s the link to the project uploaded on GitHub. Please clone the repo and follow the tutorial step by step for better understanding. Note: The code snippets in this article highlight only a part of the actual script, please refer to the GitHub repo for complete code.

1. Designing the engine (model.py)

We build our FCN model by stacking convolution blocks consisting of 2D convolution layers (Conv2D) and the required regularization (Dropout and BatchNormalization). Regularization prevents overfitting and helps in quick convergence. We also add an activation layer to incorporate non-linearity. In Keras, the input batch dimension is added automatically and we don’t need to specify it in the input layer. Since the height and width of our input images are variable, we specify input shape as (None, None, 3). The 3 is for the number of channels in our image which is fixed for colored images (RGB).

Minimum image dimension requirement

After applying a convolution block on the input, the height and width of the input will decrease based on the values of kernel_size and strides. If the input image size is too small then we might fall short of the minimum required height and width (which should be greater than or equal to the kernel size) for the next convolution block. A trial and error way to determine the minimum input dimension is as follows:

  1. Decide the number of convolution blocks to stack
  2. Choose any input shape to say (32, 32, 3) and stack the convolution blocks with an increasing number of channels
  3. Try building the model and print model.summary() to view the output shape of each layer.
  4. Ensure that you get (1, 1, num_of_filters) as the output dimension from the last convolution block (this will be input to fully connected layer).
  5. Try decreasing/increasing the input shape, kernel size or strides to satisfy the condition in step 4. The input shape, along with other configurations, which satisfies the condition is the minimum input dimension required by your network.

There’s also a mathematical way to calculate the spatial size of the output volume as a function of the input volume which is illustrated here. After finding the minimum input dimension, we now need to pass the output of the last convolution block to the fully connected layers. However, any input that has dimension greater than the minimum input dimension needs to be pooled down to satisfy the condition in step 4. We understand how to do that using our main ingredient.

The main ingredient

The fully connected layers (FC layers) are the ones that will perform the classification tasks for us. There are two ways in which we can build FC layers:

  1. Dense layers
  2. 1×1 convolutions

If we want to use dense layers then the model input dimensions have to be fixed because the number of parameters, which goes as input to the dense layer, has to be predefined to create a dense layer. Specifically, we want the height and width in (height, width, num_of_filters) from the output of the last convolution block to be constant or 1. The number of filters is always going to be fixed as those values are defined by us in every convolution block.

The input dimension to the 1×1 convolution could be (1, 1, num_of_filters) or (height, width, num_of_filters) as they mimic the functionality of FC layer along num_of_filters dimension. However, the input to the last layer (Softmax activation layer), after the 1×1 convolutions, must be of fixed length (number of classes).

The main ingredient: GlobalMaxPooling2D() / GlobalAveragePooling2D(). These layers in Keras convert an input of dimension (height, width, num_of_filters) to (1, 1, num_of_filters) essentially taking max or average of the values along height and width dimensions for every filter along num_of_filters dimension.

Dense layers vs. 1×1 convolutions

The code includes dense layers (commented out) and 1×1 convolutions. After building and training the model with both the configurations here are some of my observations:

  1. Both models contain equal number of trainable parameters.
  2. Similar training and inference time.
  3. Dense layers generalize better than 1×1 convolutions.

The third point cannot be generalized because it depends on factors such as number of images in the dataset, data augmentation used, model initialization, etc. However, these were the observations in my experiments. You can run the script independently, to test that the model is being built successfully, by firing the command $python model.py.

2. Downloading the fuel (data.py)

The flowers dataset being used in this tutorial is primarily intended to understand the challenges that we face while training a model with variable input dimensions. Some interesting datasets to test our FCN model might come from medical imaging domain, which contains microscopic features that are crucial in classifying images, and other datasets containing geometric patterns/shapes that may get distorted after resizing the image.

The script provided (data.py) needs to be run independently ($python data.py). It’ll perform the following tasks:

  1. Downloads flower dataset which contains 5 classes (‘daisy’, ‘dandelion’, ‘rose’, ‘sunflower’, ‘tulip’). More details about the dataset here.
  2. Splits the dataset into training and validation sets. You can set the number of images to be copied into training and validation sets.
  3. Gives statistics about the dataset like minimum, average and maximum height and width of the images.

This script downloads the .tar file and extracts its contents in the current directory using using keras.utils.get_file(). If you want to use TensorFlow Datasets (TFDS) you can check out this tutorial which illustrates the usage of TFDS along with data augmentation.

3. The special carburetor (generator.py)

We want to train our model on varying input dimensions. Every image in a given batch and across batches has different dimensions. So what’s the problem? Let’s take a step back and revisit how we train a traditional image classifier. In traditional image classifiers, the images are resized to a given dimension, packed into batches by converting into numpy array or tensors and this batch of data is forward propagated through the model. The metrics (loss, accuracy, etc.) are evaluated across this batch. The gradients to be backpropagated are calculated based on these metrics.

We cannot resize our images (since we’ll lose our microscopic features). Now, since we cannot resize our images, converting them into batches of numpy array becomes impossible. That’s because if you have a list of 10 images of dimension (height, width, 3) with different values for height and width and you try to pass it to np.array(), the resulting array would have a shape of (10,) and not (10, height, width, 3)! However, our model expects the input dimensions to be of the latter shape. A workaround for this is to write a custom training loop that performs the following:

  1. We pass each image, in the list (batch), through the model by converting (height, width, 3) to (1, height, width, 3) using np.expand_dims(img, axis=0).
  2. Accumulate the metrics for each image in the python list (batch).
  3. Calculate the loss and the gradients using the accumulated metrics. Apply the gradient update to the model.
  4. Reset the values for the metrics and create a new list (batch) of images.

I tried out the above-mentioned steps and my suggestion is not to go with the above strategy. It’s arduous, results in complex and unsustainable code and runs very slow! Everyone loves the elegant and kerassical model.fit() and model.fit_generator(). The latter is what we’ll use here! But first, the carburetor.

A carburetor is a device that mixes air and fuel for internal combustion engines in the proper air-fuel ratio for combustion. And that’s what we need, air! We find the max height and width of images in a batch and pad every other image with zeros so that every image in the batch has an equal dimension. Now we can easily convert it to a numpy array or a tensor and pass it to the fit_generator(). The model automatically learns to ignore the zeros (basically black pixels) and learns features from the intended portion from the padded image. This way we have a batch with equal image dimensions but every batch has a different shape (due to difference in max height and width of images across batches). You can run generator.py file independently using $python generator.py and cross-check the output.

Creating generators in Keras is dead simple and there’s a great tutorial to get started with it here. One great addition to generator.py would be to include support for data augmentation, you can get some inspiration for it here.

4. Ignition to cognition (train.py)

The training script imports and instantiates the following classes:

  1. Generator: We need to specify the path to train and val directories created by data.py.
  2. FCN_model: We need to specify the number of classes required in the final output layer.

The above objects are passed to the train() function which compiles the model with Adam optimizer and categorical cross-entropy loss function. We create a checkpoint callback which saves the best model during training. The best model is determined based on the value of loss calculated on the validation set at the end of each epoch. As we can see fit_generator() function simplifies the code to a great extent and is pleasing to the eyes.

I would suggest performing training on Google Colab unless you have a GPU in your local machine. The GitHub repo includes a Colab notebook which puts all the pieces together required for training. You can modify the python scripts in Colab itself and train different model configurations on the dataset of your choice. Once you’ve completed the training you can download the best snapshot to your local machine from the “Files” tab in Colab.

5. Deploying model using TensorFlow Serving (inference.py)

After you’ve downloaded the model, you need to export it to SavedModel format using export_savedmodel.py. Specify the path to the downloaded model (.h5 file) in the main function and execute the script using the command $python export_savedmodel.py. This script uses the new features in TensorFlow 2.0 which loads a Keras model from .h5 file and saves it to TensorFlow SavedModel format. SavedModel will be exported to export_path specified in the script. This SavedModel is required by TensorFlow serving docker image.

To start TensorFlow Serving server, go to the directory where the SavedModel is exported (./flower_classifier in this case) and run the following command (Note: You must have Docker installed on your machine):

$ docker run --rm -t -p 8501:8501 -v "$(pwd):/models/flower_classifier" -e MODEL_NAME=flower_classifier --name flower_classifier tensorflow/serving

The above command performs the following steps:

  1. Pulls the tensorflow/serving docker image if it is not present locally.
  2. The “-p” flag maps port 8501 on the local machine to port 8501 in the docker container.
  3. The “-v” flag mounts your current directory (specified by $(pwd)) to /models/flower_classifier in the docker container.
  4. The “-e” flag sets the environment variable in docker container which is used by the TensorFlow Serving server to create REST endpoint.
  5. The “ — rm” flag removes any anonymous volumes associated with the container when the container is removed.
  6. The “-t” shows the container logs in your current terminal. You can press CTRL+C to go back to your terminal and the container will continue to run in the background.

You can verify that your container is running in the background using $ docker ps command. You can also see the container logs using $ docker logs your_container_id. The inference.py script contains the code to construct batches of uniform image dimensions and send those batches as a POST request to TensorFlow Serving server. The output received from the server is decoded and printed in the terminal.

The dream conveyance

In this tutorial, we understood the following:

  1. Building a vanilla fully convolutional network for image classification with variable input dimensions.
  2. Training FCN models with equal image shapes in a batch and different batch shapes.
  3. Deploying trained models using TensorFlow Serving docker image.

Note that, this tutorial throws light on only a single component in a machine learning workflow. ML pipelines consist of enormous training, inference and monitoring cycles that are specific to organizations and their use-cases. Building these pipelines requires a deeper understanding of the driver, its passengers and the route of the vehicle. Only then it’s possible to deliver the dream conveyance!