CNN Image Classification using CIFAR-10 dataset on Google Colab TPU

Source: Deep Learning on Medium


Go to the profile of Santanu Khan

Why on Google Colab TPU?
Train any deep learning model which consists of a large amount of data is very time consuming. Some times it shows out of memory on your system during the training process. To solve this issue GPUs are introduced. Mainly the NVIDIA GTX series GPUs have CUDA cores which can process large data quickly compared to CPUs. But the problem is budget laptops doesn’t support GTX series graphics cards which basically we need to run on GPU. Here google comes. They have introduced a platform named Colaboratory i.e. Colab. It is a free Jupyter notebook environment that requires no setup and runs entirely in the cloud. It has 3 platforms CPU, GPU, and TPU. You can change the runtime at any time.

Google Colab already provides free GPU access (1 K80 core) to everyone, and TPU is 10x more expensive. (Google Cloud currently charges $4.50 USD per TPU per hour, and $0.45 USD per K80 core per hour.) To train fast use TPU rather than GPU.

TPU stands for Tensor Processing Unit. It consists of four independent chips. Each chip consists of two calculation cores, called Tensor Cores, which include scalar, vector and matrix units (MXUs).

In 2015, Google established its first TPU center to power products like Google Calls, Translation, Photos, and Gmail. To make this technology accessible to all data scientists and developers, they soon after released the Cloud TPU, meant to provide an easy-to-use, scalable, and powerful cloud-based processing unit to run cutting-edge models on the cloud.

Artificial neural networks based on the AI applications used to train the TPUs are 15 and 30 times faster than CPUs and GPUs!”

TPUs are designed specifically for machine learning, and optimized for various basic operations commonly used in neural network training and inference. GPUs are more general purpose and flexible massively-parallel processors that can be programmed to do neural network operations, but also tasks with very different compute/memory access patterns like computer graphics.

Image Classification: 
Image classification is the first task is to understand in computer vision. A model which can classify the images by its features. To extract features we use CNN(Convolution Neural Network). Here we used the CIFAR-10 dataset. The CIFAR-10 dataset consists of 60000 32×32 colour images in 10 classes, with 6000 images per class.

Here are the classes in the dataset, as well as 10 random images from each:

Processes: 
I. Import libraries and download dataset.

III. Normalized the data and make it RGB to GRAYSCALE image for fast training.

II. Split the dataset into train and valid.

StratifiedKFold just shuffles and splits once, therefore the test sets do not overlap, while StratifiedShuffleSplit shuffles each time before splitting, and it splits n_splits times, the test sets can overlap.

III. Define the model

  • Conv2D — Performs convolution.
  • filters: Number of output channels.
  • kernel_size: An integer or tuple/list of 2 integers, specifying the width and height of the 2D convolution window. In general kernel_size should be 2*2 or 3*3 to extract better local feature. You can increase it on next layer until it overfills.
  • padding: padding=”same” adds zero padding to the input, so that the output has the same width and height, padding=’valid’ performs convolution only in locations where kernel and the input fully overlap.
  • If you train the network with a large batch-size (say 10 or more), use BatchNormalization layer. BatchNormalization normalize the activation of previous layer at each batch. i.e. applies a transformation that maintains the mean activation close to 0 and activation std close to 0.
  • activation: ELU units actually seem to learn faster than other units and they are able to learn models which are at least as good as ReLU-like networks.
  • input_shape: Shape of input. i.e. (32,32,1)
  • MaxPooling2D — Performs 2D max pooling.
  • Flatten — Flattens the input, does not affect the batch size.
  • Dense — Fully-connected layer.
  • Dropout — Applies dropout to inactive some neurons that causes over fitting.

IV. Set up the Colab TPU settings.

  • If your targets are one-hot encoded, use categorical_crossentropy.
  • Examples of one-hot encodings:
  • [1,0,0]
  • [0,1,0]
  • [0,0,1]
  • But if your targets are integers, use sparse_categorical_crossentropy.
  • Examples of integer encodings (for the sake of completion):
  • 1
  • 2
  • 3

V. Fit data into model.

fit() takes numpy arrays data into memory, while fit_generator() takes data from the sequence generator such as keras.utils.Sequence which works slower.

To avoid overfitting I used data augmentation which is a regularization method. It is a technique to generate new data from previous data or existing data to reduce training and validation loss and improve accuracy.

Early stopping is another method of overfitting which immediately terminate the training process if there is no improvement occurs in training.

VI. Prediction

For simplicity I take 16 test images for prediction.

Perfectly predicted!

Final Words

The main vital part was to train the network because to optimize a neural network is very hard and time consuming. You can further improve the network by tuning hyper parameters. Code is given into my GitHub repository.

For any suggestion contact me on my LinkedIn. Have a nice day!