7. Introduction to Deep Learning with Computer Vision — MNIST Handwritten digit recognition…

Source: Deep Learning on Medium

7. Introduction to Deep Learning with Computer Vision — MNIST Handwritten digit recognition hands-on

Written by Nilesh Singh & Praveen Kumar.

Prerequisites: Filters & Kernels, Channels & Features

In this blog, we will learn to code and create our very first neural network. The primary task of our network will demonstrate the fact that it works and as a bonus, it will also identify handwritten digits autonomously.

So let’s cut the chatter and talk code.

The first thing to bear in mind is that we will be using Python 3.x on a google Colab platform. We will be using Keras as our primary library for writing the network. It is a simple, elegant, yet extremely powerful API which runs on top of TensorFlow or Theano.

The first logical thing to do would be to fire up colab if not already done.

You should get a Welcome to the Colaboratory page. On the top left corner, look for the ‘file’ option and go to file->New Python 3 Notebook.

After you have fired up the notebook, let us recall some of the most commonly used shortcuts so that you do not have to follow the standard procedure of running and manipulating each cell using traditional procedures.

To run each cell individually: SHIFT + [ENTER]

To delete a cell: CTRL + M + D

To insert a cell below: CTRL + M + B

To insert a cell above: CTRL + M + A

But but but, the good news is, if you find them complicated to remember, you can customize each shortcut inside the preference setting or use CTRL + M + H to open up all the shortcuts.

Let’s build a neural network with 7 hidden layers

In the first block, write this import statement and press shift+enter(That’s how you run a code block in colab)

import keras

We will now import a few more modules. Don’t worry about understanding the use of everything at one go, we will ease you into it as and when we use it.

import numpy as np

from keras.models import Sequential
from keras.layers import Flatten
from keras.layers import Convolution2D
from keras.utils import np_utils

from keras.datasets import mnist

Here the line worth noting is the last import statement. In the last statement, we are importing the whole MNIST dataset. It contains all the images and their labels which we will be using to train our handwritten digit, recognition model. MNIST consists of 60k training and 10k testing images. All images are black and white and are 28X28 in size.

Now that we have all the required libraries and datasets in place. We will be splitting the dataset into training and testing parts. We will build a model that will be trained on the training set. It’s like teaching a baby and asking it to learn how does an apple looks like, the same goes for our model, which will learn by seeing multiple instances of handwritten digits and able to classify them from 0 to 9.

Even though we are confident our model has accurately learned all the digits, we need to gauge its learning by showing some unseen images. If the model accurately classifies these new unseen images, we can confidently say, we nailed it!!!

Since we are using inbuild dataset from Keras, it can be loaded, split into training and test sets with just one line of code.

##splitting MNIST dataset into training and testing sets, images are loaded into memory as well
(X_train, y_train), (X_test, y_test) = mnist.load_data()

Here X_train and y_train contains the training and testing images respectively while X_test and y_test contain their labels and follow the same nomenclature. Also, all the variables created in this step are numpy arrays.

In the next step, we add a depth dimension of 1, which signifies that our data has only one channel. This is done using the reshape function of numpy arrays.

##Basically adding a depth dimension to the data, so our MNIST data are single channeled
X_train = X_train.reshape(X_train.shape[0], 28, 28,1)
X_test = X_test.reshape(X_test.shape[0], 28, 28,1)

Next, we normalize our data from a scale of (0–255) to (0–1). why? Because normalizing allows us to scale down all the images into a single ratio of the same proportion. It helps in training faster, and better accuracy prediction (sometimes counter-intuitive). Let’s just try to understand more about it in upcoming sessions. So wee simply divide our images by 255 so that all of them scale between 0–1.

X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
X_train /= 255
X_test /= 255

Now, once we have normalized our images, we go on to convert our linear image labels to a categorical one. These labels are the ground truth to our model. Let’s just do this and not get into why and what. We will explain as we progress deeper into the sessions.

# Convert 1-dimensional class arrays to 10-dimensional class matrices(making the data categorical)
Y_train = np_utils.to_categorical(y_train, 10)
Y_test = np_utils.to_categorical(y_test, 10)

In the next code cell, we will define and create a very simple neural network. This may seem complex, and weird, but believe us, this is the simplest of architectures you would ever write. Let’s write the code and then try to understand it briefly.

NOTE: DO NOT WORRY IF YOU DO NOT UNDERSTAND. You will as we move on. Your promise, our word !!

from keras.layers import Activation, MaxPooling2D

model = Sequential()
model.add(Convolution2D(32, 3, 3, activation='relu', input_shape=(28,28,1))) #receptive field=3,input channel=1
model.add(Convolution2D(64, 3, 3, activation='relu')) #receptive field=5,input channel=32
model.add(Convolution2D(128, 3, 3, activation='relu')) #receptive field=7,input channel=64

model.add(MaxPooling2D(pool_size=(2, 2))) #receptive field=14

model.add(Convolution2D(256, 3, 3, activation='relu')) #receptive field=16,input channel=128
model.add(Convolution2D(512, 3, 3, activation='relu')) #receptive field=18,input channel=256
model.add(Convolution2D(1024, 3, 3, activation='relu')) #receptive field=20,input channel=512
model.add(Convolution2D(2048, 3, 3, activation='relu')) #receptive field=22,input channel=1024
model.add(Convolution2D(10, 3, 3)) #receptive field=24,input channel=2048

model.add(Flatten())
model.add(Activation('softmax'))

model.summary()

In the 2nd line, we are declaring a neural network and making it Sequential. Sequential simply means that the layers are connected to one another in a 1:1 ratio, which means, each layer takes input from the previous layer and its output goes to the next immediate layer. There are no connections that may lead to skipping of layers. It follows a cascaded layer design.

Then in the next line, we create a convolutional layer by using the Convolution2D() function and add it to the model using model.add().

In Convolution2D(32, 3, 3, activation=’relu’, input_shape=(28,28,1)):

  • 32 represents that we need 32 kernels
  • 3,3 means that the 32 kernels should be of size 3X3, so we are performing a 3X3 convolution here
  • Activations are basically non-linear functions that tell the network that how data is passed between layers. Without its presence, all neural networks will be a simple linear regression model. The activation function ReLu used here is quite simple, it sends all the positive values as is over the layer and discards all negative values.

We repeat these layers, linking one to the next.

Towards the end, we add a Flatten() layer which basically converts the output of the final layer into a 1D array of 10 numbers (because there are 10 categories in our data [numbers from 0 to 9]).

When the last statement is executed, we get a nice and structured summary of the model with details of each layer.

We just created an architecture of the model. Now lets fuel it up by compiling it and run it by training on the input images.

##we are compiling our model using adam as optimizer
model.compile(loss='categorical_crossentropy',
optimizer='adam',
metrics=['accuracy'])
##Training the model in this cell
model.fit(X_train, Y_train, batch_size=32, nb_epoch=10, verbose=1)

We have some loss matrics which will be used by our model in backpropagation so that every time our model learns (predicts) the wrong label (ground truth), we punish it by giving it a penalty. We have accuracy as metrics which is nothing but our performance parameter based on which we assess our model. So in this case, after every epoch, we are monitoring and judging the performance of the model using accuracy matrics. Let’s ignore optimizers for the time being since we will cover them deeply in upcoming sessions.

We fit or train the model. Wow !!!!!!!!!!!!! Finally, we are doing some training ahann?? Yea !! After 10 epochs, our model has successfully learned all the numbers from 0 to 9.

Let’s move on…take a test of our model. Fun time begins….

Please note that we have deliberately left explanations for some functions and statements as we plan to cover all of them in greater detail in the near future.

So let’s test it.

##Evaluating the trained model on test data
score = model.evaluate(X_test, Y_test)

##predicting the labels of test dataset
y_pred = model.predict(X_test)

This should print the score of the model. It should be somewhat around 99.2%, not all bad for the first network.

Congratulations!!!

You made it!!!!!

You are a tough cookie and you nailed it. You deserve a pat on the back.

Get some rest!!!

Hope you enjoyed it.

Interesting comment: Do check out your model’s summary and try different kernel values and let us know how many parameters you are training today.

NOTE: We are starting a new telegram group to tackle all the questions and any sort of queries. You can openly discuss concepts with other participants and get more insights and this will be more helpful as we move further down the publication. [Follow this LINK to join]