For a beginner how to start with CNN using MNIST Dataset

Original article can be found here (source): Deep Learning on Medium

For a beginner how to start with CNN using MNIST Dataset

Introduction:

In deep learning, a convolutional neural network (CNN, or ConvNet) is a class of deep neural networks, most commonly applied to analyzing visual imagery. They are also known as shift invariant or space invariant artificial neural networks (SIANN), based on their shared-weights architecture and translation invariance characteristics.

They have applications in image and video recognition, recommender systems, image classification, medical image analysis, natural language processing, and financial time series.

CNN stages

Convolutional networks were inspired by biological processes in that the connectivity pattern between neurons resembles the organization of the animal visual cortex.

Individual cortical neurons respond to stimuli only in a restricted region of the visual field known as the receptive field. The receptive fields of different neurons partially overlap such that they cover the entire visual field.

Architecture

A convolutional neural network consists of an input and an output layer, as well as multiple hidden layers. The hidden layers of a CNN typically consist of a series of convolutional layers that convolve with multiplication or other dot product.

The activation function is commonly a ReLU layer and is subsequently followed by additional convolutions such as pooling layers, fully connected layers and normalization layers, referred to as hidden layers because their inputs and outputs are masked by the activation function and final convolution.

Convolution:

Convolution is a generalized way of multiplication it is similar to dot product performed on matrices.

In convolution, the image is filtered with a small kernel or filter which reduce the size of the picture without losing the relationship between pixels. Let’s say the convolution image is (3*3) is multiplied with kernel size=(2*2).

How the process convolution process takes place

Then the convolution of (3*3) image matrix multiplies with (2*2) filter matrix which is called Feature Map.

Adding hyperparameters:

-Pooling

Pooling layers subsample their input. Pooling size is to reduce the amount of the parameters by selecting the maximum, average, or sum values inside these pixels. The most common way to do pooling it to apply a max operation to the result of each filter. Pooling reduces the output dimensionality.

Pooling gives invariance to translation, rotation, scaling.

Translation invariance: Wherever the face is should be able to locate it.

Rotational invariance: Whether the face is straight or titled should be able to locate it.

Scale invariance: Whether the face is small or big should be able to locate it.

How max pooling and average pooling takes place

-Padding

Zero Padding: The nice feature of zero padding is that it will allow us to control the spatial size of the output volumes ( we will use it to exactly preserve the spatial size of the input volume so the input and output width and height are the same).

Valid Padding: Drop the part of the image where the filter did not fit. This is called valid padding which keeps only valid part of the image.

How padding and strides take place

-Strides

Stride is the number of pixels shifts over the input matrix. When the stride is 1 then we move the filters to 1 pixel at a time. When the stride is 2 then we move the filters to 2 pixels at a time and so on. The below figure shows convolution would work with a stride of 2.

Padding and strides are used to leverage the strides of multiplication.

Activation unit:

tanh, sigmoid, ReLU is used for activation as Relu has better performance it is used mostly

When activation functions work

Implementation of CNN on MNIST dataset

Now let’s consider MNIST dataset. Each image is a 28 by 28-pixel squares with greyscale. There are 10 digits (0 to 9) or 10 classes to predict.

Import the required libraries:

Credits: https://github.com/keras-team/keras/blob/master/examples/mnist_cnn.py

After importing the libraries load MNIST data from Keras. The dataset contains of 60000 training samples and 10000 test samples of each 28 by 28 pixels.

For data preparation reshaped the images and normalized the pixels data. In whatever form may the input be said images, text, audio, numerical final for processing the data it should be converted into binary format. y_train, y_test are converted into binary vectors.

x_train shape: (60000, 28, 28, 1)
60000 train samples
10000 test samples

Plotting the graph to check where the model is either overfitting or underfitting.

Building the model:

For Here I had considered one input layer, two hidden layers with kernel size=(5*5), for first hidden layer max poolsize=(2*2), for second hidden layer max poolsize=(3*3) where padding is same i.e zero padding, strides s=1. Used ReLU as activation unit. After the process is completed the model is flattened.

We can change all these values according to the required model.

Model: "sequential_6"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d_11 (Conv2D) (None, 22, 22, 32) 1600
_________________________________________________________________
conv2d_12 (Conv2D) (None, 18, 18, 64) 51264
_________________________________________________________________
max_pooling2d_6 (MaxPooling2 (None, 9, 9, 64) 0
_________________________________________________________________
dropout_11 (Dropout) (None, 9, 9, 64) 0
_________________________________________________________________
conv2d_13 (Conv2D) (None, 7, 7, 128) 73856
_________________________________________________________________
max_pooling2d_7 (MaxPooling2 (None, 3, 3, 128) 0
_________________________________________________________________
dropout_12 (Dropout) (None, 3, 3, 128) 0
_________________________________________________________________
flatten_6 (Flatten) (None, 1152) 0
_________________________________________________________________
dense_11 (Dense) (None, 512) 590336
_________________________________________________________________
dropout_13 (Dropout) (None, 512) 0
_________________________________________________________________
dense_12 (Dense) (None, 10) 5130
=================================================================
Total params: 722,186
Trainable params: 722,186
Non-trainable params: 0
_________________________________________________________________
Train on 60000 samples, validate on 10000 samples
Epoch 1/15
60000/60000 [==============================] - 225s 4ms/step - loss: 0.2804 - accuracy: 0.9097 - val_loss: 0.0493 - val_accuracy: 0.9831
Epoch 2/15
60000/60000 [==============================] - 225s 4ms/step - loss: 0.0775 - accuracy: 0.9762 - val_loss: 0.0334 - val_accuracy: 0.9890
Epoch 3/15
60000/60000 [==============================] - 225s 4ms/step - loss: 0.0542 - accuracy: 0.9834 - val_loss: 0.0306 - val_accuracy: 0.9898
Epoch 4/15
60000/60000 [==============================] - 226s 4ms/step - loss: 0.0456 - accuracy: 0.9862 - val_loss: 0.0238 - val_accuracy: 0.9922
Epoch 5/15
60000/60000 [==============================] - 227s 4ms/step - loss: 0.0384 - accuracy: 0.9882 - val_loss: 0.0206 - val_accuracy: 0.9932
Epoch 6/15
60000/60000 [==============================] - 230s 4ms/step - loss: 0.0331 - accuracy: 0.9899 - val_loss: 0.0193 - val_accuracy: 0.9941
Epoch 7/15
60000/60000 [==============================] - 230s 4ms/step - loss: 0.0308 - accuracy: 0.9904 - val_loss: 0.0190 - val_accuracy: 0.9939
Epoch 8/15
60000/60000 [==============================] - 230s 4ms/step - loss: 0.0281 - accuracy: 0.9912 - val_loss: 0.0234 - val_accuracy: 0.9922
Epoch 9/15
60000/60000 [==============================] - 237s 4ms/step - loss: 0.0253 - accuracy: 0.9922 - val_loss: 0.0195 - val_accuracy: 0.9941
Epoch 10/15
60000/60000 [==============================] - 230s 4ms/step - loss: 0.0229 - accuracy: 0.9928 - val_loss: 0.0170 - val_accuracy: 0.9939
Epoch 11/15
60000/60000 [==============================] - 227s 4ms/step - loss: 0.0224 - accuracy: 0.9926 - val_loss: 0.0149 - val_accuracy: 0.9953
Epoch 12/15
60000/60000 [==============================] - 234s 4ms/step - loss: 0.0192 - accuracy: 0.9938 - val_loss: 0.0168 - val_accuracy: 0.9950
Epoch 13/15
60000/60000 [==============================] - 229s 4ms/step - loss: 0.0185 - accuracy: 0.9941 - val_loss: 0.0150 - val_accuracy: 0.9955
Epoch 14/15
60000/60000 [==============================] - 230s 4ms/step - loss: 0.0173 - accuracy: 0.9948 - val_loss: 0.0187 - val_accuracy: 0.9941
Epoch 15/15
60000/60000 [==============================] - 233s 4ms/step - loss: 0.0169 - accuracy: 0.9947 - val_loss: 0.0188 - val_accuracy: 0.9951
Test loss: 0.01883504462489873
Test accuracy: 0.9951000213623047

Best epoch is 13. It starts overfitting from epoch 14. Test accuracy obtained is 99.51%.

Hope this article helps you to have the basic knowledge regarding how to perform CNN. You can further train your data with required changes by keeping the important points in mind.

Thanks for giving your valuable time for reading this article. This is my first post will keep posting further.

References:

  1. https://en.wikipedia.org/wiki/Convolutional_neural_network
  2. https://www.appliedaicourse.com/
  3. google images.