Original article can be found here (source): Deep Learning on Medium
For a beginner how to start with CNN using MNIST Dataset
In deep learning, a convolutional neural network (CNN, or ConvNet) is a class of deep neural networks, most commonly applied to analyzing visual imagery. They are also known as shift invariant or space invariant artificial neural networks (SIANN), based on their shared-weights architecture and translation invariance characteristics.
They have applications in image and video recognition, recommender systems, image classification, medical image analysis, natural language processing, and financial time series.
Convolutional networks were inspired by biological processes in that the connectivity pattern between neurons resembles the organization of the animal visual cortex.
Individual cortical neurons respond to stimuli only in a restricted region of the visual field known as the receptive field. The receptive fields of different neurons partially overlap such that they cover the entire visual field.
A convolutional neural network consists of an input and an output layer, as well as multiple hidden layers. The hidden layers of a CNN typically consist of a series of convolutional layers that convolve with multiplication or other dot product.
The activation function is commonly a ReLU layer and is subsequently followed by additional convolutions such as pooling layers, fully connected layers and normalization layers, referred to as hidden layers because their inputs and outputs are masked by the activation function and final convolution.
Convolution is a generalized way of multiplication it is similar to dot product performed on matrices.
In convolution, the image is filtered with a small kernel or filter which reduce the size of the picture without losing the relationship between pixels. Let’s say the convolution image is (3*3) is multiplied with kernel size=(2*2).
Then the convolution of (3*3) image matrix multiplies with (2*2) filter matrix which is called Feature Map.
Pooling layers subsample their input. Pooling size is to reduce the amount of the parameters by selecting the maximum, average, or sum values inside these pixels. The most common way to do pooling it to apply a max operation to the result of each filter. Pooling reduces the output dimensionality.
Pooling gives invariance to translation, rotation, scaling.
Translation invariance: Wherever the face is should be able to locate it.
Rotational invariance: Whether the face is straight or titled should be able to locate it.
Scale invariance: Whether the face is small or big should be able to locate it.
Zero Padding: The nice feature of zero padding is that it will allow us to control the spatial size of the output volumes ( we will use it to exactly preserve the spatial size of the input volume so the input and output width and height are the same).
Valid Padding: Drop the part of the image where the filter did not fit. This is called valid padding which keeps only valid part of the image.
Stride is the number of pixels shifts over the input matrix. When the stride is 1 then we move the filters to 1 pixel at a time. When the stride is 2 then we move the filters to 2 pixels at a time and so on. The below figure shows convolution would work with a stride of 2.
Padding and strides are used to leverage the strides of multiplication.
tanh, sigmoid, ReLU is used for activation as Relu has better performance it is used mostly
Implementation of CNN on MNIST dataset
Now let’s consider MNIST dataset. Each image is a 28 by 28-pixel squares with greyscale. There are 10 digits (0 to 9) or 10 classes to predict.
Import the required libraries:
After importing the libraries load MNIST data from Keras. The dataset contains of 60000 training samples and 10000 test samples of each 28 by 28 pixels.
For data preparation reshaped the images and normalized the pixels data. In whatever form may the input be said images, text, audio, numerical final for processing the data it should be converted into binary format. y_train, y_test are converted into binary vectors.
x_train shape: (60000, 28, 28, 1)
60000 train samples
10000 test samples
Plotting the graph to check where the model is either overfitting or underfitting.
Building the model:
For Here I had considered one input layer, two hidden layers with kernel size=(5*5), for first hidden layer max poolsize=(2*2), for second hidden layer max poolsize=(3*3) where padding is same i.e zero padding, strides s=1. Used ReLU as activation unit. After the process is completed the model is flattened.
We can change all these values according to the required model.
Layer (type) Output Shape Param #
conv2d_11 (Conv2D) (None, 22, 22, 32) 1600
conv2d_12 (Conv2D) (None, 18, 18, 64) 51264
max_pooling2d_6 (MaxPooling2 (None, 9, 9, 64) 0
dropout_11 (Dropout) (None, 9, 9, 64) 0
conv2d_13 (Conv2D) (None, 7, 7, 128) 73856
max_pooling2d_7 (MaxPooling2 (None, 3, 3, 128) 0
dropout_12 (Dropout) (None, 3, 3, 128) 0
flatten_6 (Flatten) (None, 1152) 0
dense_11 (Dense) (None, 512) 590336
dropout_13 (Dropout) (None, 512) 0
dense_12 (Dense) (None, 10) 5130
Total params: 722,186
Trainable params: 722,186
Non-trainable params: 0
Train on 60000 samples, validate on 10000 samples
60000/60000 [==============================] - 225s 4ms/step - loss: 0.2804 - accuracy: 0.9097 - val_loss: 0.0493 - val_accuracy: 0.9831
60000/60000 [==============================] - 225s 4ms/step - loss: 0.0775 - accuracy: 0.9762 - val_loss: 0.0334 - val_accuracy: 0.9890
60000/60000 [==============================] - 225s 4ms/step - loss: 0.0542 - accuracy: 0.9834 - val_loss: 0.0306 - val_accuracy: 0.9898
60000/60000 [==============================] - 226s 4ms/step - loss: 0.0456 - accuracy: 0.9862 - val_loss: 0.0238 - val_accuracy: 0.9922
60000/60000 [==============================] - 227s 4ms/step - loss: 0.0384 - accuracy: 0.9882 - val_loss: 0.0206 - val_accuracy: 0.9932
60000/60000 [==============================] - 230s 4ms/step - loss: 0.0331 - accuracy: 0.9899 - val_loss: 0.0193 - val_accuracy: 0.9941
60000/60000 [==============================] - 230s 4ms/step - loss: 0.0308 - accuracy: 0.9904 - val_loss: 0.0190 - val_accuracy: 0.9939
60000/60000 [==============================] - 230s 4ms/step - loss: 0.0281 - accuracy: 0.9912 - val_loss: 0.0234 - val_accuracy: 0.9922
60000/60000 [==============================] - 237s 4ms/step - loss: 0.0253 - accuracy: 0.9922 - val_loss: 0.0195 - val_accuracy: 0.9941
60000/60000 [==============================] - 230s 4ms/step - loss: 0.0229 - accuracy: 0.9928 - val_loss: 0.0170 - val_accuracy: 0.9939
60000/60000 [==============================] - 227s 4ms/step - loss: 0.0224 - accuracy: 0.9926 - val_loss: 0.0149 - val_accuracy: 0.9953
60000/60000 [==============================] - 234s 4ms/step - loss: 0.0192 - accuracy: 0.9938 - val_loss: 0.0168 - val_accuracy: 0.9950
60000/60000 [==============================] - 229s 4ms/step - loss: 0.0185 - accuracy: 0.9941 - val_loss: 0.0150 - val_accuracy: 0.9955
60000/60000 [==============================] - 230s 4ms/step - loss: 0.0173 - accuracy: 0.9948 - val_loss: 0.0187 - val_accuracy: 0.9941
60000/60000 [==============================] - 233s 4ms/step - loss: 0.0169 - accuracy: 0.9947 - val_loss: 0.0188 - val_accuracy: 0.9951
Test loss: 0.01883504462489873
Test accuracy: 0.9951000213623047
Best epoch is 13. It starts overfitting from epoch 14. Test accuracy obtained is 99.51%.
Hope this article helps you to have the basic knowledge regarding how to perform CNN. You can further train your data with required changes by keeping the important points in mind.
Thanks for giving your valuable time for reading this article. This is my first post will keep posting further.