Original article can be found here (source): Deep Learning on Medium
Getting it to top 6% in Kaggle’s MNIST Digit Recognizer from scratch -3.
Making our CNN more robust
3.1 Batch Normalization
In part 1, we normalized our input layer, by rescaling the image’s pixel value between o to 1. In a similar fashion, we can even normalise the input provided to the hidden layer using Batch Normalization. There are several advantages of Batch Normalisation but most promising ones are requiring less number of epochs, learning of hidden layer is more independent of each other and has less covariance shift, that is it prevents the model from overfitting. We can straightaway include Batch Normalisation layer in our model using:
More on Batch Normalisation here.
3.2 Dropout — Regularising our Deep Neural Network
During the training phase of the neural network, some nodes in a hidden layer are dropped randomly based on its neuron value(We need to specify the dropout rate ranging between 0 to 1). This allows the network to be architecturally different at each run and hence, preventing overfitting. You can start with dropout rate of 0.4 and go on experimenting a bit to see where it finds a sweet spot. I usually prefer between 0.4–0.6. We can implement Dropout layer as follows :
More on dropout here.
3.3 Dropping learning rate as training progresses
We can reduce the learning rate after a fixed set of epochs. This is important, because when we were training our models in part 1 and part 2 the accuracy is constantly fluctuating i.e in between it starts to decrease instead of increasing as the learning progress. Therefore it is essential to reduce the learning rate to prevent it from overshooting the optima. We can implement our own learning rate scheduler as follows :
from keras.callbacks import *LearningRateScheduler(lambda x: droppingrate**x)
3.4 Data Augmentation
The Deep nets are always thirsty for data, the more the data you provide to the network, the more accurate it is. Data Augmentation is the most crucial step in any machine learning model, especially if we are building complex neural networks. As the number of training parameters increases, it requires more data proportionally. TensorFlow provides an option to augment data in real-time and can be implemented as :
Let’s build our CNN using all the above parameters:
The Model Scores 0.99728 on Kaggle with a rank under 150 (Top 7%).
3.5 Ensembling the different models:
Run the model for 5 different times, you will see that the training and testing score varies on each run. Therefore, there is no guarantee that the same model will hit the 0.99728 accuracy score on each run. Therefore it is better to:
a. run the model for a few iterations
b. try different models by experimenting with a number of hidden layers, kernel size, pooling layers, learning rate, dropout rate, neurons in dense layers etc. Select the good models and ensemble them.
Keeping the template as it is, and just by changing kernel size in convolutional layers and ensembling output from different models, I was able to hit 0.99771 with a rank of 120 i.e Top 6% as of 31st March 2020.