Going beyond 99% — MNIST Handwritten Digits Recognition

Original article can be found here (source): Deep Learning on Medium

model = Sequential([# Layer 1
Conv2D(filters = 32, kernel_size = 5, strides = 1, activation = ‘relu’, input_shape = (32,32,1), kernel_regularizer=l2(0.0005)),
# Layer 2
Conv2D(filters = 32, kernel_size = 5, strides = 1, use_bias=False),
# Layer 3
BatchNormalization(),
# — — — — — — — — — — — — — — — — #
Activation(“relu”),
MaxPooling2D(pool_size = 2, strides = 2),
Dropout(0.25),
# — — — — — — — — — — — — — — — — #
# Layer 3
Conv2D(filters = 64, kernel_size = 3, strides = 1, activation = ‘relu’, kernel_regularizer=l2(0.0005)),
# Layer 4
Conv2D(filters = 64, kernel_size = 3, strides = 1, use_bias=False),
# Layer 5
BatchNormalization(),
# — — — — — — — — — — — — — — — — #
Activation(“relu”),
MaxPooling2D(pool_size = 2, strides = 2),
Dropout(0.25),
Flatten(),
# — — — — — — — — — — — — — — — — #
# Layer 6
Dense(units = 256, use_bias=False),
# Layer 7
BatchNormalization(),
# — — — — — — — — — — — — — — — — #
Activation(“relu”),
# — — — — — — — — — — — — — — — — #
# Layer 8
Dense(units = 128, use_bias=False),
# Layer 9
BatchNormalization(),
# — — — — — — — — — — — — — — — — #
Activation(“relu”),
# — — — — — — — — — — — — — — — — #
# Layer 10
Dense(units = 84, use_bias=False),
# Layer 11
BatchNormalization(),
# — — — — — — — — — — — — — — — — #
Activation(“relu”),
Dropout(0.25),
# — — — — — — — — — — — — — — — — #
# Output
Dense(units = 10, activation = ‘softmax’)
])

Results

The model was trained for 30 epochs and gave the following result.

  • Training Accuracy of 99.82%
  • Dev Set Accuracy of 99.62%
  • Test Set Accuracy of 99.41%

Wrap Up

The mentioned optimizations helped significantly to push the accuracy of the model aggressively beyond the 99% mark.

We can note that there are still some signs of overfitting as the accuracy goes down in the test set. Feel free to play around and try to reduce the variance further. If you think there are other ways to improve the model, please leave a comment.

All the code and the results are available on GitHub here.

References

[1] LeCun et al., Gradient-based learning applied to document recognition (1998), Proc of the IEEE, November 1998

[2] Andrew Ng et al., Improving Deep Neural Networks: Hyperparameter tuning, Regularization and Optimization, Coursera Course

[3] Y. Ghouza, Introduction to CNN Keras — 0.997 (top 6%) (2017), Kaggle Digit Recognizer Competition

[4] Srivastava, Nitish, et al., Dropout: a simple way to prevent neural networks from overfitting, (2014) JMLR