Bangla Character Recognition System — The Deep Learning Way (2/n)

Source: Deep Learning on Medium

Our Dense Net

Architecture

Coming back to the project at hand, here is our version of DenseNet (lovingly called SayanNet v4 😛 )

https://ibb.co/HGtkwWs (also shown at the very end. It is huge.)

However the basic structure is as follows:

  • Input Layer — Input the images as 64x64x1 (since all images are grayscale)
  • (Optional) Augmentation Layer.
  • 3 Dense Blocks — With 4 convolution layers each block, each convolutional layer having 12 filters. Followed by a transition block of average pooling layer (2×2 with strides of 2×2) for downsampling the output dimensions.
  • Global Average Pooling layer — To transform the final convolution layer output to a single vector
  • 3 Output Dense Layers — For each target.
  • Several Dropout and Batch Normalization between convolution layers to reduce over-fitting and ensure signal strength.

In total we have:
Total parameters: 173,730
Trainable parameters: 170,802
Non-trainable parameters: 2,928

Other Hyperparameter

Optimizer — We used the Adam⁶ optimizer with a starting learning rate of 0.01

Loss function — Since each value is a label (with no numerical significance), we used the Sparse Categorical Cross-Entropy⁷ as out loss function for all outputs.

Metrics — Although the competition webpage measure on a weighted Recall metric, we use Accuracy⁸ as our training metric

Learning Rate — We used a variable learning rate, using TensorFlow’s native callback `ReduceLROnPlateau`⁹ functionality on all 3 output losses. We started of with a learning rate of 0.01 and reduce by a factor of 0.2 each time the loss plateaus to a minimum of 0.0001.

Preliminary Results

As mentioned before and the previous article, since our training data is imbalanced, we used scikit-learns `train_test_split`¹⁰ function to do a class stratified split of the entire data-set into 90% training an 10% validation.

We trained the model with a batch size of 32 for a total of 100 epochs.