Source: Deep Learning on Medium
Our Dense Net
Coming back to the project at hand, here is our version of DenseNet (lovingly called SayanNet v4 😛 )
https://ibb.co/HGtkwWs (also shown at the very end. It is huge.)
However the basic structure is as follows:
- Input Layer — Input the images as 64x64x1 (since all images are grayscale)
- (Optional) Augmentation Layer.
- 3 Dense Blocks — With 4 convolution layers each block, each convolutional layer having 12 filters. Followed by a transition block of average pooling layer (2×2 with strides of 2×2) for downsampling the output dimensions.
- Global Average Pooling layer — To transform the final convolution layer output to a single vector
- 3 Output Dense Layers — For each target.
- Several Dropout and Batch Normalization between convolution layers to reduce over-fitting and ensure signal strength.
In total we have:
Total parameters: 173,730
Trainable parameters: 170,802
Non-trainable parameters: 2,928
Optimizer — We used the Adam⁶ optimizer with a starting learning rate of 0.01
Loss function — Since each value is a label (with no numerical significance), we used the Sparse Categorical Cross-Entropy⁷ as out loss function for all outputs.
Metrics — Although the competition webpage measure on a weighted Recall metric, we use Accuracy⁸ as our training metric
Learning Rate — We used a variable learning rate, using TensorFlow’s native callback `ReduceLROnPlateau`⁹ functionality on all 3 output losses. We started of with a learning rate of 0.01 and reduce by a factor of 0.2 each time the loss plateaus to a minimum of 0.0001.
As mentioned before and the previous article, since our training data is imbalanced, we used scikit-learns `train_test_split`¹⁰ function to do a class stratified split of the entire data-set into 90% training an 10% validation.
We trained the model with a batch size of 32 for a total of 100 epochs.