Source: Deep Learning on Medium
What we need Squeeze-Net?
Nowadays, the technology is at its peak. Self driving cars and IOT is going to be household talks in the next few years to come. Therefore, everything is controlled remotely, say for example, in self-driving cars we will need our system to constantly communicate with the servers. So therefore, if we have a model which has a small size then we can easily deploy it in cloud. So that’s why we needed a architecture which is less in size and also achieve the same level of accuracy that other architecture achieves.
Here are some advantages of Squeeze-Net —
- More efficient distributed training.
- Less overhead when exporting new models to clients.
SqueezeNet achieves AlexNet-level accuracy on ImageNet with 50x fewer parameters. Additionally, with model compression techniques, we are able to compress SqueezeNet to less than 0.5MB (510× smaller than AlexNet).
- Replace 3×3 filters with 1×1 filters- We plan to use maximum number of 1×1 filters as using a 1X1 filter rather than a 3X3 filter can reduce the number of parameters by 9X. One may think that replacing 3X3 filters with 1X1 filters may perform poorly as it has less information to work on. But this is not the case. Typically a 3X3 filter may capture the spatial information of pixels close to each other while the 1X1 filter zeros in on a pixel and captures the features amongst its channels.
- Decrease the number of input channels to 3×3 filters- to maintain a small total number of parameters in a CNN, it is important not only to decrease the number of 3×3 filters, but also to decrease the number of input channels to the 3×3 filters. We decrease the number of input channels to 3×3 filters using squeeze layers. The author of the paper has used a term called the “fire module” in which there is a squeeze layer and an expand layer. In the squeeze layer we are only using 1X1 filters while in expand layer we are using a combination of 3X3 filters and 1X1 filters. The author is trying to limit the number of inputs to the 3X3 filters so as to reduce the number of parameters in the layer.
- Downsample late in the network so that convolution layers have large activation maps- Having got an intuition about reducing the sheer number of parameters we are working with, how the model is getting the most out of the remaining set of parameters. The author in the paper has downsampled the feature map in later layers and this actually increases the accuracy. But this is a great contrast to networks like VGG where a large feature map is taken and then it gets smaller as the network approach towards the end. This different approach is very interesting and they cite a paper by K. He and H. Sun that similarly applies a delayed down sampling that leads to higher classification accuracy.
The squeeze-net architecture consists of the fire module which enables it to bring down the number of parameters.
Another thing that surprises me is the lack of fully connected layers or dense layers at the end which one will see in a typical CNN architecture. The dense layers at the end learns all the relationship between the high level features and the classes it is trying to identify. The fully connected layers are designed to learn that noses and ears make up a face, and wheels and lights indicate cars. However, in this architecture that extra learning step seems to be embedded within the transformations between various “fire modules”.
The squeeze-net is able to achieve an accuracy nearly equal to AlexNet with 50X less number of parameters. The most impressive part is that if we apply Deep compression to the already smaller model then it can reduce the size of the squeeze-net model to 510x times that of AlexNet.
I will now share the model of the squeeze-net which consists of 8 fire modules and 2 convolutional layer one at the start and another at the end.
So this is the Squeeze-Net model, so feel free to leave your comments below if you need any help. Thanks for reading my post.
You can go through the squeeze-net official paper- Click here