Why Convolution Neural Networks are better than feed forward networks?Introduction

Source: Deep Learning on Medium

Why Convolution Neural Networks are better than feed forward networks?Introduction to Alexnet & ZFNet

Deep Learning is such an interesting topic in today’s world. It has such a power to predict things with it’s little complex but interesting algorithms.

While I am still studying this topic, I was really fascinated about the Convolution Neural Networks. After my dive into Feed Forward Neural Networks, I had this question in my mind that why are we going to move to CNN’s, if these Feed Forward Networks are so interesting and they are strongly capable of performing classifications..? I wasn’t really up to study CNN’s but still I took my pen & paper and started doing my research for CNN’s over the internet. All I wanted to know was why are they so popular and how are they better than FFN’s? So, I started reading this article online. To my interest, I wanted to know more about CNN’s and after a time span of few weeks, I’ll tell you why they are such an interesting topic.

CNN on the left & FFN on the right

If we have an image as an input, each & every pixel will have a different weight associated to it and it will have three values(rgb values) associated to it. So, we can apply feed forward networks to them but if a standard image say of size 227*227 is input then the number of parameters become 227*227*3. Roughly, 10⁴ number of weights will be associated with the image. So, 10⁴ number of neurons will be required in one single layer of the network which is really incompatible and bothersome work. Hence, total of millions of parameters and neurons will be required in one single feed forward network, so they are incompatible for handling images. In CNN’s a kernel is built( kernel is basically a matrix of weights) and the weights are shared as the kernel moves horizontally and vertically across and image. The maxpooling operation directly cuts the number of parameters by half. Then there’s a concept of padding and stride which further decreases the parameter size of the image.

The CNN’s look really complex 3-d connections but it’s an easy topic especially because we use frameworks like Pytorch :). Also, as we proceed to next convolution layers the kernel size decreases and the parameters continue to decrease unlike Feed Forward networks.

Let’s discuss about two old but interesting pre built CNN’s:


AlexNet Structure

The structure gives rise to a total of 27.55 million parameters out of which 24 million are produced from last three fully connected layers only and the rest from Convolution network behind it. The kernel size has decreased or remain same as we proceed. Total 96 parameters are there in the first Convolution Layer and each of them moves horizontally & vertically across a 2-d input layer. The 11*11 kernel size means that we are capturing a large area of pixels in the image. About the parameters, there is no proper method to determine why we use those exact parameters. Deep Learning is a hit and trial process where we check accuracy and keep the parameters where accuracy is high.

Error rate in this network is 16.4%.

AlexNet was trained for 6 days simultaneously on two Nvidia Geforce GTX 580 GPU’s which is the reason why the network is split in two pipelines. Isn’t that awesome.. 😀

More about AlexNet : Just Deep Learning things!