Convolutional Neural Networks — Part 5: Why Convolutions ?

Original article was published by Brighton Nkomo on Artificial Intelligence on Medium


The second way that convolutional networks get away with having relatively few parameters is by having sparse connections. Here’s what that means,

if you look at the zero, this is computed via 3 by 3 convolution. And so, it depends only on this 3 by 3 inputs grid. So, it is as if the green-circled zero output unit on the right is connected only to green-shaded 9 out of these 36 input features. And in particular, the rest of the other 27 pixel values, the pixel values not shaded in green do not have any effects the green-circled zero.

As another example, the red-circled 30 at the output depends only on the 9 red shaded input features. And so, it’s as if only those 9 input features are connected to the red-circled 30 at the output, and the other pixels just don’t affect this output at all. And so, through these two mechanisms, a neural network has a lot fewer parameters which allows it to be trained with smaller training sets and is less prone to overfitting.

Sometimes you also hear about convolutional neural networks being very good at capturing translation invariance; that’s the observation that a picture of a cat shifted a couple of pixels to the right, is still pretty clearly a cat. The convolutional structure helps the neural network encode the fact that an image shifted a few pixels should result in pretty similar features and should probably be assigned the same output label. And the fact that you are applying to same filter, which goes through all the positions of the image, both in the early layers and in the late layers that helps a neural network automatically learn to be more robust or to better capture the desirable property of translation invariance. So, these are maybe a couple of the reasons why convolutions or convolutional neural networks work so well in computer vision.