Introduction to Convolutional Neural Networks

Original article was published on Deep Learning on Medium

Introduction to Convolutional Neural Networks


Convolutional Neural Networks are a part of Deep Learning which is employed in image recognition, image classification, object detection, etc. These take an image as input, processes it, and classify it under certain categories.

Here every image is passed through a series of filters, pooling, flattening, and a fully connected layer and apply as the softmax activation function to classify an object with probabilistic values between 0 and 1.

The main reason for applying softmax function is that, as an example, if we are classifying a cat and dog, and let’s say we got a probability that the image is a dog as 0.80 and cat as 0.35. It doesn’t sum to 1 and doesn’t make sense. To overcome it we use softmax function what it does is, It takes the values from the last layer of the hidden layer and then scales the values in between 0 and 1 and make sure it sums to 1.

How CNN works

1.Convolution Layer

It is the first layer to extract features from the image. It keeps the relation between pixels by learning image features using small squares of input data. It extracts features such as edges, corners from the input image.

It performs a dot product between the two matrices, one is the kernel and the other is the portion of the image. The kernel can be of any size but in the visualization shown below the kernel is of size 3 by 3 matrix. It moves from left to right till the end of the image and if any values match with the kernel values, that count is added in the feature detector. If anyone value matches we place 1 at the respective position and if any two values are matched with the kernel we place 2 in the feature detector i.e in the resultant matrix. The representation can be shown below


Stride: It is the number of pixels by which we slide our filter matrix over the input matrix. When the stride is 1 then we move the filters one pixel at a time.

Padding is adding zeros so as to fit it for the stride.

Relu Layer

RelU activation function

We apply a Rectifier function to the Feature Map i.e the Convolved Feature to add Non-Linearity to our image. The reason why we apply non-linearity is, Initially the images are highly non-linear in nature but when we apply Convolution to create Feature Maps, there is a risk that it might create something linear and erases non-linearity. So to make image non-linear we apply rectifier function.

2.Max Pooling

In this layer, the dimensionality of the feature map gets reduced by 75% keeping important information. pooling can be of different types:

  • Max Pooling
  • Average Pooling
  • Sum Pooling

Mostly we use Max pooling, pooling won’t only reduce the size by 75% but also prevents overfitting and helps us a lot in terms of processing.



It breaks the spatial structure of the data and transforms your two-dimensional data into one dimensional. This is done to feed the output of CNN to the fully connected network(to classify features learned by CNN) or to feed output to the softmax function to get the probability.

4. Fully Connected layer

It is a simple feedforward Neural Network, these are the last few layers in the model. The output from the final pooling layer is flattened and then fed into a Fully connected layer. The role of this layer is to take the results from the flatten layer and use them to classify the image to its corresponding label.

Till now we have highlighted some important features of the image and reduced the sized and removed some unnecessary features to speed up the process but didn’t classify an image yet.

In the above diagram, the feature map matrix is converted as a vector, with fully connected layers, we combine all these features together to create a model. We have an activation function such as softmax or sigmoid to classify the outputs as a cat, dog, car, truck, etc.,

That’s all for now, hope you enjoyed the post. In my next blog, we will discuss about Recurrent Neural Networks.