Original article was published on Deep Learning on Medium

# Understanding Neural Networks

This article focuses on in-depth understanding of Neural Network architecture. Later we will try to implement this in a jupyter notebook.

In my previous article, I have briefly discussed deep learning and how to get started with it. If you haven’t read that article, please read it here to get an intuitive idea about deep learning and machine learning.

If you have already read it, then let’s get started!

# The perceptron

Perceptron?! Some of you may think what is it? Some of you may know about it already. Anyway, a perceptron is the structural building block of a neural network. As simple as that. Combining many perceptrons by forming layers ends up being a deep neural network. A perceptron architecture may look like this:

Here, there are 2 layers in total: an input layer and an output layer. But, in the machine learning world, developers don’t consider input as a layer and hence they will say, “this a single layer perceptron model”. So, when someone says, “I have build a 5 layer neural network”, don’t count the input as a layer. So, what does this perceptron model do? As you can see in the above diagram we have 2 inputs and one single node with sigma and integration signs on it and then there is the output. This node computes two mathematical expressions to give the output. First, it takes the weighted sum of the input plus a bias and then the sum is passed through a **non-linear** activation function. Later the activation function produces predicted output. This whole process is called forward propagation in the neural network.

Have a look at this image:

The inspiration for the forward propagation is taken from logistic regression. If you know the logistic regression algorithm, this may seem you familiar, but if you don’t know logistic regression, it’s not necessary. The weights (W) and the biases (b) are the parameters that are “trained” by the neural network, and by “trained” I mean they are set to a precise value such that the loss is minimum.

The output ( “y” in the above diagram) is the prediction made by the neural network. The difference between the actual value and the predicted value is called the loss of the neural network. But it is not as simple as that. We do take the difference between the predicted value and the actual value but not the direct difference. Let’s understand what I want to say.

# The Loss and the Cost function

One thing you should know before moving on is that the predicted value calculates the loss of the neural network and the neural network do so by the calculation of “Z” which is dependent on “W” and “b”. Ultimately, we can say that the loss is dependent on “W” and “b”. So, the “W” and the “b” should be set to a value that gives minimum loss. To be clear, **a neural network always minimizes the loss instead of maximizing the accuracy**.

When solving a deep learning problem, the dataset is huge. For example, let’s say that we have to build an image classifier that classifies the image of cats and dogs (you can consider it as the “Hello World!” of computer vision 🙂 ). So for training the neural network, we need as many images of cats and dogs as we can get. In machine learning, the image of a dog or a cat is considered as a **“training example”**. For training a good neural network, we need a good number of training examples. The loss function is the loss calculated for a single training example. So, actually what we optimize for the training of a neural network is the **cost function**. The cost function can be defined as the average of all losses calculated separately for each training example.

Let us assume there are “m” number of training examples. Then the cost function is: