Source: Deep Learning on Medium

## Feed Forward

Going forward in this neural network, we will compute the values for each layer.

**Input:**

*x = z¹ = a¹*

*H1 Layer:*

*z² = w¹x + b¹ = w¹a¹ + b¹*

*a² = f(z²) = f(w¹a¹ + b¹)*

*H2 Layer:*

*z³ = w²a² + b²*

*a³ = f(z³) = f(w²a² + b²)*

*Output:*

*z⁴ = w³a³*

*o = f(z⁴) = f(w³a³) = f(w³(f(w²(f(w¹a¹ + b¹)+b²))))*

This is known as feed forward propagation. We must do this again and again by altering our weights and biases to get close to our desired output.

## Gradient Descent and Backpropagation

But, the question is how to alter the weights. Well, that’s what is the recipe is for this algorithm. For that, we need to use the Gradient Descent, and propagate backward (thus comes the name backpropagation algorithm), and alter the weights as desired.

Recall that our desired output was *y*, and current output is *o. *To evaluate the difference in our prediction, we introduce **cost functions**, also called the l**oss functions**. It could be as simple as the **MSE** ( Mean Squared Error) or little complex as the **Cross-entropy** function. Here, let’s call it C, so, our cost function will be *C(o,y)*.

Now that we have our difference in the form of a function. We can introduce calculus to play with it. We need some function that could help us in decreasing the difference between our predicted value and the actual output, which we have quantified in the form of a cost function. This is where we need gradient descent algorithm, which is used to minimize a function by continuously moving in the direction of steepest descent, which is equal to the negative of the gradient.

Let’s dig deeper into this gradient descent. The gradient of a function means, how much a function change with respect to a change in particular quantity. It is computed by taking the partial derivative of the function with respect to the particular quantity. In this case, we want to see how much our cost function changes with respect to a change in our weights and biases, thus giving an idea of how to alter the weights to get minimum possible error in our prediction.

Going back to our example: