Original article was published on Artificial Intelligence on Medium
The Workflow of Neural Network Explained
This article explains the workflow and formulas of neural networks using the backpropagation method.
Neural Network has been developed to mimic a human brain. Though we are not there yet, neural networks are very efficient in machine learning. It was very popular in the 1980s and 1990s. Recently it has become more popular. Computers are fast enough to run a large neural network in a reasonable time. In this article, I will discuss how a neural network works.
Ideas of Neural Network
In a simple neural network, neurons are the basic computation units. They take the input features and channel them out as output. Here is how a basic neural network looks like:
Here, ‘layer1’ is the input feature. ‘Layer1’ goes into another node layer2 and finally outputs the predicted class or hypothesis. Layer2 is the hidden layer. You can use more than 1 hidden layer. The neural network uses a sigmoid activation function for a hypothesis just like logistic regression. The term x-zero in layer1 and a-zero in layer2 are the bias units.
The process of moving from layer1 to layer3 is called the forward propagation. The steps in the forward-propagation:
- Initialize the coefficients theta for each input feature and also for the bias term. Suppose, there are 10 input features. Add a bias term 1. Then input features become 11. Say, we have 100 training examples. That means 100 rows of data. In that case, the size of our input matrix is 100 by 11. Now you determine the size of your theta1. The number of rows needs to be the same as the length of the input matrix. In this example, that is 100. The number of columns should be the length of the hidden layer.
- Multiply the input features with corresponding thetas and pass them through the sigmoid activation function.
Here, ‘a’ represents the hidden layer or layer2. And g(z) is:
If we elaborate on the process of getting into the layer2:
This is the layer2.
3. After the calculation of the hidden layer(layer2), we need to add a bias term in the layer2 as well.
4. Initialize the theta2 for the hidden layer. The size will be the length of the hidden layer by the length of the next layer. In this example, the nest layer is the output layer as we do not have any more hidden layers.
5. Then we need to follow the same process as before. Multiply theta and the hidden layer and pass through the sigmoid activation layer to get the hypothesis.
This is the process of forward-propagation.
Backpropagation is the process of moving from the output layer to layer2. In this process, we calculate the error term.
- First, subtract the hypothesis from the original output y. That will be our delta3.
2. Now, calculate the gradient for theta2. Multiply delta3 to theta2. Multiply that to ‘a2’ times ‘1- a2’. In the formula below superscript 2 on ‘a’ represents the layer2. Please do not misunderstand it as a square.
3. Calculate the unregularized version of the gradient from diving delta by the number of training examples m.
Train The Network
Revise the theta. Multiply input features to the delta2 times a learning rate to get theta1. Please pay attention to the dimension of the theta.
Repeat the process of forward-propagation and backpropagation with this revised parameters until you reach an optimum cost. Here is the formula for cost function:
If you notice, the first part of this equation is like the cost function of logistic regression. Just, a regularization term is added at the end. You can assume the value of lambda to be 1.
Thank you so much for reading this article. It is a complex process. I tried to explain the process of coding a neural network as simply as I can. If you want to see some working examples of linear and logistic regression with complete code, here are some materials: