Source: Deep Learning on Medium
This post is written with the aim to explain the basics of neural networks by drawing an analogy between life learnings and neural networks sans the math behind it (the math behind it will be explained in a separate post). It is written with the assumption that the reader has come across terms like Neurons, weights, learning rate etc.
If you are a person who has just started to learn Neural Networks and cannot really fathom what happens underneath, then welcome to the club. To a lot of people including me, the whole concept and working of Neural Networks seemed to be a complete black box. Gaining an intuition about neural networks by trying to correlate it with life has helped me immensely in my Deep Learning career.
TRAINING THE NEURAL NETWORK & FORWARD PROPAGATION
The standard explanation of Neural networks is that they work similar to the human brain. When a baby is born , its brain hasn’t learnt anything (not technically). For example, the brain at this stage, has not seen dogs/cats, neither can it differentiate between them. The neural network at the start is like a baby’s brain. The network has technically not learnt anything. And to teach the network and the brain, a lot of labelled examples need to be given. The parent tries to teach the child, the difference between a dog and the cat by giving various visual examples. This is exactly what is called as training the neural network with labelled examples. The whole process of giving each example and teaching the network what it is, can be termed as forward propagation
Questions like How exactly does the brain learn , what exactly do the neurons do may arise. The brain, as we all know is made up neurons. We can say that the neurons get turned on if the amount of information in them crosses a particular threshold, else it may be termed as being off.
Let us say certain neurons get turned on (or get fired) when a cat is seen and another set of neurons get turned on when a dog is seen.The rate at which a child grasps information is unique to every child. Some may learn faster, some may take a little more time to learn. When the child is a fast learner, i.e grasps information faster, the neurons in the brain tend to update their information faster. This measure of the pace of learning is called Learning rate in neural networks. The information (weights in neural network) is adjusted in accordance to the learning rate.
ADJUSTING THE WEIGHTS & BACK PROPAGATION
Now that the network/brain has started learning, the process of learning may not be foolproof. The network/brain needs to be tested. In the case of a child, the parent now tries to show a new variety of cat and checks if the child has learnt to recognize cats properly and whether it has learnt to differentiate between a cat and a dog. This new variety of cat shown to the child is what is called as the test set in Neural networks. There are chances that the child predicts a dog in place of a cat.In this case, the parent tries to correct the child by telling it is a cat. Now this change of information needs to be broadcasted back to all the neurons in the network. This is termed as backpropagation in neural networks.
As mentioned earlier, the information in the brain is updated and the child learns that it is a cat. This adjusting of information is what is known as weight adjustment in neural networks.
Another way to gain an intuition about weight adjustment, is trying to correlate it with the knobs that the vintage television sets had. There used to be two knobs, one for tuning the height of the video and other for tuning the width of the video. Both the knobs had to be tuned till the height and width of the video was optimal for viewing.
COST FUNCTION AND LOSS FUNCTION
Technically, the amount of deviation between the ground truth and the prediction is termed as loss function, and the aim is to minimise it. Let us go back to the example of a child. The parent would try to make the child learn faster and also wish to make their life easier by trying to make the child learn faster by just giving a few examples. This would give a better intuition on why we strive to minimize the loss.
Let us say the child has to be taught the art of recognizing animals. But all that the child has seen is images of cats. Next time the child is shown the image of a dog, it predicts wrongly as a Cat. Thus, the child has started memorizing rather than actually learning to differentiate between animals. In neural networks, this scenario is called overfitting. This can be overcome by diversifying the training examples.
Let us say the child is now assigned the task of learning the difference between various birds. What happens internally in the brain is that it tries to recollect top level information such as eyes etc and tries to apply that knowledge in the task of classifying birds. This will speed up the process. This concept of information transfer between various task in neural networks is known as transfer learning. Essentially the information in each neuron (weights in each neuron) is initialised with the weights of the previous task rather than starting from scratch.
When do you stop teaching a child? Only when you have the conviction that the child is doing the task well. And the child is said to be doing the task well only when it predicts correctly on a good amount of test cases. Similarly the neural network is said to have converged when the loss has decreased well and remains constant/tends to zero after many cycles.
Neural networks is seeing a rapid development in this decade especially with heavy computing resources being accessible. I hope, you have something to take away after reading this post. Please do watch this space for weekly posts on Neural networks and AI.