Ever since the advent of computers in particular and technology in general, the idea of creating intelligent systems has always fascinated us. The idea of something that can learn and adapt to changing circumstances and produce information and data relevant in solving real world problems has always been one of the top-most researched areas in the field of computer science. It’s not just because it would be awesome to have such a program that will adapt to changing parameters (though that is indeed a noble cause) but also because of how immense the real-world implications of such a system would be.
The good news is that we have already made an amazing amount of progress in just the last couple years. This has been evident to everyone as given the amount of news coverage Artificial Intelligence and particularly Deep Learning has gotten over the last couple of years, even the most disinterested in the field would have read about it just to know what the hype is all about.
But the question is why just the last couple of years? Have we recently stumbled upon a game-changing way to do something that has caused all this interest in the field?
Well, not really.
The fact is that the concepts on which today’s Deep Learning is based upon have been known since 1943! The ideas and the mathematics behind all this have been known to us all along, the technology to implement them is what hasn’t caught up yet — well, until now (a few years ago, to be exact).
(Artificial) Neural Networks
The most beautiful thing about Deep Learning is that it is based upon how we, humans, learn and process information. Everything we do, every memory we have, every action we take is controlled by our nervous system which is composed of — you guessed it — neurons!
All information that our brain processes and stores is done by the way of connections between different neurons and their relative strength. That’s the concept on which neural networks are based upon. And on the very basic level, neural networks are just a bunch of neurons connected to each other. This presents an interesting concept: The structure of a neural network is independent of the job it has to do. The only things that change are the parameters of those interconnections between different neurons. So, a neural network created to do one job can do something else as well, it just has to re-train— learn everything about the new job.
But how does an artificial neural network actually learn?
To understand that we must learn a little bit about the basic structure of an ANN (artificial neural network). The simplest of the ANNs can be created from three layers of “neurons”. The input layer, the hidden layer and the output layer. Information flows from the input layer, through the hidden layer to the output layer and then out.
In the image above, the green circles represent the neurons on the input layer, the blue ones — the hidden layer, and the red one — the output layer. The arrows represent the connections between the different neurons.
Each of the connections has a number associated with it called the connection weight and each of the neurons has a number and a special formula associated with them called a threshold value and an activation function respectively. These are the parameters of the neural network.
When a neural network is being trained (in one of the methods of the training), it is provided with a set of inputs as well as their corresponding outputs. It runs the inputs through the neurons on each of the layers of the network, and using the parameters above, each neuron transforms the input in some way and forwards it to the next layer and so on. The result that it receives on the output layer is then compared to the outputs supplied above and it checks how far apart the two are and accordingly adjusts the parameters on each of the neurons through special algorithms (Gradient Descent, Back Propagation) designed to bring the actual and produced outputs as close to each other as possible. It learns to adjust its weights and threshold values to arrive at the correct output. This is what we call as “learning” for the artificial neural network. This process is repeated a (very high) number of times until the produced and expected outputs are as close as possible. That completes the training.
Now, when new inputs are supplied to the neural network, we can confidently say that the predicted outputs of the network will be fairly close to the actual outputs. Such ANNs can be used in predicting house prices based upon certain features of the houses (sq ft area, location, no. of rooms etc), classifying objects and images and what not!
ANNs like the one above with limited number of layers and neurons can only do so much. To represent more complex features and to “learn” increasingly complex models for prediction and classification of information that depends on thousands or even millions of features, we need ANNs a little more complex than the one above. This is accomplished by simply increasing the number of hidden layers and / or the number of neurons per hidden layer. More layers and more neurons can represent increasingly complex models but they also come at the cost of increasing time and power-consuming computations.
Such neural networks which consist of more than three layers of neurons (including the input and output layer) are called as Deep Neural Networks. And training them is called as Deep Learning.
And now, with deep neural networks, extremely complex problems of prediction and classification can be solved in very much the same way.
This is the beauty of how such simple constructs can do such amazing jobs!
Thank you for reading.
Disclaimer: This article is only meant for general information and no specific care has been taken to make it mathematically and / or conceptually accurate. It’s only meant to give a general idea.
Source: Deep Learning on Medium