Source: Deep Learning on Medium
Perceptron — The Neuro Computer
Perceptron was first introduced in 1957 by Frank Rosenblatt, which was a computer program (actually a computer) inspired  by neurons in human brain. Rather than hand crafting the input space for the model, perceptron is designed to automatically adjust each inputs importance (weights) for the desired output. The perceptron returns a linear line which is used for binary classification.
How it works
1-) Given the Weight Vector and Input Vector multiply them and a bias term
“y = W*X + b”
2-) Transform the output of above function with some activation function z = TF(y).
3. If z is above some threshold return 1 or return 0
After the output of the model, predicted output and real value will be passed into a loss function which tells the model how well its performance was. We find a combination of Weights and biases that minimizes loss function by using the algorithm gradient descent. We take the derivative of the loss function to see the direction of where we should update our weights ( +,- direction for each weight) with a learning – rate.
It’s beautiful because of its real world similarity with humans. We humans predict some unknown and get an answer from our environment and most of the time we adjust our thinking process to have a better estimate for the next time.
However, Marvin Minsky and Seymour A. Papert in their book Perceptrons proved perceptron won’t be able to fit to most of the real world applications where the vectors are not linearly separable. After the paper perceptrons lost attention to it until 2000’s.
Universal Approximation Theorem
Goeffrey Hinton a British Canadian Computer scientist and Cognitive Scientist did not let go of the perceptron model and used the advantage of increasing computer powers brought us the deep learning era . A single perceptron cannot represent complex relationships like extracting meaning from image and many more other complex domains. So by adding one more layer you can approximate any function [2 – Ian Goodfellow DLB] with finite neurons. Adding multiple linear separators you can create curved representations of vectors and have the power of adjusting importance of each. How Multilayer Perceptron’s Work
People call Multilayer Perceptron’s Deep Neural Network; deep because it has more than one layer and in some domains we don’t know what the model is constructing inside hidden layers. So we have a model that can represent or draw a map from any input to any output but how do we use it in real world. To find better representations of complex domains; people added lots of layers and lots of neurons to the model . However, there are problems with neural networks; It consumes a lot of data to approximate and it does not generalize to other domains rather than what it has been trained on.
Human brain operates like a machine rather than just a single formula. There has to be different models doing their job and at the same time work as one model. This multi model network should find its way to represent world efficiently and learn quickly.
The next steps for better representations
As we try to make computers understand our world we need to make formulas more efficient and more selective on which signals they are attending to. Humans have been navigation through the world through symbols and capsules of information not bits like the pixels of image. Models that prioritize bulks of information can have better predictions and they can share those bulks of information to have representations of what is new to them