Original article was published by Anjali Bhardwaj on Deep Learning on Medium
What is a Perceptron? – Basics of Neural Networks
An overview of the history of perceptrons and how they work
A single-layer perceptron is the basic unit of a neural network. A perceptron consists of input values, weights and a bias, a weighted sum and activation function.
In the last decade, we have witnessed an explosion in machine learning technology. From personalized social media feeds to algorithms that can remove objects from videos. Like a lot of other self-learners, I have decided it was my turn to get my feet wet in the world of AI. Recently, I decided to start my journey by taking a course on Udacity called, Deep Learning with PyTorch. Naturally, this article is inspired by the course and I highly recommend you check it out!
If you have taken the course, or read anything about neural networks one of the first concepts you will probably hear about is the perceptron. But what is a perceptron and why is it used? How does it work? What is the history behind it? In this post we will briefly address each of these questions.
A little bit of history
The perceptron was first introduced by American psychologist, Frank Rosenblatt in 1957 at Cornell Aeronautical Laboratory (here is a link to the original paper if you are interested). Rosenblatt was heavily inspired by the biological neuron and its ability to learn. Rosenblatt’s perceptron consists of one or more inputs, a processor and only one output.
Originally, Rosenblatt’s idea was to create a physical machine that behaves like a neuron however, it’s first implementation was a software that had been tested on the IBM 704. Rosenblatt eventually implemented the software into custom-built hardware with the intention to use it for image recognition.
Although initially Rosenblatt and the AI community were optimistic about the technology, it was later shown that the technology was only linearly separable, in other words the perceptron was only able to work with linear separation of data points. This caused the technology to have poor recognition of different patterns.
At the time the poor classification (and some other bad press) caused the public to lose interest in the technology. Today, however, we have developed a method around this problem of linear separation, called activation functions.
Let’s take a look at how perceptrons work today.
A perceptron works by taking in some numerical inputs along with what is known as weights and a bias. It then multiplies these inputs with the respective weights(this is known as the weighted sum). These products are then added together along with the bias. The activation function takes the weighted sum and the bias as inputs and returns a final output.
Wow that was confusing… let’s break that down by building a perceptron.
A perceptron consists of four parts: input values, weights and a bias, a weighted sum and activation function.
Assume we have a single neuron and three inputs x1, x2, x3 multiplied by the weights w1, w2, w3 respectively as shown below,
The idea is simple, given the numerical value of the inputs and the weights, there is a function, inside the neuron, that will produce an output. The question now is, what is this function?
One function may look like
This function is called the weighted sum, because it is the sum of the weights and inputs. This looks like a good function, but what if we wanted the outputs to fall into a certain range say 0 to 1.
We can do this by using something known as an activation function. An activation function is a function that converts the input given (the input in this case would be the weighted sum) into a certain output based on a set of rules.
There are different kinds of activation functions that exist, for example:
- Hyperbolic Tangent: used to output a number from -1 to 1.
- Logistic Function: used to output a number from 0 to 1.
Since the range we are looking for is between 0 and 1, we will be using a Logistic Function to achieve this.
Logistical functions have the formula,
Where the graph looks like,
Notice that g(z) lies between the points 0 and 1 and that this graph is not linear. This will allow us to output numbers that are between 0 and 1 which is exactly what we need to build our perceptron.
Now we have almost everything we need to make our perceptron. The last thing we are missing is the bias. The bias is a threshold the perceptron must reach before the output is produced. So the final neuron equation looks like:
Represented visually we see (where typically the bias is represented near the inputs),
Notice that the activation function takes in the weighted sum plus the bias as inputs to create a single output. Using the Logistical Function this output will be between 0 and 1.
Why are perceptron’s used ?
Perceptrons are the building blocks of neural networks. It is typically used for supervised learning of binary classifiers. This is best explained through an example. Let’s take a simple perceptron. In this perceptron we have an input x and y, that is multiplied with the weights wx and wy respectively, it also contains a bias.
Let’s also create a graph with two different categories of data represented with red and blue dots.
Notice that the x-axis is labeled after the input x and the y-axis is labeled after the input y.
Suppose our goal was to separates this data so that there is a distinction between the blue dots and the red dots. How can we use the perceptron to do this?
A perceptron can create a decision boundary for a binary classification, where a decision boundary is regions of space on a graph that separates different data points.
Let’s play with the function to better understand this. We can say,
wx = -0.5
wy = 0.5
and b = 0
Then the function for the perceptron will look like,
0.5x + 0.5y = 0
and the graph will look like,
Let’s suppose that the activation function in this case is a simple step function that outputs either 0 or 1. The perceptron function will then label the blue dots as 1 and the red dots as 0. In other words,
if 0.5x + 0.5y => 0, then 1
if 0.5x + 0.5y < 0, then 0.
Therefore, the function 0.5x + 0.5y = 0 creates a decision boundary that separates the red and blue points.
Overall, we see that a perceptron can do basic classification using a decision boundary.
Note: In this example the weights and biases were randomly chosen to classify the points, but what if we did not know what weights would create a good separation for the data. Is there a way that the perceptron could classify the points on its own (assuming the function is linear)? The answer is yes! There is a method called the ‘perceptron trick’, I will let you look into this one on your own :).