Neural Networks: Basics

Source: Deep Learning on Medium

Understanding the basics of neural networks will be beneficial by helping you stay engaged in a conversation around this topic.

Artificial Neural Networks (ANNs) are all the hype in machine learning. As a result, a slew of research is occurring. The progression of computer vision by their tolerance of noisy data, self-driving cars by predicting where the road lines will be, and natural language processing (NLP) so that you can communicate to your voice assistants has been due to ANNs. Understanding the basics of neural networks will be beneficial by helping you stay engaged in a conversation around this topic.

Let’s start with the definition: artificial neural networks are mathematical structures that when given input can map to a desired output. With that, a dozen questions pop up in your mind. Let’s go ahead and answer some of the common questions to understand the basics of neural networks.

Who Invented Neural Networks?

This answer can vary based on who you ask. Some people give credit to the very first theory by Warren McCulloch and Walter Pitts. They are credited with describing how a neuron could have a mathematical representation. However, in this article, we are saying a psychologist named Frank Rosenblatt.


Rosenblatt published a paper titled “The perceptron: Perceiving and Recognizing Automation” in 1957. This publication describes the building blocks of neural networks, perceptrons. He describes how these artificial neurons could learn from data. He is credited with creating supervised learning which allowed for the neuron to alter its own weights based on its accuracy.

McCulloch and Pitts

McCulloch and Pitts published a paper titled “ A Logical Calculus of the Ideas Immanent in Nervous Activity “ in 1943. They heavily influenced Rosenblatt’s research. In that paper, they described how the human neuron could be represented mathematically in the “all-or-nothing” characteristic of the biological neural network.

What a neuron receives a signal it does not automatically start continuing the signal downstream. It will hold onto the signal until a threshold is met and then it sends the message along its axon to be received by its neighboring cells.


Donald Hebb, a psychologist pioneer of neuropsychology, published a paper in 1949 titled “The Organization of Behavior: A Neuropsychological Theory”. This groundbreaking paper resulted in a whole new rule in psychology called Hebb’s rule. The rule describes how one neuron can excite another neuron and after repetitive activation cell A becomes efficient in activating cell B. In Hebb’s words, ‘neurons that fire together wire together’.

“When the axon of a cell A is close enough to excite a B cell and takes part on its activation repetitively and persistently, some type of growth process or metabolic change takes place in one or both cells, so that increases the efficiency of cell A in the activation of B”

Hebb’s rule definition

What Is A Perceptron and How Does It Make Up Neural Networks?

A perceptron is the building block of neural networks, similar to the biological building blocks of nucleic acids, lipids, carbohydrates, and proteins. 4 parts make up these as well:

  1. Input values
  2. Weights and bias
  3. Summation function
  4. Activation function

The input values are affected by the predetermined weight values and then added up together in the summation function. After the summation function, the value is squashed between -1 and 1 or 0 and 1 in the activation function. This will determine if that perceptron will be activated. The more negative the value is the closer it will be to the minimum (-1 or 0) and then more positive the value the closer it is to the maximum (1). If it is activated then the output is sent along its way.

Single Perceptron

If the perceptron is in a single form, then the first and only output will be in a yes or no format. Single perceptrons are great for linearly separability datasets, groups that can be separated using a single line with a constant slope.

Perceptron’s are excellent for ‘AND’, ‘OR’, ‘XOR’ logic gates. Let’s break these down:

‘AND’, If both of the inputs are true then the resulting logic will be true.

‘OR’, if either or both of the inputs are true then the reasoning will result in true.

‘XOR’, exclusive ‘OR’, if either (but not both) of the information is true then the thesis will result in true.

Neural Network

However, if more than one perceptron is present and joined in a layered fashion we have produced a neural network. The output from the first layer becomes the input for the second layer and so on until the output layer sums up the total activation. The total activation is the confidence level of the network for its final decision.

What Are The Neural Network Layers?

Neural networks have 3 layers:

  1. Input Layer
  2. Hidden Layer
  3. Output Layer

The input layer is the starting point of the network. This is the layer where the values that will be used for the prediction are brought in. The hidden layer is where the network works its “magic”. The activation amounts of the input values are calculated throughout this area. The hidden layer can be as small as 1 or as numerous as needed for the project. Finally, the output layer is where the activation of each of the final nodes is used to select a solution.

As you can see, each unit is connected to each unit of the subsequent layer. This allows for nearby perceptrons to communicate with one another and create weights that are optimal for its use.

Just remember, the first layer is the input and the last layer is the output. Anything in between is the hidden layer.

How Does The Computer “See” Neural Networks?

The diagram that is frequently used to represent neural networks (such as the one used above) is the human-friendly version. How computers work with them and view them are in matrix form.

We will be going over the feedforward or training, portion first. This is accomplished using matrix multiplication.

Matrix Multiplication

Let’s break down a neural network with 2 input values, 1 hidden layer containing 3 nodes, and we end with 2 output layer nodes.

We will keep this easy to follow by using letters unique to each element so that you can reference it as we go.

Weights x Input Layer

Matrix multiplication is done by taking the rows of the first matrix and multiplying each element with the respective element in the second matrix. The main rule for this operation is that the number of columns in the first matrix must match the number of rows in the second matrix. Let us take a look at the first set of matrixes we will be using for this network.

As we can see, if the matrix of the input values is first we will have an undefined result due to the rule above not being honored. However, if we flip the matrixes we will be able to perform multiplication. With the resulting/input matrix as the second, or right, matrix and the weight matrix as the first, or left, matrix.

Let’s perform the first multiplication in this network.

Our resulting matrix contains 3 rows which correlate to each of the 3 nodes in the hidden layer:

Weights x Hidden Layer

Let’s do the final multiplication.

The output matrix has the final result and it contains the 2 rows which correlate to the 2 nodes of the output layer:

The result from the equations above would be the activation level for each of the neurons. If O 0 was higher than O 1 then the prediction associated with O 0 would be given as the solution.

How Do Neural Networks “Learn”?

Neural networks learn in 2 steps, feedforward(which we just went over) and backpropagation. Backpropagation can be broken down into 2 steps itself, calculating the cost then minimizing the cost.

The cost is the difference between the predicted value from the network and the expected value from the dataset. The larger the cost, the greater the error. The goal is to have the smallest possible cost. To achieve this minimizing the cost through altering the weights and biases is the name of the game. To think of it simply, its feedforward in reverse. You are doing matrix multiplication to alter the weights to give more of an emphasis on certain neurons based on what they will be receiving.

One common method of minimizing the cost function is through gradient descent.

Gradient Descent

The goal is to find the global cost minimum of the function by summing up all of the difference between the actual and expected output then multiplying it by the learning rate. The most common cost function is Mean-Squared Error.

The J(Θ) is called from the gradient descent function.

The learning rate, α, is basically how big of a step to take to convergence (another way saying finding the global minimum). To give you a visual, imagine you are standing on top of a hill with a gorgeous valley below you. You start to make your way down the side of the hill. Fighting the urge to listen to your inner child and roll down the hill rapidly (potentially missing your target), you take your time and step easily down and avoiding any stones and sidestepping any trees in your way.

The learning rate is similar to how large steps or methods you to take to get to the valley. If the learning rate is too large, you may keep on rolling up the other side and miss the valley entirely with the momentum from your roll. If the learning rate is too small then it will be dark before you even reach the valley, the optimal goal. Finding the proper learning rate for your model is essential in creating an effective and efficient neural network.

Check out 5 Easy Steps to Learn Machine Learning if you are curious about where to begin your self-education on machine learning. This article will help you efficiently learn this broad subject.

Until we learn again,