Source: Deep Learning on Medium

# Deep Neural Networks, Inspired by Math and Neuroscience

“It was the worst possible time, Everyone else was doing something different.”

— Bengio, co-director of the CIFAR program

## A Brief History of Neural Networks

In 1943, neurophysiologist Warren McCulloch and mathematician Walter Pitts created computational models based on math algorithms called Threshold Logic Unit (TLU) to describe how neurons might work. Simulations of neural networks were possible until computers became more advanced in the 1950s.

Before the 2000s it was considered one of the worst areas of research. LeCun and Hinton variously mentioned how in this period their papers were routinely rejected from being published due to their subject being neural networks. Artificial Neuron was considered useless as a single one of it cannot even solve the XOR logic.

The research was funded by the Canadian Institute for Advanced Research (CIFAR). With continuous hard work from all the researchers in the field, Krizhevsky, Sutskever, and Hinton won the ImageNet competition in 2012 with Convolutional Neural Networks (CNN), a model first created by Yann LeCun in 1998. Deep Neural Networks were in almost every single paper thereafter.

There were other reasons why neural networks succeed so well after a long time since it was created. First of all those models heavily depends on the computational powers, as our computers have become more advanced throughout the years, they eventually became more feasible. More importantly, in order to train such advanced models adequately, it would require a huge amount of data. A large amount of data was generated daily after the internet became popular with the consumers, along with social media, eCommerce and mobile devices which adds more opportunities for everyone to create tons of digital data. These new technologies also created the demand for applications in Computer Vision (CV) and Natural Language Processing (NLP).

There are two opposing beliefs in the field of Artificial Intelligence — **Connectionism vs. Symbolism**. Connectionism believes that the approach of cognitive science in the hopes of explaining intelligent behavior. Symbolism, on the other hand, relies on reasonings based on mathematical and logical operations. While many mathematical models succeeded in their given tasks, there is also an adequate amount of researches showing that approaches based on cognitive science do accelerate the learning process of Artificial Neural Networks.

## Artificial Neural Network (ANN)

The term “Deep Learning” is a field involving Deep Neural Networks, meaning Artificial Neural Networks with one or more hidden layers. An Artificial Neural Network is a Directed Acyclic Graph (DAG) consists of connected layers of Artificial Neurons. These Artificial Neurons are also called “Perceptrons” and thus Artificial Neural Networks are sometimes called Multi-layer Perceptrons (MLP). The values will be fed into the Input Layer, which is the first layer of the network, and the result will come out from the Output Layer, which is the last layer.

We can think of Multi-layered Perceptrons as a voting scheme, in order to come out with a decision, each of the Perceptron in the input layer sends a weighted vote to Perceptrons in the next layer, and the next layer… until the vote is finalized in the perceptrons in the output layer. The activation functions are generally so steep in the middle to ensure the activation values lie in either end so that binary decision values are approximated. Here is a place where we can visualize how an Artificial Neural Network classifies data.

**Forward Propagation & Activation Functions**

In each of the layers, there are numbers of Perceptrons connected with each other. The mechanism between the Perceptrons can be defined by the following terms:

**Activation (**: The Activation of a Perceptron is calculated by summing the Activations of Perceptron in the previous layer, added by a Bias, then transformed by an Activation Function.*a*)**Bias (**: Each time when a Perceptron receives a value, it will apply a bias to the value.*b*)**Weight (**: Activation value from the Perceptrons will be multiplied by a weight and summed together.*w*)**Activation Function (**: Activation values of each perceptron are calculated from the Activation function.*σ*)

There are several types of activation functions. Sigmoid Function (σ), often related to Learning Curves, is the most basic one. Each activation function has its specialties. Rectified Linear Units (ReLU) is the most popular one.

Each Perceptron will propagate its value into Perceptrons in the proceeding layer, and eventually, arrive at the output layer. This process is called **Forward Propagation**.

**Back Propagation & Cost Functions**

Forward Propagation is based on Weights, Biases and the Activation Function, but what determines these values? Activation Function is selected beforehand, but for a large neural network, it would be impossible to manually select the appropriate Weights and Biases.

In the field of Machine Learning, the models are supposed to “learn” from the data on its own, this learning process is also called “training”. Usually, the data is split into 2 different sets — the **Training Set** and the **Test Set**. The Training Set is used to “train” the model into a more mature state, and then the performance will be evaluated by the Test Set.

There are many different methods to “train” an Artificial Neural Network, but the most popular one is with **Back Propagation**.

Before Back Propagation, the Weights and Biases of the neural network are usually initialized randomly, in Normal Distribution. The neural network will then perform a Forward Propagation. Since the Weights and Biases are initialized at random, the result of the first Forward Propagation is usually way off. A **Cost Function** is then used to calculate the difference between the expected result and the output of the neural network. After the difference is calculated, it will be used to adjust the Weight and Biases of the previous layer. The process propagates backward in layers, and thus it is called “Back Propagation”.

Here is a more formal tutorial on Back Propagation, as it requires some advanced math to explain. An explanation of Neural Networks and code examples can be found here, where the author uses matrix operations to simulate a neural network in Python.