Deep Learning — Beginners guide

Source: Deep Learning on Medium

Neural Networks

Neuron

The neural networks are inspired from the architecture of a human brain and the neuron is a basic building block. It functions similar to a human brain’s neuron(takes input, processes it and gives final output).

Weights

When an input enters a neuron, it’s multiplied by weights. For instance, if a neuron has 2 inputs, then each of the input will have a weight associated to it. These weights are randomly initialized and later updated during the mode training. Zero weight signifies that particular feature has no value.

Example, let’s consider an input a and weight w1. Once, it is passed through the node it becomes a*w1. Here, for input 1 and weights associated with it — 0.8 and 0.2 which is

Bias

Bias is another component applied to the input along with the weights. It is added to change the range of the weighted input.

a*w1 + bias

Activation Functions

These functions decides whether a neuron should be activated or not by calculating the weighted sum and further adding a bias to it. The major goal of an activation function is to introduce non-linearity to the output of a neuron.

A neural network without activation function can be considered as a linear regression model. Hence, the activation function performs non-linear transformations to the input allowing it to learn and perform complex tasks.

Some of the popular types of activation functions –

  • Sigmoid or Logistic
  • Tanh
  • ReLu

Sigmoid Activation Function — f(x) = 1/1+ exp(-x)

It ranges between 0 and 1, an S-Shaped curve which is easy to understand and allows applications. But reasons for its fall out of popularity include vanishing gradient problem, and its output is not zero centered which makes gradient updates go too far in different directions making optimization a difficult task.

Hyperbolic Tangent Activation Function (Tanh) f(x) = 1-exp(-2x)/1+exp(-2x).

It ranges between -1 to 1. The output for Tanh activation function is zero centered which overcomes the issue with the sigmoid activation function. Hence, optimization is easier and most chosen over sigmoid function.

However, it also suffers from a vanishing gradient problem.

ReLu — Rectified Linear Units: R(x) = max(0,x) if x<0, R(x) = 0 and if x>=0, R(x) = x.

It is the most used activation function since it is quite simple and also efficient.

Almost every deep learning model uses ReLu these day dues to its capability to identify and solve the vanishing gradient problem. However, ReLu has its issues as some gradients can be weak. It can lead to a weight update which may not be activated on any data point resulting in a Dead neuron.

To overcome this issue of dying neurons, a modified version is introduced — Leavy ReLu. It introduces small slopes to keep the updates alive.

Architecture

Input layer

The nodes in the input layer do not modify the data, i.e., no computations are performed in this layer most of the times, they just pass the information to the next layer. In contrast, the nodes in the hidden layer and output layer are active, and most data modifications are performed in those layers. The variables in input node — x1, x2, x3…,xn hold the data for evaluation. The data can be pixel values for an image, stock prices, output of some other algorithm such as a classifier in cancer detection, ex: diameter, edge sharpness, etc.,

Hidden layer

All values from the input layer are duplicated and sent to the hidden layer where a fully connected structure is formed. Computations are carried out in this layer, and the weights are transferred from the input layer to the following/next layer. The weighted inputs are added to produce a single value. The value is then passed through an activation function.

Output Layer

The output from the hidden layers are passed to the output(next) layer where the active nodes combine and modify data to produce the output values of this network.

A neural network can have any number of layers and any number of nodes per layer. Most applications use 3 layer structure with a maximum of a few hundred input nodes. The hidden layer usually contributes to about 10% the size of the input layer.

However, a neural network doesn’t work exactly like our brain, the connections between biological neurons are much more complicated than those implemented by ANN.

A human brain is much more complex and there is so much more to learn from it. There are many things we don’t know about a human brain, and this makes it hard to understand how we should model an Artificial Brain to reason at a human level. Whenever we train a neural network, we want our model to learn the optimal weights (w) that best predicts the required outcome (y) given the inputs (x).