Original article can be found here (source): Deep Learning on Medium
ARTIFICIAL NEURAL NETWORK
In this work, we concentrate on artificial neural networks (ANNs) which are a useful tool in machine learning and data mining. They are mostly inspired by how neurons work in the brain. To understand what ANN does, first the mentality behind biological neural networks (BNNs) should be clarified. Each biological neuron has dendrites, which receive information from other neurons and transmits signals to the cell body. Cell bodies are the parts where the signals from the dendrites are joined and passed on. Finally, axons take part to send signals to other neurons and process repeats. Similarly, each artificial neuron acts as a function which takes weighted inputs from others just as dendrites do, conducts a processing stage like cell bodies, and produces an output for the use of other artificial neurons. Since neural networks are much more complicated to simulate and understand, ANNs try to mimic and simulate only most basic parts. The representation of a simple biological neuron and its equivalent functions in an artificial neuron can be seen in Fig. 1.
Basically, an ANN has three components which are weights, bias and activation function as formulated with the Eq. 1 where b is bias, w is weight matrix and x is input. Here f() stands for activation function. Their schematic representation in an ANN can be seen in Fig. 2.
The first main component is weight which represents the strength of connection between units which is w in Eq. 1 and W in Fig. 2. It determines if a neuron has more influence for the output value. The main purpose of a neural network is to adjust and find the optimum weight to minimize the loss or error. Initially, we select weights randomly. Then we train our network by punishing the weights to decide the output.
Second main part of an ANN is bias which is b in Eq. 1 and Bias in Fig. 2. We can think bias as an intercept in a linear equation. It acts as a constant which helps the model to fit the given data. It makes the model more flexible.
Last part of an ANN is activation function which is function f() in Eq. 1. It is also named as transfer function. It is a decision making mechanism if a neuron should be activated or not by making calculations with the weighted sum and bias values. Without activation functions, an ANN just acts as a linear regression model. It introduces non-linearity to neural networks. Rectified Linear Units (ReLU), Tanh and Sigmoid Activation are the most common ones used as an activation function. Its role in an ANN can be seen in Fig. 2.
Tanh and Sigmoid are the functions that are commonly used in recurrent neural networks as activation functions which will be explained in RNN part. The range of the tanh function is [-1,1] and that of the sigmoid function is [0,1]. Gradient values of tanh range between 0 and 1 and gradient values of sigmoid range between 0 and 0.25. These gradient values will play an important role when Vanishing Gradient Problem being explained in Vanishing Gradient Problem part.