Source: Deep Learning on Medium
One of the biggest tasks of the machine learning is to classification. We need to create a boundary that can separate two( or more) kinds of instances. For example, we want to build the classifier that can claim that these 20 persons will default and the rest will not. So, mathematically, we are trying to imagine/identify the boundary in n-dimensional space( feature space) that can separate one class(defaulter) from the other class( non-defaulter)
Every point in the plot represents a person( instance/row). Every point/person/instance has three features( income, age and professional experience(in years)) and hence feature space is three dimensional space(X,Y,Z). Colour is marker for default. Blue colour points(persons) are non-defaulters whereas red colour points(persons) are defaulters.
Our aim to imagine/identify the boundary that can separate the classes( blue and red, it means defaulters and non-defaulters). Here we can easily imagine a simple place that can separate the classes. It means it needs a simple plane ( mathematically, the plane is aX+bY+cZ +d = 0 where coefficients a,b,c,d are to be found out).
In other words, by using this boundary/classifier, we can predict the class for any new point(X,Y,Z). Logic is the point for which aX+bY+cZ +d >0 is one class(blue class here) otherwise another class(red class here)
In real life, there is always non-linearity, bending at extreme values of X,Y,Z . For example, your income is more than 25 lakh INR per annum, chance of default is always very low. Chance of default varies with persons with income 7 lakh to 10 lakh, but does not vary much with income 32 lakh to 35 lakh. It is non-linearity.
(lakh: 100 thousand, INR: Indian Rupee)
So, to introduce the non-linearity, we introduce activation functions ( sigmoid, hyperbolic tangent)
For sake of simplicity, we are trying to predict a class(Z) in terms of 2 dimensional feature space(X and Y). Mostly the boundaries in real life problems are non-linear as mentioned above. We need activation functions for that.
Sigmoid function can change any value into a value between 0 and 1, whereas hyperbolic tangent changes values into a value between -1 and 1. Relu, leakyRelu, ELU are advanced versions of activation function.
So the boundary changes from a plane to non-linear place to separate classes
Congratulations ! You have understood the importance/relevance of activation function, non-linearity and boundaries.
Now let us go to another level of understanding.
Here we can see how values are changed in each layer/step. We just need a functional approximation to the boundary that can separate the classes. The functional approximation is generally combination of non-linear functions( that is linear function with non-linear(activation) functions)
A linear function of X and Y are made in the next layers. Based on coefficients, a,b,c and d(initial values initially, and optimal values after tuning) and bias values, different functions/graphs are made as shown and then activation functions squashes the value/graph. Initially we don’t know the right/optimal values of coefficients so that the final function can approximate the boundary to separate classes. That is why we need back propagation to tune the coefficients/weights to get the optimal values of coefficient/weights.
Hope you have enjoyed the visual understanding of the Neural Networks