Source: Deep Learning on Medium
What is the activation function?
The activation function is used to check if the weighted sum of the inputs will be fired or not. Is it possible to do this process without using an activation function? It’s possible, but it will be a linear regression transformation, which will never be able to solve complex tasks. Herein lies the advantage of the activation function—it can simply transform nonlinear inputs, which are useful to handle more complex tasks like language translation and image classification.
Let’s explore 4 common types of activation functions: threshold, sigmoid, rectifier, and hyperbolic tangent function (tanh).
1. Threshold function:
In the figure below, threshold functions have the weighted sum on the x-axis and values from 0 to 1 on the y-axis. You’ll notice that if the weighted sum is greater than 0, the signal will be fired, and if it’s not, then the signal won’t be passed. As such, it’s basically a yes or no function—but can you see any problem with this type of function?
Let’s consider a situation in which we have more than one neuron with different classes. They’ll all be activated, so all of them will give output 1—how can we distinguish them? The main problem in this type of function is classification. To solve this problem, we can say we need analog activation values. Let’s see whether the next function will solve this problem or not.
2. Sigmoid function:
We can say that sigmoid is an analog activation function, especially in the steep region, so it’s very good for classification outputs. Another advantage is we can control the output range from 0 to 1, so any change in the x value (weighted sum) will cause a significant change in the y-axis. This can be used for predicting probability. because we don’t want our model to predict the probability value to be below 0 or above 1. it has some disadvantages, two of which are outputs that are not zero-centered and the high computational cost.
3. Rectifier function:
This function gives the same value of the weighted sum if the weighted sum is greater than 0 (positive values); otherwise, the output will be 0. One of the problems with this function is that the range is from 0 to infinity, which means that this function has no range—yet it’s less expensive than the sigmoid function.
4. Hyperbolic tangent function (tanh):
This function looks like a sigmoid function, but the difference here is it goes below 0. Its range is from -1 to 1, which means it’s zero-centered, making it easier to model inputs that have strongly negative, neutral, and strongly positive values.
If you’re confused about which function you should use, the answer to this question depends on which function is faster in your case. This will lead to faster training. If you want to classify inputs, you should choose the sigmoid function.
For example, in the next figure, we have some inputs. We apply the rectifier activation function on the hidden layer. After that, we apply the sigmoid function to get the output—so in general, we need to choose the activation function according to the task at hand.