Different Types of Activation Functions

Original article can be found here (source): Deep Learning on Medium

Different Types of Activation Functions

Sigmoid, tanh, ReLu & Leaky ReLu

Sigmoid

The sigmoid function range is [0,1]

Advantages

It is good to use for the problem of binary classification as it outputs a value between 0 and 1 (0&1 inclusive).

Drawbacks

It is some times not used in hidden layers because as the value of the x becomes large the steepness of the graph decreases, the gradient values become very small, this can slow the learning of our model.

tanh

Tanh function range is [-1,1]

Advantages

This is better than the sigmoid function as it is steeper than the sigmoid for small values which make learning faster as the gradient is large.

Drawbacks

As it ranges between -1 and 1 it cannot be used as the activation function of the last layer for binary classification as we need the binary output.

Moreover, just like the sigmoid function, the gradient is very less for the large values which again makes the learning slow.

ReLu

Advantages

Nowadays, this activation is mostly used as the gradient is significant and also it remains the same for large values.

Drawbacks

For negative values, the gradient drops to 0 which can make learning significantly slow. So, this should be used when most of the input values for a given layer are positive.

Leaky Relu

Advantages

This is a modified version of the ReLu function.

In this function for negative values, there is some slight slope that ensures that the gradient does not drop to zero, and the learning of the model is not being much affected.

Drawbacks

This is not as such a major drawback but sometimes the slope of function in the negative region needs to be fine-tuned.