Source: Deep Learning on Medium

In this ANN model that we’ll be looking at, I used the Rectifier and the Sigmoid function. How did I use both? Here’s the intuition:

Since my output variable is binary, I use the Rectifier function to classify that in my hidden layers, and then I use the Sigmoid function to determine the probability of whether the output will 1 or 0.

The output value and the predicted value will generally be differentiated by a cost function (error).

The goal is to minimize the loss function (cost) since this would bring the predicted value closer to the actual value. This is usually done by changing the weights of the input variables. Sometimes it can take a lot of time and computational power to calculate the actual or global cost function, and it makes sense to use a gradient descent approach to make this process much faster.

A Gradient descent uses the slope of a loss function at a certain point and tries to move downwards to find the lowest point of the function. However, if my function is not convex (with higher degrees freedom), I could end up at a local minimum rather than the global minimum of the function, and the network wouldn’t be as efficient.

Therefore, I use the stochastic gradient descent method, which runs the function for each and every row and keeps updating the minimum of the cost function. This way, I have a higher chance of finding the global minimum. It is also actually faster than the gradient function since it is running smaller algorithms.

# II. || The problem ||

This is a dataset of a firm’s customers, it’s not a real dataset but resembles real-like data. The aim is to build an ANN to predict whether a customer will leave the company or not given certain demographic characteristics such as age, gender, salary, credit score, whether they are active or not, etc.

Thus, we can classify this problem as a demographic segmentation model.

Such a model could be used to predict anything, not just customer churn. You could also try to predict things like whether the customer should get a loan, or if the customer is more likely to purchase a new product. The only change would be to relabel the variables so that we’re predicting the right thing.

Below, you can see a workflow of the ANN model I created in this project. This includes all the steps required to build such a model. After you learn it, you can refer to this diagram to help you remember the steps.