Neural Network from Scratch

Source: Deep Learning on Medium

When I was playing with this nice tool I found really interesting to recreate this tool by myself. It helped me to understand how Neural Networks works and learns to find the best parameters.

We do not use Keras or Tensorflow or any framework to build and train our network, the network is built from scratch using our own functions. It is very useful for understanding the concepts and basic parameters of the neural network. We have used sklearn to generate a random dataset and we have just built functions to train it.

Please find the code in my Github 🙂

Let’s go!!

We are starting with a binary problem, therefore we are going to generate a binary dataset using sklearn, specifically make_circles. We are using n_samples=1000 (The total number of points generated), shuffle = True (Whether to shuffle the samples), noise = 0.09 (Standard deviation of Gaussian noise added to the data) and factor = 0.46 (Scale factor between inner and outer circle).

Our neural network will have to learn to classify this data distribution:

Data to train our neural network

Neural Network

We need to build a class to create the basic parameters of each layer. The basic unit of a neural network is called, perceptron, the perceptron is an algorithm for learning a binary classifier. So if we take a look into this, we can observe that there are 3 parameters:

Perceptron, basic unit of a neural network
  • w: weights
  • b: bias
  • activation_function: activation function to convert the value z to z_act, maps the resulting values z into the desired range such as between 0 to 1 or -1 to 1 etc. (depending upon the choice of activation function).

Mathematically, we can say that the function of a perceptron is:

Perceptron function

The basic unit of a neural network is called perceptron

So, if we combine a lot of perceptron (linear model) we will build a neural network, basically, the neural network combines linear models to create a non-linear-model for extracting non-linear patterns in the data. This is the important thing of the neural networks.

Combining a lot of perceptrons, we build a neural network

Now, we have understood some basic concepts, we are going to create our class called layers to define the basic parameters of each layer.

Neural Network combines linear models to create a non-linear-model for extracting non-linear patterns in the data

Activation Functions

We define 2 types of activation functions: sigmoid and tanh. There are more activations functions that we can use, but currently we are starting with these 2 types. We define also the derivative of each activation functions because we are going to use it in the back-propagation process.

We can plot the sigmoid and tanh functions with their derivatives.

sigmoid and tanh functions with their derivatives.

Build Neural Network

Now, we have the activation functions and the basic parameters of each layer, so we can build our architecture with both of them. We only need one more thing, we have to define our architecture, we have to define how many neurons and layers we are going to have in our neural network.

For this reason, we create an array, called num_neurons_per_layers with the number of neurons in each layer. For instance:

num_neurons_per_layers = [input_attrib, 8, 16, 32, 16, 8, 1]

  • input_attrib = number of features in our data, we are using a binary_classify so we are 2 features.
  • hidden_layers = These layers are defined by : [8, 16, 32, 16, 8]
  • output_layer = The last layer are going to give us the prediction, as this problem is a binary problem we only have 1 neuron in this layer.


To train the neural network we need to define the error function, this function are going to calculate the difference between the result of the neural network y_pred and the real result y_true, we use the mean squared error. Also, we derive this function because we will use it in the back-propagation process.

Basically, we split the training process in:

  • Forward We need to take each layer and calculate the equation explained above and apply the activation function to the result z getting z_act.
  • Back-propagation In this part we start by the end of the neural network, so we apply reversed to start by the last layer and we have to calculate the error in the next layer respect to the actual layer. Then, we apply the gradient descent to update the weights of the neural network.

The equations for back-propagation can find here.

We use a loop to train the neural network for 2000 epochs, using a learning_rate = 0.03, we are going to plot the results of the neural network using heatmap to see how it is changing per epoch.

We can combine the different generated images (saved in ./results/to{}.png) to represent the training using a gif image.

If you find this article interesting, please feel free to take my code to improve it. Share this story with your contacts and clap if you like it 🙂