Build Your Own Neural Network From Scratch with Python

Original article was published on Deep Learning on Medium

There are many python libraries to build and train neural networks like Tensorflow and Keras. But to really understand neural networks, we need to understand its basic structure and be able to build and train a network of our own.

Neural networks can learn data much better compared to regular machine learning algorithms. In fact, a neural network algorithm can be interpreted as a bunch of linear regressions, where each node is an output of one linear regression.

In the diagram below, the input layer has 3 nodes and the next layer (hidden) has 4 nodes and the output layer has 2 nodes. The input layer is not counted in the number of layers, hence it is a 2-layered network.

2-layered NN architecture

Each node in the 2 layers are an output of one linear regression. At each layer, after the linear regressions are performed, the values are fed to an activation function.

The basic algorithm for a neural network should be something like this.

for n epochs:
1. forward_propagation() #predicting output
2. backward_propagation() #updating parameters according to loss

The function names suggest the basic structure of the algorithm. In this article, we will build a 2-layer neural network.

Let’s look at the common naming conventions we will be using.

Z is the linear forward value .i.e Z = W.X and A is the node activation .i.e A = sigmoid(Z) . We will be using the sigmoid activation function for our network.

1. Initializing parameters

First, we initialize the weights and layer sizes for our neural network.

We define the variables as follows:

  • [num_samples, n_x] : Shape of the input
  • n_x : Number of features of input X
  • n_h: Nodes in hidden layer
  • n_y : Number of target values for each sample

First, we initiate the layer sizes depending on the input and output values and our choice of hidden layer size.

Then we initialize the weights based on the layer sizes.

  • In the diagram above, for the first layer, we need to define our weights in such way as to be able to compute 4 (n_h) linear regressions, one for each node in the next layer.
  • So the weights in the first layer, W1 should be (3,4) or (n_x,n_h) (four (n_h) linear regressions on three (n_x) input features).

With this analogy, you can easily guess the shape of the weights W2 in the above network.

Let’s define our function to initialize weights.

2. Forward Propagation

The forward propagation consists of a bunch of linear regressions combined with a non-linear activation function at each layer.

We will use the sigmoid function as the activation function for our neural network.

Forward propagation is pretty straightforward once you get the weights right.

output = sigmoid( (W2*sigmoid(W1*X + b1)) + b2)

Let’s define our forward function.

Forward propagation

3. Backward Propagation

This is the tricky part. The back propagation updates the weight parameters accordingly, in the direction where the loss is minimized. We will use the Mean Squared Error (MSE) loss for our network. The calculation looks complicated but it is, in fact, simple calculus.

Derivatives of the loss function w.r.t. weights and sigmoid derivative

That was pretty simple, right? Now, let’s define our function for backward propagation. We will compute the gradients and update the weights in the backward step.

Backward propagation

4. Training

One epoch in training consists of one forward and backward propagation step.

We will use the breast cancer dataset from the sklearn.datasets module. The dataset has 30 features and 569 samples and train our network for 1000 epochs with 4 nodes in the hidden layer.

Comparing predicted and output values of 10 random samples.

Looks like our network worked really well!

Note that the predicted values slightly differ from the actual values. This is desirable, since it means the model is not overfitting and will generalize better.


In our neural network, we used the sigmoid function in the activation layer. But there are other activation function which can be useful depending on the type of problem. In fact, the activation function can be any function of your choice if you think that function is well suitable for the particular problem we are trying to solve. But the common activation functions used in neural networks can be found here.