Implementing A Simple Artificial Neural Network from Scratch in Python

Original article was published on Deep Learning on Medium

Implementing A Simple Artificial Neural Network from Scratch in Python

Unveiling Math and Logic behind it.

What’s a Neural Network?

In Layman terms, a neural network is just a mathematical function where you enter a vector of values, those values get transformed by other values inside the function and the value or vector of values are obtained as the desired output.

Now getting back to the world of data science, a neural network depicts a human brain structure that consists of simple but highly interconnected nodes, called neurons, which are organized in layers that process information received from external inputs through dynamic learning and sends out desired outputs. So, basically we have a set of inputs and a set of target values and we try to predict outputs that match those targets as close as possible.

Biological Neuron vs Artificial Neuron

Neural Network Architecture consists of :

  • An input layer, x
  • An arbitrary choice of hidden layers
  • A set of weights, W and biases, b between layers.
  • An arbitrary choice of an activation function, 𝞼 for each hidden layer.
  • An output layer, ŷ
The basic architecture of a neural network

Let’s Get Our hand’s dirty with Math

I’m assuming that you know all those fancy terminologies behind the neural networks and little bit knowledge about calculus.

Let’s start implementing simple 2-layered artificial neural network,

Let Input be X = [x1, x2] = [0.1,0.3], Target be Y=[1], Activation Function for hidden layers be “ReLU” and for output layer be “Sigmoid”.

Okay, it’s time to train our Neural Network:

Step 1: Initialize W and b as random values.

[[w1,w2],[w3,w4]] = [[-0.1,0.2],[0.2,-0.3]]

[b11,b12] = [0,0]

[[w5],[w6]] = [[0.3],[-0.1]]

[b21] = [0]

https://gist.github.com/vamc-stash/670724f74e4ba20093b126a9689271e5

Step 2: Feed Forward Propagation

Z = W.X + b

A = activation_function(Z)

For 1st layer,

z1 = w1*x1 + w3*x2 + b11 = -0.1*0.1 + 0.2*0.3 + 0 = 0.05

a1 = ReLU(0.05) = max(0,0.05) = 0.05

z2 = w2*x1 + w4*x2 + b12 = 0.2*0.1 + -0.3*0.3 + 0 = -0.07

a2 = ReLU(-0.07) = max(0,-0.07) = 0.0

a1, a2 are the inputs to the 2nd layer.

For the 2nd layer,

z3 = w5*a1 + w6*a2 + b21 = 0.3*0.05 + -0.1*0.0 + 0 = 0.015

a3 = Sigmoid(0.015) = 1/(1+e^(-z3)) = 0.504

ŷ = a3

https://gist.github.com/vamc-stash/3e25838f9d7cc496e23260e5fee2ab17

Step 3: Compute Error

Cost function we use is “ binary_crossentropy”

Where C is the number of classes, y is the target value and ŷ is the predicted value.

Since our classification is binary, C = 2, the cross_entropy function becomes:

Error E =- y log(ŷ)-(1-y)log(1-ŷ)

= -1*(log(0.504))-(1–1)(log(1–0.504))

= 0.2975

https://gist.github.com/vamc-stash/7a9864c0b73ffdb2a99160c792373d04

Step 4: Backward Propagation

For Output Layer,

dE/dw5 = dE/da3 * da3/dz3 * dz3/dw5 →eq.1

dE/da3 = dE / dŷ = d (- y log(ŷ)-(1-y)log(1-ŷ)) / dŷ

= -( y / ŷ) + ((1-y)/(1-ŷ))

=- (1/0.504) + 0 = -1.985 →eq.2

da3/dz3 = d(1/(1+e^(-z3))) / dz3 = e^(-z3) / (1+e^(-z3))²

= 0.249 →eq.3

dz3/dw5 = d(w5*a1 + w6*a2 + b21)/dw5 = a1 = 0.05 →eq.4

dE/dw5 = ᅀw5 = -1.985 * 0.249 * 0.05 = -0.0247 (from eq.1)

Similarly for w6 and b2,

dE/dw6 = dE/da3 * da3/dz3 * dz3/dw6 →eq.5

dz3/dw6 = d(w5*a1 + w6*a2 + b21)/dw6 = a2 = 0.0 →eq.6

dE/dw6 = ᅀw6 = -1.985 * 0.249 * 0.0 = 0.0 (from eq.5, eq.2, eq.3)

dE/ db21 = dE/da3 * da3/dz3 * dz3/db21

dz3/db21 = d(w5*a1 + w6*a2 + b2)/db21 = 1

dE/db21 = ᅀb21 = -1.985 * 0.249 *1 = -0.4942

For Hidden layer,

dE/dw1 = dE/da1 * da1/dz1 * dz1/dw1 →eq.7

dE / da1 = dE/z3 * dz3/da1 =(dE/da3 * da3/dz3) * dz3/da1 →eq.8

( If there are more nodes in the output layer, then the error propagated to the preceding layer node should be considered from all present layer nodes that lead to the preceding layer node. Suppose, if there are two nodes in the output layer, whose errors and outputs are E1, ŷ1and E2,ŷ2 respectively, then error propagated to node z1 is dE / da1 = dE1/da1 + dE2/da1 )

dz3/da1 = d(w5*a1 + w6*a2 + b21)/da1 = w5 = 0.3 →eq.9

dE/da1 = -1.985*0.249*0.3 = -0.1482 →eq.10 (from eq.2, eq.3, eq.9)

da1/dz1 = d(max(0,z1))/dz1 = 1.0 (since z1 != 0.0) →eq.11

dz1/dw1 = d(w1*x1 + w3*x2 + b11)/dw1 = x1 = 0.1 →eq.12

dE/dw1 = ᅀw1 = -0.1482 * 1.0 * 0.1 = -0.01482 (from eq.7)

Similarly,

dE/dw2 = dE/da2 * da2/dz2 * dz2/dw2 →eq.13

dE / da2 = dE/z3 * dz3/da2 =(dE/da3 * da3/dz3) * dz3/da2

dz3/da2 = d(w5*a1 + w6*a2 + b21)/da1 = w6 = -0.1 →eq.14

dE/da2 = -1.985*0.249*-0.1 = 0.0496 →eq.15(from eq.2, eq.3, eq.13)

da2/dz2 = d(max(0,z2))/dz2 = 0.0 (since z2 == 0.0) →eq.16

dz2/dw2 = d(w2*x1 + w4*x2 + b12)/dw2 = x1 = 0.1→eq.17

dE/dw2 = ᅀw2 = 0.0496 * 0.0 * 0.1 = 0.0 (from eq.13)

We can calculate similarly for w3 and w4

dE/dw3 = ᅀw3 = -0.1482 * 1.0 * 0.3 = 0.04446

dE/dw4 = ᅀw4 = 0.0496 * 0.0 * 0.3 = 0.0

For b11, b12

dE/ db11 = dE/da1 * da1/dz1 * dz1/db11

dz1/db11 = d(w1*x1 + w3*x2 + b11)/db11 = 1

dE/db11 = ᅀb11 = -0.1482 * 1.0 * 1 = -0.1482 (from eq.10, eq.11)

dE/ db12 = dE/da2 * da2/dz2 * dz2/db12

dz1/db12 = d(w2*x1 + w4*x2 + b12)/db12 = 1

dE/db12 = ᅀb12 = 0.0496 * 0.0 * 1 = 0.0 (from eq.15, eq.16)

https://gist.github.com/vamc-stash/2f35083ab29c76922698b57c958a42c3

Step 5: Update parameters W and b

Let learning rate, 𝞮 = 0.01

W = W – 𝞮(𝚫W)

b = b – 𝞮(𝚫b)

w1 = w1-𝞮(𝚫w1) = -0.1–0.01*(-0.01482) = -0.0998

w2 = w2-𝞮(𝚫w2) = 0.2–0.01*(0) = 0.2

w3 = w3-𝞮(𝚫w3) = 0.2–0.01*(0.04446) = 0.1995

w4 = w4-𝞮(𝚫w4) = -0.3–0.01*(0) = -0.3

w5 = w5-𝞮(𝚫w5) = 0.3–0.01*(-0.0247) = 0.3002

w6 = w6-𝞮(𝚫w6) = -0.1–0.01*(0) = -0.1

b11 = b11-𝞮(𝚫b11) = 0–0.01*(-0.1482) =0.001482

b12 = b12-𝞮(𝚫b12) = 0–0.01*(0) = 0

b11 = b11-𝞮(𝚫b11) = 0–0.01*(-0.4942) = 0.004942

https://gist.github.com/vamc-stash/f5eaa33973474c46746c88403f55e4b9

Now, repeat the process from step 2 till you get the minimum error.

After training for certain iterations, you will have a 2-layer neural network model with no inbuilt libraries for a binary classification task. Now, you can start making predictions …..

https://gist.github.com/vamc-stash/5dab6347b1405fd48def0524af033c63

Assembling code chunks

Let’s put all our code together:

https://gist.github.com/vamc-stash/468d4cea50bb59107947eef9870c2502

But Why From Scratch?

There are many libraries in python which allows you to create a Neural Network without binding much dirt to your hands. Yet, it’s better to have an intuition on how neural networks work, the beauty, and the logic behind them and that is essential for designing effective models.