How does a Single Neural Net works?

Source: Deep Learning on Medium

We all know the buzz around Artificial Intelligence field,

But are you among the many that thinks it to be a fancy field and would be very difficult to understand.

Well, its not 100% true.

If you are interested and have some basic understanding in math already and willing to learn new things, you can crack it in no time.

Let me explain what it actually is.

Artificial intelligence is a combination of many disciplines and deep learning is the back bone of all its components.

Hand written number predictor(Deep learning Model)

Deep learning is nothing but the art of building a working and an efficient model constituting multiple layers of neural nets, that are designed to perform a specific task.

In order to understand more about deep learning we need to first understand,

What is this Artificial Neural net ?

An Artificial Neural net is very similar to the way of functioning of actual neuron we have in our brain. And it was basically designed to mimic our own neurons.

Neuron firing

A biological neuron cell is connected to a millions and millions of other neurons forming a network.
And each network of neurons in the different parts of our brain governs the difficult aspects of our day to day functioning.

Similarly as part of deep learning, we are trying to build artificial neural network with many layers of neural net to perform a more specific tasks like classification problems, playing an Atari game, Language translation etc…

To understand what these artificial neural network are, we need to know how does a single neural net works.

There are plenty of articles available explaining the function of a neural network in internet.
But this is my take on the functioning of a single neural net and have tried to explain it with a very simple working program.

Before jumping in, it will be easy to understand if there was a fundamental understanding of linear algebra and python programming.
But i will try to explain it even if you do not have the above understanding anyway.

We can divide this into two parts.

  1. Theoretical Explanation of the Neural Net
  2. Practical implementation using a python program.

Theoretical Explanation of the Neural Net

A Neural Net should be considered as a function or a matrix of coefficients, which converts the inputs into the desired output.

Every variables we see from now on, will be a array of numbers or a Matrix

It can be represented as below.

Y = A X

where Y is the output,

X is the input and

A is the function or set of coefficients that acts upon the input to give us the output.

In the conventional computer programming, we will know the input X and we would have created the function A with combinations of logic through a computer program, to attain the desired output Y.

But here, instead of creating the function A with combinations of logic, we would be teaching the model to learn what is the function A by making it to train with already available input X and output Y as data sets.

If you are wondering how does this learning takes place?

its actually how everyone learns

By making mistakes!

Initially, our model will randomly assign the set of coefficients A and this set of coefficients will act upon the input to give an output(predicted output),

which we then compare it with the actual output in the training stage.
And calculates an error value (difference between actual output and predicted output).

E = Y- AX

Where E is the error value

Y — is the actual output

AX — is the predicted output which we saw before.

Based on this error value, the model determines how much the set of coefficients needs to be changed,
so that the predicted output comes close to the actual output.

This process of updating the coefficients is done through a process called gradient descend method or Back propagation.

This is what makes the model to learn from its mistakes by looking back on the error values and adjusting the coefficients of function A.

Back propagation process.

This Back propagation can be done by just applying the differentiation on the error equation.

But why do we need to apply differentiation?

Generally differentiation of any equation will give you the rate at which equation change for a minuscule change in value of the input.

If we plot all the values of coefficients of A and see how the error equation changes with respect to its value.
we would be getting a below convex shaped graph.

error function vs coefficients.

From now on we will be calling this function of coefficients as Nodal weights.

dE = d (Y-AX)

And differentiating this error equation as above with the respect to its nodal weights, we would be getting a value.

if we add/subtract this value with the nodal weights, the error value will be starting to move towards the minimum point of the convex shaped graph. This minimum point is called Global minima .

This is the point when the error almost becomes minimum (sometimes close to zero)

So our objective is to iteratively run this process of updating the nodal weights till it reaches the global minima.

Now lets jump into implementation part

Practical implementation using a python program

This python program has two main parts,

  1. Neural net Class (This Class will contain all necessary components required in a neural net as below)

a. Nodal weights

b. Activation function

c. Prediction function

d. Training function

2. Main function (This function will call the Neural net with an input and gets back a predicted output)

As we move along the code we will learn about each and every individual components.

First, we need to import the dependencies required for this program.

The important dependency that is needed here is the Numpy library,

this library mostly contain all the required mathematical operation and helper functions.

Now we create our class for Neural net with an __init__ function as below

1. Neural net Class

a. Nodal Weights

The initialization of variables/functions in a class in python happens in its __init__ function, which initializes the class variables associated with the entire class.

For people who do not have any experience in python, just think that this function initializes the nodal weights with some random value.

For further clarity on the __init__ function, please refer the below link.

https://micropyramid.com/blog/understand-self-and-__init__-method-in-python-class/

we have created Nodal weights as a 3X1(3 rows and 1 column) matrix, because our inputs has 3 binary variables which gets multiplied with the nodal weights to provide a single binary variable output.

We will fine tune this nodal weights using gradient descend method by trying to reduce the error value calculated between the predicted and the actual output.

b. Activation function

The above function is called the activation function, as the name suggest it activates the neuron that we are trying to produce.

We should consider this activation function as the functional characteristic of a neuron, like we have in our brain’s neuron cell which comprises of nucleus, cell body and so on , which constitutes to the characteristics of the neuron.

Similarly here the activation function determines the characteristics of the neural net, and we can change this activation function depending upon the problem we are trying to solve.

And the activation function we use here is the Sigmoid activation function, which outputs all the input values of t between 0 and 1.

thereby transforming the inputs into a more normalized form, which helps us to understand relationship between the input and the output better.

This sigmoid is helpful in the binary classification problems where the output is either 0 or 1. In this program too we are trying to perform a binary classification problem since our output is either 0 or 1.

To know more about the sigmoid activation functions and all other types of activation functions available, please check out the below link

Now we need to take derivative or differentiation of the activation function, as we have already seen in the Back propagation process.

Only the derivative of activation function is needed in this case, because there are no other operation that is performed on input, other than multiplying the nodal weights which does not have a derivative as it is a constant value.

But in the more complex neural networks, many such activation functions will be involved, so the derivatives of each of those functions will be required.

c. Prediction function

Above brain function is where the important and interesting stuff happens.

This is where input and nodal weights get combined and gets activated through sigmoid. So the output function will be as below

output = 1/(1+ exp(-(inputs (.) nodel_weights))

where (.) denotes the dot product

d. Training function

This training function is what makes the neural network to learn.

Since we already know how the neural network learns, lets focus only on the implementation part.

The basic intuition behind this training function is that, by calling brain function(Prediction function) in iteration, neural net does the following as below

  1. Predicts the output
  2. Compares the predicted output with the actual output and calculates a error value
  3. Based on the error value, it updates the nodal_weights using gradient descend.

The above 3 process continues until the iteration is finished.

2. Main function

Now we have everything in place, and just need to train the model with the data sets as below,

Inputs — [0 0 1] , [1 1 1], [1 0 1], [0 1 1]

Outputs — [0], [1], [1], [0]

So the model takes in 3 inputs and gives out 1 output.

Below is the main function which trains the model with the data set and predict the output for a test input which is not in the training set.

The model will be predicting for the test input data [1 1 0] .

Lets see what is the output and trained nodal weights are

And viola, our model has calculated new nodal weights and have predicted a value of 1(approximately) for our new test input.

Interesting thing to be noted here is, if we closely observe the input training data set. We can see that the first column of the input matrix is same as that of the output.

Now if we can see the newly updated nodal weights,the first row only will be having a positive value and rest of the rows will be having a negative value.

So through training, our neural net have learned that the first column of the input alone is contributing to the output. hence it got a huge positive value while other 2 columns got a negative value.

And when this passed into the sigmoid function as below,

sigmoid(inputs (.) self.nodal_weights)

The first column of the input will be multiplied by the positive value and rest of the input columns are multiplied by the respective negative values of nodal weights.

And the sigmoid function will convert any value that is above 1 as 1 and any value below 0 as 0.

Hence if the first column of the input is 1, the output will also be approximately equal to one. We do not need to care about the rest of the function as there are multiplied by the near zero value.

That’s it!!!

we have our simple fully functioning single neural net.

The python code was inspired(just made few tweaks in above code) from the blog post by Milo Spencer.

I found his program and data-set most simple and easy to understand, but there were few fundamental questions which i had to search elsewhere to fully understand the underlying concepts, so i thought it would be helpful for a beginner to have everything in one place.

Hope you have enjoyed learning about the neural net.

Clap if you liked it 😄