Source: Deep Learning on Medium

### 1.0 Introduction

From the Graph above it is evident that in our field of work data and accuracy are very important for us. Deep networks give us better accuracy with the limited amount of data.

We will talk briefly about **neural networks** here in this lesson. Neural Networks form the basis of deep networks and deep networks are part of every single machine learning algorithm that we use in our application these days including O.C.R, Machine Translation(My favorite), object classification and detection in photographs(used by Facebook), Automatic game playing Etc,Etc and I can keep going.

*The easiest way to define Neural networks is neural networks are **black box function Approximator.*

A **neural network** is a **black box in **the sense that while it can approximate any function, studying its structure won’t give you any insights on the structure of the function being approximated.

Two years back I wrote my first neural network using only mathematical python libraries writing all the formulas step by step. If you want to understand how neural networks work best practice is to write the code without using sci-kit learn or tensor flow. It took me 3 day and nights(i could barely sleep, eat) to complete that small piece of code and took help from **StackOverflow/DataScienc**e to improve my model accuracy which is the most important part and in the end, it boils down to how accurate your model is.

### 1.1 XOR Gate

we will take an example of XOR gate and try to approximate the function

*I just hope you are using a Linux machine and not a windows machine.*

Copy paste this code once you understand, read the comments carefully. since I explained logistic regression in the last lesson I would not explain the code again.

here we have a simple example of XOR gate, we will approximate the xor function.

import numpy as np

import pandas as pd

#initializing the inputs which is a truth table for xor gate and ‘y’ is the output of truth table

#row of the ‘x’ means the number of examples we have in a neural network

#no. of columns mean,the number of features we have.

#’x’ is a (4,2) matrix,means 4 examples and two features.

x=np.array([[0,0],[0,1],[1,0],[1,1]])

y=np.array([[0],[1],[1],[0]])

#’seed’ so that random weights donot change everytime the program runs.

np.random.seed(0)

# Optional, but a good idea to have +ve and -ve weights

theta1=np.random.rand(2,8)-0.5

#8 neurons in our hidden layers or we can also call them features.

theta2=np.random.rand(8,1)-0.5

# Necessary — the bias terms should have same number of dimensions

# as the layer.

b1=np.zeros(8)

b2=np.zeros(1)

alpha=0.01

#’lamda’regularization term to prevent overfitting,not neccessary for this example though.

lamda=0.001

# More iterations than you might think! This is because we have

# so little training data, we need to repeat it a lot.

for i in range(1,40000):

z1=x.dot(theta1)+b1

h1=1/(1+np.exp(-z1))

z2=h1.dot(theta2)+b2

h2=1/(1+np.exp(-z2))

#This dz term assumes binary cross-entropy loss

dz2 = h2-y

# You could also have stuck with squared error loss, the extra h2 terms

# are the derivative of the sigmoid transfer function.

# It converges slower though:

# dz2 = (h2-y) * h2 * (1-h2)

# This is just the same as you had before, but with less temp variables

dw2 = np.dot(h1.T, dz2)

db2 = np.sum(dz2, axis=0)

dz1 = np.dot(dz2, theta2.T) * h1 * (1-h1)

dw1 = np.dot(x.T, dz1)

db1 = np.sum(dz1, axis=0)

# The L2 regularisation terms ADD to the gradients of the weights

dw2 += lamda * theta2

dw1 += lamda * theta1

theta1 += -alpha * dw1

theta2 += -alpha * dw2

b1 += -alpha * db1

b2 += -alpha * db2

input1=np.array([[0,0],[1,1],[0,1],[1,0]])

#here input matrix is the input for the xor gate

z1=np.dot(input1,theta1)+b1

h1=1/(1+np.exp(-z1))

z2=np.dot(h1,theta2)+b2

h2=1/(1+np.exp(-z2))

print(h2)

#output for input1 for XOR gate is

#[[ 0.01031446]

# [ 0.0201576 ]

# [ 0.9824826 ]

# [ 0.98584079]]

# which is approx [0,0,1,1]

I have explained the code with comments like it was written for a 5-year-old, so I’m guessing everyone understands it.

If we add more layers to this network it becomes a deep neural network which is used in all of the deep reinforcement learning. with each added hidden layer training time increases

Back to value learning from the next lesson, pretty soon we will be talking about how **AlphaGo** runs. now that all the prerequisites are learned we are good to go.