R.L Lesson 1:Part 3

Source: Deep Learning on Medium


1.0 Introduction

From the Graph above it is evident that in our field of work data and accuracy are very important for us. Deep networks give us better accuracy with the limited amount of data.

We will talk briefly about neural networks here in this lesson. Neural Networks form the basis of deep networks and deep networks are part of every single machine learning algorithm that we use in our application these days including O.C.R, Machine Translation(My favorite), object classification and detection in photographs(used by Facebook), Automatic game playing Etc,Etc and I can keep going.

The easiest way to define Neural networks is neural networks are black box function Approximator.

A neural network is a black box in the sense that while it can approximate any function, studying its structure won’t give you any insights on the structure of the function being approximated.

Two years back I wrote my first neural network using only mathematical python libraries writing all the formulas step by step. If you want to understand how neural networks work best practice is to write the code without using sci-kit learn or tensor flow. It took me 3 day and nights(i could barely sleep, eat) to complete that small piece of code and took help from StackOverflow/DataScience to improve my model accuracy which is the most important part and in the end, it boils down to how accurate your model is.

1.1 XOR Gate

we will take an example of XOR gate and try to approximate the function

XOR Function
The truth table for XOR gate

I just hope you are using a Linux machine and not a windows machine.

Copy paste this code once you understand, read the comments carefully. since I explained logistic regression in the last lesson I would not explain the code again.

here we have a simple example of XOR gate, we will approximate the xor function.
import numpy as np
import pandas as pd
#initializing the inputs which is a truth table for xor gate and ‘y’ is the output of truth table
#row of the ‘x’ means the number of examples we have in a neural network
#no. of columns mean,the number of features we have.
#’x’ is a (4,2) matrix,means 4 examples and two features.
x=np.array([[0,0],[0,1],[1,0],[1,1]])
y=np.array([[0],[1],[1],[0]])
#’seed’ so that random weights donot change everytime the program runs.
np.random.seed(0)
# Optional, but a good idea to have +ve and -ve weights
theta1=np.random.rand(2,8)-0.5
#8 neurons in our hidden layers or we can also call them features.
theta2=np.random.rand(8,1)-0.5
# Necessary — the bias terms should have same number of dimensions
# as the layer.
b1=np.zeros(8)
b2=np.zeros(1)
alpha=0.01
#’lamda’regularization term to prevent overfitting,not neccessary for this example though.
lamda=0.001
# More iterations than you might think! This is because we have
# so little training data, we need to repeat it a lot.
for i in range(1,40000):
z1=x.dot(theta1)+b1
h1=1/(1+np.exp(-z1))
z2=h1.dot(theta2)+b2
h2=1/(1+np.exp(-z2))
#This dz term assumes binary cross-entropy loss
dz2 = h2-y
# You could also have stuck with squared error loss, the extra h2 terms
# are the derivative of the sigmoid transfer function.
# It converges slower though:
# dz2 = (h2-y) * h2 * (1-h2)
# This is just the same as you had before, but with less temp variables
dw2 = np.dot(h1.T, dz2)
db2 = np.sum(dz2, axis=0)
dz1 = np.dot(dz2, theta2.T) * h1 * (1-h1)
dw1 = np.dot(x.T, dz1)
db1 = np.sum(dz1, axis=0)
# The L2 regularisation terms ADD to the gradients of the weights
dw2 += lamda * theta2
dw1 += lamda * theta1
theta1 += -alpha * dw1
theta2 += -alpha * dw2
b1 += -alpha * db1
b2 += -alpha * db2
input1=np.array([[0,0],[1,1],[0,1],[1,0]])
#here input matrix is the input for the xor gate
z1=np.dot(input1,theta1)+b1
h1=1/(1+np.exp(-z1))
z2=np.dot(h1,theta2)+b2
h2=1/(1+np.exp(-z2))
print(h2)
#output for input1 for XOR gate is
#[[ 0.01031446]
# [ 0.0201576 ]
# [ 0.9824826 ]
# [ 0.98584079]]
# which is approx [0,0,1,1]

I have explained the code with comments like it was written for a 5-year-old, so I’m guessing everyone understands it.

If we add more layers to this network it becomes a deep neural network which is used in all of the deep reinforcement learning. with each added hidden layer training time increases


Back to value learning from the next lesson, pretty soon we will be talking about how AlphaGo runs. now that all the prerequisites are learned we are good to go.