Building a Neural Network with a Single Hidden Layer using Numpy

Original article was published on Deep Learning on Medium

Building a Neural Network with a Single Hidden Layer using Numpy

Implement a 2-class classification neural network with a single hidden layer using Numpy

In the previous post, we discussed how to make a simple neural network using NumPy. In this post, we will talk about how to make a deep neural network with a hidden layer.

  1. Import Libraries

We will import some basic python libraries like numpy, matplotlib (for plotting graphs), sklearn (for data mining and analysis tool), etc. that we will need.

import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split

2. Dataset

We will use the Banknote Dataset that involves predicting whether a given banknote is authentic given several measures taken from a photograph. It is a binary (2-class) classification problem. There are 1,372 observations with 4 input variables and 1 output variable. For more detail see the link.

data = np.genfromtxt(‘data_banknote_authentication.txt’, delimiter = ‘,’)
X = data[:,:4]
y = data[:, 4]

We can visualize the dataset using a scatter plot. We can see two classes (authentic and not authentic) are separable. Our goal is to build a model to fit this data i.e. we want to build a neural network model that defines regions as either authentic or unauthentic.

plt.scatter(X[:, 0], X[:, 1], alpha=0.2,
c=y, cmap=’viridis’)
plt.xlabel(‘variance of wavelet’)
plt.ylabel(‘skewness of wavelet’);

Now, let us divide the data into a training set and test set. This can be accomplished using sklearn train_test_split() function. 20% of data is selected for test and 80% for train. Also, we will check the size of the training set and test set. This will be useful later to design our neural network model.

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)X_train = X_train.T
y_train = y_train.reshape(1, y_train.shape[0])
X_test = X_test.T
y_test = y_test.reshape(1, y_test.shape[0])
print (‘Train X Shape: ‘, X_train.shape)
print (‘Train Y Shape: ‘, y_train.shape)
print (‘I have m = %d training examples!’ % (X_train.shape[1]))

print ('\nTest X Shape: ', X_test.shape)

3. Neural Network Model

The general methodology to build a Neural Network is to:

1. Define the neural network structure ( # of input units, # of hidden units, etc). 
2. Initialize the model's parameters
3. Loop:
- Implement forward propagation
- Compute loss
- Implement backward propagation to get the gradients
- Update parameters (gradient descent)

We will build a Neural Network with a single hidden layer as shown in the following figure:

3.1 Define structure

We need to define the number of input units, the number of hidden units, and the output layer. The input units are equal to the number of features in the dataset (4), hidden layer is set to 4 (for this purpose), and the problem is the binary classification we will use a single layer output.

def define_structure(X, Y):
input_unit = X.shape[0] # size of input layer
hidden_unit = 4 #hidden layer of size 4
output_unit = Y.shape[0] # size of output layer
return (input_unit, hidden_unit, output_unit)
(input_unit, hidden_unit, output_unit) = define_structure(X_train, y_train)
print("The size of the input layer is: = " + str(input_unit))
print("The size of the hidden layer is: = " + str(hidden_unit))
print("The size of the output layer is: = " + str(output_unit))

3.2 Initialize Model Parameter

We need to initialize the weight matrices and bias vectors. Weight is initialized randomly while bias is set to zeros. This can be done using the following function.

def parameters_initialization(input_unit, hidden_unit, output_unit):
np.random.seed(2)
W1 = np.random.randn(hidden_unit, input_unit)*0.01
b1 = np.zeros((hidden_unit, 1))
W2 = np.random.randn(output_unit, hidden_unit)*0.01
b2 = np.zeros((output_unit, 1))
parameters = {"W1": W1,
"b1": b1,
"W2": W2,
"b2": b2}

return parameters

3.3.1 Forward Propagation

For forward propagation, given the set of input features (X), we need to compute the activation function for each layer. For the hidden layer, we are using tanh activation function:

Similarly, for the output layer, we are using sigmoid activation function.

We can use the following code to implement forward propagation.

def sigmoid(z):
return 1/(1+np.exp(-z))
def forward_propagation(X, parameters):
W1 = parameters['W1']
b1 = parameters['b1']
W2 = parameters['W2']
b2 = parameters['b2']

Z1 = np.dot(W1, X) + b1
A1 = np.tanh(Z1)
Z2 = np.dot(W2, A1) + b2
A2 = sigmoid(Z2)
cache = {"Z1": Z1,"A1": A1,"Z2": Z2,"A2": A2}

return A2, cache

3.3.2 Compute Cost

We will compute the cross-entropy cost. In the above section, we calculated A2. Using A2 we can compute cross-entropy cost using the following formula.

def cross_entropy_cost(A2, Y, parameters):
# number of training example
m = Y.shape[1]
# Compute the cross-entropy cost
logprobs = np.multiply(np.log(A2), Y) + np.multiply((1-Y), np.log(1 - A2))
cost = - np.sum(logprobs) / m
cost = float(np.squeeze(cost))

return cost

3.3.3 Backpropagation

We need to calculate the gradient with respect to different parameters as shown below.

Pic Courtesy: Andrew Ng
def backward_propagation(parameters, cache, X, Y):
#number of training example
m = X.shape[1]

W1 = parameters['W1']
W2 = parameters['W2']
A1 = cache['A1']
A2 = cache['A2']

dZ2 = A2-Y
dW2 = (1/m) * np.dot(dZ2, A1.T)
db2 = (1/m) * np.sum(dZ2, axis=1, keepdims=True)
dZ1 = np.multiply(np.dot(W2.T, dZ2), 1 - np.power(A1, 2))
dW1 = (1/m) * np.dot(dZ1, X.T)
db1 = (1/m)*np.sum(dZ1, axis=1, keepdims=True)

grads = {"dW1": dW1, "db1": db1, "dW2": dW2,"db2": db2}

return grads

3.3.4 Gradient Descent (update parameters)

We need to update the parameters using the gradient descent rule i.e.

where 𝛼 is the learning rate and 𝜃 is the parameter.

def gradient_descent(parameters, grads, learning_rate = 0.01):
W1 = parameters['W1']
b1 = parameters['b1']
W2 = parameters['W2']
b2 = parameters['b2']

dW1 = grads['dW1']
db1 = grads['db1']
dW2 = grads['dW2']
db2 = grads['db2']
W1 = W1 - learning_rate * dW1
b1 = b1 - learning_rate * db1
W2 = W2 - learning_rate * dW2
b2 = b2 - learning_rate * db2

parameters = {"W1": W1, "b1": b1,"W2": W2,"b2": b2}

return parameters

4. Neural Network Model

Finally, putting together all the functions we can build a neural network model with a single hidden layer.

def neural_network_model(X, Y, hidden_unit, num_iterations = 1000):
np.random.seed(3)
input_unit = define_structure(X, Y)[0]
output_unit = define_structure(X, Y)[2]

parameters = parameters_initialization(input_unit, hidden_unit, output_unit)

W1 = parameters['W1']
b1 = parameters['b1']
W2 = parameters['W2']
b2 = parameters['b2']

for i in range(0, num_iterations):
A2, cache = forward_propagation(X, parameters)
cost = cross_entropy_cost(A2, Y, parameters)
grads = backward_propagation(parameters, cache, X, Y)
parameters = gradient_descent(parameters, grads)
if i % 5 == 0:
print ("Cost after iteration %i: %f" %(i, cost))
return parametersparameters = neural_network_model(X_train, y_train, 4, num_iterations=1000)

5. Prediction

Using the learned parameter, we can predict the class for each example by using forward propagation.

def prediction(parameters, X):
A2, cache = forward_propagation(X, parameters)
predictions = np.round(A2)

return predictions

If the activation > 0.5, then prediction is 1 otherwise 0.

predictions = prediction(parameters, X_train)
print ('Accuracy Train: %d' % float((np.dot(y_train, predictions.T) + np.dot(1 - y_train, 1 - predictions.T))/float(y_train.size)*100) + '%')
predictions = prediction(parameters, X_test)
print ('Accuracy Test: %d' % float((np.dot(y_test, predictions.T) + np.dot(1 - y_test, 1 - predictions.T))/float(y_test.size)*100) + '%')

As we see, the training accuracy is around 97% which means that our model is working and fit the training data with high probability. The test accuracy is around 96%. Given the simple model and the small dataset, we can consider it as a good model.