Deep Learning: Explore Neural Network, Perceptron, Convolution and Pooling

Original article was published on Artificial Intelligence on Medium

Deep Learning: Explore Neural Network, Perceptron , Convolution and Pooling

ANN inspired by Biological Neuron :

  • Dendrites receive signals from other neurons
  • The axon sums the incoming signals.
  • When the sum > a threshold, the cell fires; that is, it transmits a signal over its axon to other cells.
  • Dendrites and cell bodies form the incoming signals.

Single-layer Perceptron

•In BN Dendrites receive signals from other neurons

•In ANN Here the input is received through nodes.

•In BN The soma sums the incoming signals.

•When the sum>a threshold, the cell fires; that is, it transmits a signal over its axon to other cells.

•In ANN transfer function helps decide the output of the neuron, the output of the transfer function is fed to the final block.

x1, x2, and x3 are the input nodes of the Single Layer perceptron.

0.3 is the weight associated with each node of the Neural network.

0.4 is the bias value for The neural network.

y =0.3×1 +0.3×2 +0.3×3–0.4

Here Y can be positive and negative depending on the values of x1, x2, and x3.

Many layers are present in the model.

•Input Layer, Hidden layer, Output Layer.

•W13 is the weight associated with x1

•W23 is the weight associated with x2

•W33 is the weight associated with x3

•b3 is the bias

•Out y23 =x1.w13+x2.w23+x3.w33

The structure of a simple three-layer neural network shown in Fig. 4.1. Every neuron of one layer is connected to all neurons of the next layer, but it gets multiplied by a so-called weight which determines how much of the quantity from the previous layer is to be transmitted to a given neuron of the next layer. Of course, the weight is not dependent on the initial neuron, but it depends on the initial neuron–destination neuron pair. This means that the link between say neuron N5 and neuron M7 has a weight wk while the link between the neurons N5 and M3 has a different weight, WJ .

FIG 4.1

Multilayer Perceptron

These weights can happen to have the same value by accident, but in most cases, they will have different values.

The flow of information through the neural network goes from the first-layer neurons (input layer), via the second-layer neurons (hidden layer) to the third-layer neurons (output neurons).

We return now to Fig. 4.1. The input layer consists of three neurons and each of them can accept one input value, and they are represented by variables x1, x2, x3 (the actual input values will be the values for these variables). Accepting input is the only thing the first layer does. Every neuron in the input layer can take a single output. It is possible to have fewer input values than input neurons (then you can hand 0 to the unused neurons), but the network cannot take in more input values than it has input neurons.

Inputs can be represented as a sequence x1, x2,…, xn (which is actually the same as a row vector) or as a column vector x:= (x1, x2,…, xn) ⊤. These are different representations of the same data, and we will always choose the representation that makes it easier and faster to compute the operations we might need. In our choice of data representation, we are not constrained by anything else but computational efficiency.

As we already noted, every neuron from the input layer is connected to every neuron from the hidden layer, but neurons of the same layer are not interconnected.

Every connection between neuron j in layer k and neuron m in layer n has a weight denoted by w kn jm, and, since it is usually clear from the context which layers are concerned, we may omit the superscript and write simply wjm. The weight regulates how much of the initial value will be forwarded to a given neuron, so if the input is 12 and the weight to the destination neuron is 0.25, the destination will receive the value 3.

When representing operations in neural networks as vectors and matrices, we want to minimize the use of transpositions (since each one of them has a computational cost), and keep the operations as natural and simple as possible. On the other hand, matrix transposition is not that expensive, and it is sometimes better to keep things intuitive rather than fast.

Second neuron in layer 1 and the third neuron in layer 2 with a variable named w23. We see that the index retains information on which neurons in the layers are connected, but one might ask where do we store the information which layers are in question. The answer is very simple, that information is best stored in the matrix name in the program code, e.g. input_to_hidden_w. Note that we can call a matrix by its ‘mathematical name’, e.g. u or by its ‘code name’ e.g. hidden_to_output_w. we write the weight matrix connecting the two layers as above.

Specifications of Neural Network

For a full specification of a neural network we need:

• The number of layers in a network

• The size of the input (recall that this is the same as the number of neurons in the input layer)

• The number of neurons in the hidden layer

• The number of neurons in the output layer

• Initial values for weights

• Initial values for biases

Note that the neurons are not objects. They exist as entries in a matrix, and as such, their number is necessary for specifying the matrices. The weights and biases play a crucial role: the whole point of a neural network is to find a good set of weights and biases, and this is done through training via backpropagation, which is the reverse of a forward pass. The idea is to measure the error the network makes when classifying and then modify the weight so that this error becomes very small. The remainder of this chapter will be devoted to backpropagation, but as this is the most important subject in deep learning

Neural Network Weights.

As we noted before, the learning process in the neurons is simply the modification or update of weights and biases during training with backpropagation. We will explain the backpropagation algorithm shortly. During classification, only the forward pass is made. One of the early learning procedures for artificial neurons is known as perceptron learning. The perceptron consisted of a binary threshold neuron (also known as binary threshold units) and the perceptron learning rule and altogether looks like a modified logistic regression. Let us formally define the binary threshold neuron:

Where xi are the inputs, wi the weights, b is the bias and z is the logit. The second equation defines the decision, which is usually done with the nonlinearity, but here a binary step function is used instead (hence the name).

Perceptron Training

1. Choose a training case.

2. If the predicted output matches the output label, do nothing.

3. If the perceptron predicts a 0 and it should have predicted a 1, add the input vector to the weight vector

4. If the perceptron predicts a 1 and it should have predicted a 0, subtract the input vector from the weight vector.

As an example, take the input vector to be x = (0.3, 0.4) ⊤ and let the bias be b = 0.5, the weights w = (2, −3) ⊤ and the target4 t = 1. We start by calculating the current classification result:

Convolution Neural network

The idea which LeCun and his team implemented was older, and built up on the ideas of David H. Hubel and Torsten Weisel presented in their 1968 seminal paper [2] which won them the 1981 Nobel prize in Physiology and Medicine. They explored the animal visual cortex and found connections between activities in a small but well-defined area of the brain and activities in small regions of the visual field. In some cases, it was even possible to pinpoint exact neurons that were in charge of a part of the visual field.

This led them to the discovery of the receptive field, which is a concept used to describe the link between parts of the visual fields and individual neurons which process the information.

The idea of a receptive field completes the third and final component we need to build convolutional neural networks. But what were the other two-part we have? The first was a technical detail: flattening images (2D arrays) to vectors.

Even though most modern implementations deal readily with arrays, under the hood they are often flattened to vectors. We adopt this approach in our explanation since it has less handwaving, and enables the reader to grasp some technical details along the way.

You can see an illustration of flattening a 3 by 3 image at the top of Fig. 6.1. The second component is the one that will take the image vector and give it to a single workhorse neuron which will be in charge of processing. Can you figure out what can we use? If you said ‘logistic regression’, you were right! We will, however, be using a different activation function, but the structure will be the same.

Let us review the situation in 2D as if we did not flatten the image into a vector. This is the classical setting for convolutional layers, and such layers are called 2D convolutional layers or planar convolutional layers.

If we were to use 3D, we would call it spatial, and for 4D or more hyperspatial. In the literature is common to refer to the 2D convolutional layer as ‘spatial’, but this makes one’s spider-sense tingle.

The logistic regression (local perceptive field) inputs now should be also 2D, and this is the reason why we most often use 4, 9 and 16, since they are squares of 2 by 2, 3 by 3, and 4 by 4 respectively. The stride now represents a move of this square on the image, starting from left, going to the right and after it is finished, one row down, move all the way to the left without scanning and start scanning from left to right (you can see the steps of this process on the top part of Fig. 6.2). One thing that becomes obvious is that now we will get less outputs.

If we use a 3 by 3 local receptive field to scan a 10 by 10 image, as the output from the local receptive field we will get an 8 by 8 array (see the bottom part of Fig. 6.2). This completes a convolutional layer.

A convolutional neural network has multiple layers. Imagine a convolutional neural network consisting of three convolutional layers and one fully-connected layer. Suppose it will be processing an image of size 10 and that all three layers have a local receptive field of 3 by 3.

Its task is to decide whether a picture has a car in it or not. Let us see how the network works. The first layer takes a 10 by 10 image, produces an output (it has randomly initialized weights and bias) of size 8 by 8, which is then given to the second convolutional layer (which has its own local receptive field with randomly initialized weights and biases but we have decided to have it also 3 by 3), which produces an output of size 6 by 6, and this is given to the third layer (which has a third local receptive field).

This third convolutional layer produces a 4 by 4 image. We then flatten it to a 16-dimensional vector and feed it to a standard fully-connected layer which has one output neuron and uses a logistic function as its nonlinearity.

This is actually another logistic regression in disguise, but it could have had more than one output neuron, and then it would not be a proper logistic regression, so we call it a fully-connected layer of size 1. The input layer size is not specified and it is assumed to be equal to the output of the previous layer. Then, since it uses the logistic function, it produces an output between 0 and 1 and compares its output to the image label. The error is calculated and backpropagated, and this is repeated for every image in the dataset which completes the training of the network.

Features Maps and Pooling

Usually, a convolutional neural network is composed of a convolutional layer followed by a max-pooling layer, followed by a convolutional layer, and so on. As the image goes through the network, after a number of layers, we get a small image with a lot of channels.

Then we can flatten this to a vector and use a simple logistic regression at the end to extract which parts are relevant for our classification problem. The logistic regression (this time with the logistic function) will pick out which parts of the representation will be used for classification and create a result that will be compared with the target and then the error will be backpropagated. This forms a complete convolutional neural network.

A simple but fully functional convolutional network with four layers is shown in Fig. 6.3. Why are convolutional neural networks easier to train? The answer is in the number of parameters used. A five-layer deep fully connected neural network for MNIST has a lot of weights,6 through which we need to backpropagate. A five-layer convolutional network (containing only convolutional layers) with all receptive fields of 3 by 3 has 45 weight and 5 biases. Notice that this configuration can be used for arbitrarily large images: we do not have to expand the input layer (which is a convolutional layer in our case), but we will need more convolutional layers then to shrink the image. Even if we add feature maps, the training of each feature map is independent of the other, i.e. we can train it in parallel.

This makes the process not only computationally fast, but we can also split it across many processors. By contrast, to backpropagate errors through a regular feed-forward fully connected network is highly sequential, since we need to have the derivatives of the outer layers to compute the derivatives of the inner layers.

Example :

import numpy as np

from keras.models import Sequential

from keras.layers import Dense, Dropout, Activation, Flatten

from keras.layers import Convolution2D, MaxPooling2D

from keras.utils import np_utils

from keras.datasets

import mnist (train_samples, train_labels), (test_samples, test_labels) = mnist.load_data()

train_samples = train_samples.reshape(train_samples.shape [0], 28, 28, 1)

test_samples = test_samples.reshape(test_samples.shape [0], 28, 28, 1)

train_samples = train_samples.astype(’float32’)

test_samples = test_samples.astype(’float32’)

train_samples = train_samples/255

test_samples = test_samples/255

c_train_labels = np_utils.to_categorical(train_labels, 10)

c_test_labels = np_utils.to_categorical(test_labels, 10)

convnet = Sequential() convnet.add(Convolution2D(32, 4, 4, activation=’relu’, input_shape=(28,28,1))) convnet.add(MaxPooling2D(pool_size=(2,2))) convnet.add(Convolution2D(32, 3, 3, activation=’relu’)) convnet.add(MaxPooling2D(pool_size=(2,2))) convnet.add(Dropout(0.3)) convnet.add(Flatten()) convnet.add(Dense(10, activation=’softmax’))

convnet.fit(train_samples, c_train_labels, batch_size=32, nb_epoch=20, verbose=1)

metrics = convnet.evaluate(test_samples, c_test_labels, verbose=1)

print()

print(“%s: %.2f%%” % (convnet.metrics_names[1], metrics[1]*100))

predictions = convnet.predict(test_samples)

Pentagon Space Team Engineer: Abhijith Nalathavada Mutt

References:

Introduction to Deep Learning,Sandro Skansi.