Understanding and Implementing LeNet-5 CNN Architecture (Deep Learning)

Original article was published on Artificial Intelligence on Medium

LeNet-5 TensorFlow Implementation

We begin implementation by importing the libraries we will be utilizing:

  • TensorFlow: An open-source platform for the implementation, training, and deployment of machine learning models.
  • Keras: An open-source library used for the implementation of neural network architectures that run on both CPUs and GPUs.
  • Numpy: A library for numerical computation with n-dimensional arrays.
import tensorflow as tf
from tensorflow import keras
import numpy as np

Next, we load the MNIST dataset using the Keras library. The Keras library has a suite of datasets readily available for use with easy accessibility.

We are also required to partition the dataset into testing, validation and training. Here are some quick descriptions of each partition category.

  • Training Dataset: This is the group of our dataset used to train the neural network directly. Training data refers to the dataset partition exposed to the neural network during training.
  • Validation Dataset: This group of the dataset is utilized during training to assess the performance of the network at various iterations.
  • Test Dataset: This partition of the dataset evaluates the performance of our network after the completion of the training phase.

It is also required that the pixel intensity of the images within the dataset are normalized from the value range 0–255 to 0–1.

(train_x, train_y), (test_x, test_y) = keras.datasets.mnist.load_data()
train_x = train_x / 255.0
test_x = test_x / 255.0
train_x = tf.expand_dims(train_x, 3)
test_x = tf.expand_dims(test_x, 3)
val_x = train_x[:5000]
val_y = train_y[:5000]

In the code snippet above, we expand the dimensions of the training and dataset. The reason we do this is that during the training and evaluation phases, the network expects the images to be presented within batches; the extra dimension is representative of the numbers of images in a batch.

The code below is the main part where we implement the actual LeNet-5 based neural network.

Keras provides tools required to implement the classification model. Keras presents a Sequential API for stacking layers of the neural network on top of each other.

lenet_5_model = keras.models.Sequential([
keras.layers.Conv2D(6, kernel_size=5, strides=1, activation='tanh', input_shape=train_x[0].shape, padding='same'), #C1
keras.layers.AveragePooling2D(), #S2
keras.layers.Conv2D(16, kernel_size=5, strides=1, activation='tanh', padding='valid'), #C3
keras.layers.AveragePooling2D(), #S4
keras.layers.Flatten(), #Flatten
keras.layers.Dense(120, activation='tanh'), #C5
keras.layers.Dense(84, activation='tanh'), #F6
keras.layers.Dense(10, activation='softmax') #Output layer

We first assign the variable’lenet_5_model'to an instance of the tf.keras.Sequential class constructor.

Within the class constructor, we then proceed to define the layers within our model.

The C1 layer is defined by the linekeras.layers.Conv2D(6, kernel_size=5, strides=1, activation='tanh', input_shape=train_x[0].shape, padding='same'). We are using the tf.keras.layers.Conv2D class to construct the convolutional layers within the network. We pass a couple of arguments which are described here.

  • Activation Function: A mathematical operation that transforms the result or signals of neurons into a normalized output. An activation function is a component of a neural network that introduces non-linearity within the network. The inclusion of the activation function enables the neural network to have greater representational power and solve complex functions.

The rest of the convolutional layers follow the same layer definition as C1 with some different values entered for the arguments.

In the original paper where the LeNet-5 architecture was introduced, subsampling layers were utilized. Within the subsampling layer the average of the pixel values that fall within the 2×2 pooling window was taken, after that, the value is multiplied with a coefficient value. A bias is added to the final result, and all this is done before the values are passed through the activation function.

But in our implemented LeNet-5 neural network, we’re utilizing the tf.keras.layers.AveragePooling2D constructor. We don’ t pass any arguments into the constructor as some default values for the required arguments are initialized when the constructor is called. Remember that the pooling layer role within the network is to downsample the feature maps as they move through the network.

There are two more types of layers within the network, the flatten layer and the dense layers.

The flatten layer is created with the class constructor tf.keras.layers.Flatten.

The purpose of this layer is to transform its input to a 1-dimensional array that can be fed into the subsequent dense layers.

The dense layers have a specified number of units or neurons within each layer, F6 has 84, while the output layer has ten units.

The last dense layer has ten units that correspond to the number of classes that are within the MNIST dataset. The activation function for the output layer is a softmax activation function.

  • Softmax: An activation function that is utilized to derive the probability distribution of a set of numbers within an input vector. The output of a softmax activation function is a vector in which its set of values represents the probability of an occurrence of a class/event. The values within the vector all add up to 1.

Now we can compile and build the model.

lenet_5_model.compile(optimizer=’adam’, loss=keras.losses.sparse_categorical_crossentropy, metrics=[‘accuracy’])

Keras provides the ‘compile’ method through the model object we have instantiated earlier. The compile function enables the actual building of the model we have implemented behind the scene with some additional characteristics such as the loss function, optimizer, and metrics.

To train the network, we utilize a loss function that calculates the difference between the predicted values provided by the network and actual values of the training data.

The loss values accompanied by an optimization algorithm(Adam) facilitates the number of changes made to the weights within the network. Supporting factors such as momentum and learning rate schedule, provide the ideal environment to enable the network training to converge, herby getting the loss values as close to zero as possible.

During training, we’ll also validate our model after every epoch with the valuation dataset partition created earlier

lenet_5_model.fit(train_x, train_y, epochs=5, validation_data=(val_x, val_y))

After training, you will notice that your model achieves a validation accuracy of over 90%. But for a more explicit verification of the performance of the model on an unseen dataset, we will evaluate the trained model on the test dataset partition created earlier.

lenet_5_model.evaluate(test_x, test_y)
>> [0.04592850968674757, 0.9859]

After training my model, I was able to achieve 98% accuracy on the test dataset, which is quite useful for such a simple network.

Here’s GitHub link for the code presented in this article: