Digit recognizer using CNN

Original article was published on Artificial Intelligence on Medium

Digit recognizer using CNN

Building a simple Convolutional Neural Network using mnist data set to recognize handwritten digits.

Image by author


MNIST (“Modified National Institute of Standards and Technology”) is the de facto “Hello World” dataset of computer vision. Since its release in 1999, this classic dataset of handwritten images has served as the basis for benchmarking classification algorithms. As new machine learning techniques emerge, MNIST remains a reliable resource for researchers and learners alike.

Data Processing:

import tensorflow as tf
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()

The data set contains 60,000 traning images and 10000 testing images. Here I split the data into training and testing datasets respectively. The x_train & x_test contains grayscale codes while y_test & y_train contains labels from 0–9 which represents the numbers.

When you check the shape of the dataset to see if it is compatible to use in for CNN. You can see we will (60000,28,28) as our result which means that we have 60000 images in our dataset and size of each image is 28 * 28 pixel.

To use Keras API we need a 4-dimensional array but we can see from above that we have a 3-dimension numpy array.

x_train = x_train.reshape(x_train.shape[0], 28, 28, 1)
x_test = x_test.reshape(x_test.shape[0], 28, 28, 1)
input_shape = (28, 28, 1)

So, here we convert the 3-dimension numpy array into 4-dimensional and after we set the type as float to have floating values after the division.

x_train = x_train.astype('float32')
x_test = x_test.astype('float32')

Now coming to the normalzing part, we will always we to do this in our neural networks. This is done by dividing it by 255 (which is the maximum RGB code minus the minimum RGB code).

x_train /= 255
x_test /= 255

Building the Model:

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Conv2D, Dropout, Flatten, MaxPooling2D
model = Sequential()
model.add(Conv2D(28, kernel_size=(3,3), input_shape=input_shape))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dense(128, activation=tf.nn.relu))

I use the Keras API to build the model hence I have a Tensorflow background.I import the Sequential Model from Keras and add Conv2D, MaxPooling, Flatten, Dropout, and Dense layers.

Dropout layers fight with the overfitting by disregarding some of the neurons while training while Flatten layers flatten 2D arrays to 1D array before building the fully connected layers.

Compiling and fitting the Model:

So far, we have created an non-optimized empty CNN. Then I set an optimizer with a given loss function which uses a metric and fit the model by using our train data. The ADAM optimizer is said to outperform the other optimizers, that’s why I used that.

model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
model.fit(x=x_train,y=y_train, epochs=10)

Here we get pretty high accuracy with just 10 epochs. Since the dataset doesn’t need heavy computational power you can play around with the number of epochs you can also play around with the optimizer, loss function and metrics.

Model Evaluation:

model.evaluate(x_test, y_test)

When this model is evaluated we see that just 10 epochs gave use the accuracy of 98.59% at a very low loss.

Now to check its prediction:

image_index = 2853
plt.imshow(x_test[image_index].reshape(28, 28),cmap='Greys')
predict = x_test[image_index].reshape(28,28)
pred = model.predict(x_test[image_index].reshape(1, 28, 28, 1))

Here we select an image and run it through to get the prediction then display both the image and prediction to see if its accurate.

Image by author

And that is how you can build and implement a simple convolutional neural network. You can implement this concept to various different types of classification and other such implementations. Respond to this article with what you implemented this concept on.