Simple Neural Network on MNIST Handwritten Digit Dataset

Original article was published by Muhammad Ardi on Becoming Human: Artificial Intelligence Magazine

MNIST Handwritten Digit Dataset.

Hello world! Ardi here. Today I would like to share my simple project regarding to the implementation of a Neural Network for classification problem. As shown in the title of this writing, I will be performing classification on MNIST Handwritten Digit dataset. So now, without further talk, let’s do this!

Note: full code available in the end of this writing.

So the first thing to do is to import all the required modules. Here I use NumPy to process matrix values, Matplotlib to show images and Keras to build the Neural Network model. Additionally, the MNIST dataset itself is also taken from Keras framework.

import numpy as np
import matplotlib.pyplot as plt
from keras.layers import Dense, Flatten
from keras.models import Sequential
from keras.utils import to_categorical
from keras.datasets import mnist

Next, we can load the dataset by using the following code. Note that this may take a while especially if this is your first time working with MNIST dataset. After running the code below, we will have 4 variables namely X_train, y_train, X_test and y_test, where X is the image and y is the target label. These train and test data consist of 60000 and 10000 images respectively, in which all those images are already in the same size (28 by 28 pixels).

# Load MNIST handwritten digit data
(X_train, y_train), (X_test, y_test) = mnist.load_data()

By the way you can check those numbers I mentioned above by using the following script:


Then the output is going to be something like this:

(60000, 28, 28)
(10000, 28, 28)

It is also worth to remember that the first 28 of each row indicates the height of the image in pixels while the last 28 indicates the width.

You can also try to print out the shape of the target label (y) like this:


Then it gives the following output:


The values of the target label are stored in a 1-dimensional array since essentially all the labels are represented as a single number. However, this kind of label representation is not the one that a Neural Network expect, so we need to turn this into one-hot representation before training the model (we will discuss about this later).

Trending AI Articles:

1. Microsoft Azure Machine Learning x Udacity — Lesson 4 Notes

2. Fundamentals of AI, ML and Deep Learning for Product Managers

3. Roadmap to Data Science

4. Work on Artificial Intelligence Projects

Up to this point you might be wondering how the MNIST Digit images look like. So now I want to show the first 5 images in the dataset by using the following code:

# Display some images
fig, axes = plt.subplots(ncols=5, sharex=False,
sharey=True, figsize=(10, 4))
for i in range(5):
axes[i].imshow(X_train[i], cmap='gray')

After running the code you will have this output:

The first 5 images of MNIST Digit dataset.

The images above show the digit written by hand (X) along with the label (y) above each images.

As I promise earlier, now we will turn all the labels into one-hot representation. It can be done easily by using to_categorical() function from Keras module. Before using the function into our main program, I will explain a bit about how the function works. So in the example below I am going to find out the one-hot representation of class with label 3 in which the total number of classes are 10.

to_categorical(3, num_classes=10)


array([0., 0., 0., 1., 0., 0., 0., 0., 0., 0.], dtype=float32)

You can see here that the output is a simple array which has all-zero values except the value of index 3. And that’s it. Such representation is called as one-hot encoding. Now what we want to do in our program is to one-hot-encode all the target labels (both y_train and y_test), which can be done by using the following code:

# Convert y_train into one-hot format
temp = []
for i in range(len(y_train):
temp.append(to_categorical(y_train[i], num_classes=10))
y_train = np.array(temp)
# Convert y_test into one-hot format
temp = []
for i in range(len(y_test)):
temp.append(to_categorical(y_test[i], num_classes=10))
y_test = np.array(temp)

Now that we can check the new shape of y_train and y_test.


If all target labels are already in form of one-hot representation, then the output should look something like this:

(60000, 10)
(10000, 10)

Alright, so up to this point, we have already had a correct target label shape. Now we can start to create the Neural Network model using Keras.

ML Jobs

The first thing to do is to initialize a sequential model. Afterwards, we are now able to add layers to it. Here I start the Neural Network model with a flatten layer because we need to reshape the 28 by 28 pixels image (2-dimensions) into 784 values (1-dimension). Next, we connect this 784 values into 5 neurons with sigmoid activation function. Actually, you can freely choose any number of neurons for this layer, but since I want to make the Neural Network model to be simple and fast to train so I just go with 5 neurons for this case. The last thing to add is another dense layer (here I use softmax activation function) which acts as our output layer. In the last layer we need to use 10 neurons because our classification task have 10 different classes.

# Create simple Neural Network model
model = Sequential()
model.add(Dense(5, activation='sigmoid'))
model.add(Dense(10, activation='softmax'))

We can also use the code below in order to see the details of our architecture:


The output tells the details of the layers inside our Neural Network:

Model: "sequential_1"
Layer (type) Output Shape Param #
flatten_1 (Flatten) (None, 784) 0
dense_1 (Dense) (None, 5) 3925
dense_2 (Dense) (None, 10) 60
Total params: 3,985
Trainable params: 3,985
Non-trainable params: 0

After constructing the Neural Network classifier model, we need to compile it with the following code:


The code above shows that we pass categorical cross entropy for the loss function argument because it is just the best one to be used in multiclass classification problem. Next, we use Adam optimizer since it is also the best one for most cases. Lastly we have accuracy to be passed in metrics argument in order to measure the performance of our classifier.

Trending AI Articles:

1. Microsoft Azure Machine Learning x Udacity — Lesson 4 Notes

2. Fundamentals of AI, ML and Deep Learning for Product Managers

3. Roadmap to Data Science

4. Work on Artificial Intelligence Projects

Now into the fun part: training our Neural Network! So basically training a model is easy as what we need to do is just to run the fit() method on our model., y_train, epochs=5, 

And it will show the progress details like this:

Train on 60000 samples, validate on 10000 samples
Epoch 1/5
60000/60000 [==============================] - 10s 170us/step - loss: 1.5269 - acc: 0.5605 - val_loss: 1.1787 - val_acc: 0.6808
Epoch 2/5
60000/60000 [==============================] - 10s 169us/step - loss: 1.0848 - acc: 0.6741 - val_loss: 0.9602 - val_acc: 0.7235
Epoch 3/5
60000/60000 [==============================] - 10s 172us/step - loss: 0.9469 - acc: 0.7269 - val_loss: 0.8891 - val_acc: 0.7506
Epoch 4/5
60000/60000 [==============================] - 10s 173us/step - loss: 0.8878 - acc: 0.7409 - val_loss: 0.8795 - val_acc: 0.7650
Epoch 5/5
60000/60000 [==============================] - 10s 172us/step - loss: 0.8525 - acc: 0.7542 - val_loss: 0.8133 - val_acc: 0.7737

According to the output above, we can see that the accuracy is increasing (both towards training and test data) in our 5-iteration training process. I think this result is pretty good because only with relatively simple Neural Network model we can obtain approximately 75% of accuracy, even though this result can still be improved.

Now we can try to perform predictions on several images stored in our X_test variable.

predictions = model.predict(X_test)

However though, the result might be confusing as it shows a result which looks like the following:

[[9.5367432e-06 1.0506779e-02 3.1652153e-03 ... 5.9073067e-01
3.7065744e-03 1.2163597e-01]
[1.0621548e-03 1.6309917e-03 1.5031934e-02 ... 1.5586615e-05
1.7473400e-03 8.7112188e-05]
[1.5623271e-03 1.6000962e-01 1.8915981e-02 ... 7.9721212e-04
1.2158155e-03 9.9062920e-05]
[1.3842285e-03 4.1633844e-05 1.8185675e-03 ... 4.7475308e-02
2.2819310e-02 3.5299832e-01]
[5.5582732e-02 3.3888221e-04 9.7544789e-03 ... 2.3180246e-04
2.1919787e-01 4.4040084e-03]
[1.2986362e-03 1.9049346e-03 4.7103435e-02 ... 5.5095553e-04
1.4519393e-03 3.0362308e-03]]

So actually, this output shape is (10000, 10) in which it stores the classification probability value of each sample. Run the following code find out the actual prediction of the model:

predictions = np.argmax(predictions, axis=1)

Then it gives the following result (this is the prediction of each test sample):

[7 2 1 ... 4 5 6]

Lastly, using the code below we can try to print some images along with its predictions:

fig, axes = plt.subplots(ncols=10, sharex=False,
sharey=True, figsize=(20, 4))
for i in range(10):
axes[i].imshow(X_test[i], cmap='gray')

Which will give an output that looks like this:

Image predictions.

The output image above shows the first 10 test images along with its predictions above each those digit images. You can see there that most of those handwritings are classified correctly. Only the 9th picture (from the left) is the misclassified sample as it should be a five (I think) but it is predicted as a four.

So that’s it! I hope you learn something from this post! Feel free to ask a question or give a suggestion so that I can give better tutorial in the next posts.

Note: here is the full code that I promised earlier. I suggest you to run on Jupyter Notebook / Google Colab / Kaggle Notebook or something like that so you can understand better each of the line of this code.

Don’t forget to give us your 👏 !

Simple Neural Network on MNIST Handwritten Digit Dataset was originally published in Becoming Human: Artificial Intelligence Magazine on Medium, where people are continuing the conversation by highlighting and responding to this story.