Visualizing Convolution Neural Network [The Hidden Truth]

Original article was published on Deep Learning on Medium

Visualizing Convolution Neural Network [The Hidden Truth]

People Interested in Computer vision, Deep learning , this might be quite insightful

Nowadays people build various complex Neural networks for the Computer Vision task, one of the major tasks is Image Classification. But making a model is simple if u know how to code but in the end, u never know what’s inside the network. It’s like a Black box into which u feed inputs and get output like magic. It’s important to know what’s actually going inside the network when u actually pass a particular input how does the network react to and produce a favorable outcome.

This blog is all about guiding you to visualize your own model which u create using one of the most popular Machine Learning library Tensorflow. This tutorial is divided into two parts:

1.Build and train a model

2. Visualize the model internally

Build and train the model

Step 1: Download and process the data

For this example we are going to keep it simple and short by using the world-famous, MNIST Dataset of digits also called the Modified National Institute of Standards and Technology Dataset.

Import The Important Libraries

import tensorflow as tf
import matplotlib.pyplot as plt
import random
import numpy as np

Tensorflow for Building the network and the dataset

Matplotlib for Plotting Graphs and Visualizing

Random for generating random numbers

Download Data

mnist_data = tf.keras.datasets.mnist(train_images,train_labels),(test_images,test_labels) = mnist_data.load_data()print("No of training exapmles "+str(train_images.shape[0]))
print("No of Test exapmles "+str(test_images.shape[0]))
print("Shape of Each Image "+str(train_images[0].shape))
'''
No of training exapmles 60000
No of Test exapmles 10000Shape of Each Image (28, 28)

View The Images

def disp():
id = random.randint(0,train_images.shape[0])
img = train_images[id]
l = train_labels[id]
plt.imshow(img)
print("This is image of Number "+str(l))
disp()
This is an image of Number 0

Each Image Is 2d array of 28 x 28 Pixel image

Preprocess the data for input

train_images=train_images.reshape(train_images.shape[0],28,28,1)/255test_images = test_images.reshape(test_images.shape[0],28,28,1)/255

This is done because the Conv2D layer in TensorFlow accepts (height, weight,1) shape of Image rather than (height, width)

print("Shape of Each Image "+str(train_images[0].shape))'''
Shape of Each Image (28, 28, 1)
'''

Now we have successfully changed the shape and make it ready for sending inputs into the model. Trust me this is the most critical step in machine learning, Always verify the shape of inputs and check it matches with the models’ input shape , else whatever u made, How complex u make, If the shape is not right, everything falls apart

Step 2. Making The Model

model = tf.keras.models.Sequential()model.add(tf.keras.layers.Conv2D(filters=32,kernel_size=(3,3),activation='relu',input_shape=(28,28,1)))model.add(tf.keras.layers.MaxPool2D((2,2)))model.add(tf.keras.layers.Dropout(0.3))model.add(tf.keras.layers.Conv2D(filters=64,kernel_size=(3,3),activation='relu',input_shape=(28,28,1)))model.add(tf.keras.layers.MaxPool2D((2,2)))model.add(tf.keras.layers.Dropout(0.3))model.add(tf.keras.layers.Conv2D(filters=128,kernel_size=(3,3),activation='relu',input_shape=(28,28,1)))model.add(tf.keras.layers.MaxPool2D((2,2)))model.add(tf.keras.layers.Dropout(0.3))model.add(tf.keras.layers.Flatten())model.add(tf.keras.layers.Dense(128,activation='relu'))model.add(tf.keras.layers.Dense(10,activation='softmax'))model.summary()

Conv2D, MaxPool2D, Dropout, Dense, Flatten are the basic building blocks that are given to us by TensorFlow Keras implementation.

Our Architecture

Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d (Conv2D) (None, 26, 26, 32) 320
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 13, 13, 32) 0
_________________________________________________________________
dropout (Dropout) (None, 13, 13, 32) 0
_________________________________________________________________
conv2d_1 (Conv2D) (None, 11, 11, 64) 18496
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 5, 5, 64) 0
_________________________________________________________________
dropout_1 (Dropout) (None, 5, 5, 64) 0
_________________________________________________________________
conv2d_2 (Conv2D) (None, 3, 3, 128) 73856
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 1, 1, 128) 0
_________________________________________________________________
dropout_2 (Dropout) (None, 1, 1, 128) 0
_________________________________________________________________
flatten (Flatten) (None, 128) 0
_________________________________________________________________
dense (Dense) (None, 128) 16512
_________________________________________________________________
dense_1 (Dense) (None, 10) 1290
=================================================================
Total params: 110,474
Trainable params: 110,474

Observe the output shape carefully. This is our model’s architecture, Not a Fancy model but a decent one. The thing I learned while making various models that the shape of the kernel and the number of filters follow a simple pattern which always seems to work well, Increase the no of filters by a factor of 2 and keep the kernel size an odd number

Step 3: Most Exciting Step, the training step

checkpointer = tf.keras.callbacks.ModelCheckpoint(filepath="best_weights.hdf5",
monitor = 'val_accuracy',verbose=1,save_best_only=True)
es=tf.keras.callbacks.EarlyStopping(monitor='val_loss',patience=2)model.compile(optimizer='adam',loss='sparse_categorical_crossentropy',metrics=['accuracy'])history=model.fit(train_images,train_labels,batch_size=128,
validation_data=(test_images,test_labels),epochs=100,
callbacks=[checkpointer,es])

Here we use some call back functions , which is the most helpful tool provided by TensorFlow which helps u interact with the model while its working and not let you, loose ur model, if anything goes wrong, The callbacks I used are Early Stopping which helped me to stop if the validation loss doest not change for max 2 epoch which leads to stopping training and avoid overfitting, another callback I used is check-pointers which helps in saving the model with the best parameters .U can even train without this but this helps when u are confused about the no of epochs to set.i Have used the most common and reliable optimizer the Adam optimizer. I on purpose used the sparse categorical cross-entropy because out outputs are not 1–10 its actual probability of getting 1–10 that’s an output size of (1×10) in which each no is the probability of getting that digit for example [0.1,0.1,0.1,0.1,0.1,0.1,0.9,0.1,0.1,0.1]

This indicates that there is a 90 percent chance that the digit is 6

Step 4 : Watch our model train:

a. Loss

plt.plot(history.history[‘loss’])
plt.plot(history.history[‘val_loss’])
plt.xlabel(‘Epochs’)
plt.ylabel(‘Loss’)
plt.title(‘Loss Graph’)
plt.legend((‘train’,’test’))

b. Accuracy

plt.plot(history.history[‘accuracy’])
plt.plot(history.history['val_accuracy'])
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.title('Loss Graph')
plt.legend(('train','test'))

We observe that the model is pretty damn good with Train and Test accuracy of 98.22 % and 98.9 %. U might say after seeing the loss, that the train and the validation loss is a lot apart, is that good enough? But observe the scale on the left and see that the test loss is way too less which indicates it generalizes a lot better

Step 6: Predict

Let’s do the step which is actually the task of this algorithm, to predict the digit

def disp_pred():
id = random.randint(0,test_images.shape[0])
img = test_images[id]
#print(img.shape)
l = test_labels[id]
plt.imshow(img.reshape(28,28))
print("This is image of Number "+str(l))
print("This is Predicted to be Number "+str(np.argmax(model.predict(img.reshape(1,28,28,1)))))
disp_pred()
'''
This is image of Number 5
This is Predicted to be Number 5
'''
This is digit 5 and predicted to be Number 5

Lets Now Begin The Step for which the post is dedicated

Step 1. We are first going to decide which layer’s activations do we want to visualize and build our activation model.

For example, in my architecture From [0:8], That is from conv2d to dropout_2d

Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d (Conv2D) (None, 26, 26, 32) 320
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 13, 13, 32) 0
_________________________________________________________________
dropout (Dropout) (None, 13, 13, 32) 0
_________________________________________________________________
conv2d_1 (Conv2D) (None, 11, 11, 64) 18496
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 5, 5, 64) 0
_________________________________________________________________
dropout_1 (Dropout) (None, 5, 5, 64) 0
_________________________________________________________________
conv2d_2 (Conv2D) (None, 3, 3, 128) 73856
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 1, 1, 128) 0
_________________________________________________________________
dropout_2 (Dropout) (None, 1, 1, 128) 0
_________________________________________________________________
flatten (Flatten) (None, 128) 0
_________________________________________________________________
dense (Dense) (None, 128) 16512
_________________________________________________________________
dense_1 (Dense) (None, 10) 1290
=================================================================

Make a list of layers we need

layer_outputs = [layer.output for layer in model.layers[0:8]]
activation_model = tf.keras.Model(inputs=model.input,
outputs=layer_outputs)

Step 2: We now choose a random image from the test dataset on which we will use our activation model.

img = test_images[51].reshape(1,28,28,1)

Step 3: We now use our activation model to output activations from the selected layers.

activations = activation_model.predict(img)

Step 4: Use Matplotlib to visualize each layer

Names of the layers, so you can have them as part of your plot

layer_names = []
for layer in model.layers[0:8]:
layer_names.append(layer.name)

Displays the feature maps

images_per_row = 16for layer_name, layer_activation in zip(layer_names, activations):
n_features = layer_activation.shape[-1]
size = layer_activation.shape[1]
n_cols = n_features
display_grid = np.zeros((size * n_cols, images_per_row * size))
for col in range(n_cols):
for row in range(images_per_row):
channel_image = layer_activation[0,:, :,col * images_per_row + row] channel_image -= channel_image.mean()
channel_image /= channel_image.std()
channel_image *= 64
channel_image += 128
channel_image = np.clip(channel_image, 0, 255).astype('uint8')
display_grid[col * size : (col + 1) * size, row * size : (row + 1) * size] = channel_image
scale = 1. / size
plt.figure(figsize=(scale * display_grid.shape[1],
scale * display_grid.shape[0]))
plt.title(layer_name)
plt.grid(False)
plt.imshow(display_grid, aspect='auto', cmap='viridis')

Layer 1

Layer 2

Layer 3

We observe all the filters in layer 1,2,3 as 32,64,128 no of filters and how they react to the image of number 3

This actually helps to visualize of the deeper layers react to an image and how these features actually help to give predictions

I hope this post Helps u to understand the convolution network

Happy Deep Learnings