Convolutional Neural Networks: Part 9

Second section in a series of Python Deep Learning posts.

For the first section, check out: Python Deep Learning

Additionally, you can check out my series of posts on Apache Spark:

Visualizing Heatmaps of Class Activation

I’ll introduce one more visualization technique: one that is useful for understanding which parts of a given image led a CNN to its final classification decision. This is helpful for debugging the decision process of a CNN, particularly in the case of a classification mistake. It also allows you to locate specific objects in an image.

This general category of techniques is called class activation map (CAM) visualization, and it consists of producing heatmaps of class activation over input images. A class activation heatmap is a 2D grid of scores associated with a specific output class, computed for every location in any input image, indicating how important each location is with respect to the class under consideration. For instance, given an image fed into a dogs-versus-cats CNN, CAM visualization allows you to generate a heatmap for the class “cat,” indicating how cat-like different parts of the image are, and also a heatmap for the class “dog,” indicating how dog-like parts of the image are.

The specific implementation you’ll use is the one described in:

“Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization.”

Ramprasaath R. Selvaraju et al., arXiv (2017)

It’s very simple: it consists of taking the output feature map of a convolution layer, given an input image, and weighing every channel in that feature map by the gradient of the class with respect to the channel. Intuitively, one way to understand this trick is that you’re weighting a spatial map of “how intensely the input image activates different channels” by “how important each channel is with regard to the class,” resulting in a spatial map of “how intensely the input image activates the class.”

We’ll demonstrate this technique using the pretrained VGG16 network again.

>>> from keras.applications.vgg16 import VGG16
>>> model = VGG16(weights='imagenet')

Consider the image of Batman and The Joker, during the interrogation room scene, in “The Dark Knight.”

Let’s convert this image into something the VGG16 model can read: the model was trained on images of size 224 × 224, preprocessed according to a few rules that are packaged in the utility function keras.applications.vgg16.preprocess_input. So you need to load the image, resize it to 224 × 224, convert it to a Numpy float32 tensor, and apply these preprocessing rules.

>>> from keras.preprocessing import image
>>> from keras.applications.vgg16 import preprocess_input, decode_predictions
>>> import numpy as np
>>> img_path = 'Downloads/batman_joker.jpg'
>>> img = image.load_img(img_path, target_size=(224, 224))
>>> x = image.img_to_array(img)
>>> x = np.expand_dims(x, axis=0)
>>> x = preprocess_input(x)
>>> >>> preds = model.predict(x)
>>> preds = model.predict(x)
>>> print('Predicted:', decode_predictions(preds, top=3)[0])
Downloading data from
40960/35363 [==================================] - 0s 1us/step
Predicted: [('n04229816', 'ski_mask', 0.46682864), ('n03724870', 'mask', 0.29899433), ('n02667093', 'abaya', 0.058608655)]

The top three classes predicted for this image are as follows:

  • Ski Mask (with 46.7% probability)
  • Mask (with 29.9% probability)
  • Abaya (with 0.06% probability)

The network has recognized the image as containing an undetermined quantity of “masks”. Also, there’s a very small chance that Batman is Muslim. The entry in the prediction vector that was maximally activated is the one corresponding to the “Ski Mask” class, at index 796:

>>> np.argmax(preds[0])

To visualize which parts of the image are the most mask–like, let’s set up the Grad-CAM process.

>>> mask_output = model.output[:, 796]
>>> last_conv_layer = model.get_layer('block5_conv3')
>>> grads = K.gradients(mask_output, last_conv_layer.output)[0]
>>> pooled_grads = K.mean(grads, axis=(0, 1, 2))
>>> iterate = K.function([model.input], [pooled_grads, last_conv_layer.output[0]])
>>> pooled_grads_value, conv_layer_output_value = iterate([x])
>>> for i in range(512):
... conv_layer_output_value[:, :, i] *= pooled_grads_value[i]
>>> heatmap = np.mean(conv_layer_output_value, axis=-1)

For visualization purposes, you’ll also normalize the heatmap between 0 and 1.

>>> heatmap = np.mean(conv_layer_output_value, axis=-1)
>>> heatmap = np.maximum(heatmap, 0)
>>> heatmap /= np.max(heatmap)
>>> plt.matshow(heatmap)
<matplotlib.image.AxesImage object at 0x7fd9fd39aeb8>
Mask class activation heatmap over the test picture

Finally, you’ll use OpenCV to generate an image that superimposes the original image on the heatmap you just obtained.

>>> import cv2
>>> import cv2
>>> img = cv2.imread(img_path)
>>> heatmap = cv2.resize(heatmap, (img.shape[1], img.shape[0]))
>>> heatmap = np.uint8(255 * heatmap)
>>> heatmap = cv2.applyColorMap(heatmap, cv2.COLORMAP_JET)
>>> superimposed_img = heatmap * 0.4 + img
>>> cv2.imwrite('Pictures/mask_cam.jpg', superimposed_img)

If you get an error about cv2, open another terminal, activate your venv, and install it:

$ pip install opencv-python

Run the commands again.

This visualization technique answers two important questions:

  • Why did the network think this image contained a Mask?
  • Where is the Mask located in the picture?

In particular, it’s interesting to note that the face of Batman is strongly activated: this is probably how the network can tell the difference between masks and faces.

CNN Summary

  • CNNs are the best tool for attacking visual-classification problems.
  • CNNs work by learning a hierarchy of modular patterns and concepts to represent the visual world.
  • The representations they learn are easy to inspect — CNNs are the opposite of black boxes!
  • You’re now capable of training your own CNN from scratch to solve an image-classification problem.
  • You understand how to use visual data augmentation to fight overfitting.
  • You know how to use a pretrained CNN to do feature extraction and fine-tuning.
  • You can generate visualizations of the filters learned by your CNNs, as well as heatmaps of class activity.

Source: Deep Learning on Medium