ICLR 2016 — Implementing Context Module in Tensorflow with Interactive Code [Manual Back Prop with…

Image from pixabay

Since I covered Dilated Convolution operation in the previous post it only make sense if I made a network out of it. So I think it is best for me to implement the Context Module Presented in this paper “Multi-Scale Context Aggregation by Dilated Convolutions”. Which was presented in International Conference on Learning Representations 2016.

There are two things I wish to point out…
1. The authors of the original paper initialized the weights using their own methods, for this post I am just going to initialize from normal distribution. 
2. For fun, lets train the network with Dilated Back Propagation as well.

Network Architecture (Table Form) / Experiment Set Up

Screen shot from this paper

Basically, context module is a Fully Convolutional Neural Network with Dilated Convolution operations. And our experiment setup is extremely simple, we are going to perform multi class classification on MNIST data set. In order to accomplish this, we will add a fully connected neural network at the output of the Context Module Network.

Network Architecture (OOP Form)

Left Image Context Layer / Right Image Fully Connected Neural Network

Red Line → Dilated Convolution Operation

Except for the Red Line above, which performs Dilated Convolution operation, the network architecture is exactly the same with Convolutional Neural Network. Which have convolution layers in front and fully connected neural network at the back.

Dilation Factor / Feed Forward Operation

Red Lines → Dilation Factor to match the original paper

So to keep the Dilation Factors same as the paper we will set the factors to 1,1,2,4,8,16,1, and 1 respectively.

Training Results Auto Differentiation (ADAM Optimizer)

Left Graph → Cost for Training Images Over Time Graph
Right Graph → Accuracy for Training Images Over Time Graph

Left Graph → Cost for Testing Images Over Time Graph
Right Graph → Accuracy for Testing Images Over Time Graph

This is the most interesting results I have seen, especially for auto differentiation. Around 50’th epoch we can see that the model is performing extremely well. However after 100th (ish?) epoch the model’s performance starts to decrease. Eventually ending up around 77 percent accuracy for both Training and Testing Images.

Training Results (Broken) Dilated Back Prop (ADAM Optimizer)

Left Graph → Cost for Training Images Over Time Graph
Right Graph → Accuracy for Training Images Over Time Graph

Left Graph → Cost for Testing Images Over Time Graph
Right Graph → Accuracy for Testing Images Over Time Graph

While letting the model train, I went to the gym. And during my workout session I realized that, my implementation for back propagation for Dilated Convolution Layer is wrong. Hence, I will name this model, (Broken) Dilated Back Propagation. However, it is very interesting to see while the process maybe broken, the model is still able to learn. While, it under performed compared to auto differentiation, it still ended up around 75 percent accuracy.

Interactive Code

I moved to Google Colab for Interactive codes! So you would need a google account to view the codes, also you can’t run read only scripts in Google Colab so make a copy on your play ground. Finally, I will never ask for permission to access your files on Google Drive, just FYI. Happy Coding!

To access code for (Broken) Dilated Back Propagation please click here.
To access code for Auto Differentiation please click here.

Final Words

Since, this post is more about implementing the context module, I won’t optimize it further. However, if you were able to get superior results please comment down below and how you were able to achieve that result.

If any errors are found, please email me at jae.duk.seo@gmail.com, if you wish to see the list of all of my writing please view my website here.

Meanwhile follow me on my twitter here, and visit my website, or my Youtube channel for more content. I also did comparison of Decoupled Neural Network here if you are interested.


  1. labeling”?, W. (2018). What is “semantic segmentation” compared to “segmentation” and “scene labeling”?. Stackoverflow.com. Retrieved 12 March 2018, from https://stackoverflow.com/questions/33947823/what-is-semantic-segmentation-compared-to-segmentation-and-scene-labeling
  2. Yu, F., & Koltun, V. (2015). Multi-scale context aggregation by dilated convolutions. arXiv preprint arXiv:1511.07122.
  3. A 2017 Guide to Semantic Segmentation with Deep Learning. (2017). Blog.qure.ai. Retrieved 12 March 2018, from http://blog.qure.ai/notes/semantic-segmentation-deep-learning-review#dilation
  4. sklearn.utils.shuffle — scikit-learn 0.19.1 documentation. (2018). Scikit-learn.org. Retrieved 12 March 2018, from http://scikit-learn.org/stable/modules/generated/sklearn.utils.shuffle.html
  5. Semantic Image Segmentation with DeepLab in Tensorflow. (2018). Research Blog. Retrieved 12 March 2018, from https://research.googleblog.com/2018/03/semantic-image-segmentation-with.html
  6. Denosing CT Scans using NN with Interactive Code — Part 3, Convolutional Residual Neural Networks…. (2018). Towards Data Science. Retrieved 12 March 2018, from https://towardsdatascience.com/denosing-lung-ct-scans-using-neural-networks-with-interactive-code-part-3-convolutional-residual-6dbb36b28be
  7. Understanding 2D Dilated Convolution Operation with Examples in Numpy and Tensorflow with…. (2018). Towards Data Science. Retrieved 12 March 2018, from https://towardsdatascience.com/understanding-2d-dilated-convolution-operation-with-examples-in-numpy-and-tensorflow-with-d376b3972b25
  8. Yu, F., & Koltun, V. (2015). Multi-Scale Context Aggregation by Dilated Convolutions. Arxiv.org. Retrieved 12 March 2018, from https://arxiv.org/abs/1511.07122
  9. tf.nn.conv2d_backprop_filter | TensorFlow. (2018). TensorFlow. Retrieved 12 March 2018, from https://www.tensorflow.org/api_docs/python/tf/nn/conv2d_backprop_filter
  10. tf.nn.conv2d_backprop_input | TensorFlow. (2018). TensorFlow. Retrieved 12 March 2018, from https://www.tensorflow.org/api_docs/python/tf/nn/conv2d_backprop_input
  11. Only Numpy: Dilated Back Propagation and Google Brain’s Gradient Noise with Interactive Code. (2018). Hacker Noon. Retrieved 12 March 2018, from https://hackernoon.com/only-numpy-dilated-back-propagation-and-google-brains-gradient-noise-with-interactive-code-3a527fc8003c
  12. values?, H. (2018). How to get Tensorflow tensor dimensions (shape) as int values?. Stackoverflow.com. Retrieved 13 March 2018, from https://stackoverflow.com/questions/40666316/how-to-get-tensorflow-tensor-dimensions-shape-as-int-values

Source: Deep Learning on Medium