Learning Tensorflow with Image Colorization

Tensorflow is an awesome machine learning framework by google. It is one of the best machine learning framework for making deploy-able piece of machine learning codes. I have encountered many tensorflow tutorials, I will try to cover things that I find somewhat missing in other tutorials. As a learning example, lets take grayscale image colorization as a problem and try to solve it with tensorflow.

Note: This tutorial won’t guide you to create a state of the art image colorization model, rather in this tutorial, I will introduce the very basics of tensorflow and image colorization. You should try to create better networks, crawl research papers, and come up with a great model that works good enough to solve this problem. Also, I am assuming you are familiar with basic things like, what is a variable? what are weights?, what is deep learning? etc.


Importing required libraries :

import numpy as np
import cv2
import tensorflow as tf
import skimage.color as color
import skimage.io as io


First we need to load out images into python. Images loaded using opencv’s imread are in BGR color space, we convert that to lab space. In lab space, ‘l’ is for luminosity (something like gray-scale), and ‘a’ and ‘b’ are for colors. The input image that will be given to our final model will only contain gray levels, so this has to be done. See the code snippet below:

mydir = r'imgs'
images = [files for files in os.listdir(mydir)]
data = np.zeros([N, 256, 256, 3]) # N is number of images for training
for count in range(len(images)):
img = cv2.resize(io.imread(mydir + '/'+ images[count]), (256, 256))
data[count,:,:,:] = img
num_train = N
Xtrain = color.rgb2lab(data[:num_train]*1.0/255)
xt = Xtrain[:,:,:,0]
yt = Xtrain[:,:,:,1:]
yt = yt/128
xt = xt.reshape(num_train, 256, 256, 1)
yt = yt.reshape(num_train, 256, 256, 2)

The above code loads the images from the directory, resizes them to 256x256x3 and converts them to lab color space. In the above snippet, I am loading images for training the model. Once an image is converted to lab space, the zeroth index corresponds to the ‘l’ channel and first and second correspond to ‘a’ and ‘b’ channels respectively, so, the numpy array xt contains the ‘l’ channel (since this will be the input to the deep learning network) and the numpy array yt contains the ‘a’ and ‘b’ channel (since these are what the model needs to predict).


Everything in tensorflow is executed under a “session” variable. This is what you use in tensorflow to perform computations. For example, the output for the code block:

import tensorflow as tf 
a = tf.constant(5)
b = tf.constant(6)
c = a*b


Tensor("mul:0", shape=(), dtype=int32)

If you want to see the value of C, then you have to use a session. For example:

import tensorflow as tf 
session = tf.Session()
a = tf.constant(5)
b = tf.constant(6)
c = a*b

In this code I created a session variable and used that to evaluate the value of c. The output is :


Now, lets move to creating a model for image colorization.

First, something I would like to mention. I personally prefer using tf.nn instead of tf.layers. If you use tf.nn, you have to define everything like weights and biases yourselves. This makes it much easier to manipulate things like weights and biases to implement weight sharing etc. Weight sharing is essentially when you have a few layers which share some percentage of their weights. (If you didn’t understand, feel free to ignore these lines).

Creating placeholders :

session = tf.Session()
x = tf.placeholder(tf.float32, shape = [None, 256, 256, 1], name = 'x')
ytrue = tf.placeholder(tf.float32, shape = [None, 256, 256, 2], name = 'ytrue')

In the above code block, we have defined a session and created two placeholders. Placeholders are basically proxies for values which will be passes later. We will later pass input and truth values to our model during training, so we need to declare placeholders for them.

Creating helper function:

I prefer using these helper functions since they make the task of making models much easier:

def create_weights(shape):
return tf.Variable(tf.truncated_normal(shape, stddev=0.1))
def create_bias(size):
return tf.Variable(tf.constant(0.1, shape = [size]))

These are the standard two functions that you’ll find everywhere. These two are the functions that we use to create tensors for weights and bias of a layer.

def convolution(inputs, num_channels, filter_size, num_filters):
weights = create_weights(shape = [filter_size, filter_size, num_channels, num_filters])
bias = create_bias(num_filters)

## convolutional layer
layer = tf.nn.conv2d(input = inputs, filter = weights, strides= [1, 1, 1, 1], padding = 'SAME') + bias
layer = tf.nn.tanh(layer)
return layer

This is a function that creates a convolutional layer. First, we create the weights, the shape of the weights is [filter_size, filter_size, num_channels, num_filters] which should be clear if you know convolutional networks. Then we define the bias. Next, to implement the convolutional layer, we use the tf.nn.conv2d (just as I mentioned earlier ). In the first line of creating convolutional layer, we use the tf.nn.conv2d and add the bias term and then apply an activation function (tanh) on it and return the output.

def maxpool(inputs, kernel, stride):
layer = tf.nn.max_pool(value = inputs, ksize = [1, kernel, kernel, 1], strides = [1, stride, stride, 1], padding = "SAME")
return layer
def upsampling(inputs):
layer = tf.image.resize_nearest_neighbor(inputs, (2*inputs.get_shape().as_list()[1], 2*inputs.get_shape().as_list()[2]))
return layer

In the above code snippet, we define two functions, maxpool and upsampling. For upsampling, I am using the tf.image.resize_nearest_neighbor function to which I am giving specifying the output dimensions explicitly (2nd argument to the function).

Now, lets define the model for image colorization.

conv1 = convolution(x, 1, 3, 3)
max1 = maxpool(conv1, 2, 2)
conv2 = convolution(max1, 3, 3, 8)
max2 = maxpool(conv2, 2, 2)
conv3 = convolution(max2, 8, 3, 16)
max3 = maxpool(conv3, 2, 2)
conv4 = convolution(max3, 16, 3, 16)
max4 = maxpool(conv4, 2, 2)
conv5 = convolution(max4, 16, 3, 32)
max5 = maxpool(conv5, 2, 2)
conv6 = convolution(max5, 32, 3, 32)
max6 = maxpool(conv6, 2, 2)
conv7 = convolution(max6, 32, 3, 64)
upsample1 = upsampling(conv7)
conv8 = convolution(upsample1, 64, 3, 32)
upsample2 = upsampling(conv8)
conv9 = convolution(upsample2, 32, 3, 32)
upsample3 = upsampling(conv9)
conv10 = convolution(upsample3, 32, 3, 16)
upsample4 = upsampling(conv10)
conv11 = convolution(upsample4, 16, 3, 16)
upsample5 = upsampling(conv11)
conv12 = convolution(upsample5, 16, 3, 8)
upsample6 = upsampling(conv12)
conv13 = convolution(upsample6, 8, 3, 2)

In the above codeblock, wehave some that utilizes the functions we created earlier to create a model. In the model, the input dimension is [None, 256, 256, 1]. 1 is for the ‘l’ channel. wefirst decrease the width and height of this input and increase number of channels, and then do the reverse to obtain a tensor of shape [None, 256, 256, 2]. These are our output ‘a’ and ‘b’ channels.


Now, we will define the optimizer, cost function and train the model.

loss = tf.losses.mean_squared_error(labels = ytrue, predictions = conv13)
cost = tf.reduce_mean(loss)
optimizer = tf.train.AdamOptimizer(learning_rate = 0.0001).minimize(cost)

In the above code block, I have defined the loss using the inbuilt tensorflow function for mse loss. Note that I have passes conv13 as “predictions” in the 2nd argument of mean_squared_error function. This is because conv13 is the output of the model’s last layer.

Next, I have defined the optimizer. In tensorflow we need to specify what is the optimizer going to optimizer, so I have used here .minimize(cost) since I want to minimize the cost function.At last, I have initialized all the variables with global variable initializer.

num_epochs = 100
for i in range(num_epochs):
session.run(optimizer, feed_dict = {x: xt, ytrue:yt})
lossvalue = session.run(cost, feed_dict = {x:xt, ytrue : yt})
print("epoch: " + str(i) + " loss: " + str(lossvalue))

In the above code, I have run the session on optimizer, so that it can optimize the cost function. I used an argument “feed_dict”. This is used to feed values for the proxies (placeholders) that I mentioned earlier.

Finally, after this, our little image colorization model is ready to use.

output = session.run(conv13, feed_dict = {x: xt[0].reshape([1, 256, 256, 1])})*128
image = np.zeros([256, 256, 3])
image = color.lab2rgb(image)
io.imsave("test.jpg", image)

In the code above, I run the model on an image from the training data itself and save it to disk.

Further improvements:

This is a very basic model. There are much more advance models out there to solve the problem of image colorization that you should search for and try out OR make significant changes to the model we created in this tutorial to get some better results. Learning is always much better when you explore things to the highest extent.

Source: Deep Learning on Medium