Colorize your black & white world with Deep Learning Studio



Today, colorization is usually done by hand in Photoshop. To appreciate all the hard work behind this process, take a peek at this gorgeous colorization memory lane video:

In short, a picture can take up to one month to colorize. It requires extensive research. A face alone needs up to 20 layers of pink, green and blue shades to get it just right.

Since we all know that today deep learning is applied on almost each and every field and makes the old human based approach more easy and faster.

So what if we can build a deep learning based model that can reproduce old memories from our childhood and give it some exciting colors.

So now I am going to demonstrate a simple neural network that can do this task for us using Inception Resnet V2 that has been trained on 1.2 million images. To make the coloring pop, we’ll train our neural network on portraits from Unsplash.

Logic

In this section, I’ll outline how to render an image, the basics of digital colors, and the main logic for our neural network.

Black and white images can be represented in grids of pixels. Each pixel has a value that corresponds to its brightness. The values span from 0–255, from black to white.

Color images consist of three layers: a red layer, a green layer, and a blue layer. Imagine splitting a green leaf on a white background into the three channels. Intuitively, you might think that the plant is only present in the green layer.

But, as you see below, the leaf is present in all three channels. The layers not only determine color, but also brightness.

To achieve the color white, for example, you need an equal distribution of all colors. By adding an equal amount of red and blue, it makes the green brighter. Thus, a color image encodes the color and the contrast using three layers:

Just like black and white images, each layer in a color image has a value from 0–255. The value 0 means that it has no color in this layer. If the value is 0 for all color channels, then the image pixel is black.

As you may know, a neural network creates a relationship between an input value and output value. To be more precise with our colorization task, the network needs to find the traits that link grayscale images with colored ones.

In sum, we are searching for the features that link a grid of grayscale values to the three color grids.

f() is the neural network, [B&W] is our input, and [R],[G],[B] is our output.

Now since we are dealing with high resolution images as we increase the dataset we need more computation power and for that I always prefer using Deep Learning Studio jupyter notebooks by Deep Cognition. It provides Amazon Deep Learning Instances with GPU which can be used to train the model.

If you are not familiar with how to use Deep Learning Studio take a look at this :)

Introduction

Complete Guide

A video walkthrough of Deep Cognition by Favio Vázquez

“Information: The python code and the dataset I have used can be download from my GitHub repository

Environment setup

One of the best part of the Deep Learning Studio is that even if I want to code I don’t have to take the pain of setup different environments and libraries in my computer. I can easily installed them with a single click on Deep Learning Studio Cloud and use them from where ever I want to.

1) Installing Python environment

To do this go to environment tab available in DLS.

Then click on the environments you want to install.

For this task we will install the following environments:

  1. Python3
  2. Tensorflow-gpu-1.6.0
  3. Keras-gpu-2.1.5

2) Installing python packages

Click on the launch environment.

Then open Terminal.

In terminal write this command:

pip install scikit-image

Uploading the dataset

Open file browser and create a new folder for this project

Upload the dataset that is available at my Github repository.

If you want you can create your custom dataset by uploading high resolution Colorful images in the train folder and grayscale images in the test folder.

Let’s start coding

Importing all the libraries

import keras
from keras.applications.inception_resnet_v2 import InceptionResNetV2
from keras.preprocessing import image
from keras.engine import Layer
from keras.applications.inception_resnet_v2 import preprocess_input
from keras.layers import Conv2D, UpSampling2D, InputLayer, Conv2DTranspose, Input, Reshape, merge, concatenate
from keras.layers import Activation, Dense, Dropout, Flatten
from keras.layers.normalization import BatchNormalization
from keras.callbacks import TensorBoard
from keras.models import Sequential, Model
from keras.layers.core import RepeatVector, Permute
from keras.preprocessing.image import ImageDataGenerator, array_to_img, img_to_array, load_img
from skimage.color import rgb2lab, lab2rgb, rgb2gray, gray2rgb
from skimage.transform import resize
from skimage.io import imsave
import numpy as np
import os
import random
import tensorflow as tf

Reading all the images from Train folder and loading inception weights

# Get images
X = []
for filename in os.listdir('Train/'):
X.append(img_to_array(load_img('Train/'+filename)))
X = np.array(X, dtype=float)
Xtrain = 1.0/255*X
#Load weights
inception = InceptionResNetV2(weights='imagenet', include_top=True)
inception.graph = tf.get_default_graph()

Creating encoder and decoder along with a fusion layer.

In parallel to the encoder, the input images also run through one of today’s most powerful classifiers — the Inception ResNet v2 . This is a neural network trained on 1.2M images. We extract the classification layer and merge it with the output from the encoder.

By transferring the learning from the classifier to the coloring network, the network can get a sense of what’s in the picture. Thus, enabling the network to match an object representation with a coloring scheme.

encoder_inputis fed into our Encoder model, the output of the Encoder model is then fused with the embed_inputin the fusion layer; the output of the fusion is then used as input in our Decoder model, which then returns the final output, decoder_output.

embed_input = Input(shape=(1000,))
#Encoder
encoder_input = Input(shape=(256, 256, 1,))
encoder_output = Conv2D(64, (3,3), activation='relu', padding='same', strides=2)(encoder_input)
encoder_output = Conv2D(128, (3,3), activation='relu', padding='same')(encoder_output)
encoder_output = Conv2D(128, (3,3), activation='relu', padding='same', strides=2)(encoder_output)
encoder_output = Conv2D(256, (3,3), activation='relu', padding='same')(encoder_output)
encoder_output = Conv2D(256, (3,3), activation='relu', padding='same', strides=2)(encoder_output)
encoder_output = Conv2D(512, (3,3), activation='relu', padding='same')(encoder_output)
encoder_output = Conv2D(512, (3,3), activation='relu', padding='same')(encoder_output)
encoder_output = Conv2D(256, (3,3), activation='relu', padding='same')(encoder_output)
#Fusion
fusion_output = RepeatVector(32 * 32)(embed_input)
fusion_output = Reshape(([32, 32, 1000]))(fusion_output)
fusion_output = concatenate([encoder_output, fusion_output], axis=3)
fusion_output = Conv2D(256, (1, 1), activation='relu', padding='same')(fusion_output)
#Decoder
decoder_output = Conv2D(128, (3,3), activation='relu', padding='same')(fusion_output)
decoder_output = UpSampling2D((2, 2))(decoder_output)
decoder_output = Conv2D(64, (3,3), activation='relu', padding='same')(decoder_output)
decoder_output = UpSampling2D((2, 2))(decoder_output)
decoder_output = Conv2D(32, (3,3), activation='relu', padding='same')(decoder_output)
decoder_output = Conv2D(16, (3,3), activation='relu', padding='same')(decoder_output)
decoder_output = Conv2D(2, (3, 3), activation='tanh', padding='same')(decoder_output)
decoder_output = UpSampling2D((2, 2))(decoder_output)
model = Model(inputs=[encoder_input, embed_input], outputs=decoder_output)

Now, we have to resize the image to fit into the Inception model. Then we use the preprocessor to format the pixel and color values according to the model. In the final step, we run it through the Inception network and extract the final layer of the model.

def create_inception_embedding(grayscaled_rgb):
grayscaled_rgb_resized = []
for i in grayscaled_rgb:
i = resize(i, (299, 299, 3), mode='constant')
grayscaled_rgb_resized.append(i)
grayscaled_rgb_resized = np.array(grayscaled_rgb_resized)
grayscaled_rgb_resized = preprocess_input(grayscaled_rgb_resized)
with inception.graph.as_default():
embed = inception.predict(grayscaled_rgb_resized)
return embed

With ImageDataGenerator, we adjust the setting for our image generator. This way, each image will never be the same, thus improving the learning rate. The shear_rangetilts the image to the left or right, and the other settings are zoom, rotation and horizontal-flip.

# Image transformer
datagen = ImageDataGenerator(
shear_range=0.2,
zoom_range=0.2,
rotation_range=20,
horizontal_flip=True)
#Generate training data
batch_size = 10

We use the images from our folder, Xtrain, to generate images based on the settings above. Then, we extract the black and white layer for the X_batchand the two colors for the two color layers.

To create our batch, we use the tweaked images. We convert them to black and white and run them through the Inception ResNet model.

def image_a_b_gen(batch_size):
for batch in datagen.flow(Xtrain, batch_size=batch_size):
grayscaled_rgb = gray2rgb(rgb2gray(batch))
embed = create_inception_embedding(grayscaled_rgb)
lab_batch = rgb2lab(batch)
X_batch = lab_batch[:,:,:,0]
X_batch = X_batch.reshape(X_batch.shape+(1,))
Y_batch = lab_batch[:,:,:,1:] / 128
yield ([X_batch, create_inception_embedding(grayscaled_rgb)], Y_batch)

Now we will compile our model using “RMSProp” optimizer and Mean Square error as loss function.

The stronger the GPU you have, the more images you can fit into it. With this setup, you can use 50–100 images. steps_per_epoch is calculated by dividing the number of training images with your batch size.

#Train model      
model.compile(optimizer='rmsprop', loss='mse')
model.fit_generator(image_a_b_gen(batch_size), epochs=50, steps_per_epoch=1)

1.0/255 indicates that we are using a 24-bit RGB color space. It means that we are using numbers between 0–255 for each color channel. This results in 16.7 million color combinations.

Since humans can only perceive 2–10 million colors, it does not make much sense to use a larger color space.

The Lab color space has a different range in comparison to RGB. The color spectrum ab in Lab ranges from -128 to 128. By dividing all values in the output layer by 128, we bound the range between -1 and 1.

We match it with our neural network, which also returns values between -1 and 1.

After converting the color space using the function rgb2lab() we select the grayscale layer with: [ : , : , 0]. This is our input for the neural network. [ : , : , 1: ] selects the two color layers, green–red and blue–yellow.

color_me = []
for filename in os.listdir('Test/'):
color_me.append(img_to_array(load_img('Test/'+filename)))
color_me = np.array(color_me, dtype=float)
gray_me = gray2rgb(rgb2gray(1.0/255*color_me))
color_me_embed = create_inception_embedding(gray_me)
color_me = rgb2lab(1.0/255*color_me)[:,:,:,0]
color_me = color_me+.reshape(color_me.shape+(1,))

After training the neural network, we make a final prediction which we convert into a picture.

Here, we use a grayscale image as input and run it through our trained neural network. We take all the output values between -1 and 1 and multiply it by 128. This gives us the correct color in the Lab color spectrum.

Lastly, we create a black RGB canvas by filling it with three layers of 0s. Then we copy the grayscale layer from our test image. Then we add our two color layers to the RGB canvas. This array of pixel values is then converted into a picture.

# Test model
output = model.predict([color_me, color_me_embed])
output = output * 128
# Output colorizations
for i in range(len(output)):
cur = np.zeros((256, 256, 3))
cur[:,:,0] = color_me[i][:,:,0]
cur[:,:,1:] = output[i]
imsave("result/img_"+str(i)+".png", lab2rgb(cur))

Results:

On small dataset

Number of training images = 10

Number of testing images = 8

Test Data

After 50 epochs:

After 100 epochs:

After 1000 epochs:

After 2000 epochs:

So, the main purpose of this article is to build a simple deep learning model using Deep Learning Studio.

Resources:

  1. Dataset download link.
  2. Original Dataset on Unsplash.
  3. The python code can be fetched from my repo here.
  4. If you want to know more about this approach check here
  5. To know the benefits of using Deep Learning Studio check here.
  6. Checkout Deep Cognition New Website

Support

If you like my articles, you can always support me with some money https://www.paypal.me/rajat2712/usd50

Thank you for your attention

You are using your time to read my work means the world to me. I fully mean that.

If you liked this story, go crazy with the applaud(👏) button! It will help other people to find my work and also share it with your friends!

Also, follow me on

if you want to! I would love that.

You can also checkout my professional portfolio website

Source: Deep Learning on Medium