Source: Deep Learning on Medium
Andrew Tan, Quantitative Analyst
Preston Lim, Software Engineer
Tan Kai Wei, Data Scientist
Have you ever looked at an old black and white photo and wondered: what did the person taking this photo actually see?
Was there something about the arrangement of colours that compelled the photographer to capture this very moment? And if so, did the photographer see something that we — modern day viewers of this black and white photo — are not privy to?
While it is impossible to replicate the exact conditions in which the original photo was taken, it is possible to add colour to the photo to help us imagine what the photographer could have seen in that instant. It is incredible — almost magical — how a little bit of colour can bring us that much closer to that specific moment in time.
And as such, for our hackathon in January, our team decided to build a deep learning colouriser tool trained specifically for old Singaporean photos.
An important note here: the point of colourisation is to generate an image with colours that are plausible. It by no means guarantees that the colourised image is an accurate representation of the actual snapshot in time.
Another note: colourisation is a field of active research and our model is by no means perfect — it works well on some images but not others.
If you have old black and white photos and would like to colourise them, you can do so here: Colourise.sg. We do not store any of the photos that you upload to our colouriser application. We are planning to run this service for the month of February — but might extend it depending on usage.
We will explain how we created Colourise.sg in the following sections of this blog post. Skip ahead to the end if you would like to check out more of our results!
Wait. Hasn’t this been done before?
Yes — and no.
While it is true that colourisation tools exist, to our knowledge, none of these tools have been trained to colour images specific to the Singaporean historical context.
Why does this matter, you ask? Take, for example, the black and white image from the beginning of the article and its colourised output created by the deep learning tool Algorithmia. Algorithmia produces an image that lacks colour intensity and is not very believable.
A major contributing factor to poor colourisation of old Singaporean photos could be the fact that the old Singaporean black and white images are too different from the training dataset. The model used by Algorithmia (created by Zhang et al) is trained using 1.3 million images from ImageNet — a commonly used image database created by researchers at Stanford University and Princeton University.
And as such, images in ImageNet are unlikely to have images relevant to Singapore. What this means is that the model is unlikely to have learnt what the colours of an old Singaporean schoolyard scene could plausibly be.
We hypothesise that a tool trained on Singapore-specific historical images will produce more believable colourised old Singaporean photos than existing tools.
How does one colourise a black and white image?
Before we jump into how colourisation can be done by a computer programme, let’s first consider how colourisation is done by a human colourist.
Colourisation is an extremely time and skill-intensive endeavour. In order to create an appropriately colourised photo, an experienced human colourist has to do two tasks:
- (1) do significant research on the historical, geographic, and cultural context of the photo in order to derive appropriate colours, and
- (2) colour the black and white image using software tools like Photoshop.
(This is of course an oversimplification of the work colourisation artists do — for a more detailed and accurate explanation, check out this great video by Vox.)
Similarly, a computer programme needs to perform the two tasks, albeit in a slightly different manner. A programme needs to:
- (1) identify objects in a black and white photo, and figure out a plausible colour for the objects given images that it has seen in the past, and
- (2) colour the black and white image.
Colourisation using Generative Adversarial Networks (GANs) — a deep learning technique
To colourise black and white images, we employed a technique in deep learning known as Generative Adversarial Networks (GANs). This comprises:
- A first neural network — a ‘generator’ — with many mathematical parameters (> 20 million) that tries to predict the colour values at different pixels in a black and white image, based on features in the image, and
- A second neural network — the ‘discriminator’ — that tries to identify if the generated colours are photo-realistic compared to the original coloured image.
The model is trained until the generator can predict colours that the discriminator cannot effectively distinguish as fake. A simplified view of the architecture used for training is shown below:
We used the popular fast.ai and PyTorch libraries to develop our model, with an architecture and training steps inspired by Jason Antic (https://github.com/jantic/DeOldify). We trained our model based on a new set of more than 500,000 old, publicly available Singapore based images that we compiled, using a local GPU cluster with NVIDIA V100 GPUs.
Other steps we took to improve our model included adding images from Google’s Open Images V4, especially for body parts that our model did not seem to do too well on (e.g. hands, legs, and arms which were hard for the model to identify), and modifying learning rates and batch sizes for better results.
Deploying our deep learning model as a web application
At this point, our deep learning model lived in our office’s local GPU cluster — which meant that only our team had access to the colouriser model. In order for the colouriser to be useful to anyone outside our team, we had to deploy it on the internet.
We went with Google Cloud Platform as our cloud provider for the colouriser service. The architecture is fairly simple, with:
- (1) a CDN offering DDoS protection and caching of static content,
- (2) an NGINX frontend proxy and static content server,
- (3) a load balancer that distributes traffic, and
- (4) backend colouriser services with NVIDIA Tesla K80 GPUs that perform the actual colourisation.
The colourisation step is compute intensive and takes approximately 3 seconds to complete per image. As such, we decided to shield the backend colouriser services by using an NGINX server to queue requests to the backend. If the rate of incoming requests far exceeds the rate that our backend services can handle, the NGINX server immediately returns a status response to the client asking the user to try again later.
The key highlight of this architecture is that the colouriser service virtual machines (VMs) are autoscaled in response to how much traffic each VM has to service. This saves on cost because additional VMs are only switched on when there is demand for it.
Here are some of our favourite results using photos obtained with permission from the New York Public Library and the National Archives of Singapore. We would like to note that our sources only provided us with the black and white photos and are not in any way responsible for the colourised output created by us.
Our model performs well on high resolution images that prominently feature human subjects (images where people occupy a large portion of the image) and natural scenery.
The following images look believable (at least to us) because they contain objects that exist in sufficient examples of the training image dataset. And so the model is able to identify the correct objects in the image and colour them believably.
Funky things happen when the model does not recognise objects in the photo.
Take for example the following image, “Japanese Surrender at Singapore”. The colouriser colours one — and only one — of the soldiers’ fist red. But it gets the rest of the soldiers’ fists correct. This happens because the model is unable to tell that the clenched fist is actually a fist from the angle the photo was taken. And so the colouriser makes its best guess but didn’t quite get it right.
This happens again in the following image, “Minister of Finance Dr. Goh Keng Swee arrives at opening of Bata shoe factor in Telok Blangah”. The face of the man on the right of the photo is coloured a ghastly grey because half of the man’s face is hidden from view, and so the model is unable to identify the object in the photo.
This phenomenon is known as occlusion — one of the major challenges in computer vision, where object recognition algorithms have trouble identifying objects that are partially covered.
More cool results
Here are more cool results from the colouriser. Because, why not?
We would like to thank Richard Zhang and Jason Antic for their image colourisation research work that inspired this project. We would also like to thank the National Archives of Singapore, the New York Public Library, and US Library of Congress for approving the use of archive photos in this article.