Source: Deep Learning on Medium
For all those who are not so familiar with computer vision, neural style transfer allows us to compose images in the style of another image . Inspired by the paper on Image Style Transfer Using Convolutional Neural Networks by Leon A. Gatys , Alexander S. Ecker , Matthias Bethge which is a great read and worth checking out.
Ever wondered what if Leonardo da Vinci , Piccaso, Monet lived in our time and saw what we see everyday and made painting like photographs of our time of the world around us.
A short summary of neural style transfer goes like this —
Neural style transfer is an optimization technique used to take three images, a content image (in our case photograph of the modern world), a style reference image (such as an artwork by a famous painter), and the input image you want to style — and blend them together such that the input image is transformed to look like the content image, but “painted” in the style of the style image.
I know it looks like magic but believe me it science. The content image is of Louvre museum ,France and the style image is Monet’s painting called “Poppies”.
Three images are passed as input — content image , style image and a random noisy input image . Think of random noise image as an canvas over which the neural network will paint. The output includes a generated image which had be produced after N steps of gradient descent in our case ADAM(adaptive momentum) optimizer .
We make a NST algorithm by following two steps-
- Using a pre-trained model
- Computing a cost function
Neural Style Transfer (NST) uses a previously trained convolutional network, and builds on top of that. What cool kids call transfer learning on a Convnet
Following the research paper https://arxiv.org/abs/1508.06576 we are using a VGG -19 , a 19-layer version of the VGG network. This model has already been trained on the very large ImageNet database, and thus has learned to recognize a variety of low level features (at the earlier layers) and high level features (at the deeper layers). We will pick a middle layer as it will be most optimal for our use case.
As described in the paper we will be using the following cost function to calculate loss and use it with adam
There are two independent cost that make up out total cost function
- Jcontent(C,G) — Its the content image cost with the generated image
- Jstyle(S,G) — Its the style image cost with the generated image
Alpha and Beta are parameters whose value we have set to 10 and 40 respectively. They basically drives the generated image either towards the content image or style image depending upon the value.
For Jcontent(C,G) we have picked one particular hidden layer l to use. Now, set the image C as the input to the pretrained VGG network, and run forward propagation. Let a(C) be the hidden layer activations in the layer we have chosen. This will be a nH×nW×nC tensor. Repeat this process with the image G: Set G as the input, and run forward progation. Let a(G) be the corresponding hidden layer activation. We will define as the content cost function as:
For Jstyle(S,G) we first need to construct a gram matrix . A Gram matrix G of a set of vectors (v1,…,vn) is the matrix of dot products, In other words, Gij compares how similar vi is to vj: If they are highly similar, you would expect them to have a large dot product, and thus for Gij to be large.
After generating the Style matrix (Gram matrix), our goal is to minimize the distance between the Gram matrix of the “style” image S and that of the “generated” image G. For now, we are using only a single hidden layer a[l], and the corresponding style cost for this layer is defined as:
I went ahead and played with multiple example —