NST — Creating Artworks with the help of machine

Original article can be found here (source): Deep Learning on Medium

NST — Creating Artworks with the help of machine

Generating new arts by manipulating a base image to adopt an appearance of a different image

Many of us know how hard is to create an outstanding piece of art and the effort and experience that goes in while creating it is just commendable. While it seems like a patient and hard-to-acquire skill, you can actually create impressive artworks with just the help of a Deep Learning Algorithm.

Neural Style Transfer (NST) is a Deep Learning Algorithm which helps us create astounding artworks with no prior knowledge in art particularly. The basic idea of this algorithm is to take a base image which we classify as a Content Image and then try and manipulate its pixels to overlay another image called Style Image to produce the desired Art!

While as deceptive as it seems it’s true and the method actually works. Take a look at the images created by using this Algorithm.

Left: Yellow Labrador Looking, from Wikimedia Commons; Middle: Kandinsky’s Painting; Right: Image output by the Model

The Content Image in this context is the one in the left and the one in the middle is the Style Image and the one to the right is the Target Image, we’ll address them with with the above terminologies throughout the read.

Before indulging into the technicalities of the work, I will first try and establish the intuition behind it. Whenever we think of an image editing, what really comes to our mind is the different combinations of pixel values spanned across the three channels of color namely RGB. So, think of NST as taking the Content image and just putting the Style image on top of it as a new layer like applying a filter over an image, the only difference behind it is that it’s actually no applied directly over the pixels however, it is learned with the help of various image feature combinations.

If you’re from the ML/DL community you might know that there’s no better algorithm than CNN (Convolution Neural Network) to extract rich features from an Image. Using deep enough CNN we can extract the high level features we require for NST. To do our task we need CNN to find “what is in an image?” for us so that we can use it to create the Target image. As the layers of the CNN gets deeper and deeper they learn many complex functions which can tell us what actually an image is made up of.

We feed our Content Image and Style Image to a CNN network. Once it has learned the features of the image, we compute the Total Loss function. This loss function is the weighted average of two losses namely Content Loss and Style Loss. Intuitively, we are iteratively updating our output image in such a way that it minimizes our total loss by bringing the output as close as possible to the content of the target image and the style of the reference image.

Content Loss: The Content loss is defined as the L2 distance between the intermediate content representations, taken from higher layers of a pre-trained neural network, for an input image and the target image. As a higher level layer produces filters that possess complex raw information for the input image, this is a suitable approximation for judging similarity in terms of content. The equation is shown below:

Equation for the Content Loss, defined as the squared-error loss between the two feature representations of the target and output images in a particular layer L

Style Loss: Similarly, we define the style loss as the L2 distance between the gram matrices of the intermediate style representations for the style image (taken from lower layers of a pre-trained neural network) and the output images. The lower level layers capture more simple image features which best encode the concept of style. Also, by the use of the Gram Matrix, we distribute and delocalize spatial information in an image and approximate the style of an image. Mathematically, it’s just the matrix multiplication of the image matrix and its transpose.

Equation for the Gram Matrix. G is the inner product between the vectorized feature maps i and j in a particular layer L
Equation for the Style Loss, defined as the squared-loss error between the gram matrices of the style reference and the output image, over a subset of particular layers.

Finally, we add these two losses together into the Total Loss and train the network to minimize the loss.


Now, after grasping the main elements of the network, we’ll move towards the implementation of NST with the help of Tensorflow.

First of all we’ll load all the packages we need for the implementation

In the next step, you will write a helper function which will load the two images in arrays of numbers (because computers understand only numbers) and reshape them for making them compatible with the model.

Now use the above created function and load our Content and Style Images

The images are reshaped. Now, you will load a pre-trained VGG19 model for extracting the features. As you will be using the model for extracting features, you will not need the classifier part of the model.

We’ll take a look at the VGG19 model you just loaded and print the names of all the layers present in the network.

You are interested in the following ones for getting the style features, however:

  • ‘conv1_1’
  • ‘conv2_1’
  • ‘conv3_1’
  • ‘conv4_1’
  • ‘conv5_1’

For content features, you will need conv4_2. You will store these in variables accordingly.

Now create a custom VGG model which will be composed of the specified layers. This will help you run forward passes on the images and extract the necessary features along the way.

Defining a gram matrix is super easy in TensorFlow, and you can do it in the following way:

We will now define a custom model using the mini_model() function. This will be used for returning the content and style features from the respective images.

Now that we have defined the custom model let’s use it on the images to get the content and style features accordingly.

We’ll load the optimizer function.

Let’s now define the overall content and style weights and also the weights for each of the style representations as discussed earlier. Note that these are hyperparameters and are something you should play with.

Now comes the most crucial part, which makes the process of neural style transfer a lot more fun — the loss function.

You will now write another function which will:

  • Calculate the gradients of the loss function you just defined.
  • Use these gradients to update the target image.

We’ll set the target image to the content image at the beginning. And then we’re ready to train out model.

Hurray! All our work here is done. In just some minutes you created a brand new artwork of your own and that too without any prior knowledge in it.

That’s the power which Machine Learning gives in your hand. Embrace it! Enhance it! and Explore it! Give this article an clap if you like it. Happy Learning!