Neural Style Transfer using VGG model

Source: Deep Learning on Medium

Neural Style Transfer using VGG model

A technique to transform a digital image that adopts the style of different image


Before we begin, let’s go to this website to get some inspiration. On the website, we choose a photo from the local computer (let’s assume the image named Joey.jpg). Let’s call this content image. Then we choose another image, say style image named style1.jpg from the local computer. What this website does is produces a mixed image that preserves the contours of the content image and adds the texture and color pattern from the style image to the content image. Following is the result.

Left: Original Image, Right: Style Image, Middle: Mixed Image


This is called Neural Style Transfer (NST) and is done by using Deep Learning, Convolution Neural Network (CNN) to be specific. I assume you are familiar with CNN. If not, I would highly recommend Andrew Ng’s Course on CNN.

Let us understand the basics of NST with the help of the following flowchart. It shows the Style Transfer algorithm which has 13 convolutional layers (only a few are shown for simplicity). Two images are input to the neural network i.e. a content image and a style image. Our motive here is to generate a mixed image that has contours of the content image and texture, color pattern of the style image. We do this by optimizing several loss functions.

The loss function for the content image minimizes the difference of the features activated for the content image corresponding to the mixed image (which initially is just a noise image that gradually improves) at one or more layers. This preserves the contour of the content image to the resultant mixed image.

Whereas the loss function for the style image minimizes the difference between so-called Gram-matrices between style image and the mixed image. This is done at one or more layers. The usage of the Gram matrix is it identifies which features are activated simultaneously at a given layer. Then we mimic the same behavior to apply it to the mixed image.

Using TensorFlow, we update the gradient of these combined loss functions of content and style image to a satisfactory level. Certain calculations of Gram matrices, storing intermediate values for efficiency, loss function for denoising of images, normalizing combined loss function so both image scale relative to each other.

Coding :

Now that we have understood the algorithm, let us begin coding. The original paper uses the VGG-19 model. But here we are going to use the VGG-16 model which is available publicly. Download the VGG-16 model from here (Please remember it is ~550MB file).

In the root directory, create a new folder name it as vgg16 and paste the above file and from the Github link. Also, we have modified the file by commenting out maybe_download function (since you have already downloaded the vgg16.tfmodel file)

Let’s import the libraries first. Then import the vgg16 model.