Neural Style Transfer with Deep VGG model

Original article was published on Deep Learning on Medium

In this article, style transfer uses the features found in the 19-layer VGG Network, which is comprised of a series of convolutional and pooling layers, and a few fully-connected layers. In the image below, the convolutional layers are named by stack and their order in the stack. Conv_1_1 is the first convolutional layer that an image is passed through, in the first stack. Conv_2_1 is the first convolutional layer in the second stack. The deepest convolutional layer in the network is conv_5_4.

Figure 1. VGG19 architecture

Separating Style and Content

Style transfer relies on separating the content and style of an image. Given one content image and one style image, we aim to create a new, target image which should contain our desired content and style components:

  • objects and their arrangement are similar to that of the content image
  • style, colours, and textures are similar to that of the style image

In this notebook, we’ll use a pre-trained VGG19 Net to extract content or style features from a passed in image.

Load features in VGG19

VGG19 (Figure 1.) is split into two portions:

  • vgg19.features, which are all the convolutional and pooling layers
  • vgg19.classifier, which are the three linear, classifier layers at the end

We only need the features portion, which we’re going to load in and “freeze” the weights of, below.

Load in Content and Style Images

You can load in any images you want! Below, we’ve provided a helper function for loading in any type and size of image. The load_image function also converts images to normalized Tensors.

Additionally, it will be easier to have smaller images and to squish the content and style images so that they are of the same size.

VGG19 Layers

To get the content and style representations of an image, we have to pass an image forward through the VGG19 network until we get to the desired layer(s) and then get the output from that layer.

Gram Matrix

The output of every convolutional layer is a Tensor with dimensions associated with the batch_size, a depth, d and some height and width (h, w). The Gram matrix of a convolutional layer can be calculated as follows:

  • Get the depth, height, and width of a tensor using batch_size, d, h, w = tensor.size()
  • Reshape that tensor so that the spatial dimensions are flattened
  • Calculate the gram matrix by multiplying the reshaped tensor by it’s transpose

Putting it all Together

Now that we’ve written functions for extracting features and computing the gram matrix of a given convolutional layer; let’s put all these pieces together! We’ll extract our features from our images and calculate the gram matrices for each layer in our style representation.

Loss and Weights

Individual Layer Style Weights

Below, you are given the option to weight the style representation at each relevant layer. It’s suggested that you use a range between 0–1 to weight these layers. By weighting earlier layers (conv1_1 and conv2_1) more, you can expect to get larger style artifacts in your resulting, target image. Should you choose to weight later layers, you’ll get more emphasis on smaller features. This is because each layer is a different size and together they create a multi-scale style representation!

Content and Style Weight

Just like in the paper, we define an alpha (content_weight) and a beta (style_weight). This ratio will affect how stylized your final image is. It’s recommended that you leave the content_weight = 1 and set the style_weight to achieve the ratio you want.

# weights for each style layer
# weighting earlier layers more will result in *larger* style artifacts
# notice we are excluding `conv4_2` our content representation
style_weights = {‘conv1_1’: 1.,
‘conv2_1’: 0.8,
‘conv3_1’: 0.5,
‘conv4_1’: 0.3,
‘conv5_1’: 0.1}

content_weight = 1 # alpha
style_weight = 1e6 # beta

Updating the Target & Calculating Losses

You’ll decide on a number of steps for which to update your image, this is similar to the training loop that you’ve seen before, only we are changing our target image and nothing else about VGG19 or any other image. Therefore, the number of steps is really up to you to set! I recommend using at least 2000 steps for good results. But, you may want to start out with fewer steps if you are just testing out different weight values or experimenting with different images.

Inside the iteration loop, you’ll calculate the content and style losses and update your target image, accordingly.

Content Loss

The content loss will be the mean squared difference between the target and content features at layer conv4_2. This can be calculated as follows:

content_loss = torch.mean((target_features['conv4_2'] - content_features['conv4_2'])**2)

Style Loss

The style loss is calculated in a similar way, only you have to iterate through a number of layers, specified by name in our dictionary style_weights.

You’ll calculate the gram matrix for the target image, target_gram and style image style_gram at each of these layers and compare those gram matrices, calculating the layer_style_loss. Later, you’ll see that this value is normalized by the size of the layer.

Total Loss

Finally, you’ll create the total loss by adding up the style and content losses and weighting them with your specified alpha and beta!

Intermittently, we’ll print out this loss; don’t be alarmed if the loss is very large. It takes some time for an image’s style to change and you should focus on the appearance of your target image rather than any loss value. Still, you should see that this loss decreases over some number of iterations.

Final output

You can find the whole codes in my GitHub repository.