Source: Deep Learning on Medium
Neural Artistic Style Transfer Explained
I’ve been wanting to get an official oil painting done for my cat. Unfortunately, artists are expensive, (plus they don’t accept ‘exposure’ as payment).
So I figured out how to make a computer paint for me.
Using PyTorch, I implemented Neural Artistic Style Transfer, a technique that uses deep learning to transfer the artistic style of a painting, or any other image, onto my cat.
In this article, I’ll cover how this technique works, with code samples so you can make it on your own. You can view my code in full on Google Colab.
What is Artistic Style?
What do you think artistic style is?
Unfortunately, this article isn’t interactive, so I’ll just use my own definition. Artistic style consists of features like colour and texture. A painting of a mountain VS a person will have very different content, but if made by the same artist they can have the same style.
If we want to transfer style, we need some way to measure it. This can be done with a pretrained Convolutional Neural Network (CNN).
CNNs have many layers, each with many filters. Each filter learns a spatial pattern in images. For example: filters in early layers might learn simple features like detecting edges or specific colours, while filters in later layers can build off of their predecessors and find more complex features like shapes and faces. If a specific filter detects a feature in an image, we say that filter is activated.
Artistic style is how higher-level features such as shapes are represented on the pixel level.
Artistic style is about how you paint, not what you paint.
So how do we use CNNs to measure style?
You might be tempted to use the activations of filters in earlier layers, since these capture pixel patterns. Unfortunately, lower-level activations capture style (pixel patterns) and content, and there’s no way to separate the two.
2 Tools that Neural Artistic Style Transfer Uses to Capture Style
- The Gram matrix
- Observe the correlations between several filters in different layers
The Gram Matrix captures non-localized information about an image.
Correlations between several filters are useful. This is because if filters in multiple layers of a CNN are being activated, it’s because they’re capturing something general about an image, style.
The Math Behind Artistic Style Transfer
Here’s the math behind these 2 techniques, and how I implemented them in PyTorch
Style and the Gram Matrix
The Gram matrix captures non-localized information about an image. Non-localized information doesn’t care about where in an image something occurs. Non-localization is good when it comes to capturing style because we only care if a stylistic feature is present in an image, not where in an image that stylistic feature occurs. A Gram matrix results from multiplying a matrix with the transpose of itself.
Here’s how I implemented the Gram matrix in PyTorch:
Because every column is multiplied with every row (self x transpose), we can think of the spatial information that was contained in the original matrix to have been distributed across rows and columns.
The Gram matrix is a key component in computing style loss.
Style loss is how different an image is stylistically from another image. To perform style transfer we’d like to minimize the style loss between our target image (image being modified) and the style image.
Here’s the equation for style loss:
To get the style loss for an image, we measure the squared error between the Gram matrix (G) of style representations of the target image (left term) and style image (right term), in a list of predetermined layers (l) in a pretrained CNN. Both of these Gram matrices are multiplied by beta, know as the style weight.
Over time this style loss is minimized to transfer the style of one image to another. When style loss is low, then style has been successfully transferred!
Content Representation and Loss
When we transfer the style of a painting onto an image (e.g. my cat) we need to make sure that the image still has a cat in it. To do this we try to keep the content loss low. Content representations (and loss) don’t care about individual pixel values since these will be changed by the style representation later. They just care about the high-level features, such as if an ear is present in the image, and if so where.
Here’s the equation for content loss:
To get the content loss for an image, we pass the target image and content image into an image classification network (e.g. VGG-19) and measure the squared error in activations (each multiplied by the content weight) in a list of predetermined layers (l) between the two.
And here’s Content Loss in PyTorch:
In style transfer, only filter activations of the network are being used. The network doesn’t change what it thinks an ear looks like. In fact, the network doesn’t change at all. No updating of weights occurs in the network. Because of this we can use a pretrained CNN and not modify it at all.
In my implementation of style transfer I chose to use VGG-19, a CNN with 19 layers that is very good at detecting features in images.
Here’s how you import a pretrained VGG, and freeze its weights, in PyTorch
As detailed above, the activations in VGG-19 are used to measure the content and style loss for an image.
Putting It All Together: Style Transfer
Content + Style = Pretty Picture
We now know how to compute the content and style loss for an image. How do we use this information to create new stylized images?
It’s pretty simple actually:
- Combine the two loss functions to get the total loss
- Backpropagate the content image through the network, and follow the gradient of the loss to make it more like the style image (rinse and repeat)
Here’s how I implemented the training loop in PyTorch:
Style Transfer in Action
Here’s another example of artistic style transfer in action. This algorithm is amazing at producing pieces of artwork in only seconds
Style Transfer in the Wild
This version of style transfer is the original method detailed by Gatys, et al (2015). Their algorithm is simple, and good for artistic style transfer, but doesn’t give photo-realistic results.
In early 2018, researchers at NVIDIA created a much more powerful version of style transfer that creates photo-realistic results.
This is INSANE. Imagine, you took a photo on an overcast day, but you want to make it look more atmospheric as if it was taken at night. NVIDIA’s technique can do that automatically, and as far as I can see, perfectly.
- Using Neural Artistic Style Transfer, an image can be transformed to match the artistic style of another image
- Style can be captured by observing the correlations between filter activations in multiple layers of an image classification network
- Style transfer uses a pretrained image classification network, and doesn’t train the network in any way
- New research from NVIDIA allows for photo-realistic style transfer