Style Reversion, with Fast.ai tools
Note: This post assumes some familiarity with basics of Neural Style Transfer. If new to the topic it will help to get a grasp of it before going on. I recommend this post by a former student of Fast.ai course Cutting Edge Deep Learning for Coders: https://medium.com/@shivamgoel1791/everything-you-need-to-know-about-neural-style-transfer-994530cc9a6e
A successful idea
When I first discovered Neural Style Transfer I was fascinated but also a bit uncomfortable.
Such a powerful tool with such a simple idea. Results in many cases were so original that could be named art by their own merit. Plus, it was popular, lots of people were using their smartphones to make Monet-like portraits of their favorite selfie. The topic became so successful that it reached a certain saturation point.
Why did the thing work so well? Nobody seemed to know.
But there was one claim often made about the algorithm that I knew for sure was not true. Most explanations about it sooner or later would include the sentence “This is like having Monet painting for you” or something of the kind. And that claim, by design, is not true, and I can tell you why:
The reason, STYLE of most works of art is CONTENT DEPENDENT, that “Starry Night” style made sense in that moment, for that landscape, but Van Gogh would never use it to portrait your cat in a Sunday morning. He would choose a proper style but we just can not know… or can we?
This implies that classical Style Transfer is a cool way of transferring the style of a single painting, not of an author.
So… what could we do to “learn” how a painter did what he did? We need the content. The original one. And we need to observe the options taken, the style chosen to abstract that content. After that provided we have enough examples a CNN could learn the patterns that make an author different from another.
All right… it’s almost certainly impossible. In my opinion the unsolvable part is “enough examples”. But what about step one, recovering original models? It was tempting to try.
How to recover original content or at least something close to it? My idea was to use the same approach of Style Transfer, just the other way around. Reverting style by defining content as the painting and style as a photo. I needed a proof of concept.
Of all the implementations of Style Transfer there was one I was more familiar with from Fast.ai part2 Course that I had attended as international fellow months ago so I would use that. As content picture I chose a famous self portrait of Van Gogh.
What about the style photo? It could be any portrait photo, as long as it shared general physical properties with Van Gogh, this is, red hair and blue-green eyes. It also had to be copyright -free. Also, as different as possible in content, for the sake of clarity.
A quick search on the web gave me a couple of candidates, both of them women. That was ok even if Vincent was a man because style, I thought, was captured in a locality independent way. (This intuition is not completely true as I was about to discover but using the picture of a woman for a man’s style did work fine).
It should have been question of hours to make it work. Only parameters to touch were levels of the CNN at which content and style were extracted and content loss vs. style loss weight.
For some reason, it didn’t work at first. Here is a sample of the kind of result I got initially with different parameters:
Quite horrible. But still in almost all of the pictures there was a part that looked as “real Vincent”. An eye, the nose… what was going on here? It took me a while to discover.
“Partial locality” of Style Gram Matrixes
I finally realized what was going on. And I did so by thinking about how actually does the Gram Matrix of filters, layer-wise, keep spatial information of the style image.
(If you don’t care about what Gram matrix do in NST you can skip next paragraph but in case you want to try style reversion on your own you will have to understand this because it turned out to be critically important for getting a minimally viable proof of concept)
Realize that for each convolutional layer a Gram matrix captures the “fingerprint” of filters activations in the image individually as well as correlating in space with other filters. In my specific example this implies that if, for example, we take only half of the woman’s photo or the proportion of hair vs. skin changes… the style also changes. There is a spatial component in style. This insight made me inspect style photo trying to find out what strong spatial component was damaging style reversion and creating all those dark parts.
The most promising suspect was contrast. If style is not independent of the existence of, say, a big dark zone in the photo then the algorithm will try to transfer that contrast quality of global image at any prize, creating big darker zones. Just an intuition but easy to find out. I made a simple contrast reduction of the style photo, like this:
Reproducing it here because this modification was critical for the approach to work. First image would never work, no matter the settings. Contrast lowered image did work. There are much accurate ways of balancing contrast of source and target but in this case just lowering it worked well enough.
After that I rerun exactly the same code. You can see the result below:
Much better, isn’t it? I would say the “style-reverted” image looks quite well, some steps have been given towards “realistic” face of Vincent. Sure it can be improved in many ways but I consider it enough for a proof of concept.
The “recipe” and some take home ideas
The general approach should work with any “clasic” implementation of Style Transfer , you can find the Fast.ai notebook I used in Github here: https://github.com/fastai/fastai/blob/master/courses/dl2/style-transfer.ipynb
Exact parameters will be dependent on implementation and CNN architecture but the main tricks to make it work were:
– Unless working with really big sizes choose content layer lower than in normal approach. Realize that the higher you cut the more pooling has happened and more distortion will occur.
– Choose a compatible “style” photo. And remember that compatibility includes not only theme but also scale/masses and contrast cause Gram matrix of filters will be sensitive to that.
– Weight style loss more than content, optimal weight here was 10x content loss (try different proportions)
– I used size = 432, the max size my laptop GPU could handle. Results were worse with smaller size, it is quite possible that bigger size and proper tuning will give even better results.
It can be argued that picking a photo to revert style of a painting is just as arbitrary as was doing the opposite. True, but there is one difference. There is much more variation in style in one artist’s works and among artists than variation of, for example, skin texture among physically similar persons or plants of the of same species. This makes much easier to pick “sensible” candidates for style photos just as in the example above.
If we just had enough examples — but we don’t have!- we could use that material with DL to abstract at least some of the options individual artists made when creating.
I have exposed the simple but appealing idea of using Style Transfer algorithm to revert style in a painting and approximate it to a more realistic image of the original content. Also hinted the possibility of using this artificial original to feed a Neural Net and abstract an author’s style, even though almost certainly lack of enough examples makes that approach unfeasible.
So glad I found the time to put in words this intuitions. I hope it was of some interest. Maybe someone with time and curiosity in further exploring will achieve some more interesting results on other themes and authors.
Non DL bonus: For those interested in Art who would like to understand why Van Gogh painted how he did, there is a book in which he will tell you about it. His “Letters to Theo” (his dear brother who supported him all his life) contain many hints of what he wanted to achieve. The search of a style more real than reality. Definitely worth reading.
Source: Deep Learning on Medium