Neural Networks — The 21st Century Artist

Original article was published by Varatharajah Vaseekaran on Artificial Intelligence on Medium


Creative art has been one of the qualities that have set humans apart from other species. From the prehistoric cave paintings to the sculptures and papyrus arts from Roman and Egyptian Eras, to the mesmerizing life-like portraits during the Renaissance Period, and the modern, innovative artistic creations from artists such as Monet and Picasso proved the fact that humanity has rarely failed to showcase their creative skills during the different time periods. Even though creativity has been a defining feature for humans in general, many face limitations in bringing out their vision and imagination to art.

This has been a hindrance to many for ages, but now, thanks to Neural Networks, the limitations faced are exponentially shortened, and this article focuses on Neural Style Transfer — a technique used to add style to images.

Different types of art ranging from the dawn of mankind to the current era.
Structural diagram of a Neural Network

Neural Networks have redefined the process of how computers can process vision. From basic object detection tasks such as, detecting cats from dogs to complex problems such as processing self-driving cars and massive satellite images. All these are made possible by utilizing neural networks. Deep Learning has revolutionised computer vision, by adding senses to the obsolete eyes, making it able to process and predict meaningful information. Deep Learning and Neural Networks are not only restricted to solving real-life problems; it has also made a huge impact on recreational activities, such as generating new music, art, and deep fakes, as mentioned in the article, creating styled images with beautiful paintings.

Imagine that your display picture is bland and dull, and you want to spruce up the image: voilà, here comes Neural Networks to the rescue. Your drab picture can be styled to look as if it was a painting by Picasso himself, thanks to the advancements of the research in Deep Learning.

You have a shaky picture of a sunset by the sea that doesn’t look perfect enough to post on Instagram: how about if the shaky image is styled as Van Gogh’s ‘Starry Night’. The below picture, which was developed by our Team at Rootcode AI, would give a general overview of how Neural Style Transfer works.

An overview of how Neural Style Transfer works

The Algorithm

Rather than giving a comprehensive explanation together with the mathematical theory on how Neural Style Transfer is implemented, this focuses on a high-level overview of how the algorithm works. As a very simple explanation of neural style transfer, a content image (the main image that needs to be styled) and a style image (an image that provides the style to the content image) are used by the Neural Style Transfer and a third image, which has the content from the content image and the style from the style image, is generated by the Neural Network.

In order to implement the Neural Style Transfer, the VGG (Varied Geometric Group) model was used. VGG models are massive Neural Network models with convolutional layers that are trained for weeks with high-end GPUs, and thankfully, pre-trained VGG models can be obtained from Keras easily. These large VGG networks are used to classify images from the ImageNet database, which consists of millions of images with thousands of different classes. In Neural Style Transfer, our main objective is to style the content images, not to classify them; therefore, the dense layers and softmax layers are removed from the VGG network, and the resulting network, which consists of only convolutional layers, is used.

Image of a VGG 16 Network (Source: towardsdatascience)

Both images, content and style images, are scaled in order to input it to the VGG network. Initially, the content image is passed through the VGG model. The convolutional layers act differently than usual dense layers: it seeks for specific features in a given image; that is, the early layers identify basic features of an image, such as horizontal, vertical and diagonal edges, and the later convolutional layers build upon the features are recognized by the early layers and detect complex and spatial features. The output (after feeding the model with the content image) is obtained at the later convolutional layers as in order to create beautifully styled images. We want to capture the complex features in the content image. The spatial features of the content image are important as the final generated image should resemble the content image.

When considering the style image, the final generated image should not look exactly like the style image, but the general style should be captured. Therefore, in order to achieve that, we’ll create a Gram Matrix for the style image — the Gram Matrix manages to capture the general style of the style image and reduces the spatial features, and this Gram Matrix would be fed to the model, where we select the outputs from the model at different levels of convolutional layers (i.e. at 5th, 6th, 9th, 11th, etc. layers).

We would generate a random image and that image would be fed to the model and we store the output layers that we obtained from the previous steps, and we begin the optimization process, where the random image would be transformed into the stylized content image. The final results would be a beautifully transformed image.

Neural style transfer is one of the fun and exciting techniques that can be achieved using convolutional neural networks, and it gets better as the field of deep learning evolves.