Original article was published by Robert A. Gonsalves on Artificial Intelligence on Medium
For the past three months, I have been exploring the latest techniques in Artificial Intelligence (AI) and Machine Learning (ML) to create abstract art. During my investigation, I learned that three things are needed to create abstract paintings: (A) source images, (B) an ML model, and (C) a lot of time to train the model on a high-end GPU. Before I discuss my work, let’s take a look at some prior research.
Artificial Neural Networks
Warren McCulloch and Walter Pitts created a computational model for Neural Networks (NNs) back in 1943. Their work led to research of both the biological processing in brains and the use of NNs for AI. Richard Nagyfi discusses the differences between Artificial Neural Networks (ANNs) and biological brains in this post. He describes an apt analogy that I will summarize here: ANNs are to brains as planes are to birds. Although the development of these technologies was inspired by biology, the actual implementations are very different!
Both ANNs and biological brains learn from external stimuli to understand things and predict outcomes. One of the key differences is that ANNs work with floating-point numbers and not just binary firing of neurons. With ANNs it’s numbers in and numbers out.
The diagram below shows the structure of a typical ANN. The inputs on the left are the numerical values that contain the incoming stimuli. The input layer is connected to one or more hidden layers that contain the memory of prior learning. The output layer, in this case just one number, is connected to each of the nodes in the hidden layer.
Each of the internal arrows represents numerical weights that are used as multipliers to modify the numbers in the layers as they get processed in the network from left to right. The system is trained with a dataset of input values and expected output values. The weights are initially set to random values. For the training process, the system runs through the training set multiple times, adjusting the weights to achieve the expected outputs. Eventually, the system will not only predict the outputs correctly from the training set, but it will also be able to predict outputs for unseen input values. This is the essence of Machine Learning (ML). The intelligence is in the weights. A more detailed discussion of the training process for ANNs can be found in Conor McDonald’s post, here.
Generative Adversarial Networks
In 2014, Ian Goodfellow and seven coauthors at the Université de Montréal presented a paper on Generative Adversarial Networks (GANs). They came up with a way to train two ANNs that effectively compete with each other to create content like photos, songs, prose, and yes, paintings. The first ANN is called the Generator and the second is called the Discriminator. The Generator is trying to create realistic output, in this case, a color painting. The Discriminator is trying to discern real paintings from the training set as opposed to fake paintings from the generator. Here’s what a GAN architecture looks like.
A series of random noise is fed into the Generator, which then uses its trained weights to generate the resultant output, in this case, a color image. The Discriminator is trained by alternating between processing real paintings, with an expected output of 1 and fake paintings, with an expected output of -1. After each painting is sent to the Discriminator, it sends back detailed feedback about why the painting is not real, and the Generator adjusts its weights with this new knowledge to try and do better the next time. The two networks in the GAN are effectively trained together in an adversarial fashion. The Generator gets better at trying to pass off a fake image as real, and the Discriminator gets better at determining which input is real, and which is fake. Eventually, the Generator gets pretty good at generating realistic-looking images. You can read more about GANs, and the math they use, in Shweta Goyal’s post here.
Improved GANs for Large Images
Although the basic GAN described above works well with small images (i.e. 64×64 pixels), there are issues with larger images (i.e. 1024×1024 pixels). The basic GAN architecture has difficulty converging on good results for large images due to the unstructured nature of the pixels. It can’t see the forest from the trees. Researchers at NVIDIA developed a series of improved methods that allow for the training of GANs with larger images. The first is called “Progressive Growing of GANs” .
The key idea is to grow both the generator and discriminator progressively: starting from a low resolution, we add new layers that model increasingly fine details as training progresses. This both speeds the training up and greatly stabilizes it, allowing us to produce images of unprecedented quality. — Tero Karras et. al., NVIDIA
The team at NVIDIA continued their work on using GANs to generate large, realistic images, naming their architecture StyleGAN . They started with their Progressive Growing of GANs as a base model and added a Style Mapping Network, which injects style information at various resolutions into the Generator Network.
The team further improved the image creation results with StyleGAN2, allowing the GAN to efficiently create high-quality images with fewer unwanted artifacts . You can read more about these developments in Akria’s post, “From GAN basic to StyleGAN2”.
Prior Work to Create Art with GANs
Researchers have been looking to use GANs to create art since the GAN was introduced in 2014. A description of a system called ArtGAN was published in 2017 by Wei Ren Tan et. al. from Shinshu University, Nagano, Japan . Their paper proposes to extend GANs…
… to synthetically generate more challenging and complex images such as artwork that have abstract characteristics. This is in contrast to most of the current solutions that focused on generating natural images such as room interiors, birds, flowers and faces. — Wei Ren Tan et. al., Shinshu University
A broader survey of using GANs to create art was conducted by Drew Flaherty for his Masters Thesis at the Queensland University of Technology in Brisbane, Australia . He experimented with various GANs including basic GANs, CycleGAN , BigGAN , Pix2Pix, and StyleGAN. Of everything he tried, he liked StyleGAN the best.
The best visual result from the research came from StyleGAN. … Visual quality of the outputs were relatively high considering the model was only partially trained, with progressive improvements from earlier iterations showing more defined lines, textures and forms, sharper detail, and more developed compositions overall. — Drew Flaherty, Queensland University of Technology
For his experiments, Flaherty used a large library of artwork gleaned from various sources, including WikiArt.org, the Google Arts Project, Saatchi Art, and Tumblr blogs. He noted that not all of the source images are in the public domain, but he discusses the doctrine of fair use and its implications on ML and AI.
For my experiment, named MachineRay, I gathered images of abstract paintings from WikiArt.org, processed them, and fed them into StyleGAN2 at the size of 1024×1024. I trained the GAN for three weeks on a GPU using Google Colab. I then processed the output images by adjusting the aspect ratio and running them through another ANN for a super-resolution resize. The resultant images are 4096 pixels wide or tall, depending on the aspect ratio. Here’s a diagram of the components.
Gathering Source Images
To gather the source images, I wrote a Python script to scrape abstract paintings from WikiArt.org. Note that I filtered the images to only get paintings that were labeled in the “Abstract” genre, and only images that are labeled as being in the Public Domain. These include images that were published before 1925 or images that were created by artists who died before 1950. The top artists represented in the set are Wassily Kandinsky, Theo van Doesburg, Paul Klee, Kazimir Malevich, Janos Mattis-Teutsch, Giacomo Balla, and Piet Mondrian. A snippet of the Python code is below, and the full source file is here.
I gathered about 900 images, but I removed images that had representational components or ones that were too small, cutting the number down to 850. Here is a random sampling of the source images.