Original article was published by Fathy Rashad on Artificial Intelligence on Medium
Generating Novel Content without Dataset
Rewriting the rules in GAN: Copy & paste features contextually
GAN architecture has been the standard for generating content through AI, but can it actually invent new content outside what’s available in the training dataset? Or it’s just imitating the training data and mixing the features in new ways?
In this article, I will discuss “Rewriting Deep Generative Model” paper that enables editing GAN model directly to give the output that we want even if it doesn’t match the existing dataset. The image above is an example of an edit where you copy the helmet feature and pasted it on the horse contextually. I believe this possibility will open many new interesting applications in the digital industry such as generating fictitious content for animations or games where there may be no existing dataset.
Generative Adversarial Network (GAN) is a generative model which means it can generate a similar realistic output as the training data. For example, a GAN trained on human faces would be able to generate similar looking realistic faces. GAN is able to do this by learning the distribution of the training data and generate new content that follows the same distribution.
GAN “indirectly” learns the distribution by having a discriminator that tries to differentiate real and fake images and a generator that creates fake data to fool the discriminator. These 2 networks will continuously compete and learn from each other until both of them can generate and discriminate realistic images respectively.
Although GAN is able to learn the general data distribution and generate diverse images of the dataset. It is still limited to what exists in the training data. For example, let’s take a GAN model trained on faces. Although it can generate new faces that do not exist in the dataset, it cannot invent an entirely new face with novel features. You can only expect it to combine what the model already know in new ways.
Hence, there is no problem if we only want to generate normal faces. But what if we want faces with bushy eyebrows or maybe a third eye? The GAN model cannot generate this as there is no sample with bushy eyebrows or a third eye in the training data. The quick solution would be simply editing the generated face with a photo-editing tool, but it’s not feasible if we want to generate tons of images like it. Hence, GAN model would suit the problem better, but how do we make GAN generates our desired images when there is no existing dataset?
Rewriting GAN Rules
In January 2020, MIT and Adobe Research published an interesting paper titled “Rewriting Deep Generative Model” which enable us to edit GAN model directly and generate novel content. What does it mean by model rewriting? Instead of letting the model optimize itself based on the training data or label, we directly set the rules (parameters) that we want to keep to give us the desired results. Want a helmet on a horse? No problem. We can copy the features for the helmet and put it on the horse head feature. However, this requires an understanding of the internal parameters and how it affects the output, which has been quite a challenge in the past. Although, the paper has proven that it is feasible.
The difference between training and rewriting is akin to the difference between natural selection and genetic engineering. While training allows efficient optimization of a global objective, it does not allow direct specification of internal mechanisms. In contrast, rewriting allows a person to directly choose the internal rules they wish to include, even if these choices do not happen to match an existing data set or optimize a global objective.
– David Bau (Lead author of the paper)
As David Bau said, rewriting a model is like genetic engineering. It’s like inserting DNA of the glowing jellyfish into a cat to make a cat that glows in the dark.
How It Works
How do you actually rewrite a generative model? The paper proposes the idea of treating the weights of the generator as Optimal Linear Associative Memory (OLAM). The role of OLAM is to store a key-value pairs association. We will select a certain layer L, which represents the value V that denotes the output features of the image such as smile expression. Then, the previous layer before layer L will represent the key K, which denotes a meaningful context such as the mouth location. Here, the weight W, between layer L and layer L-1 acts as a linear associative memory that stores the association between K and V.
We can think K🠖V association as a rule in the model. For instance, imagine if we have a StyleGAN model trained on horses, and we want to rewrite the model to put a helmet to the horse. We will denote our desired feature helmet as V* and the context horse head where we want to paste the feature on as K*. Therefore, to get our desired feature, we want to change the original rule K🠖 V to our desired rule K*🠖V*. To do this, we update the weight in such a way to change the rule to the goal K*🠖V*.
How do we update W to get the goal K*🠖V*? We want to set the new rule K*🠖V* while minimizing the change in old k🠖v. Hence,
The expression above is a Constrained Least Square problem which can be solved with
Which then can be simplified
Hence our update would have 2 components, the magnitude Λ, and the update direction C^−1 k∗. We will denote the update direction C^−1 k∗ as d. The update direction d is only affected by the key k* and only Λ depends on value v*. To simplify, the update direction ensures that only weights that affect selected context k* will be updated to minimizes the interferences with other rules while Λ ensures that we achieve the desired v*. For more details on the math, I recommend reading the paper itself.
In conclusion, the steps to get the updated weight W1 are
The research has experimented on rewriting pre-trained StyleGAN and ProGAN model to demonstrate its ability. Some of the demonstrations are putting a helmet to a horse, dome-shaped top to tree top, changing frown to smiles, removing earrings, adding bushy eyebrow, and adding glasses. I recommend watching David Bau’s demonstration video where he shows how to rewrite a model using the interface tool made by the research.