Step by Step Implementation of Conditional Generative Adversarial Networks

Original article was published on Deep Learning on Medium

A brief introduction to GANs

Let’s start our journey by briefly describing GANs. GANs are primarily used to generate content i.e GANs are generative models. GAN architecture consists of two components: a Generator and a Discriminator.

The Generator is trained to generate samples similar to the training set. It takes a random noise as input, passes this input through its network, and generates an output of dimensions same as that of samples in the training set.

The objective of the Discriminator is to distinguish the generated samples (generated by the Generator model) from the real ones.

As training progresses, both the generator and discriminator become adept at their respective tasks i.e Generator learns to generate output that is close to the real sample, and Discriminator learns to discriminate it from the real ones. Both the Discriminator and Generator try to outdo each other. Furthermore, improvement in the ability of the Discriminator propels the Generator to generate samples that are similar to the training set in order to confuse the Discriminator.

Photo: Simplified GAN architecture.

Resources for a detailed review of GANs:
Understanding Generative Adversarial Networks (GANs)
A Brief Introduction To GANs

Now, let’s look at a few interesting applications of GANs. An important thing to note is that “All this is generated by a neural network”.

Generation of human face images

Synthetic images produced by StyleGAN, a GAN created by Nvidia researchers.

Painting Generation

photo credit: James D. McCaffrey.

Generation of cartoon pictures

Photo: Anime Picture Generation.

Generating colored photographs from sketches

Photo: Taken from Image-to-Image Translation with Conditional Adversarial Networks, 2016.

Photos to emoticons

Photo: Taken from Unsupervised Cross-Domain Image Generation, 2016.

Generating a child’s face using parents pictures

Source: https://medium.com/swlh/familygan-generating-a-childs-face-using-his-parents-394d8face6a4

For more interesting applications you can read the following articles:
1. https://medium.com/@jonathan_hui/gan-some-cool-applications-of-gans-4c9ecca35900
2. https://machinelearningmastery.com/impressive-applications-of-generative-adversarial-networks/

Implementation of GAN in PyTorch

Let’s jump to the implementation part. I will use the MNIST dataset for this example. MNIST is a dataset consisting of 28 X 28 size images of handwritten digits. Our GAN model will be trained using this dataset and will eventually be able to generate similar digit images.

Here is the link to my GitHub repo for the code of this tutorial.

A typical machine learning setup consists of the following steps:

1. Define the Model

2. Define the Loss function

3. Define the optimizer

4. Train the model
— Forward pass
— Compute Loss
— Call the optimizer and update the weights

I would recommend using Google Colab with GPU runtime for faster execution.

Firstly, we will import some modules

import torch
from torchvision import transforms, datasets

The second import statement will be used for loading the MNIST dataset and the transformations that will be applied to the dataset.

import torch.nn as nn
from
torch import optim as optim

The torch.nn module would be used to create our model and optim module for defining the optimizer. An optimizer is used to update the parameters of the model.

You can follow along even if you don’t understand any of the above jargon. I’ll briefly talk about these terms as we use them in the code.

Let’s select the device for computation. It’s important to use a GPU for faster computations.

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

If you have changed the runtime type to GPU then the device variable would be set to “cuda”. You can verify that by printing this variable.

Loading the dataset — PyTorch provides a simple way of loading popular datasets like MNIST.

training_parameters = {
"n_epochs": 100,
"batch_size": 100,
}
data_loader = torch.utils.data.DataLoader(

datasets.MNIST('./', train=True, download=True,
transform=transforms.Compose([
transforms.ToTensor(),
transforms.Normalize(
(0.5,), (0.5,))
])),
batch_size=training_parameters["batch_size"], shuffle=True)

You can provide the transformations that you want to apply on the dataset — We will go for two transformations which are very common while playing with datasets: Converting to Tensors and Normalization.

Batch_size also needs to be passed as a parameter to the dataloader — We will use a batch_size of 100. The batch size depends on your GPU capacity. Colab can handle a batch size of 100. If you encounter any issues related to GPU memory then reduce the batch_size as per your GPU capacity.

Let’s look at a few images of this dataset. I have used matplotlib library to display the images. The below code snippet will display the first image of the first batch of our dataset. Every image has a shape of 28 X 28.

%matplotlib inline 
from matplotlib import pyplot as plt

for x,_ in data_loader:
plt.imshow(x.numpy()[0][0], cmap='gray')
break
MNIST image

Generator Model

Since the data in our training set has images of dimension 28 X 28 i.e 784 values, the objective of our Generator model is to output a vector of 784. We will then convert this vector to a 2d matrix of 28 X 28.

Let’s follow the steps below to create our Generator model.

Step 1: Create a class that inherits from torch.nn.Module class

class GeneratorModel(nn.Module):

Step 2: Define two methods in this class — __init__() and forward()

__init__() method is used to declare all the components that will be used by the model. We will use three hidden layers for our model. You can play with this number by adding more layers or eliminating a few layers from this network. A hidden layer consists of a linear layer followed by an activation function.

A Linear layer is defined using two values — input dimension and output dimension. In order to exemplify this, let’s consider the input of dimension d with m such inputs in a batch. So the size of our effective input is m X d. Now, this is passed through a linear layer of dimension (d,k) (which is a matrix of dimension d X k). The output would be of dimension (m X k).

We will use Leaky ReLU as our activation function which is a variant of ReLU activation function. Let’s review ReLU and Leaky ReLU for completeness.

ReLU and Leaky ReLU activation functions

The left graph is for ReLU activation function and the right one is for Leaky ReLU. In ReLU negative values are suppressed to 0 while in Leaky ReLU negative values are multiplied by a small constant ‘a’ to reduce the magnitude of the value. We will use tanh activation function for the last layer (the output layer). These choices are standard choices in a machine learning setup.

This is how a hidden layer looks like —

nn.Sequential(nn.Linear(input_dim, 256),nn.LeakyReLU(0.2))

As previously mentioned, we will define three such hidden layers and an output layer with tanh activation function.

Let’s define the second method of our model class: the forward() method. This method takes the input (random noise in our case) and passes this input through the defined model sequentially and returns the output.

def forward(self, x):output = self.hidden_layer1(x)output = self.hidden_layer2(output)output = self.hidden_layer3(output)output = self.hidden_layer4(output)return output.to(device)

We will use a random noise of dimension 100 in this example. Below is the full code for our Generator model class:

class GeneratorModel(nn.Module):def __init__(self):super(GeneratorModel, self).__init__()input_dim = 100output_dim = 784
self.hidden_layer1 = nn.Sequential(nn.Linear(input_dim, 256),nn.LeakyReLU(0.2))self.hidden_layer2 = nn.Sequential(nn.Linear(256, 512),nn.LeakyReLU(0.2))self.hidden_layer3 = nn.Sequential(nn.Linear(512, 1024),nn.LeakyReLU(0.2))self.output_layer = nn.Sequential(nn.Linear(1024, output_dim),nn.Tanh())def forward(self, x):output = self.hidden_layer1(x)output = self.hidden_layer2(output)output = self.hidden_layer3(output)output = self.output_layer(output)return output.to(device)

Now, let’s define our Discriminator Model —

In terms of architecture, the discriminator model is very similar to the Generator network except for the output layer and the use of dropout. The Generator network is expected to generate an image (hence the output dim is 784), the discriminator network needs to discriminate between the fake generated image and the actual image. So, the output dimension is 1 which is the probability of the input being real.

We use sigmoid activation function instead of tanh here in the last layer. Explaining the concept of dropout is out of scope of this tutorial.

class DiscriminatorModel(nn.Module):def __init__(self):super(DiscriminatorModel, self).__init__()input_dim = 784output_dim = 1self.hidden_layer1 = nn.Sequential(nn.Linear(input_dim, 1024),nn.LeakyReLU(0.2),nn.Dropout(0.3))self.hidden_layer2 = nn.Sequential(nn.Linear(1024, 512),nn.LeakyReLU(0.2),nn.Dropout(0.3))self.hidden_layer3 = nn.Sequential(nn.Linear(512, 256),nn.LeakyReLU(0.2),nn.Dropout(0.3))self.output_layer = nn.Sequential(nn.Linear(256, output_dim),nn.Sigmoid())def forward(self, x):output = self.hidden_layer1(x)output = self.hidden_layer2(output)output = self.hidden_layer3(output)output = self.output_layer(output)return output.to(device)

Now, we can initialize these models and move them to our device. Note that it is required to move these variables to the GPU (if available) so that all the computations can be performed on GPU.

discriminator = DiscriminatorModel()generator = GeneratorModel()discriminator.to(device)generator.to(device)

This concludes the modeling part. Following the steps mentioned before, it’s time to define the loss function and the optimizer function. Since we have two classes (real and fake), we will use binary cross-entropy loss. Furthermore, we will use Adam optimizer for both of our models i.e Generator and Discriminator.

loss = nn.BCELoss()discriminator_optimizer = optim.Adam(discriminator.parameters(), lr=0.0002)generator_optimizer = optim.Adam(generator.parameters(), lr=0.0002)

Now the climax — The Training Loop

In a single training step, we need to update parameters of the Generator model as well as the Discriminator model.

Here is the outline of a single training step:

  1. Update Discriminator Model
    — Clear the optimizer gradients by calling optimizer.zero_grad()
    Forward pass to the discriminator model with the true data as input and obtain the output.
    Compute loss using the discriminator’s output and the true_labels ( ture_label is 1 for real data).
    — Forward pass to the discriminator model but this time with the generated data as input and obtain the output.
    — Compute loss using the discriminator’s output and the fake_labels ( fake_label is 0 for generated data).
    — Average both the losses
    — Call optimizer.step() to backpropagate and update the weights of the discriminator model
  2. Update Generator Model
    — Clear the optimizer gradients by calling optimizer.zero_grad()
    — Forward pass to the discriminator model with the generated data as input and obtain the output.
    — Recall that the objective of the generator model is to fool the discriminator into labeling the generated data as real data. So, we compute the loss using the discriminator model’s output and the true_labels ( this time true_label is 1 for the fake data)
    — Call optimizer.step() to backpropagate and update the weights of the generator model.

Let’s see it in action. The comments will further explain the code.

batch_size = training_parameters["batch_size"]

for epoch_idx in range(n_epochs):
G_loss = []
D_loss = []
for batch_idx, data_input in enumerate(data_loader):

# Generate noise and move it the device
noise = torch.randn(batch_size,100).to(device) # Forward pass generated_data = generator(noise) # batch_size X 784

true_data = data_input[0].view(batch_size, 784).to(device) # batch_size X 784
digit_labels = data_input[1] # batch_size
true_labels = torch.ones(batch_size).to(device)

# Clear optimizer gradients
discriminator_optimizer.zero_grad() # Forward pass with true data as input discriminator_output_for_true_data = discriminator(true_data).view(batch_size) # Compute Loss true_discriminator_loss = loss(discriminator_output_for_true_data, true_labels) # Forward pass with generated data as input discriminator_output_for_generated_data = discriminator(generated_data.detach()).view(batch_size) # Compute Loss generator_discriminator_loss = loss(
discriminator_output_for_generated_data, torch.zeros(batch_size).to(device)
)
# Average the loss discriminator_loss = (
true_discriminator_loss + generator_discriminator_loss
) / 2
# Backpropagate the losses for Discriminator model discriminator_loss.backward()
discriminator_optimizer.step()

D_loss.append(discriminator_loss.data.item())


# Clear optimizer gradients

generator_optimizer.zero_grad()

# It's a choice to generate the data again
generated_data = generator(noise) # batch_size X 784
# Forward pass with the generated data discriminator_output_on_generated_data = discriminator(generated_data).view(batch_size) # Compute loss generator_loss = loss(discriminator_output_on_generated_data, true_labels) # Backpropagate losses for Generator model. generator_loss.backward()
generator_optimizer.step()

G_loss.append(generator_loss.data.item())
# Evaluate the model if ((batch_idx + 1)% 500 == 0 and (epoch_idx + 1)%10 == 0):
print("Training Steps Completed: ", batch_idx)

with torch.no_grad():
noise = torch.randn(batch_size,100).to(device)
generated_data = generator(noise).cpu().view(batch_size, 28, 28)
for x in generated_data:
plt.imshow(x.detach().numpy(), interpolation='nearest',cmap='gray')
plt.show()

break


print('[%d/%d]: loss_d: %.3f, loss_g: %.3f' % (
(epoch_idx), n_epochs, torch.mean(torch.FloatTensor(D_loss)), torch.mean(torch.FloatTensor(G_loss))))

From GANs to Conditional GANs

The simple GAN we implemented above suffers from a serious problem. It is generating images unconditionally i.e we have no control over the output our model is generating. To overcome this limitation, conditional GANs were invented. The architecture of C-GANs is same as normal GANs but this time the model takes in some metadata as input along with the random noise and conditions the output on that.

We will pass the digit value as metadata and constrain the above GAN model to generate an image of the input digit value.

A few modifications need to be done to achieve the above objective:

  1. The generator model will take random noise of dimension 100 and the digit value as input. We will use an embedding layer of size (10,10) which will have a 10-dimensional encoding for all the 10 digits.
  2. We will concatenate the 10-dimensional embedding and the noise to get a 110-dimensional (instead of 100 as in normal Generator model) input that will be fed to the first hidden layer. The rest of the network will perform the same way.
  3. Both the above changes will be required for the Discriminator model also.
  4. In the training loop:
    — Pass the labels along with random noise to the Generator
    — Pass the labels along with the data to the Discriminator.

Let’s see how these modifications can be incorporated in the code

Generator and Discriminator Models:

Note: Only the changes have been highlighted.

class GeneratorModel(nn.Module):
def __init__(self):
super(GeneratorModel, self).__init__()
input_dim = 100 + 10
output_dim = 784
self.label_embedding = nn.Embedding(10, 10)

self.hidden_layer1 = nn.Sequential(
nn.Linear(input_dim, 256),
nn.LeakyReLU(0.2)
)

self.hidden_layer2 = nn.Sequential(
nn.Linear(256, 512),
nn.LeakyReLU(0.2)
)

self.hidden_layer3 = nn.Sequential(
nn.Linear(512, 1024),
nn.LeakyReLU(0.2)
)

self.hidden_layer4 = nn.Sequential(
nn.Linear(1024, output_dim),
nn.Tanh()
)

def forward(self, x, labels):
c = self.label_embedding(labels)
x = torch.cat([x,c], 1)

output = self.hidden_layer1(x)
output = self.hidden_layer2(output)
output = self.hidden_layer3(output)
output = self.hidden_layer4(output)
return output.to(device)

class DiscriminatorModel(nn.Module):
def __init__(self):
super(DiscriminatorModel, self).__init__()
input_dim = 784 + 10
output_dim = 1
self.label_embedding = nn.Embedding(10, 10)

self.hidden_layer1 = nn.Sequential(
nn.Linear(input_dim, 1024),
nn.LeakyReLU(0.2),
nn.Dropout(0.3)
)

self.hidden_layer2 = nn.Sequential(
nn.Linear(1024, 512),
nn.LeakyReLU(0.2),
nn.Dropout(0.3)
)

self.hidden_layer3 = nn.Sequential(
nn.Linear(512, 256),
nn.LeakyReLU(0.2),
nn.Dropout(0.3)
)

self.hidden_layer4 = nn.Sequential(
nn.Linear(256, output_dim),
nn.Sigmoid()
)

def forward(self, x, labels):
c = self.label_embedding(labels)
x = torch.cat([x, c], 1)

output = self.hidden_layer1(x)
output = self.hidden_layer2(output)
output = self.hidden_layer3(output)
output = self.hidden_layer4(output)

return output.to(device)

discriminator = DiscriminatorModel()
generator = GeneratorModel()
discriminator.to(device)
generator.to(device)

Training Loop —

n_epochs = training_parameters["n_epochs"]
batch_size = training_parameters["batch_size"]

for epoch_idx in range(n_epochs):
G_loss = []
D_loss = []
for batch_idx, data_input in enumerate(data_loader):


noise = torch.randn(batch_size,100).to(device)
fake_labels = torch.randint(0, 10, (batch_size,)).to(device)
generated_data = generator(noise, fake_labels) # batch_size X 784


# Discriminator
true_data = data_input[0].view(batch_size, 784).to(device) # batch_size X 784
digit_labels = data_input[1].to(device) # batch_size
true_labels = torch.ones(batch_size).to(device)

discriminator_optimizer.zero_grad()

discriminator_output_for_true_data = discriminator(true_data, digit_labels).view(batch_size)
true_discriminator_loss = loss(discriminator_output_for_true_data, true_labels)

discriminator_output_for_generated_data = discriminator(generated_data.detach(), fake_labels).view(batch_size)
generator_discriminator_loss = loss(
discriminator_output_for_generated_data, torch.zeros(batch_size).to(device)
)
discriminator_loss = (
true_discriminator_loss + generator_discriminator_loss
) / 2

discriminator_loss.backward()
discriminator_optimizer.step()

D_loss.append(discriminator_loss.data.item())


# Generator

generator_optimizer.zero_grad()
# It's a choice to generate the data again
generated_data = generator(noise, fake_labels) # batch_size X 784
discriminator_output_on_generated_data = discriminator(generated_data, fake_labels).view(batch_size)
generator_loss = loss(discriminator_output_on_generated_data, true_labels)
generator_loss.backward()
generator_optimizer.step()

G_loss.append(generator_loss.data.item())
if ((batch_idx + 1)% 500 == 0 and (epoch_idx + 1)%10 == 0):
print("Training Steps Completed: ", batch_idx)

with torch.no_grad():
noise = torch.randn(batch_size,100).to(device)
fake_labels = torch.randint(0, 10, (batch_size,)).to(device)
generated_data = generator(noise, fake_labels).cpu().view(batch_size, 28, 28)
for x in generated_data:
print(fake_labels[0].item())
plt.imshow(x.detach().numpy(), interpolation='nearest',cmap='gray')
plt.show()

break


print('[%d/%d]: loss_d: %.3f, loss_g: %.3f' % (
(epoch_idx), n_epochs, torch.mean(torch.FloatTensor(D_loss)), torch.mean(torch.FloatTensor(G_loss))))

Generated image when the input is 5

Generated image via C-GAN on input 5.

Follow up exercises

  1. After understanding the material covered in this article, one should try GAN, C-GAN architecture on the Fashion MNIST dataset.
  2. Try generating even numbers (in binary). Refer to the link for more details.

References:

  1. Understanding Generative Adversarial Networks (GANs)
  2. A Brief Introduction To GANs