Making a Face GAN with TensorFlow

Original article was published on Deep Learning on Medium

Making a Face GAN with TensorFlow

I have always been fascinated by the miracles Machine Learning can perform. One of my most recent discoveries was the GAN’s also called Generative Adversarial Networks. The very idea that computers can make any kind of images without giving any information about that image to start from was amazing for me. You only give a 100×1 array of random normal numbers and the GAN would make a random image (from the type of images it has been trained on). So I explored the working of it and it made me more excited that it is not one but a combination of two networks competing with each other and making itself better.

image from https://www.tensorflow.org/tutorials/generative/dcgan

The first model tries to make a randomly generated image and tries to fool the other model into thinking it was a real image and the second model tries to tell the difference between the real images and the image created by the first model. The first model is called the ‘generator’ and is the main model that generates the model. The second model is the ‘discriminator’ model that tries to catch the generated image. As training goes on both the networks adjust its weight to accomplish its task. Finally, the generator model becomes good enough to fool the discriminator model and gives a reasonably low loss score.

Image from https://www.tensorflow.org/tutorials/generative/dcgan

The Architecture –

The architecture is pretty straight forward for both the networks. The Generator Network uses ‘Upsamle’ or Convolution Transpose Layer to reshape the input to the required higher dimensional data space.

The Discriminator Network is a normal CNN that uses Convolutional Layers followed by the final Dense layer to predict the image.

The Loss Function –

The most simple and one of the most commonly used loss formula used for GAN’s is the ‘minimax’ loss given by:

Ex[log(D(x))]+Ez[log(1−D(G(z)))]

In this function:

· D(x) is the discriminator’s estimate of the probability that real data instance x is real.

· Ex is the expected value over all real data instances.

· G(z) is the generator’s output when given noise z.

· D(G(z)) is the discriminator’s estimate of the probability that a fake instance is real.

· Ez is the expected value over all random inputs to the generator (in effect, the expected value over all generated fake instances G(z)).

· The formula derives from the cross-entropy between the real and generated distributions.

The generator can’t directly affect the log(D(x)) term in the function, so, for the generator, minimizing the loss is equivalent to minimizing log(1 — D(G(z))).

Code –

The full code can be found on my Github repo

Okay so now let’s get to the fun part. Implementing the GAN network for creating human faces. The faces received from the final result will be blurry and very low res because the model is trained for an 64×64 image size and for a very short amount of time. To get more real like images increase the resolution and train for a longer period.

First, get the dataset from the following URL — http://vis-www.cs.umass.edu/lfw/lfw.tgz

!wget http://vis-www.cs.umass.edu/lfw/lfw.tgz
import tarfile
my_tar = tarfile.open('lfw.tgz')my_tar.extractall('./lfw') # specify which folder to extract tomy_tar.close()

Extract the contents of the file.

import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
import PIL
from tensorflow.keras import layers
import os
%matplotlib inline

Import the necessary libraries.

images = []
for i in os.scandir('lfw/lfw'):
for j in os.scandir(i.path):
images.append(j.path)
images = tf.data.Dataset.from_tensor_slices(images)

Here we save all the file names of the images to be used and create a tf.data variable which will help in building the input pipeline for the training process to not overload the memory.

def get_ds(path):
img = tf.io.read_file(path)
img = tf.image.decode_jpeg(img,channels=3)
img = tf.image.convert_image_dtype(img,tf.float32)
img = tf.divide(tf.subtract(tf.multiply(img,255),127.5),127.5)
return tf.image.resize(img,(64,64))

This a simple function to receive the path of the image, read it, normalize it and resizing it.

BATCH_SIZE = 64train_images = images.map(get_ds,num_parallel_calls=tf.data.experimental.AUTOTUNE).batch(BATCH_SIZE).shuffle(60000)

Pretty straight forward code which maps the path of each image and converts it into the required form using the previous function. The num_parallel_calls=tf.data.experimental.AUTOTUNE parameter is used to send in data to the function in batches that can be handled by the CPU without overextending.

def make_generator_model():
model = tf.keras.models.Sequential()
model.add(layers.Dense(8*8*128,input_shape=(100,),use_bias=False)) # creates output shape of 7,7 with number of neurons - [7,7,256]
model.add(layers.BatchNormalization())
model.add(layers.LeakyReLU())

model.add(layers.Reshape((8,8,128)))
assert model.output_shape == (None, 8, 8, 128)

model.add(layers.Conv2DTranspose(64,(5,5),strides=(1,1),padding='same',use_bias=False)) # stride (1,1) keeps the same shape as that of input
assert model.output_shape == (None, 8, 8, 64)
model.add(layers.BatchNormalization())
model.add(layers.LeakyReLU())
## output of shape (7,7,128)

model.add(layers.Conv2DTranspose(64,(5,5),strides=(2,2),padding='same',use_bias=False)) # stride (2,2) doubles the size of the input
assert model.output_shape == (None, 16, 16, 64)
model.add(layers.BatchNormalization())
model.add(layers.LeakyReLU())
## output shape (14,14,64)

model.add(layers.Conv2DTranspose(64,(5,5),strides=(2,2),padding='same',use_bias=False)) # stride (2,2) doubles the size of the input
assert model.output_shape == (None, 32, 32, 64)
model.add(layers.BatchNormalization())
model.add(layers.LeakyReLU())

model.add(layers.Conv2DTranspose(3,(5,5),strides=(2,2),padding='same',activation='tanh',use_bias=False))
assert model.output_shape == (None, 64, 64, 3)
## output shape (28,28,1) the required shape

return model

This function is the generator model. The first layer is the Dense model which will accept a 100-row random normal number array. The Reshape layer converts the output of the Dense Layer into a 3-D vector to pass into the Convolutional Transpose Layer. These layers upscale the size of the input and finally outputs the result in the required shape.

generator = make_generator_model()noise = tf.random.normal([1,100])generated_image = generator(noise,training=False)plt.imshow(generated_image[0]*127.5+127.5)
Image output of the generator model before training

The image produced by the generator without training. It produces a normalized vector which needs to be scaled to generate the RGB colors.

def make_discriminator_model():
model = tf.keras.models.Sequential()
model.add(layers.Conv2D(64,(5,5),strides=(2,2),padding='same',input_shape=[64,64,3]))
model.add(layers.LeakyReLU())
model.add(layers.Dropout(0.3))

model.add(layers.Conv2D(128,(5,5),strides=(2,2),padding='same'))
model.add(layers.LeakyReLU())
model.add(layers.Dropout(0.3))

model.add(layers.Flatten())
model.add(layers.Dense(1))

return model

This function is the Discriminator Model which predicts whether the image is real or fake.

discriminator = make_discriminator_model()decision = discriminator(generated_image)print(decision)
Output of the discriminator model before training

The output of the discriminator model will be more negative if the image is fake,i.e. an image created by the generator model and positive if a real image.

cross_entropy = tf.keras.losses.BinaryCrossentropy(from_logits=True)
def discriminator_loss(real_output,fake_output):
real_loss = cross_entropy(tf.ones_like(real_output),real_output)
fake_loss = cross_entropy(tf.zeros_like(fake_output),fake_output)
total_loss = real_loss + fake_loss
return total_loss
def generator_loss(fake_output):
return cross_entropy(tf.ones_like(fake_output),fake_output)

The loss function for both the models. It uses cross entropy loss to create the ‘minimax’ loss function.

generator_optimizer = tf.keras.optimizers.Adam(1e-4)
discriminator_optimizer = tf.keras.optimizers.Adam(1e-4)

For omtimizers both uses the Adam optimizer.

EPOCHS = 200
noise_dims = 100
num_egs_to_generate = 16
seed = tf.random.normal([num_egs_to_generate,noise_dims])
@tf.function
def train_step(images):
noise = tf.random.normal([BATCH_SIZE,noise_dims])

with tf.GradientTape() as gen_tape, tf.GradientTape() as dis_tape:
generated_images = generator(noise,training=True)

real_output = discriminator(images,training=True)
fake_output = discriminator(generated_images,training=True)

gen_loss = generator_loss(fake_output)
disc_loss = discriminator_loss(real_output,fake_output)

gen_gradients = gen_tape.gradient(gen_loss,generator.trainable_variables)
dis_gradients = dis_tape.gradient(disc_loss,discriminator.trainable_variables)

generator_optimizer.apply_gradients(zip(gen_gradients,generator.trainable_variables))
discriminator_optimizer.apply_gradients(zip(dis_gradients,discriminator.trainable_variables))

The main training function. It creates a random 100-row array which will be the input of the generator model.

def generate_and_save_output(model,epoch,test_input):

predictions = model(test_input,training=False)
# predictions = predictions.numpy().reshape(16,64,64,1)
fig = plt.figure(figsize=(4,4))
# print(predictions)
for i in range(predictions.shape[0]):
plt.subplot(4,4,i+1)
plt.imshow((predictions[i]*127.5+127.5).numpy().astype(np.uint8),cmap='gray')
plt.axis('off')
plt.savefig(f'image_at_epoch_{epoch}.png')
plt.show()

A small function to generate the ouput of the generator model when called.

from IPython import display
import time
def train(dataset,epochs):
for epoch in range(epochs):
start = time.time()
for batch in dataset:
train_step(batch)
display.clear_output(wait=True)
generate_and_save_output(generator,epoch+1,seed)

if (epoch+1)%15 == 0:
checkpoint.save(file_prefix=checkpoint_prefix)

print(f'Time for epoch {epoch + 1} is {time.time()-start}')

display.clear_output(wait=True)
generate_and_save_output(generator,epochs,seed)
train(train_images,EPOCHS)

Main loop which calls the training function and calls the generate_images function after each epoch.

import imageio
import glob
anim_file = 'dcgan.gif'

with imageio.get_writer(anim_file, mode='I') as writer:
filenames = glob.glob('image*.png')
filenames = sorted(filenames)
last = -1
for i,filename in enumerate(filenames):
frame = 2*(i**0.5)
if round(frame) > round(last):
last = frame
else:
continue
image = imageio.imread(filename)
writer.append_data(image)
image = imageio.imread(filename)
writer.append_data(image)

import IPython
if IPython.version_info > (6,2,0,''):
display.Image(filename=anim_file)

A small block to make a gif of the images saved after completion of the training.

new_image = generator(tf.random.normal([1,100]),training=False)plt.imshow(new_image[0,:,:,:])
Image from the output of generator model after training

And voila the GAN for generating faces is finished. I hope you liked it. As said earlier all the code is in my Github repo. Feel free to comment and ask any queries.