Original article was published by Nikhil Nagaraj on Artificial Intelligence on Medium
DeepPrivacy: A Generative Adversarial Network for Face Anonymization
DeepPrivacy proposes a conditional generative adversarial network (CGAN) to anonymize faces in a given image. The model considers the original pose and background and aims to generate realistic faces that fit seamlessly into the original image.
A brief overview of Generative Adversarial Networks
Before delving into the architecture proposed by DeepPrivacy, this section takes a brief look at generative adversarial networks and their conditional counterparts.
Generative Adversarial Networks (GANs)
Based on the concept of a zero-sum game and used in generative modeling, a GAN consists of two networks that compete against each other. One of these networks is the generator, which aims to generate data as similar as possible to the real data distribution. The other is the discriminator, which aims to distinguish between the real data distribution and the distribution of the generated samples. Both these networks are trained in an adversarial manner, alternating between optimizing the generator and the discriminator. Ideally, training stops when the discriminator is no longer able to distinguish between the real samples and their generated counterparts.
Conditional Generative Adversarial Networks (CGANs)
In a CGAN, the generator/discriminator must generate/discriminate based on certain auxiliary conditions that are fed to the network. For example, a CGAN might be required to generate images of handwritten numbers in accordance with a number that is given as its conditional input. The discriminator on the other hand must additionally check if the number in the real/generated image matches the condition.
DeepPrivacy proposes a CGAN, which generates images based on the surroundings of the face and sparse pose information.
The official implementation of the model proposes the usage of the dual shot face detector (DSFD) to detect faces in the given image. A Mask RCNN is used to estimate seven keypoints to describe the pose of the face: left/right eye, left/right ear, left/right shoulder, and nose. The detected face is then obfuscated and the resulting image along with the pose information is fed to the generative network which has a U-Net architecture and employs progressive growing during the training process.
DeepPrivacy is capable of generating relatively good quality anonymized faces according to the background and pose information. But the anonymization process is non-deterministic and hence there isn’t any consistency in the anonymized faces even if the original face is the same. This is evident in the video below.
DeepPrivacy aims to generate an anonymized face without any emphasis on preserving auxiliary non-identifiable information that can be gleaned from the face. CLEANIR aims to bridge this shortcoming.