Face Anonymization: A survey of what works and what doesn’t

Original article was published by Nikhil Nagaraj on Artificial Intelligence on Medium


DeepPrivacy: A Generative Adversarial Network for Face Anonymization

DeepPrivacy proposes a conditional generative adversarial network (CGAN) to anonymize faces in a given image. The model considers the original pose and background and aims to generate realistic faces that fit seamlessly into the original image.

A brief overview of Generative Adversarial Networks

Before delving into the architecture proposed by DeepPrivacy, this section takes a brief look at generative adversarial networks and their conditional counterparts.

Generative Adversarial Networks (GANs)

Based on the concept of a zero-sum game and used in generative modeling, a GAN consists of two networks that compete against each other. One of these networks is the generator, which aims to generate data as similar as possible to the real data distribution. The other is the discriminator, which aims to distinguish between the real data distribution and the distribution of the generated samples. Both these networks are trained in an adversarial manner, alternating between optimizing the generator and the discriminator. Ideally, training stops when the discriminator is no longer able to distinguish between the real samples and their generated counterparts.

Conditional Generative Adversarial Networks (CGANs)

In a CGAN, the generator/discriminator must generate/discriminate based on certain auxiliary conditions that are fed to the network. For example, a CGAN might be required to generate images of handwritten numbers in accordance with a number that is given as its conditional input. The discriminator on the other hand must additionally check if the number in the real/generated image matches the condition.

An overview of a CGAN where z represents the latent vector and y represents the condition. The generator and discriminator generate/discriminate samples per the condition ‘y’. [Source: Mirza et al.]

Proposed Model

DeepPrivacy proposes a CGAN, which generates images based on the surroundings of the face and sparse pose information.

The overall model of DeepPrivacy. The face detection module detects faces while a pose estimation module generates sparse pose information. The pose information and the image (with the face obfuscated) are fed to a generative network. (Source: DeepPrivacy)

The official implementation of the model proposes the usage of the dual shot face detector (DSFD) to detect faces in the given image. A Mask RCNN is used to estimate seven keypoints to describe the pose of the face: left/right eye, left/right ear, left/right shoulder, and nose. The detected face is then obfuscated and the resulting image along with the pose information is fed to the generative network which has a U-Net architecture and employs progressive growing during the training process.

Results

Results obtained using DeepPrivacy. The left image in each pair is the original image while the image on the right is its anonymized counterpart. In the original image(s), the red bounding box indicates the face detected by the DSFD. The red points are the facial keypoints as detected by the Mask RCNN.

DeepPrivacy is capable of generating relatively good quality anonymized faces according to the background and pose information. But the anonymization process is non-deterministic and hence there isn’t any consistency in the anonymized faces even if the original face is the same. This is evident in the video below.

A video anonymized using DeepPrivacy. Note the lack of temporal consistency in the anonymized face.

DeepPrivacy aims to generate an anonymized face without any emphasis on preserving auxiliary non-identifiable information that can be gleaned from the face. CLEANIR aims to bridge this shortcoming.