Face Unlock with 2D Data

Original article was published by Francesco Zuppichini on Deep Learning on Medium


Face Unlock with 2D Data

A deep learning approach

All the code can be found here. An interactive version of this article can be downloaded from here

Today we are going to use deep learning to create a face unlock algorithm. To complete our puzzle, we need three main pieces.

  • a find faces algorithm
  • a way to embed the faces in vector space
  • a function to compare the encoded faces

Find Faces

First of all, we need a way to find a face inside an image. We can use an end-end approach called MTCNN (Multi-task Cascaded Convolutional Networks).

Just a little bit of technical background, it is called Cascaded because it is composed of multiple stages, each stage has its neural network. The following image shows the framework.

Image from https://arxiv.org/abs/1604.02878

We rely on the MTCNN implementation from facenet-pytorch repo.

Data

We need images! I have put together a couple of images of myself, Leonardo di Caprio and Matt Demon.

Following PyTorch best practices, I load up the dataset using ImageFolder. I created the MTCNN instance and pass it to the dataset using the transform parameter.

My folder structure is the following:

The MTCNN will automatically crop and resize the input, I used an image_size=160 because the model will are going to use was trained with images with that size. I also add 18 pixels of margin, just to be sure we include the whole face.

(tensor([[[ 0.9023, 0.9180, 0.9180, ..., 0.8398, 0.8242, 0.8242], [ 0.9023, 0.9414, 0.9492, ..., 0.8555, 0.8320, 0.8164], [ 0.9336, 0.9805, 0.9727, ..., 0.8555, 0.8320, 0.7930], ..., [-0.7070, -0.7383, -0.7305, ..., 0.4102, 0.3320, 0.3711], [-0.7539, -0.7383, -0.7305, ..., 0.3789, 0.3633, 0.4102], [-0.7383, -0.7070, -0.7227, ..., 0.3242, 0.3945, 0.4023]], [[ 0.9492, 0.9492, 0.9492, ..., 0.9336, 0.9258, 0.9258], [ 0.9336, 0.9492, 0.9492, ..., 0.9492, 0.9336, 0.9258], [ 0.9414, 0.9648, 0.9414, ..., 0.9570, 0.9414, 0.9258], ..., [-0.3633, -0.3867, -0.3867, ..., 0.6133, 0.5352, 0.5820], [-0.3945, -0.3867, -0.3945, ..., 0.5820, 0.5742, 0.6211], [-0.3711, -0.3633, -0.4023, ..., 0.5273, 0.6055, 0.6211]], [[ 0.8867, 0.8867, 0.8945, ..., 0.8555, 0.8477, 0.8477], [ 0.8789, 0.8867, 0.8789, ..., 0.8789, 0.8633, 0.8477], [ 0.8867, 0.9023, 0.8633, ..., 0.9023, 0.8789, 0.8555], ..., [-0.0352, -0.0586, -0.0977, ..., 0.7617, 0.7070, 0.7461], [-0.0586, -0.0586, -0.0977, ..., 0.7617, 0.7617, 0.8086], [-0.0352, -0.0352, -0.1211, ..., 0.7227, 0.8086, 0.8086]]]), 0)

Perfect, the dataset returns a tensor. Let’s visualize all the inputs. They have been normalized by the MTCNN image-wise. The last three images of the last row are selfies of myself 🙂

The faces in our dataset, the last three are pictures of myself! 🙂 Image by Author

Embed

Our data pipeline is ready. To compare faces and find out if two are similar, we need a way to encode them in a vector space where, if two faces are similar, the two vectors associated with them are also similar.

We can use one model trained on one of the famous face datasets, such as vgg_face2, and use the output of the last layer (latent space) before the classification head as the encoder.

A model trained on a faces dataset must have learned important features about the inputs. The last layer (just before the fully connected layers) encodes the high-level features of these images. Thus, we can use it to embed our inputs in a vector space where, hopefully, similar images are close to each other.

In detail, we are going to use a inception resnet trained on the vggface2 dataset. The embeddings space has 512 dimensions.

torch.Size([8, 512])

Perfect, we had 8 images and we obtained 8 vectors

Similarity

To compare our vectors, we can use cosine_similarity to see how much they are close to each other. Cosine similarity will output a value between [-1, 1]. In the naive case, where the two compared vectors are the same, their similarity is 1. So the closest to 1, the similar.

We can now find all the distances between each pair in our dataset.

HeatMap showing all the faces in the dataset and their cosine similarities. Image by Author

Apparently, I am not very similar to Matt or Leo, but they have something in common!

We can go further and run PCA on the embeddings and project the images in a 2-D plane

Images projected on a 2D plane using PCA . Image by Author

Take this image with a grain of salt. We are compressing 512 dimensions in 2, so we are losing lots of data.

Okay, we have a way to find faces, and to see if they are similar to each other, now we can create our face unlock algorithm.

My idea is to take n images of the allowed person, find the center in the embedding space, select a threshold, and see if the cosine similarity between the center and a new image is less or bigger than it.

Let’s test it!

Lock. Image from https://www.timesofisrael.com/in-court-nso-group-accuses-facebook-of-lying-disregarding-international-law/
Lock. Image from https://en.wikipedia.org/wiki/Elon_Musk#/media/File:Elon_Musk_Royal_Society.jpg

People say I am similar to Rio from Casa de Papel

Lock. Image from https://falauniversidades.com.br/ator-la-casa-de-papel-elite-afastado-por-depressao-miguel-herran/

The similarity score was higher than the previous image, so I guess it is true!

Let’s try with a new selfie of myself

Unlock. Image by Author

It worked, I am in.

Conclusions

We have seen an attractive way to create a face unlock algorithm using only 2D data (images). It relies on a neural network to encode the cropped faces in a high dimensional vector space where similar faces are close to each other. However, I don’t know how the model was trained and it may be easy to fool it (even if in my experiments the algorithm works well).

What if the model was trained without data-augmentation? Then, probably, simply flipping of the same person can disrupt the latent representation.

A more robust training routine would be an unsupervised one (something like BYOL) that relies heavily on data-augmentation.

Thank you for reading.

Francesco

undefined