Do GANs Dream of Fake Images?

Source: Deep Learning on Medium


A deep-dive into image forensics: the effort of telling apart real images from fake ones

Go to the profile of Gidi Shperber
Barack Obama is one of the most popular characters for puppeteering

It is common knowledge nowadays, that it’s hard to tell real media from fake. May it be text, audio, video or images.

Each type of media has it’s own forgery methods. And while faking texts is (still) mostly done in the old fashioned way, faking images and videos have taken a great leap forward. Some of us even feel that we can’t tell what’s real and what is fake anymore. If two years ago, the photoshop battle sub-reddit was the state of the art for faking images, and photoshop experts were the wizards of this field, new techniques have changed things quite a bit.

You’ve probably heard of some cutting edge forgery methods that seriously threaten our perception of what real and what is fake: deep-fake technique allows planting every face in every video, and different re-enactment techniques allow allowing to move every face as you want: make expressions, talk, etc. And this is only the beginning.

Deep fake — planting Hillary Clinton on her impersonator
Re-enactment — making a face talk

Recently, I became very involved in this field: I’ve started working with a great startup named Cyabra. Cyabra is an anti-fake news and anti bots startup, and as such, one of its tasks is classifying fake images from real ones, which professional term is image forensics.

As you can expect, this is a challenging task. As many cyber-security/fraud-detection tasks, the protectors always seem to be one step behind the forgers. In image forging, the situation is even more complicated, since new forging techniques are emerging on a daily basis.

So what it the right way to address hits task? In every security task, the protector has to consider all possible known threats, additionally with unknown threats that the attacker can surprise with.

Let’s consider securing a house from burglars: you know a burglar can break through the door, window, backdoor, etc. So you put locks on all these entrances. But you should also put a motion detector for the case the burglar entered through an unknown breach, you’ll still be able to spot him.

In cybersecurity, and specifically in digital image forensics, this means you have to address all known forgery methods, with the highest accuracy possible, along with some generalizable-anomaly detection method. And since as mentioned the image forgery techniques are witnessing a kind of boom, the latter methods are becoming even more important.

What image tampering techniques are there?

First, let’s discuss what are we dealing with.

Photography forging is almost as old as photography itself. In the pictures below, you can see a tampered photography from the 19th century:

More examples can be found in the digital forensics bible — book by Hany Farid.

“Manual” forgery — Photoshopping

With the emergence of digital photography, image tampering became easier and more common: the most common one is known as “splicing” and “copy-move”, or as most people call it — “photoshopping”.

These methods include taking a part of one image and plant it into another one. To make it look more realistic, forgers also do some digital retouching. The results of these techniques may be very hard to tell apart with the naked eye.

Retouching itself is also a tampering method, especially in fashion photos.

As said, photoshopping is not always easy to tell, but using some methods that will be discussed later, researchers were on top of things. However, technology made things much harder (or easier, depends on which side are you). In an ironic twist of fate, deep learning, which became a leading method for classification images, also allowed a new, groundbreaking forgery — the generative model.

Generative model-based methods

Since the emergence of General adversarial networks in 2014, it was clear that image forgery will never be the same. The ability of an algorithm to create an image (e.g a face) literally from scratch was both amazing and frightening.

GANs results throughout the years (none of the above is a real person)

In his groundbreaking work from 2014, Ian Goodfellow conceptually showed that it is possible to create realistic faces on a small scale (28 X 28). The concept quickly became a reality, when at the end of 2018 Nvidia researchers came up with StyleGAN which can create hyper-realistic faces in high resolution. But this is far from being the only take on the GAN: impressively, researchers do different tweaks and create new kinds of fake images. We can only imagine what the future holds for this technique, but meanwhile, let’s look at a not-at-all-exhaustive list of other variants:

CycleGan

In 2017, 2 amazing works came out of Alexei Efros laboratory — pix2pix and CycleGan. You can read about them in my previous post.

Both works allowed “copying” images from one domain to another: converting horses to zebras, dogs to cats and more.

Fake video — reenactment

Nicholas Cage as the Marlon Brando as The Godfather

Apart from creating fake images, deep learning techniques went further and allowed creating fake videos.

The real tipping point of the field was (as in many other occasions) not technological, but cultural. Around January 2018, high-quality fake videos started appearing on Reddit. Many of them were planting Nicholas Cage’s face on different scenes, but not all were SFW. This has sprung immediate media attention (along with apocalyptic prophecies) to this field.

But Deep fake is far from being alone: recently there is a surge in such techniques, which become more and more advanced in their realism and ease of training. Starting from the toy “face swap” app, through more serious stuff such as face2face , deep-fake, Synthesizing Obama, to the recent “few shot talking heads” by Samsung.

The currently most impressive work in this field (currently, things are moving fast) is Deep video portrait — In this work, researchers use a multi-step approach, to “puppeteer” a full face. They first extract face features such as pose, expression, eye coordinates and more, from source and target videos, then use an encoder-decoder to generate a fake video, allowing control not only in mouth movements and facial expressions, but head movements as well.

Deep video portrait

Digital forensics

Since this post is not about image generation, but about how we detect them, after seeing some forging techniques, we would like to examine what the defensive team has to offer. As briefly discussed earlier, digital forensic methods can be divided into 3 kinds. A good detection operation should use a combination of all:

  1. Feature-based — where there is a kind of artifact in a certain (or multiple) types of forgeries — mostly applicable for classic methods.
  2. Supervised learning — using deep learning classifiers (mostly CNN) to learn certain types of fake images — applicable for classical methods as well as generative models, e.g GANs.
  3. Unsupervised/Universal — an attempt to capture some essence of a genuine image, to detect new kinds of forgeries (that the model hasn’t seen before). It can be seen as a kind of anomaly detection.

All the above methods have their advantages and disadvantages, but since generative methods became more realistic, 2 and 3 became more prominent.

The tampering techniques we’ve witnessed in the previous part are all (or most) publicly available with courtesy of the deep learning community. However, it doesn’t have to stay like this forever: it is very reasonable that in the following years, different companies and regimes will have their own secret techniques.

Feature-based

In 2004, Hany Farid and Alin Popescu have published the first work about identifying fake images, using digital artifacts. Digital cameras have different artifacts stemming either from the photographing hardware, software or from the compression techniques, which are image specific. Therefore, it was only a matter of time until a digital method would emerge, to find out these fake photos. Farid and Popescu have used a specific camera filter (CFA) to identify fake parts of the images.

Since then, many more handcrafted techniques have emerged:

Using JPEG “signature”

JPEG is the most common image compression protocol in digital media. You can read about its specifics here. In short, every image has its own encoding which optimizes the file size. it is possible to exploit the difference in the encoding of different images to detect forged images (splice). Here is an example of such work.

Splicing artifacts caused by JPEG compression

Camera artifacts

As mentioned, digital cameras also have their artifacts, structures in the hardware or software. Each camera manufacturer, model or software version, may have its own signature. comparing such features in different parts of an image or camera noise may result in a good classifier.

Photography artifacts

With the same logic as above, tampering an image may distort the natural conditions of the photograph. Using a numeric measure of such a feature (e.g the lighting, “aberration”) may also be useful in forgery detection.

Generative model artifacts

Current generative models also suffer from known artifacts, some even visible, such different asymmetries, strange shape of the mouth, asymmetric eyes and more. These artifacts are exploited in this work, using classic computer vision to separate the fake images from real ones. However, considering the progress of generative models, these artifacts won’t necessarily exist in tomorrows forgeries.

Asymmetric eyes in GAN face

All the above methods are great and clever, however once, exposed they are penetrable by forgers. This is where machine learning comes in: being somewhat of a black box, it can learn fake images without its specifics being known to forgers.

Supervised deep learning

With the rise of deep learning, it was only natural that researchers will start using deep networks to detect fake images.

Intuitively, it is easy to take images from different types of faking classes and start training classifiers and detectors. Let’s examine some high profile works.

Universal Image Manipulation Detection Using a New conv layer

In this work, researchers design a special convolutional layer, which by putting a constraint on the filters, is intended to capture manipulations instead of image semantic content. Tested on different retouching methods such as median filtering, Gaussian blurring and more, the method reached >95% accuracy. This work is intended to be universal, however, it’s designed allegedly limits it to photoshopping and tampering, not to GANs and the like.

Mesonet

Mesonet is a work focused on perhaps the most painful problems: tampering of faces in videos. Specifically, face2face and deep fakes (see above). Since video (specifically digitized) in its essence is a sequence of images, researchers address this task using a deep network and reach good results, using pretty standard networks. All in all, the work is not very special apart from being one of the first addressing this task.

GAN experiment

As we’ve seen earlier, the GANs just keep coming. It is not very hard to train a model telling apart a real image from a specific GAN. But it is also impractical to train a model for each GAN.

Some researchers were optimistic enough to try and generalize a classification model to different GANs from those they were trained on. In other words, they trained a deep network on one type of GAN (PG-GAN — the predecessor of StyleGAN) and tried to infer on another one (DC-GAN, WGAN).

The results, as expected, showed that there is no generalization whatsoever, Even with the preprocessing the researchers have applied.

There are many more similar methods, but to generalize — meaning, identifying forgeries never seen before, learning research needs to start becoming more creative.

Universal methods

As we know, deep learning is not only limited to naive classifiers. There are many kinds of un/self/semi-supervised models that can handle small amounts of data, n shots, and other tasks.

Let’s see what kind of ideas are used to address the problem of “universal fake images”.

Self-consistency

A good example of such a method can be found in the work Fighting Fake News: Image Splice Detection via Learned Self-Consistency. This article comes from the workshop of Alexei Efros, which is mostly known from his work on self-supervised techniques. This work has shared some elements with his earlier works. The researchers incorporate a 4 step workflow where they:

  1. Learn to predict the EXIF* metadata of an image.
  2. Slice an image into many patches, comparing predicted EXIF values of each pair.
  3. Slices that seem to have un-matching EXIF values will be classified as taken from different images, therefore the image will be fake.
  4. Classifications will be used to provide a heat-map of the transplanted regions.
A detection of “spliced” Keanu Reeves

But what’s an EXIF? in digital media, images (and some sound files as well), EXIF, Exchangeable image file format, is a kind of a metadata signature of the file. An EXIF of an image should include a camera (or scanner) model, original image size, photo attributes (flash, shutter opening time) and more.

Clearly, not all online photos have an intact EXIF, especially not the fraudulent ones. That’s exactly the reason the researchers engaged in predicting the EXIF for each image/patch. Now you can see that this task is somehow unsupervised since there are a plethora of online images with readily available EXIF to learn from. To be more exact, 400K images were used to train this model.

This model reached good results on photoshopped images, but surprisingly, it has also some success on GaN generated images.

Forensic transfer

Luisa Verdoliva is an Italian researcher that along with her team took some interesting shots at generalizing image forensics. In this work, they train a bit different model, that will hopefully be more generalizable. What they did is using an autoencoder, which is a network that is intended to “shrink” an image into a vector, and then to reconstruct it. This vector was trained to be the classifier to determine whether the image is real or fake:

A scheme of the forensic transfer autoEncoder

They also experiment with transfer learning: train their network on dataset A, re-train it with a small subset of dataset B and try to infer on dataset B.

They do this task on a few data-sets, back and forth, and reach reasonable results (75–85% accuracy). These results are better than other networks (some of are discussed above in supervised learning part)

ForensicTransfer — an example transfer learning results

Noise print

Another unsupervised approach from the above team, similarly to self-consistency, tries to predict PRNU noise (a specific type of camera noise) between image patches. it reports state of the art results on multiple data sets (0.4 average Matthews correlation metric in compare to 0.33 for self-consistency).

A set of predictions: noise print on the right. EXIF-SC is self-consistency.

The deep fake spin

Considering all the above, it seems that the universal methods have to address the generative fakes more aggressively. And they do: some researchers take the chance and attempt to create some kind of a general GAN hunter. Which means be able to identify a GAN-generated image without specifically training on it’s kind. Let’s see a few of them:

Learning to Detect Fake Face Images in the Wild

In a somewhat hasty article (only 4 pages) researchers use uncoupled pairs of images to train a deep network to classify same/different images (real and real, fake and fake, real and fake) The somehow naive result reaches reasonable results on different GANs, although training took place on all of them.

Do GANs leave artificial fingerprints?

In this paper, Luisa Verdoliva uses her favorite noise print (PRNU), to try and characterize a few different GAN models (PG-GAN, cycle-GAN) with different training data sets. They succeed and show that indeed every GAN (up to the training set level) has its own noise fingerprint, similar to hat a camera has. Unfortunately, the help of this method for detecting fake images is mostly theoretical, at least for now.

Taxonomy of work

Eventually, we can cross-product the forging and detection methods, and fit most methods in one (or more) of the boxes:

In our work in Cyabra we stand in front many of the above challenges, therefore we work in similar strategy: use supervised methods to detect known forgeries, along with tweaking and testing to hopefully generalize for other forgeries.

Somehow, perhaps surprisingly, we have found out that some of the general or semi-general methods may be unexpectedly efficient for generalization. E.g found out the self-consistency work, (with some tweaking) may be efficient for classifying GAN created images.

Summary

So this is it, if you’ve reached here you successfully traversed the bubbly field of image and video forensics.

As we’ve seen above, most of the work is focused on methods for classifying specific tamperings. However, the general methods are new and still sparse.

It is clear that soon, this field will cease to be the narrow field of researchers and enthusiasts, but will start to concern the common people — which would like to regain their ability to tell what’s true and what’s fake.

The arms race is not going to stop any time soon, but we should expect to see the forensic people get their act together and come up with some better methods to fight back the forgers.

* notable works in supervised learning.

I hope you’ve enjoyed reading this review! feel free to follow me, and check out my website — www.shibumi-ai.com