Facebook’s Red Team

Original article was published by Aarsh Kariya on Artificial Intelligence on Medium


Courtesy of Alex Haney from Unsplash

Instagram encourages its billion or so users to add filters to their photos to make them more shareable. In February 2019, some Instagram users began editing their photos with a different audience in mind: Facebook’s automated porn filters.

Facebook depends heavily on moderation powered by AI. Some users found they could sneak past Instagram’s filters by overlaying patterns such as grids or dots on rule-breaking displays of skin. Which meant more work for Facebook’s content reviewers.

Facebook’s AI engineers responded by training their system to recognise banned images with such patterns, but the fix was short-lived. Users started adapting by going with different patterns. Manohar Paluri, who leads work on computer vision at Facebook, and his team eventually tamed the problem of AI-evading nudity by adding another machine learning system that checks for patterns such as grids on photos and tries to edit them out by emulating nearby pixels.

This cat-and-mouse incident helped prompt Facebook a few months later to create an AI Red Team to better understand the vulnerabilities and blind spots of its AI systems.

The work of protecting AI systems bears similarities to conventional computer security. Facebook’s AI red team gets its name from a term for exercises in which hackers working for an organization probe its defenses by role-playing as attackers. They know that any fixes they deploy may be side-stepped as their adversaries come up with new tricks and attacks.

The growing investment in AI security mirrors how Facebook, Google and others also are thinking harder about ethical consequences of deploying AI. Both problems have roots in the fact that despite its usefulness, existing AI technology is narrow and inflexible, and it can’t adapt to unforeseen circumstances in the way people can.

A growing library of machine learning research papers documents tricks like altering just a few pixels in a photo to make AI software hallucinate and detect objects that are not present. One study showed that a google image-recognition service could be fooled into categorizing a rifle as a helicopter; another study 3D-printed objects with a multifaceted shape that made them invisible to the lidar software of a prototype self-driving car from China’s Baidu. Other attacks include “data poisoning” where an adversary alters the data used to train a machine learning algorithm, to compromise its performance.

Facebook’s AI red team is led by Cristian Canton, a computer-vision expert who joined the company in 2017 and ran a group that works on image moderation filters. He was proud of his team’s work on AI systems to detect banned content such as child pornography and violence, but he began to wonder how robust they really were.

In 2018, Canton organized a risk-a-thon in which people from across Facebook spent three days competing to find the most striking way to trip up those systems. Some teams found weaknesses that Canton says convinced him the company needed to make its AI systems more robust.

One team at the contest showed that using different languages within a post could befuddle Facebook’s automated hate-speech filters. A second discovered the attack used in early 2019 to spread porn on Instagram, but it wasn’t considered an immediate priority to fix at the time. We forecast the future, Canton says. That inspired me that this should be my day job.

The red team’s weightiest project aims to better understand deepfakes, imagery generated using AI that looks like it was captured with a camera. The results show that preventing AI trickery isn’t easy.

Facebook’s AI red team launched a project called the Deepfakes Detection Challenge to spur advances in detecting AI-generated videos. It paid 4,000 actors to star in videos featuring a variety of genders, skin tones, and ages. After Facebook engineers turned some of the clips into deepfakes by swapping people’s faces around, developers were challenged to create software that could spot the simulacra.

The results, released last month, show that the best algorithm could spot deepfakes not in Facebook’s collection only 65 percent of the time. That suggests Facebook isn’t likely to be able to reliably detect deepfakes soon. It’s a really hard problem, and it’s not solved, Canton says.

Canton’s team is now examining the robustness of Facebook’s misinformation detectors and political ad classifiers. We’re trying to think very broadly about the pressing problems in the upcoming elections, he says.

Biological intelligence will still be needed, since adversaries will keep inventing new tricks. The human in the loop is still going to be an important component.