Breaking Machine Learning With Adversarial Examples

Source: Deep Learning on Medium

Machine learning is at the forefront of AI. With applications to computer vision, natural language processing, and more, ML has enormous implications for the future of tech! However, as our reliance on ML increases, so does our concerns about ML security.

I’m not talking about a robot uprising, but rather something much more realistic — the threat of Adversarial Examples.

What are Adversarial Examples?

In short, Adversarial Examples are model inputs that are specifically designed to fool ML models (e.g. neural networks). What’s scary about this is that adversarial examples are nearly identical to their real life counter parts — by adding a small amount of “Adversarial Noise” to a source image, an adversarial example can be indistinguishable to an unaltered image!

*Adversarial example of a panda (right) misclassified as a gibbon

So how do we generate Adversarial Examples?

While a lot of varying methods can be used to generate Adversarial Examples, I’ll be focusing on something called the Fast Gradient Sign Method (FGSM) for this article. Furthermore, for the sake of simplicity, I’ll be walking through how this method can be used to generate an Adversarial Example for an Image Classification task.

FGSM Equation

Let’s take a second to break down the FGSM equation into a few steps.

  1. The loss function is set so that the expected label y_target will be the label that my model will misclassify my input image into.
  2. An input image is fed into my model, where the loss of the model is taken with the loss function mentioned above.
  3. The gradient of my loss function with respect to my input image is computed.
  4. To restrict how much I change my image, I’ll multiply the sign of my gradient — the sign of every value in my gradient (same shape as input image) — by the hyperparameter/constant epsilon (usually a small number to limit image change).
  5. Subtract the expression above from my input image — as the loss function’s gradient will always point me into a direction that’ll increase loss, I subtract to do the reverse.

FGSM is then repeated multiple times, until a desired adversarial example is eventually generated.

Code snippet for the FGSM equation (keras backend)

Using this technique, I was able to create an adversarial example of a dog, where I successfully tricked the MobiNet image classifier into misclassifying my image into a frog!

Admittedly, the generative example in this case has more visible differences, but can be improved with additional fine tuning.

Adversarial Attacks, and their Implications

While I’ve mainly focused on the effects of adversarial examples on image classifiers, adversarial examples can be used to cause massive damage in a lot of different scenarios. Take self driving cars as an example. If a stop sign was modified in some way (stickers, paint, etc.) to be incorrectly recognized, then an autonomous car wouldn’t stop — resulting in severe consequences.

Furthermore, adversarial examples work outside of computer vision. Should it be applied to fields such as natural language processing, an input sentence might be recognized as something entirely different (imagine the consequences of that!).

In short, adversarial attacks pose a very real threat to AI security. There’s always ongoing research in this field (e.g. generative defense networks, input transformations), but there is still no single solution against adversarial attacks. Ultimately, while it’s important to look forward towards the future of AI development, we always need to be aware of the potential problems it brings.

Before You Go:

  1. Clap this story!
  2. Share this with your network!
  3. Connect with me on Linkedin!