How to Systematically Fool an Image Recognition Neural Network

Original article was published by on AI Magazine


How to Systematically Fool an Image Recognition Neural Network

and why it matters… a lot

Convolutional neural networks — CNNs — form the basis for image recognition, which is undoubtedly one of the most important applications of deep learning. Unfortunately, much of research in deep learning is done in the ‘perfect-world’ constraints of datasets in pursuit of a few percentage points in accuracy. Thus, we’ve developed architectures that work tremendously well in theoretical tests but not necessarily so in the real world.

Adversarial examples or inputs (think adversary: enemy) are indistinguishable from regular images to the human eye, but can completely fool a variety of image recognition architectures. There are clearly many unsettling and dangerous implications of adversarial inputs being deployed, especially as AI is given more power to make decisions for itself.

Thus, it is important to understand and solve methods of systematically producing adversarial inputs — ethical hacking, applied to deep learning.

One simple approach towards systematic generation of adversarial inputs is known as the ‘fast gradient signed method’, introduced by Goodfellow et al.

Consider:

  • an input vector x (this is where the input information is — the image — but think of it as a one-dimensional list)
  • an adversarial input x-hat (same shape as x, but with altered values)
  • a perbutation vector η (‘eta’, is added to the input vector to produce the adversarial input vector)

In order to perform element-by-element multiplication and summing (e.g. [1,2,3] × [1,2,3] = 1+4+9 = 14), we multiply the transpose of the first vector by the second vector. This will be referred to as the ‘weighted sum’.

We have two goals here we must both achieve to generate an adversarial input:

  • We want to maximize the difference between the weighted sum of the original input vector and the weighted sum of the perturbed (altered) adversarial input. This will shift the activations and throw the model’s decision making process off.
  • We want to make each individual value of the adversarial vector η as small as possible such that the overall image appears unaltered to a human eye.

The solution introduced by Goodfellow et al. is two-pronged — and quite clever for a few reasons.

η is set to sign(w), where the sign function returns -1 for negative values, and 1 for positive values (0 for 0). If the weight is negative, it is multiplied by negative one to produce a positive sum; if the weight is positive, it is multiplied by positive one with no change.

For example, if the weight vector was [3,-5,7], η would be [1,-1,1]. The weighted sum is 3+5+7=15. Note that performing this operation essentially flips the negatives into positives and leaves the positives alone (abs() function). This means that every number is as large as it can be, and the highest possible weighted sum if weights are within the interval [-1, 1].

Consider some ‘images’ below. Although they represented two-dimensionally, think of them just as one-dimensional vectors.

Created by author.

The end sum is 10, which is a large departure from the original output, -7. Surely, this will screw up the network’s predictions.

This achieves the goal of making large changes, but it isn’t very discreet at all. After all, our image has noticeably changed a lot when we perturb it:

Created by author.

Remember that our earlier representation of the final sum as w(x) + w(η) where w() is the weighted sum and η is the perbutation vector is really an expansion of w(x+η). We want to change each pixel’s value slightly. While the total effect must be maximized, each element of η must be small enough as to be unnoticeable.

In the actual production of an adversarial input, pixel number j is defined as the jth value of x plus the jth value of η. The notation first introduced takes a bit of a shortcut to demonstrate the purpose of η, which is to heavily increase the collective sum, not necessarily individual pixel values.

Each element of η is fairly large: +1 or -1, which makes a big impact on properly scaled data. To solve this, we will multiply each element of η by a signed ϵ, where ϵ is the smallest numerical unit that sensors detect (or smaller). That number would be 256 for 8-bit colors, and hence ϵ = 1/255.

Since ϵ is ‘undetectable’ (or just barely so), it should make no difference visually to the image. However, each change is built — following the sign function — such that the change in weighted sum is maximized.

Hence, we add -ϵ or +ϵ to each element of the input vector, which is a small enough change such that it is undetectable but constructed with the sign function such that the change is maximized.

Many small components can add up to be quite large, especially if they are constructed in a smart way.

Let’s consider the effect of this on our previous example with ϵ=0.2. We are able to make a difference of 3 units (the sum of -4).

Created by author

This is quite substantial, especially considering the small change the perbutation vector has on the original input vector.

Created by author

If the weight vector has n dimensions and the average absolute value of an element is m, then the activation value will grow by ϵnm. In high-dimensional images (say 256 by 256 by 3), the value of n is 196608. m and ϵ can be very small, yet there will still be a substantial affect on the output.

This method is very fast, since it only changes inputs by +ϵ or -ϵ: but it does so in a way so effective it completely fools the neural network.