How I could hypothetically bypass Account Moderation: Fast Gradient Signed Untargeted on Deep CNN

Original article can be found here (source): Deep Learning on Medium

Here is the general roadmap of my article:

  • 1] What the heck is an adversarial attack (for ML)?
  • 2] What is a Fast Gradient Signed Method (FGSM)?
  • 3] Is the Age and Gender classifier legit …? (spoiler Yes)
  • 4] How to transform your picture to fool the classifier?

Note: If you’re also interested in the topic and have ideas on how to develop the project further then contact me, I’ll be very much glad to give you full attention.

Here is my GitHub for a potential collab!

Alright, let’s get going.

1] What the heck is an adversarial attack

A formal definition to start with is always better, so here you go! (Wikipedia source…)

Adversarial machine learning is a technique employed in the field of machine learning which attempts to fool models through malicious input.[1][2] This technique can be applied for a variety of reasons, the most common being to attack or cause a malfunction in standard machine learning models.

Let’s precise the definition to get some more intuition on adversarial attacks in Machine Learning. In our situation, “malicious inputs” refer to images that are images of the actual label but created to “fool” meaning to induce the classifier to think otherwise. So we shift through inputs the correct label to an incorrect label.

Magritte — La trahison des images — 1928

(TRANSLATION) This is not a smoking pipe

Unexpectedly after some research and few papers reading, I noticed that weaknesses of Machine Learning models are pretty huge, and adversarial attacks to some extent are proof of such statements. The trade-off between making it easy to train, using an approximation of linearity of models and linearity itself being a possible exploitable flaw is even more obvious.

2] What is a Fast Gradient Signed Method (FSGM)?

Fast Gradient Signed Method (untargeted) is an adversarial method first published at ICLR 2015 by Ian Goodfellow, Jonathon Shlens, and Christian Szegedy.

It consists of exploiting a general flaw that is the precision value of images. Most images are stored in 0 to 255 i.e with 8 bits, but then when fed into the Neural Network it is transformed into higher precision values (more accurately, it is given more space). So a trick would be to shift these pixels values enough but less than a certain threshold so as not to change the actual image when rendered while causing the most damage to networks.

So to sum up, we are allowed a small shifting which is a vector (since we are modifying all pixels). But then how should we choose that shifting? (at least how should we pick the direction of this shifting?)

Let’s start with some simple model, a linear model.

Linear models can ultimately be summed up to such expression.