AI Safety

What we really need to do is make sure that life continues into the future. It’s best to try to prevent a negative circumstance from occurring than to wait for it to occur and then be reactive. -Elon Musk

As a 90’s kid, I have fascinated myself by watching many sci-fi movies. During that period this genre was like “ a young child having a lot of ambitions in its mind ”. Movies such as Matrix 1,2 (not three), all movies of Terminator series, all movies of Transformer series, X men, Independence day and many such were popular amongst us. Because of them, I came to know about robots and was constantly thinking about how to fight with robots if it came in real life.

Now when i look past these movies the first thing that i recognize is what a crappy vfx this movie has and second is Can we create this type of characters with sheer strength of AI?

Again in 2015 Avengers: Age of Ultron came out, which i thought a mildly interesting movie. But one thing that again came to my mind… Can we create this type of characters with sheer strength of AI? or Can we create Ultron?

Till now i have sensed a pattern among this all movies… all of these villains have in are somehow created by humans or is created because of any intervention of humans.

But in the end these all are still movies. And as we all know

Movies is just an illusion of people’s minds

So we can’t create villains like T-X in terminator or Megatron in transformers or Ultron in Avengers ?

Why not. One thing that this all villains have in common that make them evil in first place is that they have capabilities of general artificial intelligence.

But in recent decades we are steadily seeing a rise in developments in AI. and in some cases we have exceeded beyond human limits.

Take a case of OpenAI five. An AI based dota team that beats even the most skilled dota players.

Or take this story in which Facebook developers create AI that talks to other AI in a language that even the developers didn’t understand. Technically it can be said that we have achieved a true General Artificial Intelligence.

At that point developers shut down the whole operation fearing that it has gone out of their hand. But to whom they were fearing? and what could happen if they let go the program as it is? who is stopping them?

And unfortunately right now we have no answer. There is no such thing as AI rules in our constitution as our lawmakers still cant get their heads on AI.

AI Safety

As this paper beautifically explained…

AI Safety is collective termed ethics that we should follow so as to avoid problem of accidents in machine learning systems, unintended and harmful behavior that may emerge from poor design of real-world AI systems.

It is this general problem-solving ability that we have in mind when we talk about artificial general intelligence or smarter-than-human AI. AI systems come to surpass humans in science and engineering abilities without being particularly human-like in any other respects artificial intelligence need not imply artificial consciousness, for example, or artificial emotions. Instead, we have in mind the capacity to model real-world environments well and identify a variety of ways to put those environments into new states.

To know more about this you can refer this video which clearly explains power of AI with help of an example.

Ok for now i will extend the topic to one of the interesting papers that i think is perfect example on why we need to develop AI safety.

If you didn’t know about neural networks please refer here then only you will know this.

Adversarial Attack

Adversarial attack are inputs to machine learning models that an attacker has intentionally designed to cause the model to make a mistake.

Let’s take an example

Let’s say i have created an image classifier that gives name of the object based on its image. so if we give this as input image

Then we will get output as “Panda” which is correct.

But what if i tell you that only adding some specific noise to the image, i can fool the model classifier to think of it as an another object. Such as

This is a classic example of adversarial attack.

Adversarial examples have the potential to be dangerous. For example, attackers could target autonomous vehicles by using stickers or paint to create an adversarial stop sign that the vehicle would interpret as a ‘yield’ or other sign, as discussed in Practical Black-Box Attacks against Deep Learning Systems using Adversarial Examples.

Reinforcement learning agents can also be manipulated by adversarial examples, according to new research on Adversarial Attacks on Neural Network Policies, The research shows that widely-used RL algorithms, such as DQN, TRPO, and A3C, are vulnerable to adversarial inputs. These can lead to degraded performance even in the presence of perturbations too subtle to be perceived by a human, causing an agent to move a pong paddle down when it should go up, or interfering with its ability to spot enemies in Seaquest.

When we think about the study of AI safety, we usually think about some of the most difficult problems in that field — how can we ensure that sophisticated reinforcement learning agents that are significantly more intelligent than human beings behave in ways that their designers intended?

Adversarial examples show us that even simple modern algorithms, for both supervised and reinforcement learning, can already behave in surprising ways that we do not intend.

Adversarial examples are hard to defend against because it is difficult to construct a theoretical model of the adversarial example crafting process. Adversarial examples are solutions to an optimization problem that is non-linear and non-convex for many ML models, including neural networks. Because we don’t have good theoretical tools for describing the solutions to these complicated optimization problems, it is very hard to make any kind of theoretical argument that a defense will rule out a set of adversarial examples.

Adversarial examples are also hard to defend against because they require machine learning models to produce good outputs for every possible input. Most of the time, machine learning models work very well but only work on a very small amount of all the many possible inputs they might encounter.

Adversarial examples show that many modern machine learning algorithms can be broken in surprising ways. These failures of machine learning demonstrate that even simple algorithms can behave very differently from what their designers intend.

Okay, let me ask you a question:

So how many pixels we need to change to fool our neural network?

Unfortunately, the answer is one.

In this paper it is mentioned that any neural network can be defeated by changing one pixel from the image.

By changing only one pixel in an image that depicts a horse, the AI will be 99.9% sure that we are seeing a frog. A ship can also be disguised as a car or amusingly, almost anything can be seen as an airplane.

So how can we perform such an attack? As you can see these neural networks typically don’t provide a class directly, but a bunch of confidence values. What does this mean exactly?

The confidence values denote how sure the network is that we see a labrador or a tiger cat. To come to a decision, we usually look at all of these confidence values and choose the object type that has the highest confidence. Now clearly, we have to know which pixel position to choose and what color it should be to perform a successful attack. We can do this by performing a bunch of random changes to the image and checking how each of these changes performed in decreasing the confidence of the network in the appropriate class.

After this, we filter out the bad ones and continue our search around the most promising candidates. This process we refer to as differential evolution, and if we perform it properly, in the end, the confidence value for the correct class will be so low that a different class will take over. If this happens, the network has been defeated.

Now, note that this also means that we have to be able to look into the neural network and have access to the confidence values. There is also plenty of research works on training more robust neural networks that can withstand as many adversarial changes to the inputs as possible.

Second example.


Deepfakes are videos in which the subject is face-swapped using machine-learning algorithms. The practice was created by Redditor Deepfakes, who launched a dedicated subreddit to share the videos in November 2017. In January 2018, the FakeApp desktop application was released as a tool for creating the digitally altered videos.

Beyond face-swapping, other tools could soon allow formerly difficult to fake media to be manipulated. Using this someone has produced audio of president Donald Trump speaking Mandarin. Using generative adversarial networks, essentially a “cat and mouse” game between two competing AIs, Nvidia was able to develop a grid of fake celebrities faces.

It’s basically done using a machine learning algorithm. It takes a data set of lots of pictures of one person’s face, so hundreds of pictures of, say, Carrie Fisher, and then a video to put that onto. It runs these two together in the algorithm and what comes out after hours or days is what looks like that person in that video.

What has been changed recently, exactly as some have mentioned, is are these new artificial intelligence-enabled algorithms that can take a lot of data and bypass a lot of this manual process, and a need for technical facilities and make this technology accessible to many many of the users who may not afford these kinds of technical setups.

One of the challenges with all of these approaches is that once you have a system that can detect a fake, then you can train your system that creates fakes to counter that system. And so as long as there’s access to that system for detection, you can just get better and better at sort of getting past it. So I don’t see that as sort of a super long term solution. I mean, it’s a cat and mouse game.

This was just the tip of the problems that we are facing right now due to lack of AI Safety. As time goes by, we will gain the intelligence to obtain the true intelligence that could outperform humans in every way and no one will be there to stop them to stop them.

Final Thoughts

The problem with AI Safety is not going to solve itself and it’s not easy to solve…. and most importantly, we have to solve it before we have to solve the general AI.

Sooner or later we will get general AI and when we will get general AI, we should also have safety measures for it.

Thanks for spending your precious time to read my blog. I heartily appreciate that.

If you like this post then show your response by clapping it and posting on twitter as i really think that it topics like this should spread to every individual.

If you have any issues or doubts or suggestions, just write in the comments below.

Source: Deep Learning on Medium