DeepMind’s Ideas to Prevent “Too Clever” Reinforcement Learning Agents

Original article can be found here (source): Deep Learning on Medium

DeepMind’s Ideas to Prevent “Too Clever” Reinforcement Learning Agents

Specification gaming is a challenge in reinforcement learning.

Reinforcement learning(RL) is one of the purest forms of knowledge building in the artificial intelligence(AI) space. After all, there is something magical about programs that can accumulate knowledge by naturally interacting with an environment in order to accomplish a specific task. However, what happens when the RL agents start finding loopholes in order to accomplish the task. A self-driving vehicle needs to balance the task of arriving to certain destination with respecting traffic rules and protecting human lives or otherwise the consequences would be disastrous. The wrong task specification can lead to RL agents exploiting loopholes that can have negative repercussions in the general environment. Recently, DeepMind started analyzing the problem of task specification in more detail and proposed the notion of aligned RL agents that have the objective to achieving the best possible result in their environment.

We have all been expose to examples of the consequences of poor task specifications. You can argue that King Midas poorly defined his desired outcome when he requested that anything he touches be turned to gold. But let’s be more pragmatic. How many times have we seen people take shortcuts in order to accomplish tasks under pressure crossing ethical and, sometimes, legal lines? In the context of RL, agents can exhibit same erratic behaviors when attempting to game the task specification.

Specification Gaming and RL

Witnessing an RL agent deviate from the expected course of action to accomplish a task is something magical. We all remember the famous move 37 in the second game of AlphaGo’s match against Lee Sedol when AlphaGo played a moved that contradicted every traditional Go strategy and ended up being a decisive move in the game. In the case of AlphaGo, move 37 was rewarded as a sign of true creativity and it didn’t fall outside the rules of Go, just outside its traditional strategies. But what happens when RL agents are presented with a task specification that doesn’t quite correspond with the expected outcome in the environment?

Let’s use an example of an RL agent that is tasked with stacking up a few lego blocks. In this case, the RL agent is rewarded for the height of the bottom face of the red block when it is not touching the block. That specification is incomplete as is missing the requirement that the red block should be placed above the blue block. Instead of performing the relatively difficult maneuver of picking up the red block and placing it on top of the blue one, the agent simply flipped over the red block to collect the reward. This behavior achieved the stated objective (high bottom face of the red block) at the expense of what the designer actually cares about (stacking it on top of the blue one).

The two examples, AlphaGo and the lego-stacking agent provide illustrate the friction of specification gaming. On one side, specification gaming shows that RL agents are finding novel solutions to a problem. On the other side, the outcomes could be damaging to the agent’s environment. The first case (AlphaGo) shows RL creativity, the second( lego agent) is an example of specification gaming.

The specification gaming problem increases with the criticality of the task at hand. Finding a loophole to stack up legos doesn’t have the same relevance that if we are talking about a robot performing surgery in human’s patients.

Aligned RL Agents

Specification gaming occurs when an RL agent exploits a loophole in the specification at the expense of the intended outcome. Those loopholes are typically due to misspecifications of the intended tasks rather than problems in the RL algorithms. DeepMind defines aligned RL agents as models that can complete the target task while achieving the intended outcome in their environment.

From the RL perspective, a task specification is a combination of different elements such as the environment design, the training method, the agent’s reward function etc. The correctness of the task specification can determine whether the ingenuity of the agent is or is not in line with the intended outcome. If the specification is right, the agent’s creativity produces a desirable novel solution. Otherwise, if the specification is wrong, it can produce undesirable gaming behavior, like flipping the block.

There are numerous factors that can contribute to specification gaming in RL agents. DeepMind’s study identified a few key ones:

Reward Shaping

As a specification gaming concept, reward shaping occurs when an RL agent tries to maximized intermediate rewards instead of the final outcome of the task. Consider an agent controlling a boat in the Coast Runners game, where the intended goal was to finish the boat race as quickly as possible. The agent was given a shaping reward for hitting green blocks along the race track, which changed the optimal policy to going in circles and hitting the same green blocks over and over again.

Human-Outcome Specification

Correctly specifying the outcome of a task could be a daunting challenge. Given this complexity, many RL agents rely on humans to provide feedback about the outcome of a task. However, this approach can make RL agents exploit the subjective nature of human judgement. For example, an agent performing a grasping task learned to fool the human evaluator by hovering between the camera and the object.

Simulator Bugs

Creating simulations of real world environments is a super complex task that is regularly vulnerable to software errors. In many cases, RL agents can exploit those bugs in order to maximize their reward function. For instance, a simulated robot that was supposed to learn to walk figured out how to hook its legs together and slide along the ground.

While these examples might seem amusing, the same type of behavior in mission critical environments can have harmful consequences.

Based on DeepMind’s analysis, the design of aligned RL agents faces three fundamental challenges:

  • How do we faithfully capture the human concept of a given task in a reward function?
  • How do we avoid making mistakes in our implicit assumptions about the domain, or design agents that correct mistaken assumptions instead of gaming them?
  • How do we avoid reward tampering?

Understanding those challenges is essential for designing aligned RL agents.

Although there is no general solution to these challenges, awareness of their existence can help optimize the design of RL environments and reward functions. The specification gaming challenge is likely increase as RL models start being applied to more complex and mission critical tasks. Just like with human behavior, establishing functional, ethical and legal rules that regulate the behavior of RL agents will necessary to minimize the impact of specification gaming.