Why the 3rd Generation of Deep Q-Learning is About Creativity

Source: Deep Learning on Medium

Why the 3rd Generation of Deep Q-Learning is About Creativity

In our journey to create Artificial Intelligence (AI) with more human characteristics, we have reached the point where were are copying some of the most challenging components like reasoning and creativity.

Breakout played by a Deep Q-learning agent

In 2013 a London based startup called DeepMind published a groundbreaking paper called Playing Atari with Deep Reinforcement Learning on arXiv. By using a technique called Reinforcement Learning (RL) or Deep Q-Learning Agents, the authors were able to give AI models the power to create proper policies.

This addition was seen as a breakthrough as it made it possible for AI to play over 2600 Atari games without written instructions. One of those games was Breakout. It is a computer game that resembles a pixelated version of squash. In this particular setup, the agent learned, not from explicit programming but from observing thousands of gameplays.

To win the game, the agent had to bounce the ball against a wall of bricks over and over again. It was punished every time it failed to return the ball and rewarded when it successfully removed bricks from the wall. After hours of training, the model managed to play the game reasonably well. But after some more extensive training, the agent figured out a better way to play the game. The AI learned it would be more successful if it could use the ball to create a tunnel through the bricks. This way, the ball would bounce between the bricks and the back boundary wall and destroy bricks from the opposite side. The agent figured it was a much more successful way to win the game when it entirely removed the risk of failing to bounce back the ball since it was trapped behind the bricks.

This example demonstrated how intelligent this agent was already, even in a limited control regime and having seen only a small number of samples to learn from.

However, the results weren’t all positive, as small simple changes could derail the progress of the agent. When the experimenter changed even the smallest detail in the setup, for example, the colour of the ball, the agent could no longer understand how to play the game. In this case, the agent had to be retrained again with new examples.

It is clear that the first generation of deep Q-Learning was all about:

  • Agents having limited controls;
  • Operating in simple environments;
  • Aimed to learn simple tasks to win/survive
AlphaGo playing Go against worlds best player

3 years later, DeepMind returned with a new mission: Use the Q-Learning agent to beat the world’s best player in one of the hardest games to play, Go. Experimenters chose Go champion Lee Sedol to face the machine. If you want to learn more about it, there is an excellent documentary about it on Netflix that I really recommend for you to watch.

The basic idea of the game is to conquer as much area as possible. This time, the agent could not get away with merely reacting to movements like the bouncing ball in the game Breakout. It had to anticipate the various complex strategies deployed by its opponent and come up with more intelligent and creative strategies to beat the world champion.

In case you decide to watch the documentary, I will not give away the ending. However, it is clear that to play this type of strategy game, the AI model should encompass:

  • Agents having numerous controls;
  • Operating in hard environments;
  • Aimed to learn strategies to win/survive

This would be a significant upgrade in artificial intelligence. But, again, the model could not continue to play the game if the experimenter were to change even the smallest detail, like the colour of the dice.

To move to the next level, I like to believe the 3rd generation of AI will be about:

  • Agents having unlimited controls;
  • Operating in (multiple) complex environments;
  • Have to learn reasoning to win/survive

This level of AI would possess something much closer to our human understanding of creativity.

Napoleon Bonaparte by Jacques-Louis David, 1804 and an AI-generated version of it

It might be easier to understand this if we look at two topics related to human creativity: art and music. It is here we begin to see the limitations of AI in terms of replicating this human trait.

We can currently program AI to create art when we feed the machine enormous amounts of data based on existing art and program the agent. This can create art that looks like art to a human eye, but it lacks the dimension of pure creativity. For the AI to actually create, it would need to move from interpreting to actually learning.

The same is currently true with music. AI can be programmed to produce music, even original melodies. However, the agent never really learns how to create music on its own and cannot yet create it reliably.

Creativity is an essential component in the next stages of the development of AI. There is already so much that AI can do, especially in terms of forecasting, predictive maintenance, image and text classification, and gaming. Especially the latter, is a perfect catalysator to train AI agents to master creativity. You could think of an AI playing games like The Sims, No Man’s Sky or Lego Mecabricks where any original outcome would satisfy.

No Man’s Sky — Hello Games 2016

At Bureau Digiraat, we are always exploring the latest developments in AI while using what we already know to help businesses meet their most pressing challenges and reach long-term development goals. As AI continues to advance and demonstrates even more creative capabilities, we are looking forward to how we can harness its power to help our clients go even further.