This Area of Game Theory can Help Reinforcement Learning to Scale to Infinite Agents

Source: Deep Learning on Medium

This Area of Game Theory can Help Reinforcement Learning to Scale to Infinite Agents

Mean Field Games provide a strong foundation for multi-agent reinforcement learning systems.

Reinforcement learning is one of the most popular areas of research in deep learning nowadays. Part of the popularity of reinforcement learning is due to the fact that is one of the learning methods that resembles human cognition the closets. In reinforcement learning scenarios and agent learns organically by taking actions on an environment and receiving specific rewards. A little less known discipline called multi-agent reinforcement learning(MARL) focuses on reinforcement learning scenarios involving a large number of agents. Typically, MARL scenarios suffer from a scalability challenges in which its complexity increases linearly with the number of agents in the environment. Last year, two different research papers [one by researchers at the Georgia Institute of Technology and one by a team from deep learning startup Prowler.io ] have proposed to tackle this challenge using unconventional techniques from the game theory world.

Single-Agent vs. Discrete Multi-Agent vs. Infinite Multi-Agent Reinforcement Learning

Many of the most famous successes in reinforcement learning such as AlphaGo have been based on single-agent environments in which there is only one artificial intelligence(AI) program interacting with the environment. In those single-agents reinforcement learning(SARL) scenarios, the agent’s reward function is simply based on the combination of actions and the state of the environment. Now consider scenarios such as multi-player games involving more than one agent. Those scenarios are known as multi-agent reinforcement learning(MARL) and result even more challenging as the reward function of a specific agent might be influenced by the action of other agents in the environment.

MARL scenarios have enjoyed their share of success in the last few months with AI powerhouses like OpenAI building a system that can beat Dota2 and DeepMind doing the same on the Quake III game. However, in both scenarios the MARL environment only involved a small number of agents. Until now, MARL methods have struggled when applied on scenarios involving a large number of agents. In that sense, one of the biggest questions shadowing MARL is whether it can be proven effective in scenarios that trend towards an infinite number of agents.

MARL scenarios with infinite agents are all around us. Think about stock market dynamics in which the action of a trader can be influenced by a large number of other traders or macro-economic events. Similarly, many modern economic problems in areas such as trade or monetary policy can be modeled as MARL environments with a large number of agents. The complexity of MARL scenarios with infinite agents has a very simple mathematical explanation. The solution to multi-agent games is typically modeled using the Nash-Equilibrium famously depicted in the movie A Beautiful Mind. However, the computational complexity of the Nash-Equilibrium scales linearly with the number of agents in the environment making it nonviable for MARL scenarios with infinite agents.

Enter Mean Field Games

Mean field games(MFG) is the field of game theory that models problems using a large number of non-cooperative, rational agents. This revolutionary model has been studied in depth by mathematicians and applied to describe complex multi-agent dynamic systems such as Mexican waves, stock markets and smart grids. However, MFG have remained mostly a theoretical exercise. While, in theory, MFG by itself could describe the behaviors of large population systems, the models can require handling non-linear partial differential equations that are often unsolvable. Fortunately, MARL doesn’t have that problem as it doesn’t require exact equations.

MFG and MARL

The combination of MFG and MARL is one of those cases in which two unsolvable factors produce a solvable equation 😉. MARL can effectively operate using inexact probabilistic models but it results impractical in environments with infinite agents. MFG can effectively model behaviors in large populations of agents but often produces unsolvable equations. What happens if we combine the two?

The two research papers referenced above proposed different techniques of applying MFG to MARL scenarios. In both cases, the research showed that MFG methods can drastically reduce the complexity of MARL scenarios with large number of agents. For instance, MFG can model the behavior of agents in a MARL scenarios as a probabilistic density function as it assumes that all agents have similar reward functions( all traders in the stock market are focused on maximizing the return per trade). That simplification makes MARL scenarios with large number of agents computationally viable. Instead of agents responding to the actions of other agents individually, each agent now performs its actions in response to the mass which jointly represents the collection of states for all agents.

The Prowler.io research team conducted several experiments combining MFG and MARL. One of the experiments was based on the famous spatial congestion(SC) games in which N agents, given some initial position, each agent chooses an action in order to move to a desired location which is a terminal state. Certain areas are more desirable to occupy than others, however the agents are averse to occupying crowded areas. Agents receive the largest rewards for occupying parts that are both desirable and have relatively low concentrations of agents.

Applying MFG to this scenario showed that the reward function tends to stabilize after about 2000 episodes regardless of the parameter’s configuration.

A surprising result of the previous experiment is that MFG also seems to influence RL agents to optimize for long-term planning. For instance, in the SC game, the agents learn that by taking a shortcut (traversing horizontally) to meet the objects, they can increase their overall rewards. To behave in this way, the agents must first incur costs with low rewards as they traverse a horizontal path that doesn’t coincide with the path of the objects. In this sense, the agents demonstrate planning by forgoing immediate rewards in favor of taking a path that maximizes long-term rewards.

Modeling MARL scenarios using MFG methods is still a purely theoretical exercise and hasn’t bee applied in practice. However, the initial research showed an incredible potential to finally break through what is considered by many the biggest limitation of MARL: operating at large scale with infinite agents.