New Game Theory Innovations that are Influencing Reinforcement Learning

Source: Deep Learning on Medium

New Game Theory Innovations that are Influencing Reinforcement Learning

Game theory plays a fundamental factor in modern artificial intelligence(AI) solutions. Specifically, deep reinforcement learning(DRL) is an area of AI that embraced game theory as a first-class citize. From single-agent programs to complex multi-agent DRL environments, gamifying dynamics are present across the lifecycle of AI programs. The fascinating thing is that the rapid evolution of DRL has also triggered a renewed interesting in game theory research.

The relationship between game theory and DRL seems trivial. DRL agents learn by regular interactions with an environment and other agents(in the case of multi-agent DRL). Incorporating incentives into DRL environments is a very effective way to influence the learning of agents. . While most DRL models are still based on traditional game theory concepts such as the Nash equilibrium or zero-sum-games, there are new methods that are steadily becoming an important element of AI programs. Let’s explore three new game theory trends that are making inroads into DRL research.

Mean Field Games

Mean Field-Games(MFG) are a relatively new area in the game theory space. The MFG theory was just developed in 2006 as part of a series of independent papers published by Minyi Huang, Roland Malhamé and Peter Caines in Montreal, and by Jean-Michel Lasry and Fields medalist Pierre-Louis Lions in Paris. Conceptually, MFG comprises methods and techniques to study differential games with a large population of rational players. These agents have preferences not only about their state (e.g., wealth, capital) but also on the distribution of the remaining individuals in the population. MFG theory studies generalized Nash equilibria for these systems.

A classic example of MFG is how groups of fish in a schooling swim in the same direction and in a coordinated matter. Theoretically, this phenomenon is really hard to explain but it has his roots on the fact that a fish reacts to the behavior of the closest group. More specifically, each fish does not care about each of the other fishes individually but, rather, it cares about how the fishes nearby, as a mass, globally move. If we translate that into mathematical terms, the reaction of fishes to the mass is described to the Hamilton-Jacobi-Bellman equation. On the other hand, the aggregation of the actions of the fishes which determines the motion of the mass corresponds to the Fokker-Planck-Kolmogorov equation. Mean-field game theory is the combination of these two equations.

From the DRL standpoint, MFG plays an interesting role in large-scale environments with a large number of agents. Until now, DRL methods have proven impractical in environments with near infinite number of agents given that they require to operate with inexact probabilistic models. MFG is an interesting approach to model those DRL environments. AI research startup Prowler recently did some work evaluating MFG in large, multi-agent DRL environments.

Stochastic Games

Stochastic games date back to the 1950s and were introduced by Nobel-prize winner economist Lloyd Shapley. Conceptually, stochastic games are played by a finite number of players on a finite state space, and in each state, each player chooses one of finitely many actions; the resulting profile of actions determines a reward for each player and a probability distribution on successor states.

A classic form of stochastic games is the dinning philosophers problem in which there are n + 1 philosophers (n ≥ 1) sitting at a round table with a bowl of rice in the middle. Between any two philosophers who sit next to each other lies a chopstick, which can be accessed by both of them. Since the table is round, there are as many chopsticks as there are philosophers;. To eat from the bowl, a philosopher needs to acquire both of the chopsticks he has access to. Hence, if one philosopher eats, then his two neighbors cannot eat at the same time. The life of a philosopher is rather simple and consists of thinking and eating; to survive, a philosopher needs to think and eat again and again. The task is to design a protocol that allows all of the philosophers to survive.

Stochastic games are already being used in DRL solutions related to multi-player games. In many multi-player environments, teams of AI agents need to evaluate how to collaborate with and compete against each other in order to maximize the positive outcomes. This is often known as the exploration-exploitation dilemma. Building stochastic games dynamics into DRL agents is an efficient way to balance the exploration and exploitation capabilities of DRL agents. DeepMind’s work mastering Quake III incorporates some of these stochastic game concepts.

Evolutionary Games

Evolutionary Game Theory(EGT) draws inspiration from the Darwinian theory of evolution. The origins of EGT can be traced back to 1973 with John Maynard Smith and George R. Price’s formalization of contests, analyzed as strategies, and the mathematical criteria that can be used to predict the results of competing strategies. Conceptually, EGT is the application of game theory concepts to situations in which a population of agents with diverse strategies interact over time to create a stable solution, through an evolutionary process of selection and duplication. The main idea behind EGT is that many behaviors involve the interaction of multiple agents in a population, and the success of any one of these agents depends on how its strategy interacts with that of others. While classic game theory has been focused on static strategies, that is to say, strategies that do not change over time, evolutionary game theory differs from classical game theory in focusing on how strategies evolve over time and which kind of dynamic strategies are most successful in this evolutionary process.

A classic example of EGT is the Hawk Dove Game that models a contest between a hawk and a dove over a shareable resource. In the game, each contestant follows exactly one of two strategies described below:

· Hawk: Initiate aggressive behavior, not stopping until injured or until one’s opponent backs down.

· Dove: Retreat immediately if one’s opponent initiates aggressive behavior.

If we assume that (1) whenever two individuals both initiate aggressive behaviour, conflict eventually results and the two individuals are equally likely to be injured, (2) the cost of the conflict reduces individual fitness by some constant value C, (3) when a Hawk meets a Dove, the Dove immediately retreats and the Hawk obtains the resource, and (4) when two Doves meet the resource is shared equally between them, the fitness payoffs for the Hawk-Dove game can be summarized according to the following matrix:

EGT seems to have been designed with DRL environments in mind. In multi-agent DRL environments, agents regularly modify their strategy by interacting with each others. EGT is an effective way to model those interactions. Recently, OpenAI showed some of those dynamics training agents to play a game of hide-and-seek.