The Ultimate Meta-Learning Challenge: OpenAI Teaches Agents to Learn Concepts from Experience

Source: Deep Learning on Medium


The ability to generalize concepts from experience is one of the foundational hallmarks of human intelligence. Since we are babies, we are constantly formulating new concepts based on limited experiences which, over time, become the foundation of our knowledge across different subjects. Many of the mysteries of human intelligence such as creative problem solving, creativity or abstract reasoning has their roots in abstract concept formulation. Is it possible to recreate this neuroscientific miracle in artificial intelligence(AI) agents? Recently, researchers from OpenAI published a paper proposing a technique for concept learning based on a deep learning method known as energy functions.

Imagine a baby getting his first glimpse of visual concepts such as “red” or “square” , spatial concepts like “slow” or “before” or social concepts like “sad” or “aggressive”. In any given environment, we have the ability of identify, create new concepts or infinitely compose existing concepts in new ones. A “brown, wooden, table” is a concept that exhibits visual, spatial, physical color concepts. From the cognitive standpoint, composability allow us to rapidly generalize new concepts from existing ones making learning a continuous activity. To simulate concept-learning in AI systems, we must transition from models that acquire knowledge statically during training to models in which knowledge is created throughout the lifecycle of an agent. From that perspective, OpenAI formulates the problem of identifying and creating concepts as an optimization, meta-learning operation performed during the execution lifetime of an agent. To address that challenge, OpenAI researchers resulted to a not-very-well-known generative method known as energy functions.

Energy Functions for Concept Learning

Energy-Based Models are a class of deep learning algorithms that focus on capturing dependencies between variables by associating a scalar energy to each configuration of the variables. Inference, i.e., making a prediction or decision, consists in setting the value of observed variables and finding values of the remaining variables that minimize the energy. Learning consists in finding an energy function that associates low energies to correct values of the remaining variables, and higher energies to incorrect values.

Energy functions work by encoding a preference over states of the world, which allows an agent with different available actions to learn a policy that works in different contexts. From a knowledge standpoint, the real time policy learning capability of energy functions is the foundation of a conceptual understanding or simple things. Leveraging this idea, OpenAI formulates an energy function for each concept as a combination of the following parameters:

  • The state of the world the model observes (x)
  • An attention mask (a) over entities in that state.
  • A continuous-valued vector (w), used as conditioning, that specifies the concept for which energy is being calculated

In the OpenAI model, states of the world describe entities and spatial positions. Attention masks are used to represent the model’s focus on specific entities. The valued vector is used to determine whether a specific concept is satisfied. The energy model outputs a single positive number indicating whether the concept is satisfied (when energy is zero) or not (when energy is high).

Let’s illustrate these concepts in the following examples that tries to generate and identify the “square” concept in a given spatial environment. Given the initial state X0 and attention mast a, square consisting of entities in a is formed via optimization over X1. Similarly, given states x, entities comprising a square are found by optimization over attention mask a.

One of the top contributions of the energy functions model is that it allows the identification and generation of concepts using a single network. After an initial concept inference, the network tries to either identify a learned concept or generate a new one.

Learning Concepts in the Real World

OpenAI tested energy functions models in a series of lab environments designed to determine how well an agents could learn or generate solid concepts like quantity(one, two, tree…), proximity(far, close), spatial relationships( triangle, square), spatial placement(north, south) and many others. For instance, the following example starts with demonstrations that contain entities of a particular color either joining, or forming a line, triangle, or square shapes. Taking that data as an input, energy functions are used to generate similar spatial relationships over novel arrangements.

One of the most fascinating results about the OpenAI experiments was that concepts can be transferred between environments. The following video shows a simulated robot that learns how to move its arm between two points using proximity concepts learned in a 2D environment.

Concept generation and identification is one of the hallmarks of human intelligence. Given how little we understand this ideas from a neuroscientific standpoint, recreating concept learning in AI system is nothing short of a paramount challenge. Despite the simplicity of the tests, it is encouraging to see OpenAI making inroads in this area.