Original article was published by Prabhat Nagarajan on Deep Learning on Medium
Users can customize their own training loops and environments by querying the agent’s
act method and executing it in the environment. However, users are recommended to use PFRL’s easily extensible
experiments module that handles the Agent-Environment interactions for PFRL agents following the OpenAI Gym
Env API (the API which most popular deep RL environments follow). The
experiments module provides a number of standard modes of training, evaluation, and logging of experiments. Some helpful features include:
- Tracking agent performance statistics
- Scheduling evaluations and allowing users to specify separate training and evaluation environments
- Managing model saving
experiments training function takes as input an agent and an OpenAI Gym environment (or batch of environments), queries the agent for an action (or actions), and executes it (or them) in the environment(s). Essentially these experimental utilities simplify the management of training/evaluation loops and environment interactions for different types of agents (i.e.
AsyncAgent) without additional effort from the user.
Agent implementation specifies the learning update rules of an algorithm, it is important to note that the user has a large amount of flexibility in parametrizing an agent with PFRL’s many building blocks, shown in light orange in the diagram at the beginning of this section. There are several ways in which the parametrization can be modified to suit the user’s individual needs. Some examples include:
- Explorers: Users can parametrize their agents with one of several explorers, such as ε-greedy exploration or Boltzmann exploration.
- Network architectures: PFRL supports any PyTorch
Module, which can be chosen by the user and passed to the agent. PFRL also has several pre-defined architectures (i.e. PyTorch networks) that are useful for RL, such as dueling network architectures and certain recurrent architectures. PFRL also supports Noisy networks. Users can easily make a network use noisy exploration (see the Rainbow example below).
- Replay Buffers: If supported by the agent, users can pass a prioritized replay buffer to the agent to use Prioritized Experience Replay. Users can also use N-step returns in their agent by passing an N-step replay buffer to the agent (see the Rainbow example below).
PFRL users have the ability to choose amongst the library’s several algorithms and combine them with a multitude of features, which is key for modern deep RL research. For users developing a new algorithm as an
Agent, it is advisable to restrict the
Agent class implementation to contain the algorithm’s unique update rules and action-selection procedure. The remainder of the implementation can exist outside of the agent, allowing for flexible parametrization.
One example that highlights the flexibility of PFRL agents is our implementation of Rainbow. Rainbow is an algorithm developed for Atari games that combines six independent improvements into a single agent: CategoricalDQN + Double updates + Dueling Architecture + Noisy networks + Prioritized Experience Replay + N-step target updates.
In PFRL, this is implemented quite simply as a combination of an Agent with different parametrizations in about a dozen lines.
First, we create a Distributional Q-function with a Dueling architecture (+ Dueling).
Second, we convert the network into a noisy network (+ Dueling + Noisy networks).
Third, we initialize a Prioritized replay buffer with 3-step transitions for 3-step updates (+ Dueling + Noisy networks + Prioritized Experience Replay + N-step target updates).
Finally, we pass these parameters into a CategoricalDoubleDQN agent, which is a CategoricalDQN agent that performs double updates (CategoricalDQN + Double updates + Dueling Architecture + Noisy networks + Prioritized Experience Replay + N-step target updates).
For the full script that reproduces the Rainbow paper’s results, see here.
PFN is co-organizing the 2020 NeurIPS MineRL competition with an organizing committee consisting of members from CMU, AICrowd, OpenAI, Deepmind, and Microsoft Research. In this competition, users need to develop a sample-efficient RL agent to obtain a diamond in Minecraft in limited training time while leveraging human demonstrations. To help competitors get started, PFN will be providing the following baseline agents implemented in PFRL: Rainbow, SQIL, DQfD, and Prioritized Dueling Double DQN.
Looking forward, we have several exciting features and algorithms planned for PFRL, including Hindsight Experience Replay, Munchausen DQN, and a large zoo of pretrained models for our 9 reproduced algorithms. We hope that users find PFRL to be useful, and we look forward to community contributions to PFRL!
For more information on PFRL, check out the following links: