TensorLayer Team Released Reinforcement Learning Algorithm Baseline — RLzoo

Source: Deep Learning on Medium

TensorLayer Team Released Reinforcement Learning Algorithm Baseline-RLzoo

Reinforcement learning optimizes the policy of the agent through the reward function, and deep reinforcement learning applies the deep neural networks in the reinforcement learning algorithm. Deep reinforcement learning has attracted more and more attentions from the research community and industry due to its scalability. Its applications include simple image-based Atari games, and highly sophisticated games such as StarCraft, as well as chess games such as Go and Texas Hold’em, etc. It is also gradually adopted by researchers in the field of robot control.

Recently, in order to enable the industry to better use the cutting-edge reinforcement learning algorithms, the TensorLayer Reinforcement Learning Team has released a complete library of reinforcement learning baseline algorithms for the industry — RLzoo. TensorLayer is an extended library based on TensorFlow for better supports of basic neural network construction and diverse neural network applications. The RLzoo project is the first comprehensive open source algorithm library with TensorLayer 2.0 and TensorFlow 2.0 since the release of TensorFlow 2.0. The library currently supports OpenAI Gym, DeepMind Control Suite and other large-scale simulation environments, such as the robotic learning environment RLBench, etc.

Figure: supported environments in RLzoo.

It is worth noting that the project is a subsequent open source project after the TensorLayer reinforcement learning team released the academic version of the reinforcement learning algorithm library, and this open source project is oriented to industrial needs. The previous academic tutorial library demonstrated a major reinforcement learning algorithm implementation in a compact and clear structure, and was light and fast to adapt for new learning environments; this time the RLzoo repository is structured to enable large-scale benchmark tests. It’s much simpler for usage, and it takes only a few lines of code to use very complex algorithms, making it easier for researchers and engineers to propose and test algorithms.

Figure: common interface of RLzoo (example: Soft Actor-Critic algorithm for CartPole-v0 environment).

The contributors are from Imperial College London, Peking University and Chinese Academy of Sciences, including Zihan Ding, Hao Dong, Tianyang Yu, Yanhua Huang and Hongming Zhang.

Link of RLzoo: https://github.com/tensorlayer/RLzoo

Link of RL tutorials: https://github.com/tensorlayer/tensorlayer/tree/master/examples/reinforcement_learning

Slack group: https://app.slack.com/client/T5SHUUKNJ/D5SJDERU7

Currently, TensorLayer 2.0 is an open source library based on TensorFlow 2.0, which supports high-level APIs for the packaging of neural network layers and various applications, and will support more fundamental computing engines in the future. TensorLayer 2.0’s Eager execution mode and the cancellation of the Session make the neural network construction process more flexible and simple, while TensorLayer 2.0 supports both static and dynamic network construction, supporting the entire development workflow to adapt to diverse research and industrial projects. Based on TensorLayer, RLzoo implements a basic policy and value network to support a variety of widely used reinforcement learning algorithms. In the common function provided by RLzoo, the policy network and the value network can adaptively adjust the network input and output ports according to the dimensions and types of the input state space and the action space, and can be more conveniently deployed to various environments for training. For example, for image input in Atari games, the network provided by RLzoo automatically selects the convolutional neural network module for preprocessing, extracting low-dimensional features as input to subsequent networks. At the same time, for various discrete or continuous action outputs, RLzoo can also automatically select the corresponding output port, for example, for the stochastic policy for continuous action, RLzoo provides the diagonal Gaussian distribution, for discrete action stochastic policy, RLzoo provides the categorical distribution, etc.

The RLzoo open source project aims to facilitate users to flexibly configure complex reinforcement learning algorithms. The learning parameters, neural network structures and optimizers of various algorithms can be easily selected and replaced to maximize the convenience of academic and industrial usage. TensorLayer provides RLzoo in a flexible way to build networks, making it easier to implement reinforcement learning algorithms. In the future, the RLzoo team will open source testing benchmarks and selected optimal parameters for existing reinforcement learning algorithms in various learning environments, to provide a more fair algorithm comparison.

The open source RLzoo includes algorithms Deep Q-Network (DQN), Double DQN, Dueling DQN, Prioritized Experience Replay (PER), Policy Gradient (PG), Actor-Critic (AC), Asynchronous Advantage Actor-Critic (A3C), Deep Deterministic Policy Gradient (DDPG), Twin Delayed DDPG (TD3), Soft Actor-Critic (SAC), Proximal Policy Optimization (PPO), Distributed PPO (DPPO), Trust Region Policy Optimization (TRPO). The team will continue to update various new algorithms, as well as the learning environments, and welcome your feedbacks and contributions.

In this open source benchmark library RLzoo, the supported environments include OpenAI Gym (Atari, Classic Control, Box2D, Mujoco, Robotics), Deepmind Control Suite, RLBench. The main algorithms and environment are as follows:

Figure: descriptions for algorithms and environments supported in RLzoo.