The second edition of my Deep RL book

Source: Deep Learning on Medium

The second edition of my Deep RL book

Book “Deep Reinforcement Learning Hands-On” was published June 2018 and got a warm welcome (56 ratings on Amazon, 4.3 out of stars, code repository on github has 1.2K stars):

Half a year ago I started working on a second edition of the book and finally, two weeks ago the book was published:

In this post I’m going to give a quick overview of what has been changed since the first edition.

There are two classes of changes:

  • 6 new chapters with new practical examples and extra Deep RL topics covered.
  • Fixes of the found mistakes, software version update and other minor tweaks.

Let’s start with the new content.

The second edition is +54% longer (785 pages versus 510) which were mostly caused by new chapters:

  • Chapter 9: Ways to speed up RL. Covers engineering ways to make RL code faster. When you’re doing lots of RL training, performance becomes an important influencer on the final result. One important aspect of performance is efficiency of the code and efficient usage of computing resources. This chapter takes DQN code solving Pong in 1.5 hours and shows different ways to make the code faster (without tweaking underlying RL method). The final result gets the same score only after 31 minutes of training.
  • Chapter 15: The TextWorld Environment. Chapter describes TextWorld environment from Microsoft Research which brings RL to interactive fiction games. This chapter extends the previous chapter “Training chatbots with RL” in respect of NLP usage for RL agents.
  • Chapter 18: RL in Robotics. This chapter continues the topic of continuous control started in chapter 17. But here, I’m trying to step beyond the emulated environments into the physical world. To give you a feeling of “real hardware project”, I’m describing step-by-step process of building simple four-legged robot based on cheap microcontroller platform, bunch of sensors and small servo motors. Controlling policy will be trained in emulator, but then we’ll transfer it to the real hardware. The total cost of the platform is less than $100, which makes it affordable for enthusiasts and researchers. No soldering skills are required, which is also a plus :).
  • Chapter 21: Advanced exploration. In this chapter we’ll talk about the importance of efficient exploration and cover recent findings in this area. In most of the book, simple epsilon-greedy exploration approach was used, which is not very efficient in complex environments. In the chapter, we’ll use Mountain Car, which is an old, but still quite challenging “classical control” problem.
  • Chapter 24: RL in Discrete optimisation. Another new direction of RL application, which is showcased on Rubik’s cube, which is well known, but still quite non-trivial example of discrete optimisation problem.
  • Chapter 25: Multi-agent RL. This chapter gives an overview of recent branch of RL, dealing with situations when multiple agents are communicating with each other. There is multitude of new applications in this setup (robot swarms, energy grid optimisation, board games, etc). Here we just introduce the specifics of MARL using MAgent — very efficient and flexible multi-agent environment.

Besides the new chapters, which extend 18 chapters of the first edition, the multiple improvements were made in the new edition:

  • reported mistakes (not that many) were fixed
  • plots are updated to make them more readable in the printed version
  • code snippets formatting were fixed to make them prettier (in the printed book the max length of one code line is ~70 characters, which needs to be taken into account to avoid ugly line wraps)
  • all the examples were updated to work on PyTorch 1.3 and the latest OpenAI Gym. To make code examples smaller, PyTorch Ignite was used in ~1/2 of examples in the book.

As before, the book starts with simple and gentle introduction into RL problems using simple environments, so, prerequisites are Python familiarity and basics of Machine Learning. But, I believe, even experienced RL researchers and practitioners will be able to find something new and interesting.

Code examples are very important part of the book, there are 35k lines of python code in more than 300 files, illustrating very wide variety of RL and Deep RL methods both value-based (Value iteration, Q learning, DQN, Categorical DQN and other extensions), policy-based (Cross-Entropy Method, REINFORCE, A2C, A3C, DDPG, D4PG, PPO, TRPO and SAC) and non-gradient methods (Evolution Strategies and Genetic Algorithm). Besides methods illustrations, there are several medium-sized projects, showing how RL could be applied in practical tasks: Stocks Trading, Chatbots, Interactive Fiction games, Web Navigation, board games play etc. All the examples are freely available on GitHub:

Thanks for reading this! Hope you’ll enjoy the book the same way I’ve enjoyed writing it!

Amazon link, once again: