Resources for Deep Reinforcement Learning

This is a collection of resources for deep reinforcement learning, including the following sections: Books, Surveys and Reports, Courses, Tutorials and Talks, Conferences, Journals and Workshops, Blogs, and, Benchmarks and Testbeds. This blog is very long, with lots of resources.

There are excellent invited talks, tutorials, workshops in recent conferences, like NIPS, ICML, ICLR, ACL, CVPR, AAAI, IJCAI, etc. Many of them are not included here.

If pick three study materials:

Pick three survey papers:

  • LeCun, Y., Bengio, Y., and Hinton, G. (2015). Deep learning. Nature, 521:436–444.
  • Jordan, M. I. and Mitchell, T. (2015). Machine learning: Trends, perspectives, and prospects. Science, 349(6245):255–260.
  • Littman, M. L. (2015). Reinforcement learning improves behaviour from evaluative feedback. Nature, 521:445–451.

This blog is based on Deep Reinforcement Learning: An Overview, with updates. These resources are about reinforcement learning core elements, important mechanisms, and applications, as in the overview, also include topics for deep learning, reinforcement learning, machine learning, and, AI. I am actively updating the overview, and plan to publish it at the end of January 2019. I compile this blog, for flexible updates, and will present a small number of resources in the updated overview.

Please stay tuned.


Reinforcement Learning:

  • Sutton, R. S. and Barto, A. G. (2018). Reinforcement Learning: An Introduction (2nd Edition). MIT Press. The definitive and intuitive reinforcement learning book. Accompanying Python code.
  • Szepesvári, C. (2010). Algorithms for Reinforcement Learning. Morgan & Claypool.
  • Bertsekas, D. P. (2012). Dynamic programming and optimal control (Vol. II, 4th Edition: Approximate Dynamic Programming). Athena Scientific.
  • Bertsekas, D. P. and Tsitsiklis, J. N. (1996). Neuro-Dynamic Programming. Athena Scientific.
  • Powell, W. B. (2011). Approximate Dynamic Programming: Solving the curses of dimensionality (2nd Edition). John Wiley and Sons.
  • Wiering, M. and van Otterlo, M., editors (2012). Reinforcement Learning: State-of-the-Art. Springer.
  • Puterman, M. L. (2005). Markov decision processes : discrete stochastic dynamic programming. Wiley-Interscience.
  • Lattimore, T. and Szepesvári, C. (2018). Bandit Algorithms. Cambridge University Press.

Deep Learning

  • Goodfellow, I., Bengio, Y., and Courville, A. (2016). Deep Learning. MIT Press.

Machine Learning

  • Bishop, C. (2011). Pattern Recognition and Machine Learning. Springer.
  • Hastie, T., Tibshirani, R., and Friedman, J. (2009). The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer.
  • Murphy, K. P. (2012). Machine Learning: A Probabilistic Perspective. The MIT Press.
  • Zhou, Z.-H. (2016). Machine Learning (in Chinese). Tsinghua University Press, Beijing, China.
  • Mitchell, T. (1997). Machine Learning. McGraw Hill.
  • James, G., Witten, D., Hastie, T., and Tibshirani, R. (2013). An Introduction to Statistical Learning with Applications in R. Springer.
  • Kuhn, M. and Johnson, K. (2013). Applied Predictive Modeling. Springer.
  • Provost, F. and Fawcett, T. (2013). Data Science for Business. O’Reilly.
  • Simeone, O. (2017). A Brief Introduction to Machine Learning for Engineers. ArXiv.
  • Vapnik, V. N. (1998). Statistical Learning Theory. Wiley.
  • Haykin, S. (2008). Neural Networks and Learning Machines (third edition). Prentice Hall.

Natural Language Processing (NLP)

  • Jurafsky, D. and Martin, J. H. (2017). Speech and Language Processing (3rd ed. draft). Prentice Hall.
  • Goldberg, Y. (2017). Neural Network Methods for Natural Language Processing. Morgan & Claypool.
  • Deng, L. and Liu, Y., editors (2018). Deep Learning in Natural Language Processing. Springer.

Semi-supervised Learning

  • Zhu, X. and Goldberg, A. B. (2009). Introduction to semi-supervised learning. Morgan & Claypool.

Continual Learning

  • Chen, Z. and Liu, B. (2016). Lifelong Machine Learning. Morgan & Claypool.

Game Theory

  • Leyton-Brown, K. and Shoham, Y. (2008). Essentials of Game Theory: A Concise, Multidisciplinary Introduction. Morgan & Claypool.


  • Hull, J. C., Options, Futures and Other Derivatives, Prentice Hall.


  • Bazzan, A. L. and Klügl, F. (2014). Introduction to Intelligent Systems in Traffic and Transportation. Morgan & Claypool

Artificial Intelligence

  • Russell, S. and Norvig, P. (2009). Artificial Intelligence: A Modern Approach (3rd edition). Pearson.

Surveys and Reports

Reinforcement Learning

  • Littman, M. L. (2015). Reinforcement learning improves behaviour from evaluative feedback. Nature, 521:445–451.
  • Kaelbling, L. P., Littman, M. L., and Moore, A. (1996). Reinforcement learning: A survey. JAIR, 4:237–285.
  • Li, Y. (2017). Deep Reinforcement Learning: An Overview. ArXiv.
  • Levine, S. (2018). Reinforcement Learning and Control as Probabilistic Inference: Tutorial and Review. ArXiv.
  • Recht, B. (2018). A Tour of Reinforcement Learning: The View from Continuous Control. ArXiv.
  • Geramifard, A., Walsh, T. J., Tellex, S., Chowdhary, G., Roy, N., and How, J. P. (2013). A tutorial on linear function approximators for dynamic programming and reinforcement learning. Foundations and Trends® in Machine Learning, 6(4):375–451.
  • Grondman, I., Busoniu, L., Lopes, G. A., and Babuška, R. (2012). A survey of actor-critic reinforcement learning: Standard and natural policy gradients. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 42(6):1291–1307.
  • Roijers, D. M., Vamplew, P., Whiteson, S., and Dazeley, R. (2013). A survey of multi-objective sequential decision-making. JAIR, 48:67–113.

Deep Learning

  • LeCun, Y., Bengio, Y., and Hinton, G. (2015). Deep learning. Nature, 521:436–444.
  • Poggio, T., Mhaskar, H., Rosasco, L., Miranda, B., and Liao, Q. (2017). Why and when can deep-but not shallow-networks avoid the curse of dimensionality: a review. International Journal of Automation and Computing, 14(5):503–519.
  • Bengio, Y., Courville, A., and Vincent, P. (2013). Representation learning: A review and new perspectives. TPAMI, 35(8):1798–1828.
  • Bengio, Y. (2009). Learning deep architectures for AI. Foundations and trends®in Machine Learning, 2(1):1–127.
  • Deng, L. and Dong, Y. (2014). Deep learning: Methods and applications. Foundations and Trends® in Signal Processing, 7(3–4):197–387.
  • Schmidhuber, J. (2015). Deep learning in neural networks: An overview. Neural Networks, 61:85–117.
  • Wang, H. and Raj, B. (2017). On the Origin of Deep Learning. ArXiv.
  • Sze, V., Chen, Y.-H., Yang, T.-J., and Emer, J. (2017). Efficient Processing of Deep Neural Networks: A Tutorial and Survey. ArXiv.

Machine Learning

  • Jordan, M. I. and Mitchell, T. (2015). Machine learning: Trends, perspectives, and prospects. Science, 349(6245):255–260.
  • Domingos, P. (2012). A few useful things to know about machine learning. Communications of the ACM, 55(10):78–87.
  • Bottou, L., Curtis, F. E., and Nocedal, J. (2018). Optimization methods for large-scale machine learning. SIAM Review, 60(2):223–311.
  • Ng, A. (2018). Machine Learning Yearning (draft).
  • Zinkevich, M. (2017). Rules of Machine Learning: Best Practices for ML Engineering.
  • Smith, L. N. (2017). Best Practices for Applying Deep Learning to Novel Applications. ArXiv.
  • Andrieu, C., de Freitas, N., Doucet, A., and Jordan, M. I. (2003). An introduction to MCMC for machine learning. Machine Learning, 50(1–2):5–43.


  • Li, L. (2012). Sample complexity bounds of exploration. In Wiering, M. and van Otterlo, M., editors, Reinforcement Learning: State-of-the-Art, pages 175–204. Springer-Verlag Berlin Heidelberg.

Transfer Learning

  • Taylor, M. E. and Stone, P. (2009). Transfer learning for reinforcement learning domains: A survey. JMLR, 10:1633–1685.
  • Pan, S. J. and Yang, Q. (2010). A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering, 22(10):1345–1359.
  • Weiss, K., Khoshgoftaar, T. M., and Wang, D. (2016). A survey of transfer learning. Journal of Big Data, 3(9).

Multi-task Learning

  • Zhang, Y., , and Yang, Q. (2018). An overview of multi-task learning. National Science Review, 5:30–43.
  • Ruder, S. (2017). An Overview of Multi-Task Learning in Deep Neural Networks. ArXiv.

Neural Architecture Search

  • Elsken, T., Hendrik Metzen, J., and Hutter, F. (2018). Neural Architecture Search: A Survey. ArXiv.

Successor Representation

  • Gershman, S. J. (2018). The successor representation: Its computational logic and neural substrates. Journal of Neuroscience, 38(33):7193–7200.

Bayesian RL

  • Ghavamzadeh, M., Mannor, S., Pineau, J., and Tamar, A. (2015). Bayesian reinforcement learning: a survey. Foundations and Trends in Machine Learning, 8(5–6):359–483.

Monte Carlo tree search (MCTS)

  • Browne, C., Powley, E., Whitehouse, D., Lucas, S., Cowling, P. I., Rohlfshagen, P., Tavener, S., Perez, D., Samothrakis, S., and Colton, S. (2012). A survey of Monte Carlo tree search methods. IEEE Transactions on Computational Intelligence and AI in Games, 4(1):1–43.
  • Gelly, S., Schoenauer, M., Sebag, M., Teytaud, O., Kocsis, L., Silver, D., and Szepesvári, C. (2012). The grand challenge of computer go: Monte carlo tree search and extensions. Communications of the ACM, 55(3):106–113.

Attention and Memory

Intrinsic Motivation

  • Barto, A. (2013). Intrinsic motivation and reinforcement learning. In Baldassarre, G. and Mirolli, M., editors, Intrinsically Motivated Learning in Natural and Artificial Systems. Springer, Berlin, Heidelberg.
  • Schmidhuber, J. (2010). Formal theory of creativity, fun, and intrinsic motivation (1990–2010). IEEE Transactions on Autonomous Mental Development, 2(3):230–247.
  • Oudeyer, P.-Y. and Kaplan, F. (2007). What is intrinsic motivation? a typology of computational approaches. Frontiers in neurorobotics, 1(6).

Evolution Strategy

  • Hansen, N. (2016). The CMA Evolution Strategy: A Tutorial. ArXiv.


  • Kober, J., Bagnell, J. A., and Peters, J. (2013). Reinforcement learning in robotics: A survey. International Journal of Robotics Research, 32(11):1238–1278.
  • Deisenroth, M. P., Neumann, G., and Peters, J. (2013). A survey on policy search for robotics. Foundations and Trend in Robotics, 2:1–142.
  • Argall, B. D., Chernova, S., Veloso, M., and Browning, B. (2009). A survey of robot learning from demonstration.Robotics and Autonomous Systems, 57(5):469–483.

Natural Language Processing (NLP)

  • Hirschberg, J. and Manning, C. D. (2015). Advances in natural language processing. Science, 349(6245):261–266.
  • Cho, K. (2015). Natural Language Understanding with Distributed Representation. ArXiv.
  • Young, T., Hazarika, D., Poria, S., and Cambria, E. (2017). Recent Trends in Deep Learning Based Natural Language Processing. ArXiv.

Dialogue Systems

  • Hinton, G., Deng, L., Yu, D., Dahl, G. E., rahman Mohamed, A., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T. N., , and Kingsbury, B. (2012). Deep neural networks for acoustic modeling in speech recognition. IEEE Signal Processing Magazine, 82.
  • Deng, L. and Li, X. (2013). Machine learning paradigms for speech recognition: An overview. IEEE Transac- tions on Audio, Speech, and Language Processing, 21(5):1060–1089.
  • Gao, J., Galley, M., and Li, L. (2018). Neural approaches to Conversational AI. Foundations and Trends in Information Retrieval. To appear.
  • He, X. and Deng, L. (2013). Speech-centric information processing: An optimization-oriented approach. Proceedings of the IEEE | Vol. 101, №5, May 2013, 101(5):1116–1135.
  • Young, S., Gašić, M., Thomson, B., and Williams, J. D. (2013). POMDP-based statistical spoken dialogue systems: a review. Proceedings of IEEE, 101(5):1160–1179.

Computer Vision

  • Zhang, Q. and Zhu, S.-C. (2018). Visual interpretability for deep learning: a survey. Frontiers of Information Technology & Electronic Engineering, 19(1):27–39.
  • Bohg, J., Hausman, K., Sankaran, B., Brock, O., Kragic, D., Schaal, S., and Sukhatme, G. S. (2017). Interactive perception: Leveraging action in perception and perception in action. IEEE Transactions on Robotics, 33(6):1273–1291.


  • Chakraborty, B. and Murphy, S. A. (2014). Dynamic treatment regimes. Annual Review of Statistics and Its Application, 1:447–464.


  • Anderson, R. N., Boulanger, A., Powell, W. B., and Scott, W. (2011). Adaptive stochastic control for the smart grid. Proceedings of the IEEE, 99(6):1098–1115.

Collection of Applications

AI Safety

  • Amodei, D., Olah, C., Steinhardt, J., Christiano, P., Schulman, J., and Mané, D. (2016). Concrete Problems in AI Safety. ArXiv.
  • Garcìa, J. and Fernàndez, F. (2015). A comprehensive survey on safe reinforcement learning. JMLR, 16:1437–1480.


Reinforcement Learning

Deep Learning

Machine Learning


Computer Vision




Tutorials and Talks

Reinforcement Learning

Deep Learning


Computer Vision


Fiance & Economics




Conferences, Journals and Workshops

  • NIPS: Neural Information Processing Systems
  • ICML: International Conference on Machine Learning
  • ICLR: International Conference on Learning Representation
  • RLDM: Multidisciplinary Conference on Reinforcement Learning and Decision Making
  • EWRL: European Workshop on Reinforcement Learning
  • Deep Reinforcement Learning Workshop, NIPS 2018, 2017 (Symposium), 2016, 2015; IJCAI 2016
  • AI Frontiers Conference
  • Nature Machine Intelligence, Science Robotics
  • Nature May 2015, Science July 2015, survey papers on machine learning/AI
  • Science, July 7, 2017 issue, The Cyberscientist, a special issue about AI


Benchmarks and Testbeds

I list some RL testbeds in the following. Common testbeds for general RL algorithms are Atari games, e.g., in the Arcade Learning Environment (ALE), for discrete control, and simulated robots, e.g. using MuJoCo in OpenAI Gym, for continuous control.

  • The Arcade Learning Environment (ALE) is a framework composed of Atari 2600 games to develop and evaluate AI agents.
  • OpenAI Gym is a toolkit for the development of RL algorithms, consisting of environments, e.g., Atari games and simulated robots, and a site for the comparison and reproduction of results. OpenAI Gym has the following environments: algorithmic, Atari, xox2d, classic control, MuJoCo, robotics, and, toy text.
  • MuJoCo, Multi-Joint dynamics with Contact, a physics engine.
  • DeepMind Control Suite
  • DeepMind Lab, DeepMind first-person 3D game platform
  • Deepmind PySC2 — StarCraft II Learning Environment
  • Dopamine, a Tensorflow-based RL framework from Google AI.
  • David Churchill, CommandCenter: StarCraft 2 AI Bot
  • ELF, an extensive, lightweight and flexible platform for RL research,
    ELF OpenGo: A Reimplementation of AlphaGoZero/AlphaZero using ELF.
  • FAIR TorchCraft is a library for Real-Time Strategy (RTS) games such as StarCraft: Brood War.
  • Ray RLlib: A Composable and Scalable Reinforcement Learning Library
  • ParlAI is a framework for dialogue research, implemented in Python, open-sourced by Facebook.
  • Natural language decathlon (decaNLP), an NLP benchmark suitable for multitask, transfer, and continual learning.
  • Project Malmo, from Microsoft, is an AI research and experimentation platform built on top of Minecraft.
  • Twitter open-sources torch-twrl, a framework for RL development.
  • ViZDoom is a Doom-based AI research platform for visual RL.
  • Baidu Apollo Project, self-driving open-source
  • TORCS is a car racing simulator.
  • CoQA, a large-scale dataset for building conversational QA systems
  • WebNav Challenge for Wikipedia links navigation
  • Psychlab: A Psychology Laboratory for Deep RL Agents
  • RLGlue is a language-independent software for RL experiments.
  • RLPy is a value-function-based reinforcement learning framework for education and research.

Source: Deep Learning on Medium