This is a collection of resources for deep reinforcement learning, including the following sections: **Books**, **Surveys and Reports**, **Courses**, **Tutorials and Talks**, **Conferences, Journals and Workshops**, **Blogs, **and,** Benchmarks and Testbeds**. This blog is very long, with lots of resources.

There are excellent invited talks, tutorials, workshops in recent conferences, like NIPS, ICML, ICLR, ACL, CVPR, AAAI, IJCAI, etc. Many of them are not included here.

If pick three study materials:

- David Silver, Reinforcement Learning, 2015. Slides. Video.
- Sergey Levine, UC Berkeley CS 294: Deep Reinforcement Learning
- Sutton, R. S. and Barto, A. G. (2018).
*Reinforcement Learning: An Introduction**(2nd Edition)*. MIT Press.

Pick three survey papers:

- LeCun, Y., Bengio, Y., and Hinton, G. (2015). Deep learning.
*Nature*, 521:436–444. - Jordan, M. I. and Mitchell, T. (2015). Machine learning: Trends, perspectives, and prospects.
*Science*, 349(6245):255–260. - Littman, M. L. (2015). Reinforcement learning improves behaviour from evaluative feedback.
*Nature*, 521:445–451.

This blog is based on **Deep Reinforcement Learning: An Overview**, with updates. These resources are about reinforcement learning core elements, important mechanisms, and applications, as in the overview, also include topics for deep learning, reinforcement learning, machine learning, and, AI. I am actively updating the overview, and plan to publish it at the end of January 2019. I compile this blog, for flexible updates, and will present a small number of resources in the updated overview.

Please stay tuned.

#### Books

Reinforcement Learning:

- Sutton, R. S. and Barto, A. G. (2018).
*Reinforcement Learning: An Introduction (2nd Edition)*. MIT Press. The definitive and intuitive reinforcement learning book. Accompanying Python code. - Szepesvári, C. (2010). A
*lgorithms for Reinforcement Learning.*Morgan & Claypool. - Bertsekas, D. P. (2012).
*Dynamic programming and optimal control (Vol. II, 4th Edition: Approximate Dynamic Programming)*. Athena Scientific. - Bertsekas, D. P. and Tsitsiklis, J. N. (1996).
*Neuro-Dynamic Programming*. Athena Scientific. - Powell, W. B. (2011).
*Approximate Dynamic Programming: Solving the curses of dimensionality (2nd Edition)*. John Wiley and Sons. - Wiering, M. and van Otterlo, M., editors (2012).
*Reinforcement Learning: State-of-the-Art*. Springer. - Puterman, M. L. (2005).
*Markov decision processes : discrete stochastic dynamic programming*. Wiley-Interscience. - Lattimore, T. and Szepesvári, C. (2018). B
*andit Algorithms.*Cambridge University Press.

Deep Learning

- Goodfellow, I., Bengio, Y., and Courville, A. (2016).
*Deep Learning*. MIT Press.

Machine Learning

- Bishop, C. (2011).
*Pattern Recognition and Machine Learning*. Springer. - Hastie, T., Tibshirani, R., and Friedman, J. (2009).
*The Elements of Statistical Learning: Data Mining, Inference, and Prediction*. Springer. - Murphy, K. P. (2012).
*Machine Learning: A Probabilistic Perspective*. The MIT Press. - Zhou, Z.-H. (2016).
*Machine Learning (in Chinese)*. Tsinghua University Press, Beijing, China. - Mitchell, T. (1997).
*Machine Learning*. McGraw Hill. - James, G., Witten, D., Hastie, T., and Tibshirani, R. (2013).
*An Introduction to Statistical Learning with Applications in R*. Springer. - Kuhn, M. and Johnson, K. (2013).
*Applied Predictive Modeling*. Springer. - Provost, F. and Fawcett, T. (2013).
*Data Science for Business*. O’Reilly. - Simeone, O. (2017). A Brief Introduction to Machine Learning for Engineers.
*ArXiv*. - Vapnik, V. N. (1998).
*Statistical Learning Theory*. Wiley. - Haykin, S. (2008).
*Neural Networks and Learning Machines (third edition)*. Prentice Hall.

Natural Language Processing (NLP)

- Jurafsky, D. and Martin, J. H. (2017).
*Speech and Language Processing (3rd ed. draft)*. Prentice Hall. - Goldberg, Y. (2017).
*Neural Network Methods for Natural Language Processing*. Morgan & Claypool. - Deng, L. and Liu, Y., editors (2018).
*Deep Learning in Natural Language Processing*. Springer.

Semi-supervised Learning

- Zhu, X. and Goldberg, A. B. (2009).
*Introduction to semi-supervised learning*. Morgan & Claypool.

Continual Learning

- Chen, Z. and Liu, B. (2016).
*Lifelong Machine Learning*. Morgan & Claypool.

Game Theory

- Leyton-Brown, K. and Shoham, Y. (2008).
*Essentials of Game Theory: A Concise, Multidisciplinary Introduction*. Morgan & Claypool.

Finance

- Hull, J. C.,
*Options, Futures and Other Derivatives,*Prentice Hall.

Transportation

- Bazzan, A. L. and Klügl, F. (2014). I
*ntroduction to Intelligent Systems in Traffic and Transportation.*Morgan & Claypool

Artificial Intelligence

- Russell, S. and Norvig, P. (2009).
*Artificial Intelligence: A Modern Approach (3rd edition)*. Pearson.

#### Surveys and Reports

Reinforcement Learning

- Littman, M. L. (2015). Reinforcement learning improves behaviour from evaluative feedback.
*Nature*, 521:445–451. - Kaelbling, L. P., Littman, M. L., and Moore, A. (1996). Reinforcement learning: A survey.
*JAIR*, 4:237–285. - Li, Y. (2017). Deep Reinforcement Learning: An Overview.
*ArXiv*. - Levine, S. (2018). Reinforcement Learning and Control as Probabilistic Inference: Tutorial and Review.
*ArXiv*. - Recht, B. (2018). A Tour of Reinforcement Learning: The View from Continuous Control.
*ArXiv*. - Geramifard, A., Walsh, T. J., Tellex, S., Chowdhary, G., Roy, N., and How, J. P. (2013). A tutorial on linear function approximators for dynamic programming and reinforcement learning.
*Foundations and Trends® in Machine Learning*, 6(4):375–451. - Grondman, I., Busoniu, L., Lopes, G. A., and Babuška, R. (2012). A survey of actor-critic reinforcement learning: Standard and natural policy gradients. I
*EEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews),*42(6):1291–1307. - Roijers, D. M., Vamplew, P., Whiteson, S., and Dazeley, R. (2013). A survey of multi-objective sequential decision-making.
*JAIR*, 48:67–113.

Deep Learning

- LeCun, Y., Bengio, Y., and Hinton, G. (2015). Deep learning.
*Nature*, 521:436–444. - Poggio, T., Mhaskar, H., Rosasco, L., Miranda, B., and Liao, Q. (2017). Why and when can deep-but not shallow-networks avoid the curse of dimensionality: a review.
*International Journal of Automation and Computing*, 14(5):503–519. - Bengio, Y., Courville, A., and Vincent, P. (2013). Representation learning: A review and new perspectives.
*TPAMI*, 35(8):1798–1828. - Bengio, Y. (2009). Learning deep architectures for AI.
*Foundations and trends®in Machine Learning*, 2(1):1–127. - Deng, L. and Dong, Y. (2014). Deep learning: Methods and applications.
*Foundations and Trends® in Signal Processing*, 7(3–4):197–387. - Schmidhuber, J. (2015). Deep learning in neural networks: An overview.
*Neural Networks*, 61:85–117. - Wang, H. and Raj, B. (2017). On the Origin of Deep Learning.
*ArXiv*. - Sze, V., Chen, Y.-H., Yang, T.-J., and Emer, J. (2017). Efficient Processing of Deep Neural Networks: A Tutorial and Survey.
*ArXiv*.

Machine Learning

- Jordan, M. I. and Mitchell, T. (2015). Machine learning: Trends, perspectives, and prospects.
*Science*, 349(6245):255–260. - Domingos, P. (2012). A few useful things to know about machine learning.
*Communications of the ACM*, 55(10):78–87. - Bottou, L., Curtis, F. E., and Nocedal, J. (2018). Optimization methods for large-scale machine learning.
*SIAM Review*, 60(2):223–311. - Ng, A. (2018).
*Machine Learning Yearning (draft)*. deeplearning.ai. - Zinkevich, M. (2017).
*Rules of Machine Learning: Best Practices for ML Engineering*. - Smith, L. N. (2017). Best Practices for Applying Deep Learning to Novel Applications.
*ArXiv*. - Andrieu, C., de Freitas, N., Doucet, A., and Jordan, M. I. (2003). An introduction to MCMC for machine learning.
*Machine Learning*, 50(1–2):5–43.

Exploration

- Li, L. (2012). Sample complexity bounds of exploration. In Wiering, M. and van Otterlo, M., editors,
*Reinforcement Learning: State-of-the-Art*, pages 175–204. Springer-Verlag Berlin Heidelberg.

Transfer Learning

- Taylor, M. E. and Stone, P. (2009). Transfer learning for reinforcement learning domains: A survey.
*JMLR*, 10:1633–1685. - Pan, S. J. and Yang, Q. (2010). A survey on transfer learning.
*IEEE Transactions on Knowledge and Data Engineering*, 22(10):1345–1359. - Weiss, K., Khoshgoftaar, T. M., and Wang, D. (2016). A survey of transfer learning.
*Journal of Big Data*, 3(9).

Multi-task Learning

- Zhang, Y., , and Yang, Q. (2018). An overview of multi-task learning.
*National Science Review*, 5:30–43. - Ruder, S. (2017). An Overview of Multi-Task Learning in Deep Neural Networks.
*ArXiv*.

Neural Architecture Search

- Elsken, T., Hendrik Metzen, J., and Hutter, F. (2018). Neural Architecture Search: A Survey.
*ArXiv*.

Successor Representation

- Gershman, S. J. (2018). The successor representation: Its computational logic and neural substrates.
*Journal of Neuroscience*, 38(33):7193–7200.

Bayesian RL

- Ghavamzadeh, M., Mannor, S., Pineau, J., and Tamar, A. (2015). Bayesian reinforcement learning: a survey.
*Foundations and Trends in Machine Learning*, 8(5–6):359–483.

Monte Carlo tree search (MCTS)

- Browne, C., Powley, E., Whitehouse, D., Lucas, S., Cowling, P. I., Rohlfshagen, P., Tavener, S., Perez, D., Samothrakis, S., and Colton, S. (2012). A survey of Monte Carlo tree search methods.
*IEEE Transactions on Computational Intelligence and AI in Games*, 4(1):1–43. - Gelly, S., Schoenauer, M., Sebag, M., Teytaud, O., Kocsis, L., Silver, D., and Szepesvári, C. (2012). The grand challenge of computer go: Monte carlo tree search and extensions. C
*ommunications of the ACM,*55(3):106–113.

Attention and Memory

- Olah, C. and Carter, S. (2016). Attention and augmented recurrent neural networks.
*Distill*. - Denny Britz, Attention and Memory in Deep Learning and NLP

Intrinsic Motivation

- Barto, A. (2013). Intrinsic motivation and reinforcement learning. In Baldassarre, G. and Mirolli, M., editors,
*Intrinsically Motivated Learning in Natural and Artificial Systems*. Springer, Berlin, Heidelberg. - Schmidhuber, J. (2010). Formal theory of creativity, fun, and intrinsic motivation (1990–2010).
*IEEE Transactions on Autonomous Mental Development*, 2(3):230–247. - Oudeyer, P.-Y. and Kaplan, F. (2007). What is intrinsic motivation? a typology of computational approaches.
*Frontiers in neurorobotics*, 1(6).

Evolution Strategy

- Hansen, N. (2016). The CMA Evolution Strategy: A Tutorial.
*ArXiv*.

Robotics

- Kober, J., Bagnell, J. A., and Peters, J. (2013). Reinforcement learning in robotics: A survey.
*International Journal of Robotics Research*, 32(11):1238–1278. - Deisenroth, M. P., Neumann, G., and Peters, J. (2013). A survey on policy search for robotics.
*Foundations and Trend in Robotics*, 2:1–142. - Argall, B. D., Chernova, S., Veloso, M., and Browning, B. (2009). A survey of robot learning from demonstration.
*Robotics and Autonomous Systems*, 57(5):469–483.

Natural Language Processing (NLP)

- Hirschberg, J. and Manning, C. D. (2015). Advances in natural language processing.
*Science*, 349(6245):261–266. - Cho, K. (2015). Natural Language Understanding with Distributed Representation.
*ArXiv*. - Young, T., Hazarika, D., Poria, S., and Cambria, E. (2017). Recent Trends in Deep Learning Based Natural Language Processing.
*ArXiv*.

Dialogue Systems

- Hinton, G., Deng, L., Yu, D., Dahl, G. E., rahman Mohamed, A., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T. N., , and Kingsbury, B. (2012). Deep neural networks for acoustic modeling in speech recognition.
*IEEE Signal Processing Magazine*, 82. - Deng, L. and Li, X. (2013). Machine learning paradigms for speech recognition: An overview.
*IEEE Transac- tions on Audio, Speech, and Language Processing*, 21(5):1060–1089. - Gao, J., Galley, M., and Li, L. (2018). Neural approaches to Conversational AI.
*Foundations and Trends in Information Retrieval*. To appear. - He, X. and Deng, L. (2013). Speech-centric information processing: An optimization-oriented approach.
*Proceedings of the IEEE | Vol. 101, №5, May 2013*, 101(5):1116–1135. - Young, S., Gašić, M., Thomson, B., and Williams, J. D. (2013). POMDP-based statistical spoken dialogue systems: a review. Pr
*oceedings of IEEE,*101(5):1160–1179.

Computer Vision

- Zhang, Q. and Zhu, S.-C. (2018). Visual interpretability for deep learning: a survey.
*Frontiers of Information Technology & Electronic Engineering*, 19(1):27–39. - Bohg, J., Hausman, K., Sankaran, B., Brock, O., Kragic, D., Schaal, S., and Sukhatme, G. S. (2017). Interactive perception: Leveraging action in perception and perception in action.
*IEEE Transactions on Robotics*, 33(6):1273–1291.

Healthcare

- Chakraborty, B. and Murphy, S. A. (2014). Dynamic treatment regimes.
*Annual Review of Statistics and Its Application*, 1:447–464.

Energy

- Anderson, R. N., Boulanger, A., Powell, W. B., and Scott, W. (2011). Adaptive stochastic control for the smart grid.
*Proceedings of the IEEE*, 99(6):1098–1115.

Collection of Applications

- Csaba Szepesvári, RLApplications.bib
- Satinder Singh, Successes of Reinforcement Learning

AI Safety

- Amodei, D., Olah, C., Steinhardt, J., Christiano, P., Schulman, J., and Mané, D. (2016). Concrete Problems in AI Safety. A
*rXiv.* - Garcìa, J. and Fernàndez, F. (2015). A comprehensive survey on safe reinforcement learning. JM
*LR,*16:1437–1480.

#### Courses

Reinforcement Learning

- David Silver, Reinforcement Learning, 2015. Slides. Video.
- Sergey Levine, UC Berkeley CS 294: Deep Reinforcement Learning
- Richard Sutton, Reinforcement Learning, 2016.
- Katerina Fragkiadaki, Ruslan Satakhutdinov, Deep Reinforcement Learning and Control, Spring 2017
- Emma Brunskill, CS234: Reinforcement Learning
- Charles Isbell, Michael Littman and Chris Pryby, Udacity: Reinforcement Learning
- Emo Todorov, Intelligent control through learning and optimization

Deep Learning

- Andrew Ng and Kian Katanforoosh, Stanford CS230: Deep Learning
- Andrew Ng, Deep Learning Specialization
- Jeremy Howard, Practical Deep Learning For Coders
- Nando de Freitas, Deep Learning Lectures
- David Donoho, Hatef Monajemi, and Vardan Papyan, Stanford STATS 385, Theories of Deep Learning

Machine Learning

- Andrew Ng, Machine Learning

Robotics

- Pieter Abbeel, Advanced Robotics, Fall 2015
- Abdeslam Boularias, Robot Learning Seminar
- MIT 6.S094: Deep Learning for Self-Driving Cars

Computer Vision

- Fei-Fei Li, Justin Johnson, and Serena Yeung, CS231n: Convolutional Neural Networks for Visual Recognition

NLP

- Richard Socher, CS224d: Deep Learning for Natural Language Processing
- Brendan Shillingford, Yannis Assael, Chris Dyer, Oxford Deep NLP 2017 course

Healthcare

- David Sontag, Machine Learning for Healthcare

AI

- UC Berkeley CS188 Intro to AI
- Andrew Critch and Stuart Russell, UC Berkeley CS 294–149: Safety and Control for Artificial General Intelligence

#### Tutorials and Talks

Reinforcement Learning

- Rich Sutton, Introduction to Reinforcement Learning with Function Approximation
- Rich Sutton, Temporal Difference Learning
- Andrew Barto, A history of reinforcement learning
- Deep Reinforcement Learning, David Silver, Pieter Abbeel, Sergey Levine and Chelsea Finn
- Benjamin Recht, Optimization Perspectives on Learning to Control
- John Schulman, The Nuts and Bolts of Deep Reinforcement Learning Research
- Joelle Pineau, Introduction to Reinforcement Learning
- Deep Learning and Reinforcement Learning Summer School, 2018, 2017
- Deep Learning Summer School, 2016, 2015
- Yisong Yue and Hoang M. Le,

Deep Learning

- Andrew Ng, Nuts and Bolts of Building Applications using Deep Learning
- Christopher Manning and Russ Salakhutdinov, Introductory Overview Lecture The Deep Learning Revolution, JSM 2108 Tutorial
- Sanjeev Arora, ICML 2018 Tutorial on Toward Theoretical Understanding of Deep Learning
- Simons Institute Interactive Learning Workshop, 2017
- Simons Institute Representation Learning Workshop, 2017
- Simons Institute Computational Challenges in Machine Learning Workshop, 2017
- Yann LeCun, Learning world models: The next step towards AI.
- Yoshua Bengio, From deep learning of disentangled representations to higher-level cognition
- Joshua Tenenbaum, Building machines that learn & think like people
- Michael I. Jordan, SysML 2018: Perspectives and Challenges

Robotics

- Pieter Abbeel, Deep learning for robotics, NIPS 2017 Invited Talk

Computer Vision

- Jitendra Malik, IJCAI 2018 Research Excellence Award talk
- Nick Rhinehart, Paul Vernaza, and Kris Kitan, Inverse reinforcement learning for computer vision, CVPR 2018 Tutorial

NLP

- Jianfeng Gao, Michel Galley, and Lihong Li, Neural approaches to Conversational AI. ACL 2018 Tutorial.
- William Wang, Jiwei Li, and Xiaodong He, Deep reinforcement learning for NLP. ACL 2018 Tutorial.

Fiance & Economics

- Sendhil Mullainathan, Machine Learning and Prediction in Economics and Finance, AFA 2017 Lecture

Healthcare

- Yan Liu and Jimeng Sun, Deep Learning Models for Health Care — Challenges and Solutions, ICML 2017 Tutorial

Education

- Curtis G. Northcutt, Artificial Intelligence in Online Education

Security

#### Conferences, Journals and Workshops

- NIPS: Neural Information Processing Systems
- ICML: International Conference on Machine Learning
- ICLR: International Conference on Learning Representation
- RLDM: Multidisciplinary Conference on Reinforcement Learning and Decision Making
- EWRL: European Workshop on Reinforcement Learning
- Deep Reinforcement Learning Workshop, NIPS 2018, 2017 (Symposium), 2016, 2015; IJCAI 2016
- AAAI, IJCAI, ACL, EMNLP, NAACL, CVPR, ICCV, ECCV, ICRA, IROS, RSS, SIGDIAL, KDD, SIGIR, WWW, etc.
- AI Frontiers Conference
- JMLR, MLJ, AIJ, JAIR, TPAMI, etc
- Nature Machine Intelligence, Science Robotics
- Nature May 2015, Science July 2015, survey papers on machine learning/AI
- Science, July 7, 2017 issue, The Cyberscientist, a special issue about AI
- http://distill.pub

#### Blogs

- http://rodneybrooks.com/blog/
- Deepmind Blog
- Google Research Blog,
- The Google Brain Team — Looking Back on 2017(1,2), 2016
- Berkeley AI Research Blog
- OpenAI Blog
- Facebook AI Research (FAIR) Blog
- Bandit algorithms
- David Abel, ICML 2018 notes, NIPS 2017 notes
- Denny Britz, AI and Deep Learning in 2017 — A Year in Review
- Denny Britz, Learning Reinforcement Learning (with Code, Exercises and Solutions)
- Andrej Karpathy, Deep Reinforcement Learning: Pong from Pixels
- Lilian Weng, A (Long) Peek into Reinforcement Learning
- Alexander Irpan, Deep Reinforcement Learning Doesn’t Work Yet (Note: The title is wrong.)
- Matthew Rahtz, Lessons Learned Reproducing a Deep Reinforcement Learning Paper
- Junling Hu, Reinforcement learning explained — learning to act based on long-term payoffs
- Li Deng, How deep reinforcement learning can help chatbots
- Deep Learning
- Reinforcement Learning

#### Benchmarks and Testbeds

I list some RL testbeds in the following. Common testbeds for general RL algorithms are Atari games, e.g., in the Arcade Learning Environment (ALE), for discrete control, and simulated robots, e.g. using MuJoCo in OpenAI Gym, for continuous control.

- The Arcade Learning Environment (ALE) is a framework composed of Atari 2600 games to develop and evaluate AI agents.
- OpenAI Gym is a toolkit for the development of RL algorithms, consisting of environments, e.g., Atari games and simulated robots, and a site for the comparison and reproduction of results. OpenAI Gym has the following environments: algorithmic, Atari, xox2d, classic control, MuJoCo, robotics, and, toy text.
- MuJoCo, Multi-Joint dynamics with Contact, a physics engine.
- DeepMind Control Suite
- DeepMind Lab, DeepMind first-person 3D game platform
- Deepmind PySC2 — StarCraft II Learning Environment
- Dopamine, a Tensorflow-based RL framework from Google AI.
- David Churchill, CommandCenter: StarCraft 2 AI Bot
- ELF, an extensive, lightweight and flexible platform for RL research,

ELF OpenGo: A Reimplementation of AlphaGoZero/AlphaZero using ELF. - FAIR TorchCraft is a library for Real-Time Strategy (RTS) games such as StarCraft: Brood War.
- Ray RLlib: A Composable and Scalable Reinforcement Learning Library
- ParlAI is a framework for dialogue research, implemented in Python, open-sourced by Facebook.
- Natural language decathlon (decaNLP), an NLP benchmark suitable for multitask, transfer, and continual learning.
- Project Malmo, from Microsoft, is an AI research and experimentation platform built on top of Minecraft.
- Twitter open-sources torch-twrl, a framework for RL development.
- ViZDoom is a Doom-based AI research platform for visual RL.
- Baidu Apollo Project, self-driving open-source
- TORCS is a car racing simulator.
- CoQA, a large-scale dataset for building conversational QA systems
- WebNav Challenge for Wikipedia links navigation
- Psychlab: A Psychology Laboratory for Deep RL Agents
- RLGlue is a language-independent software for RL experiments.
- RLPy is a value-function-based reinforcement learning framework for education and research.

Source: Deep Learning on Medium