强化学习资源列表
人工智能是21世紀最激動人心的技術之一。人工智能,目的是創造像人一樣的智能,而人的智能包括感知、決策和認知(從直覺到推理、規劃、意識等)。其中,感知解決what,深度學習已經超越人類水平;決策解決how,強化學習在游戲和機器人等領域取得了一定效果;認知解決why,知識圖譜、因果推理和持續學習等正在研究。強化學習,采用反饋學習的方式解決序列決策問題,因此必然是通往通用人工智能的終極鑰匙。
課程和視頻
Reinforcement Learning by David Silver (2015) [homepage] [youtube] [bilibili]
CS 188: Introduction to Artificial Intelligence [Fall 2012-Spring 2014] [Fall 2018] [Summer 2019] [Spring 2020]
CS 294: Deep Reinforcement Learning by Sergey Levine [Fall 2015] [Spring 2017] [Fall 2017] [Fall 2018]
CS 285: Deep Reinforcement Learning [Fall 2019] [youtube]
Advanced Deep Learning & Reinforcement Learning by DeepMind & UCL [youtube2018]
Deep Reinforcement Learning and Control [Spring 2017]
CS234: Reinforcement Learning [Winter 2019] [youtube]
Deep RL Bootcamp [August 2017]
Deep Reinforcement Learning by 李宏毅 [Spring 2018] [youtube2018]
Reinforcement Learning by 莫煩 [homepage]
書籍
Reinforcement Learning: An Introduction (1st Edition, 1998) [homepage]
Reinforcement Learning: An Introduction (2nd Edition, 2018) [homepage] [bookdraft2018jan1] [2018] [Python Code] [中文翻譯]
Hands-On Reinforcement Learning With Python (2018) [homepage]
Reinforcement Learning With Open AI TensorFlow and Keras Using Python (2018) [homepage]
Algorithms for Reinforcement Learning (2010) [download]
《神經網絡與深度學習》[download]
代碼
ShangtongZhang/Python Implementation of Reinforcement Learning: An Introduction (2nd Edition) [github]
JuliaReinforcementLearning/ReinforcementLearningAnIntroduction.jl [github]
berkeleydeeprlcourse [github]
tensorlayer/RLzoo [github]
rlcode/reinforcement-learning [github]
MorvanZhou/Reinforcement-learning-with-tensorflow [github]
dennybritz/reinforcement-learning [github]
p-christ/Deep-Reinforcement-Learning-Algorithms-with-PyTorch [github]
教程
OpenAI Spinning Up [英文版] [中文版]
演講
Rich Sutton, 2015, Introduction to Reinforcement Learning with Function Approximation
Andrew Barto, 2018, A history of reinforcement learning
David Silver, Principles of Deep RL
Benjamin Recht, 2018, Optimization Perspectives on Learning to Control
John Schulman, 2017, The Nuts and Bolts of Deep Reinforcement Learning Research
Joelle Pineau, Introduction to Reinforcement Learning
Deep Learning and Reinforcement Learning Summer School, 2018, 2017
Deep Learning Summer School, 2016, 2015
Yisong Yue and Hoang M. Le, Imitation Learning, ICML 2018 Tutorial
綜述
Li, Y. (2017). Deep Reinforcement Learning: An Overview. ArXiv. [paper]
Littman, M. L. (2015). Reinforcement learning improves behaviour from evaluative feedback. Nature, 521:445–451. [paper]
Kaelbling, L., Littman, M., and Moore, A. (1996). Reinforcement learning: A survey. Journalof Artificial Intelligence Research, 4:237–285. [paper]
算法
(1) Reinforcement Learning
- Q-learning
Learning From Delayed Reward (Watkins et al. 1989) [paper] - REINFORCE
Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning (Williams et al. 1992) [paper] [ML] - SARSA
Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding (Sutton et al. 1996) [paper] [NIPS]
(2) Deep Reinforcement Learning
- DQN
Playing Atari with Deep Reinforcement Learning (Mnih et al. 2013) [arxiv] - DDQN
Deep Reinforcement Learning with Double Q-learning (Hasselt et al. 2015) [arxiv] [AAAI] - TRPO
Trust Region Policy Optimization (Schulman et al. 2015) [arxiv] [ICML] - H-DQN
Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation (Kulkarni et al. 2016) [arxiv] [NIPS] - PER
Prioritized Experience Replay (Schaul et al. 2016) [arxiv] [ICLR] - Dueling DDQN
Dueling Network Architectures for Deep Reinforcement Learning (Wang et al. 2016) [arxiv] [ICML] - DDPG
Continuous Control With Deep Reiforcement Learning (Lillicrap et al. 2016) [arxiv] [ICLR] - A2C/A3C
Asynchronous Methods for Deep Reinforcement Learning (Mnih et al. 2016) [arxiv] [ICML] - SNN-HRL
Stochastic Neural Networks For Hierarchical Reinforcement learning (Florensa et al. 2017) [arxiv] [ICLR] - PPO
Proximal Policy Optimization Algorithms (Schulman et al. 2017) [arxiv] - HER
Hindsight Experience Replay (Andrychowicz et al. 2018) [arxiv] [NIPS] - TD3
Addressing Function Approximation Error in Actor-Critic Methods (Fujimoto et al. 2018) [arxiv] [ICML] - DIAYN
Diversity is All You Need: Learning Skills Without a Reward Function (Eyensbach et al. 2018) [arxiv] [ICLR] - HIRO
Data-Efficient Hierarchical Reinforcement Learning (Nachum et al. 2018) [arxiv] [NIPS] - SAC
Soft Actor-Critic Algorithms and Applications (Haarnoja et al. 2019) [arxiv] - SAC-Discrete
Soft Actor-Critic For Discrete Action Settings (Christodoulou 2019) [arxiv] - TQC
Controlling overestimation bias with truncated mixture of continuous distributional quantile critics (Kuznetsov et al. 2020) [arxiv] [ICML]
環境
Cart Pole
Mountain Car
OpenAI Gym
Google Dopamine 2.0
Emo Todorov Mujoco
通用格子世界環境類
框架
OpenAI Baselines
百度 PARL
DeepMind OpenSpiel
研究員
Richard S. Sutton [homepage]
David Silver [homepage]
Pieter Abbeel [homepage]
Sergey Levine [homepage]
李宏毅 [homepage]
會議/期刊
會議:AAAI、NIPS、ICML、ICLR、IJCAI、 AAMAS、IROS等。
期刊:AI、 JMLR、JAIR、 Machine Learning、JAAMAS等。
研究機構
OpenAI
DeepMind
Berkeley Artificial Intelligence Research (BAIR) Lab
博客
Keavnn’Blog
Medium : Reinforcement Learning
StackOverflow : Reinforcement Learning
[BEST Reinforcement Learning (RL) Books Update till Jan 2021]
[Introduction to Deep Reinforcement Learning]
知乎
強化學習知識大講堂
智能單元
強化學習
公眾號
深度強化學習實驗室
深度學習技術前沿
AI科技評論
新智元
其他
kmario23/deep-learning-drizzle [github] [webpage]
Mr.Jk.Zhang [CSDN]
總結
- 上一篇: 前端面试必考题
- 下一篇: Observability——Wavef