Munchausen Reinforcement Learning

06/12/2020

Munchausen Reinforcement Learning

Nino Vieillard, Olivier Pietquin, Matthieu Geist

Keywords:

Abstract Paper Similar Papers

Abstract: Bootstrapping is a core mechanism in Reinforcement Learning (RL). Most algorithms, based on temporal differences, replace the true value of a transiting state by their current estimate of this value. Yet, another estimate could be leveraged to bootstrap RL: the current policy. Our core contribution stands in a very simple idea: adding the scaled log-policy to the immediate reward. We show that, by slightly modifying Deep Q-Network (DQN) in that way provides an agent that is competitive with the state-of-the-art Rainbow on Atari games, without making use of distributional RL, n-step returns or prioritized replay. To demonstrate the versatility of this idea, we also use it together with an Implicit Quantile Network (IQN). The resulting agent outperforms Rainbow on Atari, installing a new State of the Art with very little modifications to the original algorithm. To add to this empirical study, we provide strong theoretical insights on what happens under the hood -- implicit Kullback-Leibler regularization and increase of the action-gap.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at NeurIPS 2020 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

06/12/2020

Non-Crossing Quantile Regression for Distributional Reinforcement Learning

Fan Zhou, Jianing Wang, Xingdong Feng

Keywords Paper

0

0

0

0

3:11

18/07/2021

Generalizable Episodic Memory for Deep Reinforcement Learning

Hao Hu, Jianing Ye, Guangxiang Zhu and
Zhizhou Ren, Chongjie Zhang

Keywords Paper

Reinforcement Learning and Planning

0

0

0

0

4:51

06/12/2021

Improve Agents without Retraining: Parallel Tree Search with Off-Policy Correction

Gal Dalal, Assaf Hallak, Steven Dalton and
iuri frosio, Shie Mannor, Gal Chechik

Keywords Paper

theory, reinforcement learning and planning

1

0

0

1

15:00

06/12/2021

On the Estimation Bias in Double Q-Learning

Zhizhou Ren, Guangxiang Zhu, Hao Hu and
Beining Han, Jianglun Chen, Chongjie Zhang

Keywords Paper

0

0

0

0

14:45

19/08/2021

Hindsight Trust Region Policy Optimization

Hanbo Zhang, Site Bai, Xuguang Lan and
David Hsu, Nanning Zheng

Keywords Paper

Machine Learning, Deep Reinforcement Learning, Reinforcement Learning

0

0

0

0

13:14

12/07/2020

ConQUR: Mitigating Delusional Bias in Deep Q-Learning

DiJia Su, Jayden Ooi, Tyler Lu and
Dale Schuurmans, Craig Boutilier

Keywords Paper

Reinforcement Learning - General

0

0

0

0

15:04

26/04/2020

Disagreement-Regularized Imitation Learning

Kiante Brantley, Wen Sun, Mikael Henaff

Keywords Paper

imitation learning, reinforcement learning, uncertainty

0

0

0

0

4:53

03/05/2021

Return-Based Contrastive Representation Learning for Reinforcement Learning

Guoqing Liu, Chuheng Zhang, Li Zhao and
Tao Qin, Jinhua Zhu, Li Jian, Nenghai Yu, Tie-Yan Liu

Keywords Paper

reinforcement learning, auxiliary task, contrastive learning, representation learning

0

0

0

0

5:20

06/12/2021

A Minimalist Approach to Offline Reinforcement Learning

Scott Fujimoto, Shixiang (Shane) Gu

Keywords Paper

reinforcement learning and planning, generative model

1

0

0

0

8:31

12/07/2020

An Optimistic Perspective on Offline Deep Reinforcement Learning

Rishabh Agarwal, Dale Schuurmans, Mohammad Norouzi

Keywords Paper

Reinforcement Learning - Deep RL

0

0

0

0

13:54

26/04/2020

Fast Task Inference with Variational Intrinsic Successor Features

Steven Hansen, Will Dabney, Andre Barreto and
David Warde-Farley, Tom Van de Wiele, Volodymyr Mnih

Keywords Paper

Reinforcement Learning, Variational Intrinsic Control, Successor Features

0

0

0

0

14:47

18/07/2021

Sample Efficient Reinforcement Learning In Continuous State Spaces: A Perspective Beyond Linearity

Dhruv Malik, Aldo Pacchiano, Vishwak Srinivasan, Yuanzhi Li

Keywords Paper

Deep Learning, Deep Learning, Efficient Training Methods; Deep Learning, Optimization for Deep Networks, Theory, RL, Decisions and Control Theory

0

0

0

0

5:11

03/05/2021

Data-Efficient Reinforcement Learning with Self-Predictive Representations

Max Schwarzer, Ankesh Anand, Rishab Goel and
R Devon Hjelm, Aaron Courville, Philip Bachman

Keywords Paper

Representation Learning, Self-Supervised Learning, Reinforcement Learning, Sample Efficiency

0

0

0

1

10:04

18/07/2021

Convex Regularization in Monte-Carlo Tree Search

Tuan Q Dam, Carlo D'Eramo, Jan Peters, Joni Pajarinen

Keywords Paper

Reinforcement Learning and Planning

0

0

0

0

4:52

06/12/2021

Decision Transformer: Reinforcement Learning via Sequence Modeling

Lili Chen, Kevin Lu, Aravind Rajeswaran and
Kimin Lee, Aditya Grover, Misha Laskin, Pieter Abbeel, Aravind Srinivas, Igor Mordatch

Keywords Paper

deep learning, reinforcement learning and planning, transformers, generative model

0

0

0

0

6:51

02/02/2021

DeepSynth: Automata Synthesis for Automatic Task Segmentation in Deep Reinforcement Learning

Mohammadhosein Hasanbeig, Natasha Yogananda Jeppu, Alessandro Abate and
Tom Melham, Daniel Kroening

Keywords Paper

0

0

0

0

15:45

18/07/2021

Ensemble Bootstrapping for Q-Learning

Oren Peer, Chen Tessler, Nadav Merlis, Ron Meir

Keywords Paper

Reinforcement Learning and Planning

0

0

0

0

5:17

06/12/2021

Deep Reinforcement Learning at the Edge of the Statistical Precipice

Rishabh Agarwal, Max Schwarzer, Pablo Samuel Castro and
Aaron Courville, Marc Bellemare

Keywords Paper

reinforcement learning and planning

0

0

0

0

19:36

06/12/2020

An Equivalence between Loss Functions and Non-Uniform Sampling in Experience Replay

Scott Fujimoto, David Meger, Doina Precup

Keywords Paper

0

0

0

0

2:53

06/12/2021

Online and Offline Reinforcement Learning by Planning with a Learned Model

Julian Schrittwieser, Thomas Hubert, Amol Mandhane and
Mohammadamin Barekatain, Ioannis Antonoglou, David Silver

Keywords Paper

deep learning, reinforcement learning and planning

0

0

0

0

13:52

02/02/2021

Distributional Reinforcement Learning via Moment Matching

Thanh Nguyen-Tang, Sunil Gupta, Svetha Venkatesh

Keywords Paper

0

0

0

0

20:01

04/07/2020

Generative Semantic Hashing Enhanced via Boltzmann Machines

Lin Zheng, Qinliang Su, Dinghan Shen, Changyou Chen

Keywords Paper

Generative Hashing, large-scale retrieval, training, Boltzmann Machines

0

0

0

0

11:26

14/06/2020

Seeing without Looking: Contextual Rescoring of Object Detections for AP Maximization

Lourenço V. Pato, Renato Negrinho, Pedro M. Q. Aguiar

Keywords Paper

object detection, context, rescoring, average precision, non-maximum suppression

0

0

0

0

1:00

18/07/2021

Learning and Planning in Average-Reward Markov Decision Processes

Yi Wan, Abhishek Naik, Richard Sutton

Keywords Paper

Reinforcement Learning and Planning

0

0

0

0

5:05

02/02/2021

Mean-Variance Policy Iteration for Risk-Averse Reinforcement Learning

Shangtong Zhang, Bo Liu, Shimon Whiteson

Keywords Paper

0

0

0

0

17:22

03/05/2021

RMSprop converges with proper hyper-parameter

Naichen Shi, Dawei Li, Mingyi Hong, Ruoyu Sun

Keywords Paper

convergence, hyperparameter, RMSprop

0

0

0

0

10:12

18/07/2021

Spectral Normalisation for Deep Reinforcement Learning: An Optimisation Perspective

Florin Gogianu, Tudor Berariu, Mihaela Rosca and
Claudia Clopath, Lucian Busoniu, Razvan Pascanu

Keywords Paper

Reinforcement Learning and Planning, Deep RL

0

0

0

0

5:04

06/12/2020

Model-Based Multi-Agent RL in Zero-Sum Markov Games with Near-Optimal Sample Complexity

Kaiqing Zhang, Sham Kakade, Tamer Basar, Lin Yang

Keywords Paper

0

0

0

0

3:25

06/12/2020

Memory Based Trajectory-conditioned Policies for Learning from Sparse Rewards

Yijie Guo, Jongwook Choi, Marcin Moczulski and
Shengyu Feng, Samy Bengio, Mohammad Norouzi, Honglak Lee

Keywords Paper

0

0

1

1

3:30

06/12/2021

Object-Aware Regularization for Addressing Causal Confusion in Imitation Learning

Jongjin Park, Younggyo Seo, Chang Liu and
Li Zhao, Tao Qin, Jinwoo Shin, Tie-Yan Liu

Keywords Paper

reinforcement learning and planning, causality

0

0

0

0

12:12

03/05/2021

FairBatch: Batch Selection for Model Fairness

Yuji Roh, Kangwook Lee, Steven Whang, Changho Suh

Keywords Paper

bilevel optimization, batch selection, model fairness

0

0

0

0

5:04

26/08/2020

Towards Competitive N-gram Smoothing

Moein Falahatgar, Mesrob Ohannessian, Alon Orlitsky, Venkatadheeraj Pichapati

Keywords Paper

0

0

0

0

17:51

18/07/2021

Sample-Optimal PAC Learning of Halfspaces with Malicious Noise

Jie Shen

Keywords Paper

Theory, Computational Learning Theory

0

0

0

0

4:37

18/07/2021

A Deep Reinforcement Learning Approach to Marginalized Importance Sampling with the Successor Representation

Scott Fujimoto, David Meger, Doina Precup

Keywords Paper

Reinforcement Learning and Planning, Deep RL

1

0

0

0

3:50

26/04/2020

Combining Q-Learning and Search with Amortized Value Estimates

Jessica B. Hamrick, Victor Bapst, Alvaro Sanchez-Gonzalez and
Tobias Pfaff, Theophane Weber, Lars Buesing, Peter W. Battaglia

Keywords Paper

model-based RL, Q-learning, MCTS, search

0

0

0

0

4:37

18/07/2021

Positive-Negative Momentum: Manipulating Stochastic Gradient Noise to Improve Generalization

Zeke Xie, Li Yuan, Zhanxing Zhu, Masashi Sugiyama

Keywords Paper

Optimization, Stochastic Optimization

0

0

0

0

5:17

19/08/2021

Discrete Multiple Kernel k-means

Rong Wang, Jitao Lu, Yihang Lu and
Feiping Nie, Xuelong Li

Keywords Paper

Machine Learning, Clustering, Kernel Methods, Multi-instance; Multi-label; Multi-view learning

0

0

0

0

15:04

06/12/2020

Discovering Reinforcement Learning Algorithms

Junhyuk Oh, Matteo Hessel, Wojciech Czarnecki and
Zhongwen Xu, Hado van Hasselt, Satinder Singh, David Silver

Keywords Paper

0

0

0

0

3:21

03/05/2021

Taming GANs with Lookahead-Minmax

Tatjana Chavdarova, Matteo Pagliardini, Sebastian Stich and
François Fleuret, Martin Jaggi

Keywords Paper

Generative Adversarial Networks, Minmax

0

0

0

0

5:25

06/12/2020

Leverage the Average: an Analysis of KL Regularization in Reinforcement Learning

Nino Vieillard, Tadashi Kozuno, Bruno Scherrer and
Olivier Pietquin, Remi Munos, Matthieu Geist

Keywords Paper

0

0

0

0

3:25