Ensemble Bootstrapping for Q-Learning

18/07/2021

Ensemble Bootstrapping for Q-Learning

Oren Peer, Chen Tessler, Nadav Merlis, Ron Meir

Keywords: Reinforcement Learning and Planning

Abstract Paper Similar Papers

Abstract: Q-learning (QL), a common reinforcement learning algorithm, suffers from over-estimation bias due to the maximization term in the optimal Bellman operator. This bias may lead to sub-optimal behavior. Double-Q-learning tackles this issue by utilizing two estimators, yet results in an under-estimation bias. Similar to over-estimation in Q-learning, in certain scenarios, the under-estimation bias may degrade performance. In this work, we introduce a new bias-reduced algorithm called Ensemble Bootstrapped Q-Learning (EBQL), a natural extension of Double-Q-learning to ensembles. We analyze our method both theoretically and empirically. Theoretically, we prove that EBQL-like updates yield lower MSE when estimating the maximal mean of a set of independent random variables. Empirically, we show that there exist domains where both over and under-estimation result in sub-optimal performance. Finally, We demonstrate the superior performance of a deep RL variant of EBQL over other deep QL algorithms for a suite of ATARI games.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at ICML 2021 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

06/12/2021

On the Estimation Bias in Double Q-Learning

Zhizhou Ren, Guangxiang Zhu, Hao Hu and
Beining Han, Jianglun Chen, Chongjie Zhang

Keywords Paper

0

0

0

0

14:45

26/04/2020

Maxmin Q-learning: Controlling the Estimation Bias of Q-learning

Qingfeng Lan, Yangchen Pan, Alona Fyshe, Martha White

Keywords Paper

reinforcement learning, bias and variance reduction

0

0

0

0

4:27

18/07/2021

Positive-Negative Momentum: Manipulating Stochastic Gradient Noise to Improve Generalization

Zeke Xie, Li Yuan, Zhanxing Zhu, Masashi Sugiyama

Keywords Paper

Optimization, Stochastic Optimization

0

0

0

0

5:17

02/02/2021

Self-correcting Q-learning

Rong Zhu, Mattia Rigotti

Keywords Paper

0

0

0

0

15:22

02/02/2021

Distributional Reinforcement Learning via Moment Matching

Thanh Nguyen-Tang, Sunil Gupta, Svetha Venkatesh

Keywords Paper

0

0

0

0

20:01

06/12/2021

Bellman-consistent Pessimism for Offline Reinforcement Learning

Tengyang Xie, Ching-An Cheng, Nan Jiang and
Paul Mineiro, Alekh Agarwal

Keywords Paper

theory, reinforcement learning and planning, robustness

0

0

0

0

17:42

26/04/2020

CAQL: Continuous Action Q-Learning

Moonkyung Ryu, Yinlam Chow, Ross Anderson and
Christian Tjandraatmadja, Craig Boutilier

Keywords Paper

Reinforcement learning (RL), DQN, Continuous control, Mixed-Integer Programming (MIP)

0

0

0

0

5:36

12/07/2020

Learning Near Optimal Policies with Low Inherent Bellman Error

Andrea Zanette, Alessandro Lazaric, Mykel Kochenderfer, Emma Brunskill

Keywords Paper

Reinforcement Learning - Theory

0

0

0

0

14:22

12/07/2020

Representations for Stable Off-Policy Reinforcement Learning

Dibya Ghosh, Marc Bellemare

Keywords Paper

Reinforcement Learning - General

0

0

0

0

14:38

19/08/2021

Hindsight Trust Region Policy Optimization

Hanbo Zhang, Site Bai, Xuguang Lan and
David Hsu, Nanning Zheng

Keywords Paper

Machine Learning, Deep Reinforcement Learning, Reinforcement Learning

0

0

0

0

13:14

06/12/2021

Uncertainty-Based Offline Reinforcement Learning with Diversified Q-Ensemble

Gaon An, Seungyong Moon, Jang-Hyun Kim, Hyun Oh Song

Keywords Paper

deep learning, reinforcement learning and planning

1

0

0

0

13:50

06/12/2021

Provable Benefits of Actor-Critic Methods for Offline Reinforcement Learning

Andrea Zanette, Martin J Wainwright, Emma Brunskill

Keywords Paper

reinforcement learning and planning

0

0

0

0

14:28

12/07/2020

Minimax Weight and Q-Function Learning for Off-Policy Evaluation

Masatoshi Uehara, Jiawei Huang, Nan Jiang

Keywords Paper

Reinforcement Learning - Theory

0

0

0

0

14:20

06/12/2021

Near-optimal Offline and Streaming Algorithms for Learning Non-Linear Dynamical Systems

Suhas Kowshik, Dheeraj Nagaraj, Prateek Jain, Praneeth Netrapalli

Keywords Paper

theory

0

0

0

0

14:43

13/04/2021

A contraction approach to model-based reinforcement learning

Ting-Han Fan, Peter Ramadge

Keywords Paper

0

0

0

0

3:01

14/06/2020

Towards Discriminability and Diversity: Batch Nuclear-Norm Maximization Under Label Insufficient Situations

Shuhao Cui, Shuhui Wang, Junbao Zhuo and
Liang Li, Qingming Huang, Qi Tian

Keywords Paper

discriminability, diversity, nuclear-norm, domain adaptation, open-set

0

0

0

0

4:56

18/07/2021

Monotonic Robust Policy Optimization with Model Discrepancy

yuankun jiang, Chenglin Li, Wenrui Dai and
Junni Zou, Hongkai Xiong

Keywords Paper

Reinforcement Learning and Planning

0

0

0

0

5:17

06/12/2021

Boosting with Multiple Sources

Corinna Cortes, Mehryar Mohri, Dmitry Storcheus, Ananda Theertha Suresh

Keywords Paper

federated learning

0

0

0

0

13:17

26/08/2020

On Minimax Optimality of GANs for Robust Mean Estimation

Kaiwen Wu, Gavin Weiguang Ding, Ruitong Huang, Yaoliang Yu

Keywords Paper

0

0

0

0

12:54

06/12/2020

Leverage the Average: an Analysis of KL Regularization in Reinforcement Learning

Nino Vieillard, Tadashi Kozuno, Bruno Scherrer and
Olivier Pietquin, Remi Munos, Matthieu Geist

Keywords Paper

0

0

0

0

3:25

12/07/2020

Double Trouble in Double Descent: Bias and Variance(s) in the Lazy Regime

Stéphane d'Ascoli, Maria Refinetti, Giulio Biroli, Florent Krzakala

Keywords Paper

Deep Learning - Theory

0

0

0

0

15:11

18/07/2021

Risk Bounds and Rademacher Complexity in Batch Reinforcement Learning

Yaqi Duan, Chi Jin, Zhiyuan Li

Keywords Paper

Theory, RL, Decisions and Control Theory

0

0

0

0

5:18

06/12/2021

Scaling Ensemble Distribution Distillation to Many Classes with Proxy Targets

Max Ryabinin, Andrey Malinin, Mark Gales

Keywords Paper

machine learning

0

0

0

0

12:36

06/12/2021

Adaptive Ensemble Q-learning: Minimizing Estimation Bias via Error Feedback

Hang Wang, Sen Lin, Junshan Zhang

Keywords Paper

0

0

0

0

11:19

06/12/2020

Least Squares Regression with Markovian Data: Fundamental Limits and Algorithms

Dheeraj Nagaraj, Xian Wu, Guy Bresler and
Prateek Jain, Praneeth Netrapalli

Keywords Paper

0

0

0

0

3:34

18/07/2021

A Distribution-dependent Analysis of Meta Learning

Mikhail Konobeev, Ilja Kuzborskij, Csaba Szepesvari

Keywords Paper

Theory, Statistical Learning Theory

0

0

0

0

5:06

04/07/2020

Perturbation Based Learning for Structured NLP tasks with Application to Dependency Parsing

Amichay Doitch, Ram Yazdi, Tamir Hazan, Roi Reichart

Keywords Paper

Structured tasks, Dependency Parsing, NLP, sampling

0

0

0

0

10:53

06/12/2020

GAIT-prop: A biologically plausible learning rule derived from backpropagation of error

Nasir Ahmad, Marcel A. J. van Gerven, Luca Ambrogioni

Keywords Paper

0

0

0

0

3:09

06/12/2020

DisCor: Corrective Feedback in Reinforcement Learning via Distribution Correction

Aviral Kumar, Abhishek Gupta, Sergey Levine

Keywords Paper

0

0

0

0

3:25

06/12/2021

Conservative Offline Distributional Reinforcement Learning

Yecheng Ma, Dinesh Jayaraman, Osbert Bastani

Keywords Paper

reinforcement learning and planning

1

0

0

0

13:54

06/12/2021

Finite-Sample Analysis of Off-Policy TD-Learning via Generalized Bellman Operators

Zaiwei Chen, Siva Theja Maguluri, Sanjay Shakkottai, Karthikeyan Shanmugam

Keywords Paper

0

0

0

0

14:56

06/12/2020

Autoencoders that don't overfit towards the Identity

Harald Steck

Keywords Paper

0

0

0

0

3:22

06/12/2021

Rethinking and Reweighting the Univariate Losses for Multi-Label Ranking: Consistency and Generalization

Guoqiang Wu, Chongxuan LI, Kun Xu, Jun Zhu

Keywords Paper

theory, machine learning

0

0

0

0

9:29

18/07/2021

Emphatic Algorithms for Deep Reinforcement Learning

Ray Jiang, Tom Zahavy, Zhongwen Xu and
Adam White, Matteo Hessel, Charles Blundell, Hado van Hasselt

Keywords Paper

Reinforcement Learning and Planning, Deep RL

1

0

0

0

5:21

26/04/2020

On Generalization Error Bounds of Noisy Gradient Methods for Non-Convex Learning

Jian Li, Xuanyuan Luo, Mingda Qiao

Keywords Paper

learning theory, generalization, nonconvex learning, stochastic gradient descent, Langevin dynamics

0

0

0

0

4:50

06/12/2020

Bias no more: high-probability data-dependent regret bounds for adversarial bandits and MDPs

Chung-Wei Lee, Haipeng Luo, Chen-Yu Wei, Mengxiao Zhang

Keywords Paper

0

0

0

0

3:16

12/07/2020

Gradient Temporal-Difference Learning with Regularized Corrections

Sina Ghiassian, Andrew Patterson, Shivam Garg and
Dhawal Gutpa, Adam White, Martha White

Keywords Paper

Reinforcement Learning - General

0

0

0

0

10:56

06/12/2020

Bayes Consistency vs. H-Consistency: The Interplay between Surrogate Loss Functions and the Scoring Function Class

Mingyuan Zhang, Shivani Agarwal

Keywords Paper

0

0

0

0

3:19

06/12/2020

Trust the Model When It Is Confident: Masked Model-based Actor-Critic

Feiyang Pan, Jia He, Dandan Tu, Qing He

Keywords Paper

0

0

0

0

2:57

06/12/2021

Consistency Regularization for Variational Auto-Encoders

Samarth Sinha, Adji Bousso Dieng

Keywords Paper

deep learning, machine learning, self-supervised learning, generative model, contrastive learning, representation learning

0

0

0

0

10:52