Maxmin Q-learning: Controlling the Estimation Bias of Q-learning

Abstract: Q-learning suffers from overestimation bias, because it approximates the maximum action value using the maximum estimated action value. Algorithms have been proposed to reduce overestimation bias, but we lack an understanding of how bias interacts with performance, and the extent to which existing algorithms mitigate bias. In this paper, we 1) highlight that the effect of overestimation bias on learning efficiency is environment-dependent; 2) propose a generalization of Q-learning, called \emph{Maxmin Q-learning}, which provides a parameter to flexibly control bias; 3) show theoretically that there exists a parameter choice for Maxmin Q-learning that leads to unbiased estimation with a lower approximation variance than Q-learning; and 4) prove the convergence of our algorithm in the tabular case, as well as convergence of several previous Q-learning variants, using a novel Generalized Q-learning framework. We empirically verify that our algorithm better controls estimation bias in toy environments, and that it achieves superior performance on several benchmark problems.

02/02/2021

Algorithms, Adversarial Learning, Applications, Computer Vision; Deep Learning, Adversarial Networks; Deep Learning, Generative Models, Reinforcement Learning and Planning, Deep RL

5:20

18/07/2021

Maxmin Q-learning: Controlling the Estimation Bias of Q-learning

Qingfeng Lan, Yangchen Pan, Alona Fyshe, Martha White

Comments

Similar Papers

Self-correcting Q-learning

Rong Zhu, Mattia Rigotti

Keywords Abstract Paper

Leveraging Recursive Gumbel-Max Trick for Approximate Inference in Combinatorial Spaces

Kirill Struminsky, Artyom Gadetsky, Denis Rakitin and Danil Karpushkin, Dmitry Vetrov

Keywords Abstract Paper

deep learning, optimization

CAQL: Continuous Action Q-Learning

Moonkyung Ryu, Yinlam Chow, Ross Anderson and Christian Tjandraatmadja, Craig Boutilier

Keywords Abstract Paper

Reinforcement learning (RL), DQN, Continuous control, Mixed-Integer Programming (MIP)

Ensemble Bootstrapping for Q-Learning

Oren Peer, Chen Tessler, Nadav Merlis, Ron Meir

Keywords Abstract Paper

Reinforcement Learning and Planning

Accelerating Safe Reinforcement Learning with Constraint-mismatched Baseline Policies

Jimmy Yang, Justinian Rosca, Karthik Narasimhan, Peter Ramadge

Keywords Abstract Paper

Algorithms, Adversarial Learning, Applications, Computer Vision; Deep Learning, Adversarial Networks; Deep Learning, Generative Models, Reinforcement Learning and Planning, Deep RL

Positive-Negative Momentum: Manipulating Stochastic Gradient Noise to Improve Generalization

Zeke Xie, Li Yuan, Zhanxing Zhu, Masashi Sugiyama

Keywords Abstract Paper

Optimization, Stochastic Optimization

Modeling the Second Player in Distributionally Robust Optimization

Paul Michel, Tatsunori Hashimoto, Graham Neubig

Keywords Abstract Paper

adversarial learning, deep learning, robustness, distributionally robust optimization

Agnostic Learning with Multiple Objectives

Corinna Cortes, Mehryar Mohri, Javier Gonzalvo, Dmitry Storcheus

Keywords Abstract Paper

A Distribution-dependent Analysis of Meta Learning

Mikhail Konobeev, Ilja Kuzborskij, Csaba Szepesvari

Keywords Abstract Paper

Theory, Statistical Learning Theory

Modeling and Optimization Trade-off in Meta-learning

Katelyn Gao, Ozan Sener

Keywords Abstract Paper

DisCor: Corrective Feedback in Reinforcement Learning via Distribution Correction

Aviral Kumar, Abhishek Gupta, Sergey Levine

Keywords Abstract Paper

Theoretical bounds on estimation error for meta-learning

James Lucas, Mengye Ren, Irene Raissa KAMENI KAMENI and Toniann Pitassi, Richard Zemel

Keywords Abstract Paper

meta learning, minimax risk, few-shot, lower bounds, learning theory

Adaptive Discretization for Model-Based Reinforcement Learning

Sean Sinclair, Tianyu Wang, Gauri Jain and Sid Banerjee, Christina Yu

Keywords Abstract Paper

A Theoretical Framework for Target Propagation

Alexander Meulemans, Francesco Carzaniga, Johan Suykens and João Sacramento, Benjamin F. Grewe

Keywords Abstract Paper

An Exact Characterization of the Generalization Error for the Gibbs Algorithm

Gholamali Aminian, Yuheng Bu, Laura Toni and Miguel Rodrigues, Gregory Wornell

Keywords Abstract Paper

Theoretically Principled Deep RL Acceleration via Nearest Neighbor Function Approximation

Junhong Shen, Lin F. Yang

Keywords Abstract Paper

Adversarially Robust Low Dimensional Representations

Pranjal Awasthi, Vaggos Chatziafratis, Xue Chen, Aravindan Vijayaraghavan

Keywords Abstract Paper

Beyond the Mean-Field: Structured Deep Gaussian Processes Improve the Predictive Uncertainties

Jakob Lindinger, David Reeb, Christoph Lippert, Barbara Rakitsch

Keywords Abstract Paper

Regularized Softmax Deep Multi-Agent Q-Learning

Ling Pan, Tabish Rashid, Bei Peng and Longbo Huang, Shimon Whiteson

Keywords Abstract Paper

reinforcement learning and planning

Learning the truth from only one side of the story

Heinrich Jiang, Qijia Jiang, Aldo Pacchiano

Keywords Abstract Paper

Attribute-Guided Adversarial Training for Robustness to Natural Perturbations

Tejas Gokhale, Rushil Anirudh, Bhavya Kailkhura and Jayaraman J. Thiagarajan, Chitta Baral, Yezhou Yang

Keywords Abstract Paper

Learning Self-Correctable Policies and Value Functions from Demonstrations with Negative Sampling

Yuping Luo, Huazhe Xu, Tengyu Ma

Keywords Abstract Paper

imitation learning, model-based imitation learning, model-based RL, behavior cloning, covariate shift

Keywords Paper

Kirill Struminsky, Artyom Gadetsky, Denis Rakitin and
Danil Karpushkin, Dmitry Vetrov

Keywords Paper

Moonkyung Ryu, Yinlam Chow, Ross Anderson and
Christian Tjandraatmadja, Craig Boutilier

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

James Lucas, Mengye Ren, Irene Raissa KAMENI KAMENI and
Toniann Pitassi, Richard Zemel

Keywords Paper

Sean Sinclair, Tianyu Wang, Gauri Jain and
Sid Banerjee, Christina Yu

Keywords Paper

Alexander Meulemans, Francesco Carzaniga, Johan Suykens and
João Sacramento, Benjamin F. Grewe

Keywords Paper

Gholamali Aminian, Yuheng Bu, Laura Toni and
Miguel Rodrigues, Gregory Wornell

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Ling Pan, Tabish Rashid, Bei Peng and
Longbo Huang, Shimon Whiteson

Keywords Paper

Keywords Paper

Tejas Gokhale, Rushil Anirudh, Bhavya Kailkhura and
Jayaraman J. Thiagarajan, Chitta Baral, Yezhou Yang

Keywords Paper

Keywords Paper

Zhizhou Ren, Guangxiang Zhu, Hao Hu and
Beining Han, Jianglun Chen, Chongjie Zhang

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Johannes von Oswald, Dominic Zhao, Seijin Kobayashi and
Simon Schug, Massimo Caccia, Nicolas Zucchet, João Sacramento

Keywords Paper

Keywords Paper

Keywords Paper

Guang Zhao, Edward Dougherty, Byung-Jun Yoon and
Francis J. Alexander, Xiaoning Qian

Keywords Paper

Keywords Paper

Gen Li, Changxiao Cai, Yuxin Chen and
Yuantao Gu, Yuting Wei, Yuejie Chi

Keywords Paper

Nicolas Papernot, Abhradeep Thakurta, Shuang Song and
Steve Chien, Úlfar Erlingsson

Keywords Paper

Keywords Paper