Finite-Time Analysis for Double Q-learning

Abstract: Although Q-learning is one of the most successful algorithms for finding the best action-value function (and thus the optimal policy) in reinforcement learning, its implementation often suffers from large overestimation of Q-function values incurred by random sampling. The double Q-learning algorithm proposed in~\citet{hasselt2010double} overcomes such an overestimation issue by randomly switching the update between two Q-estimators, and has thus gained significant popularity in practice. However, the theoretical understanding of double Q-learning is rather limited. So far only the asymptotic convergence has been established, which does not characterize how fast the algorithm converges. In this paper, we provide the first non-asymptotic (i.e., finite-time) analysis for double Q-learning. We show that both synchronous and asynchronous double Q-learning are guaranteed to converge to an $\epsilon$-accurate neighborhood of the global optimum by taking $\tilde{\Omega}\left(\left( \frac{1}{(1-\gamma)^6\epsilon^2}\right)^{\frac{1}{\omega}} +\left(\frac{1}{1-\gamma}\right)^{\frac{1}{1-\omega}}\right)$ iterations, where $\omega\in(0,1)$ is the decay parameter of the learning rate, and $\gamma$ is the discount factor. Our analysis develops novel techniques to derive finite-time bounds on the difference between two inter-connected stochastic processes, which is new to the literature of stochastic approximation.

06/12/2021

Finite-Time Analysis for Double Q-learning

Huaqing Xiong, Lin Zhao, Yingbin Liang, Wei Zhang

Comments

Similar Papers

Faster Non-asymptotic Convergence for Double Q-learning

Lin Zhao, Huaqing Xiong, Yingbin Liang

Keywords Abstract Paper

theory, reinforcement learning and planning

Tightening the Dependence on Horizon in the Sample Complexity of Q-Learning

Gen Li, Changxiao Cai, Yuxin Chen and Yuantao Gu, Yuting Wei, Yuejie Chi

Keywords Abstract Paper

Reinforcement Learning and Planning

Self-correcting Q-learning

Rong Zhu, Mattia Rigotti

Keywords Abstract Paper

A Finite-Time Analysis of Q-Learning with Neural Network Function Approximation

Pan Xu, Quanquan Gu

Keywords Abstract Paper

Reinforcement Learning - General

Combinatorial Pure Exploration with Full-Bandit or Partial Linear Feedback

Yihan Du, Yuko Kuroki, Wei Chen

Keywords Abstract Paper

Maxmin Q-learning: Controlling the Estimation Bias of Q-learning

Qingfeng Lan, Yangchen Pan, Alona Fyshe, Martha White

Keywords Abstract Paper

reinforcement learning, bias and variance reduction

Benefit of deep learning with non-convex noisy gradient descent: Provable excess risk bound and superiority to kernel methods

Taiji Suzuki, Akiyama Shunta

Keywords Abstract Paper

local Rademacher complexity, minimax optimal rate, Excess risk, linear estimator, kernel method, fast learning rate

Finite-Sample Analysis of Off-Policy TD-Learning via Generalized Bellman Operators

Zaiwei Chen, Siva Theja Maguluri, Sanjay Shakkottai, Karthikeyan Shanmugam

Keywords Abstract Paper

Analysis and Design of Thompson Sampling for Stochastic Partial Monitoring

Taira Tsuchiya, Junya Honda, Masashi Sugiyama

Keywords Abstract Paper

Improved Regret Bound and Experience Replay in Regularized Policy Iteration

Nevena Lazic, Dong Yin, Yasin Abbasi-Yadkori, Csaba Szepesvari

Keywords Abstract Paper

Theory, RL, Decisions and Control Theory

Breaking the Sample Size Barrier in Model-Based Reinforcement Learning with a Generative Model

Gen Li, Yuting Wei, Yuejie Chi and Yuantao Gu, Yuxin Chen

Keywords Abstract Paper

Fourier Sparse Leverage Scores and Approximate Kernel Learning

Tamas Erdelyi, Cameron Musco, Christopher Musco

Keywords Abstract Paper

CAQL: Continuous Action Q-Learning

Moonkyung Ryu, Yinlam Chow, Ross Anderson and Christian Tjandraatmadja, Craig Boutilier

Keywords Abstract Paper

Reinforcement learning (RL), DQN, Continuous control, Mixed-Integer Programming (MIP)

Provably Efficient Reinforcement Learning with Kernel and Neural Function Approximations

Zhuoran Yang, Chi Jin, Zhaoran Wang and Mengdi Wang, Michael Jordan

Keywords Abstract Paper

Towards Better Generalization of Adaptive Gradient Methods

Yingxue Zhou, Belhal Karimi, Jinxing Yu and Zhiqiang Xu, Ping Li

Keywords Abstract Paper

Task-Robust Model-Agnostic Meta-Learning

Liam Collins, Aryan Mokhtari, Sanjay Shakkottai

Keywords Abstract Paper

Optimal Sketching for Trace Estimation

Shuli Jiang, Hai Pham, David Woodruff, Richard Zhang

Keywords Abstract Paper

machine learning

PAGE: A Simple and Optimal Probabilistic Gradient Estimator for Nonconvex Optimization

Zhize Li, Hongyan Bao, Xiangliang Zhang, Peter Richtarik

Keywords Abstract Paper

Optimization

Guarantees for Tuning the Step Size using a Learning-to-Learn Approach

Xiang Wang, Shuai Yuan, Chenwei Wu, Rong Ge

Keywords Abstract Paper

Theory, Computational Learning Theory

Convergence of Meta-Learning with Task-Specific Adaptation over Partial Parameters

Kaiyi Ji, Jason Lee, Yingbin Liang, H. Vincent Poor

Keywords Abstract Paper

Positive-Negative Momentum: Manipulating Stochastic Gradient Noise to Improve Generalization

Zeke Xie, Li Yuan, Zhanxing Zhu, Masashi Sugiyama

Keywords Abstract Paper

Optimization, Stochastic Optimization

Provably Efficient Reinforcement Learning for Discounted MDPs with Feature Mapping

Dongruo Zhou, Jiafan He, Quanquan Gu

Keywords Paper

Gen Li, Changxiao Cai, Yuxin Chen and
Yuantao Gu, Yuting Wei, Yuejie Chi

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Gen Li, Yuting Wei, Yuejie Chi and
Yuantao Gu, Yuxin Chen

Keywords Paper

Keywords Paper

Moonkyung Ryu, Yinlam Chow, Ross Anderson and
Christian Tjandraatmadja, Craig Boutilier

Keywords Paper

Zhuoran Yang, Chi Jin, Zhaoran Wang and
Mengdi Wang, Michael Jordan

Keywords Paper

Yingxue Zhou, Belhal Karimi, Jinxing Yu and
Zhiqiang Xu, Ping Li

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Tadashi Kozuno, Yunhao Tang, Mark Rowland and
Remi Munos, Steven Kapturowski, Will Dabney, Michal Valko, Dave Abel

Keywords Paper

Keywords Paper

Zhengqing Zhou, Zhengyuan Zhou, Qinxun Bai and
Linhai Qiu, Jose Blanchet, Peter Glynn

Keywords Paper

Gen Li, Yuxin Chen, Yuejie Chi and
Yuantao Gu, Yuting Wei

Keywords Paper

Ilias Diakonikolas, Daniel Kane, Daniel Kongsgaard and
Jerry Li, Kevin Tian

Keywords Paper

Keywords Paper

Keywords Paper

Mingrui Zhang, Zebang Shen, Aryan Mokhtari and
Hamed Hassani, Amin Karbasi

Keywords Paper

Gen Li, Yuting Wei, Yuejie Chi and
Yuantao Gu, Yuxin Chen

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Max Simchowitz, Christopher Tosh, Akshay Krishnamurthy and
Daniel Hsu, Thodoris Lykouris, Miro Dudik, Robert E Schapire

Keywords Paper

Weichao Mao, Kaiqing Zhang, Ruihao Zhu and
David Simchi-Levi, Tamer Basar

Keywords Paper