Sample complexity bounds for two timescale value-based reinforcement learning algorithms

13/04/2021

Sample complexity bounds for two timescale value-based reinforcement learning algorithms

Tengyu Xu, Yingbin Liang

Keywords:

Abstract Paper Similar Papers

Abstract: Two timescale stochastic approximation (SA) has been widely used in value-based reinforcement learning algorithms. In the policy evaluation setting, it can model the linear and nonlinear temporal difference learning with gradient correction (TDC) algorithms as linear SA and nonlinear SA, respectively. In the policy optimization setting, two timescale nonlinear SA can also model the greedy gradient-Q (Greedy-GQ) algorithm. In previous studies, the non-asymptotic analysis of linear TDC and Greedy-GQ has been studied in the Markovian setting, with single-sample update at each iteration. For the nonlinear TDC algorithm, only the asymptotic convergence has been established. In this paper, we study the non-asymptotic convergence rate of two time-scale linear and nonlinear TDC and Greedy-GQ under Markovian sampling and with mini-batch data for each update. For linear TDC, we provide a novel non-asymptotic analysis and our sample complexity result achieves the complexity \mathcal{O}(\epsilon^{-1}\log(1/\epsilon)). For nonlinear TDC and Greedy-GQ, we show that both algorithms attain \epsilon-accurate stationary solution with sample complexity \mathcal{O}(\epsilon^{-2}). It is the first time that non-asymptotic convergence result has been established for nonlinear TDC and our result for Greedy-GQ outperforms previous result orderwisely by a factor of \mathcal{O}(\epsilon^{-1}\log(1/\epsilon)).

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at AISTATS 2021 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

06/12/2021

Non-Asymptotic Analysis for Two Time-scale TDC with General Smooth Function Approximation

Yue Wang, Shaofeng Zou, Yi Zhou

Keywords Paper

reinforcement learning and planning

0

0

0

0

14:28

06/12/2020

Distributionally Robust Federated Averaging

Yuyang Deng, Mohammad Mahdi Kamani, Mehrdad Mahdavi

Keywords Paper

0

0

0

0

3:11

18/07/2021

Temporal Difference Learning as Gradient Splitting

Rui Liu, Alex Olshevsky

Keywords Paper

Theory, RL, Decisions and Control Theory

0

0

0

0

15:21

26/04/2020

Reanalysis of Variance Reduced Temporal Difference Learning

Tengyu Xu, Zhe Wang, Yi Zhou, Yingbin Liang

Keywords Paper

Reinforcement Learning, TD learning, Markovian sample, Variance Reduction

0

0

0

0

4:29

06/12/2020

Variance-Reduced Off-Policy TDC Learning: Non-Asymptotic Convergence Analysis

Shaocong Ma, Yi Zhou, Shaofeng Zou

Keywords Paper

0

0

0

0

3:08

03/05/2021

Greedy-GQ with Variance Reduction: Finite-time Analysis and Improved Complexity

Shaocong Ma, Ziyi Chen, Yi Zhou, Shaofeng Zou

Keywords Paper

Machine Learning, Reinforcement Learning, Optimization

0

0

0

0

2:59

22/06/2020

Private stochastic convex optimization: Optimal rates in linear time

Vitaly Feldman, Tomer Koren, Kunal Talwar

Keywords Paper

Differential Privacy, Stochastic Convex Optimization, Stochastic Gradient Descent

0

0

0

0

24:28

02/02/2021

Loop Estimator for Discounted Values in Markov Reward Processes

Falcon Z. Dai, Matthew R. Walter

Keywords Paper

0

0

0

0

21:51

26/04/2020

Geometric Insights into the Convergence of Nonlinear TD Learning

David Brandfonbrener, Joan Bruna

Keywords Paper

TD, nonlinear, convergence, value estimation, reinforcement learning

0

0

0

0

5:10

06/12/2021

Taming Communication and Sample Complexities in Decentralized Policy Evaluation for Cooperative Multi-Agent Reinforcement Learning

Xin Zhang, Zhuqing Liu, Jia Liu and
Zhengyuan Zhu, Songtao Lu

Keywords Paper

theory, optimization, reinforcement learning and planning

0

0

0

0

14:54

09/07/2020

Finite Time Analysis of Linear Two-timescale Stochastic Approximation with Markovian Noise

Maksim Kaledin, Eric Moulines, Alexey Naumov and
Vladislav Tadic, Hoi-To Wai

Keywords Paper

Stochastic optimization, Reinforcement learning

0

0

0

0

12:29

08/07/2020

Optimal Streaming Algorithms for Submodular Maximization with Cardinality Constraints

Naor Alaluf, Alina Ene, Moran Feldman, Huy Nguyen and Andrew Suh

Keywords Paper

Submodular maximization, streaming algorithms, cardinality constraint

0

0

0

0

25:27

06/12/2020

Least Squares Regression with Markovian Data: Fundamental Limits and Algorithms

Dheeraj Nagaraj, Xian Wu, Guy Bresler and
Prateek Jain, Praneeth Netrapalli

Keywords Paper

0

0

0

0

3:34

06/12/2020

Improving Sample Complexity Bounds for (Natural) Actor-Critic Algorithms

Tengyu Xu, Zhe Wang, Yingbin Liang

Keywords Paper

0

0

0

0

3:12

06/12/2021

Online Robust Reinforcement Learning with Model Uncertainty

Yue Wang, Shaofeng Zou

Keywords Paper

reinforcement learning and planning, robustness

0

0

0

0

14:45

02/02/2021

A Hybrid Stochastic Gradient Hamiltonian Monte Carlo Method

Chao Zhang, Zhijian Li, Zebang Shen and
Jiahao Xie, Hui Qian

Keywords Paper

0

0

0

0

17:10

18/07/2021

Doubly Robust Off-Policy Actor-Critic: Convergence and Optimality

Tengyu Xu, Zhuoran Yang, Zhaoran Wang, Yingbin LIANG

Keywords Paper

Theory, RL, Decisions and Control Theory

0

0

0

0

4:23

06/12/2021

Consistent Estimation for PCA and Sparse Regression with Oblivious Outliers

Tommaso d'Orsi, Chih-Hung Liu, Rajai Nasser and
Gleb Novikov, David Steurer, Stefan Tiegel

Keywords Paper

optimization

0

0

0

0

10:44

18/07/2021

Improved Regret Bound and Experience Replay in Regularized Policy Iteration

Nevena Lazic, Dong Yin, Yasin Abbasi-Yadkori, Csaba Szepesvari

Keywords Paper

Theory, RL, Decisions and Control Theory

0

0

0

0

20:03

06/12/2021

High-probability Bounds for Non-Convex Stochastic Optimization with Heavy Tails

Ashok Cutkosky, Harsh Mehta

Keywords Paper

deep learning, optimization

0

0

0

0

20:14

18/07/2021

Towards Tight Bounds on the Sample Complexity of Average-reward MDPs

Yujia Jin, Aaron Sidford

Keywords Paper

Theory, RL, Decisions and Control Theory

0

0

0

0

5:05

06/12/2020

Finite-Time Analysis for Double Q-learning

Huaqing Xiong, Lin Zhao, Yingbin Liang, Wei Zhang

Keywords Paper

Deep Learning -> Embedding Approaches, Applications -> Natural Language Processing

0

0

0

0

3:18

26/08/2020

Alternating Minimization Converges Super-Linearly for Mixed Linear Regression

Avishek Ghosh, Ramchandran Kannan

Keywords Paper

0

0

0

0

12:56

06/12/2021

Bridging Offline Reinforcement Learning and Imitation Learning: A Tale of Pessimism

Paria Rashidinejad, Banghua Zhu, Cong Ma and
Jiantao Jiao, Stuart Russell

Keywords Paper

theory, reinforcement learning and planning, bandits

0

0

0

0

12:21

04/08/2021

Projected Stochastic Gradient Langevin Algorithms for Constrained Sampling and Non-Convex Learning

Andrew Lamperski

Keywords Paper

0

0

0

0

16:52

18/07/2021

Private Stochastic Convex Optimization: Optimal Rates in L1 Geometry

Hilal Asi, Vitaly Feldman, Tomer Koren, Kunal Talwar

Keywords Paper

Deep Learning, Algorithms, Multitask and Transfer Learning; Algorithms, Online Learning, Social Aspects of Machine Learning, Privacy, Anonymity, and Security

0

0

0

0

17:27

06/12/2021

Slice Sampling Reparameterization Gradients

David M Zoltowski, Diana Cai, Ryan Adams

Keywords Paper

optimization, machine learning, generative model

0

0

0

0

14:43

12/07/2020

Efficiently Solving MDPs with Stochastic Mirror Descent

Yujia Jin, Aaron Sidford

Keywords Paper

Optimization - Convex

0

0

0

0

14:56

26/04/2020

Towards Better Understanding of Adaptive Gradient Algorithms in Generative Adversarial Nets

Mingrui Liu, Youssef Mroueh, Jerret Ross and
Wei Zhang, Xiaodong Cui, Payel Das, Tianbao Yang

Keywords Paper

Generative Adversarial Nets, Adaptive Gradient Algorithms

0

0

0

0

5:08

09/07/2020

Optimality and Approximation with Policy Gradient Methods in Markov Decision Processes

Alekh Agarwal, Sham Kakade, Jason Lee, Gaurav Mahajan

Keywords Paper

Reinforcement learning, Non-convex optimization

0

0

0

0

11:00

26/04/2020

A Stochastic Derivative Free Optimization Method with Momentum

Eduard Gorbunov, Adel Bibi, Ozan Sener and
El Houcine Bergou, Peter Richtarik

Keywords Paper

derivative-free optimization, stochastic optimization, heavy ball momentum, importance sampling

0

0

0

0

4:51

06/12/2020

Fourier Sparse Leverage Scores and Approximate Kernel Learning

Tamas Erdelyi, Cameron Musco, Christopher Musco

Keywords Paper

0

0

0

0

3:25

26/08/2020

A Reduction from Reinforcement Learning to No-Regret Online Learning

Ching-An Cheng, Remi Tachet des Combes, Byron Boots, Geoff Gordon

Keywords Paper

0

0

0

0

14:33

12/07/2020

Momentum-Based Policy Gradient Methods

Feihu Huang, Shangqian Gao, Jian Pei, Heng Huang

Keywords Paper

Reinforcement Learning - General

0

0

0

0

13:28

06/12/2021

A Near-Optimal Algorithm for Stochastic Bilevel Optimization via Double-Momentum

Prashant Khanduri, Siliang Zeng, Mingyi Hong and
Hoi-To Wai, Zhaoran Wang, Zhuoran Yang

Keywords Paper

optimization

0

0

0

0

9:47

06/12/2021

Stochastic Gradient Descent-Ascent and Consensus Optimization for Smooth Games: Convergence Analysis under Expected Co-coercivity

Nicolas Loizou, Hugo Berard, Gauthier Gidel and
Ioannis Mitliagkas, Simon Lacoste-Julien

Keywords Paper

optimization

0

0

0

0

15:44

06/12/2021

Near-Optimal Offline Reinforcement Learning via Double Variance Reduction

Ming Yin, Yu Bai, Yu-Xiang Wang

Keywords Paper

theory, optimization, reinforcement learning and planning

0

0

0

0

8:57

06/12/2021

Nearly Horizon-Free Offline Reinforcement Learning

Tongzheng Ren, Jialian Li, Bo Dai and
Simon Du, Sujay Sanghavi

Keywords Paper

theory, optimization, reinforcement learning and planning

0

0

0

0

8:44

03/05/2021

The Role of Momentum Parameters in the Optimal Convergence of Adaptive Polyak's Heavy-ball Methods

Wei Tao, sheng long, Gaowei Wu, Qing Tao

Keywords Paper

optimal convergence, convex optimization, momentum methods, Deep learning, adaptive heavy-ball methods

0

0

0

0

5:16

22/06/2020

Non-adaptive adaptive sampling on turnstile streams

Sepideh Mahabadi, Ilya Razenshteyn, David P. Woodruff, Samson Zhou

Keywords Paper

volume maximization, determinantal point processes, computational geometry, streaming algorithms

0

0

0

0

25:07