Non-Asymptotic Analysis for Two Time-scale TDC with General Smooth Function Approximation

06/12/2021

Non-Asymptotic Analysis for Two Time-scale TDC with General Smooth Function Approximation

Yue Wang, Shaofeng Zou, Yi Zhou

Keywords: reinforcement learning and planning

Abstract Paper Similar Papers

Abstract: Temporal-difference learning with gradient correction (TDC) is a two time-scale algorithm for policy evaluation in reinforcement learning. This algorithm was initially proposed with linear function approximation, and was later extended to the one with general smooth function approximation. The asymptotic convergence for the on-policy setting with general smooth function approximation was established in [Bhatnagar et al., 2009], however, the non-asymptotic convergence analysis remains unsolved due to challenges in the non-linear and two-time-scale update structure, non-convex objective function and the projection onto a time-varying tangent plane. In this paper, we develop novel techniques to address the above challenges and explicitly characterize the non-asymptotic error bound for the general off-policy setting with i.i.d. or Markovian samples, and show that it converges as fast as $\mathcal O(1/\sqrt T)$ (up to a factor of $\mathcal O(\log T)$). Our approach can be applied to a wide range of value-based reinforcement learning algorithms with general smooth function approximation.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at NeurIPS 2021 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

06/12/2021

On the Convergence Theory of Debiased Model-Agnostic Meta-Reinforcement Learning

Alireza Fallah, Kristian Georgiev, Aryan Mokhtari, Asuman Ozdaglar

Keywords Paper

theory, optimization, reinforcement learning and planning, meta learning

0

1

1

0

12:25

13/04/2021

Sample complexity bounds for two timescale value-based reinforcement learning algorithms

Tengyu Xu, Yingbin Liang

Keywords Paper

0

0

0

0

2:57

09/07/2020

Finite Time Analysis of Linear Two-timescale Stochastic Approximation with Markovian Noise

Maksim Kaledin, Eric Moulines, Alexey Naumov and
Vladislav Tadic, Hoi-To Wai

Keywords Paper

Stochastic optimization, Reinforcement learning

0

0

0

0

12:29

18/07/2021

Doubly Robust Off-Policy Actor-Critic: Convergence and Optimality

Tengyu Xu, Zhuoran Yang, Zhaoran Wang, Yingbin LIANG

Keywords Paper

Theory, RL, Decisions and Control Theory

0

0

0

0

4:23

26/04/2020

Reanalysis of Variance Reduced Temporal Difference Learning

Tengyu Xu, Zhe Wang, Yi Zhou, Yingbin Liang

Keywords Paper

Reinforcement Learning, TD learning, Markovian sample, Variance Reduction

0

0

0

0

4:29

18/07/2021

Temporal Difference Learning as Gradient Splitting

Rui Liu, Alex Olshevsky

Keywords Paper

Theory, RL, Decisions and Control Theory

0

0

0

0

15:21

06/12/2020

Distributionally Robust Federated Averaging

Yuyang Deng, Mohammad Mahdi Kamani, Mehrdad Mahdavi

Keywords Paper

0

0

0

0

3:11

12/07/2020

Momentum-Based Policy Gradient Methods

Feihu Huang, Shangqian Gao, Jian Pei, Heng Huang

Keywords Paper

Reinforcement Learning - General

0

0

0

0

13:28

02/02/2021

On Convergence of Gradient Expected Sarsa(λ)

Long Yang, Gang Zheng, Yu Zhang and
Qian Zheng, Pengfei Li, Gang Pan

Keywords Paper

0

0

0

0

11:27

06/12/2020

Confounding-Robust Policy Evaluation in Infinite-Horizon Reinforcement Learning

Nathan Kallus, Angela Zhou

Keywords Paper

0

0

0

0

4:51

26/08/2020

Finite-Time Error Bounds for Biased Stochastic Approximation with Applications to Q-Learning

Gang Wang, Georgios B. Giannakis

Keywords Paper

0

0

0

0

14:03

06/12/2021

Taming Communication and Sample Complexities in Decentralized Policy Evaluation for Cooperative Multi-Agent Reinforcement Learning

Xin Zhang, Zhuqing Liu, Jia Liu and
Zhengyuan Zhu, Songtao Lu

Keywords Paper

theory, optimization, reinforcement learning and planning

0

0

0

0

14:54

26/04/2020

Geometric Insights into the Convergence of Nonlinear TD Learning

David Brandfonbrener, Joan Bruna

Keywords Paper

TD, nonlinear, convergence, value estimation, reinforcement learning

0

0

0

0

5:10

06/12/2021

Online Robust Reinforcement Learning with Model Uncertainty

Yue Wang, Shaofeng Zou

Keywords Paper

reinforcement learning and planning, robustness

0

0

0

0

14:45

06/12/2021

Generalization Guarantee of SGD for Pairwise Learning

Yunwen Lei, Mingrui Liu, Yiming Ying

Keywords Paper

optimization, machine learning

0

0

0

0

14:30

09/07/2020

Optimality and Approximation with Policy Gradient Methods in Markov Decision Processes

Alekh Agarwal, Sham Kakade, Jason Lee, Gaurav Mahajan

Keywords Paper

Reinforcement learning, Non-convex optimization

0

0

0

0

11:00

26/08/2020

A Distributional Analysis of Sampling-Based Reinforcement Learning Algorithms

Philip Amortila, Doina Precup, Prakash Panangaden, Marc G. Bellemare

Keywords Paper

0

0

0

0

15:15

06/12/2020

The Wasserstein Proximal Gradient Algorithm

Adil Salim, Anna Korba, Giulia Luise

Keywords Paper

0

0

0

0

3:14

18/07/2021

Bilevel Optimization: Convergence Analysis and Enhanced Design

Kaiyi Ji, Junjie Yang, Yingbin LIANG

Keywords Paper

Optimization, Non-Convex Optimization

0

0

0

0

5:02

13/04/2021

Logistic q-learning

Joan Bas-Serrano, Sebastian Curi, Andreas Krause, Gergely Neu

Keywords Paper

0

0

0

0

2:44

02/02/2021

Robust Reinforcement Learning: A Case Study in Linear Quadratic Regulation

Bo Pang, Zhong-Ping Jiang

Keywords Paper

0

0

0

0

20:01

26/08/2020

On the Convergence Theory of Gradient-Based Model-Agnostic Meta-Learning Algorithms

Alireza Fallah, Aryan Mokhtari, Asuman Ozdaglar

Keywords Paper

0

0

0

0

15:02

13/04/2021

A study of condition numbers for first-order optimization

Charles Guille-Escuret, Manuela Girotti, Baptiste Goujaud, Ioannis Mitliagkas

Keywords Paper

0

0

0

0

2:46

06/12/2021

Policy Optimization in Adversarial MDPs: Improved Exploration via Dilated Bonuses

Haipeng Luo, Chen-Yu Wei, Chung-Wei Lee

Keywords Paper

optimization, reinforcement learning and planning, bandits

0

0

0

0

15:17

26/08/2020

Linear Convergence of Adaptive Stochastic Gradient Descent

Yuege Xie, Xiaoxia Wu, Rachel Ward

Keywords Paper

0

0

0

0

10:02

03/05/2021

The Role of Momentum Parameters in the Optimal Convergence of Adaptive Polyak's Heavy-ball Methods

Wei Tao, sheng long, Gaowei Wu, Qing Tao

Keywords Paper

optimal convergence, convex optimization, momentum methods, Deep learning, adaptive heavy-ball methods

0

0

0

0

5:16

02/02/2021

Loop Estimator for Discounted Values in Markov Reward Processes

Falcon Z. Dai, Matthew R. Walter

Keywords Paper

0

0

0

0

21:51

03/05/2021

What are the Statistical Limits of Offline RL with Linear Function Approximation?

Ruosong Wang, Dean Foster, Sham M Kakade

Keywords Paper

batch reinforcement learning, representation, function approximation, lower bound

0

0

0

0

9:02

18/07/2021

Robust Reinforcement Learning using Least Squares Policy Iteration with Provable Performance Guarantees

Kishan Panaganti, Dileep Kalathil

Keywords Paper

Theory, RL, Decisions and Control Theory

0

0

0

0

5:15

26/04/2020

Sample Efficient Policy Gradient Methods with Recursive Variance Reduction

Pan Xu, Felicia Gao, Quanquan Gu

Keywords Paper

Policy Gradient, Reinforcement Learning, Sample Efficiency

0

0

0

0

4:40

04/08/2021

Convergence rates and approximation results for SGD and its continuous-time counterpart

Xavier Fontaine, Valentin De Bortoli, Alain Durmus

Keywords Paper

0

0

0

0

17:35

03/05/2021

Single-Timescale Actor-Critic Provably Finds Globally Optimal Policy

Zuyue Fu, Zhuoran Yang, Zhaoran Wang

Keywords Paper

0

0

0

0

5:12

18/07/2021

Provably Correct Optimization and Exploration with Non-linear Policies

Fei Feng, Wotao Yin, Alekh Agarwal, Lin Yang

Keywords Paper

Deep Learning, Adversarial Networks, Applications, Fairness, Accountability, and Transparency, Theory, RL, Decisions and Control Theory

0

0

0

0

5:03

26/08/2020

Alternating Minimization Converges Super-Linearly for Mixed Linear Regression

Avishek Ghosh, Ramchandran Kannan

Keywords Paper

0

0

0

0

12:56

03/05/2021

Optimism in Reinforcement Learning with Generalized Linear Function Approximation

Yining Wang, Ruosong Wang, Simon Du, Akshay Krishnamurthy

Keywords Paper

reinforcement learning, theory, exploration, function approximation, provable sample efficiency, regret analysis, optimism

0

0

0

0

4:51

06/12/2021

An Analysis of Constant Step Size SGD in the Non-convex Regime: Asymptotic Normality and Bias

Lu Yu, Krishnakumar Balasubramanian, Stanislav Volgushev, Murat Erdogdu

Keywords Paper

optimization, machine learning

0

0

0

0

10:21

06/12/2021

Slice Sampling Reparameterization Gradients

David M Zoltowski, Diana Cai, Ryan Adams

Keywords Paper

optimization, machine learning, generative model

0

0

0

0

14:43

06/12/2021

Understanding End-to-End Model-Based Reinforcement Learning Methods as Implicit Parameterization

Clement Gehring, Kenji Kawaguchi, Jiaoyang Huang, Leslie Kaelbling

Keywords Paper

theory, deep learning, optimization, reinforcement learning and planning

0

0

0

0

13:08

06/12/2021

Loss function based second-order Jensen inequality and its application to particle variational inference

Futoshi Futami, Tomoharu Iwata, naonori ueda and
Issei Sato, Masashi Sugiyama

Keywords Paper

optimization, generative model

0

0

0

0

14:09

02/02/2021

Distribution Adaptive INT8 Quantization for Training CNNs

Kang Zhao, Sida Huang, Pan Pan and
Yinghan Li, Yingya Zhang, Zhenyu Gu, Yinghui Xu

Keywords Paper

0

0

0

0

16:42