Greedy-GQ with Variance Reduction: Finite-time Analysis and Improved Complexity

03/05/2021

Greedy-GQ with Variance Reduction: Finite-time Analysis and Improved Complexity

Shaocong Ma, Ziyi Chen, Yi Zhou, Shaofeng Zou

Keywords: Machine Learning, Reinforcement Learning, Optimization

Abstract Paper Similar Papers

Abstract: Greedy-GQ is a value-based reinforcement learning (RL) algorithm for optimal control. Recently, the finite-time analysis of Greedy-GQ has been developed under linear function approximation and Markovian sampling, and the algorithm is shown to achieve an $\epsilon$-stationary point with a sample complexity in the order of $\mathcal{O}(\epsilon^{-3})$. Such a high sample complexity is due to the large variance induced by the Markovian samples. In this paper, we propose a variance-reduced Greedy-GQ (VR-Greedy-GQ) algorithm for off-policy optimal control. In particular, the algorithm applies the SVRG-based variance reduction scheme to reduce the stochastic variance of the two time-scale updates. We study the finite-time convergence of VR-Greedy-GQ under linear function approximation and Markovian sampling and show that the algorithm achieves a much smaller bias and variance error than the original Greedy-GQ. In particular, we prove that VR-Greedy-GQ achieves an improved sample complexity that is in the order of $\mathcal{O}(\epsilon^{-2})$. We further compare the performance of VR-Greedy-GQ with that of Greedy-GQ in various RL experiments to corroborate our theoretical findings.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at ICLR 2021 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

03/05/2021

Learning to Make Decisions via Submodular Regularization

Ayya Alieva, Aiden Aceves, Jialin Song and
Stephen Mayo, Yisong Yue, Yuxin Chen

Keywords Paper

0

0

0

0

5:53

26/08/2020

Greed Meets Sparsity: Understanding and Improving Greedy Coordinate Descent for Sparse Optimization

Huang Fang, Zhenan Fan, Yifan Sun, Michael Friedlander

Keywords Paper

0

0

0

0

13:41

06/12/2021

Robust Regression Revisited: Acceleration and Improved Estimation Rates

Arun Jambulapati, Jerry Li, Tselil Schramm, Kevin Tian

Keywords Paper

theory, optimization

0

0

0

0

14:22

06/12/2020

Variance Reduction via Accelerated Dual Averaging for Finite-Sum Optimization

Chaobing Song, Yong Jiang, Yi Ma

Keywords Paper

Algorithms -> Boosting and Ensemble Methods; Applications -> Hardware and Systems; Applications -> Natural Language Processing;, Applications -> Privacy, Anonymity, and Security

0

0

0

0

3:23

06/12/2020

Sample Efficient Reinforcement Learning via Low-Rank Matrix Estimation

Devavrat Shah, Dogyoon Song, Zhi Xu, Yuzhe Yang

Keywords Paper

0

0

0

0

3:22

19/08/2021

Improved CP-Based Lagrangian Relaxation Approach with an Application to the TSP

Raphaël Boudreault, Claude-Guy Quimper

Keywords Paper

Constraints and SAT, Global Constraints, Constraint Optimization, Constraints

0

0

0

0

15:00

06/12/2020

Distributionally Robust Federated Averaging

Yuyang Deng, Mohammad Mahdi Kamani, Mehrdad Mahdavi

Keywords Paper

0

0

0

0

3:11

06/12/2020

An Empirical Process Approach to the Union Bound: Practical Algorithms for Combinatorial and Linear Bandits

Julian Katz-Samuels, Lalit Jain, zohar karnin, Kevin Jamieson

Keywords Paper

0

0

0

0

3:20

13/04/2021

Direct loss minimization for sparse gaussian processes

Yadi Wei, Rishit Sheth, Roni Khardon

Keywords Paper

0

0

0

0

3:24

26/04/2020

Gradient Descent Maximizes the Margin of Homogeneous Neural Networks

Kaifeng Lyu, Jian Li

Keywords Paper

margin, homogeneous, gradient descent

0

0

0

0

15:02

06/12/2020

Least Squares Regression with Markovian Data: Fundamental Limits and Algorithms

Dheeraj Nagaraj, Xian Wu, Guy Bresler and
Prateek Jain, Praneeth Netrapalli

Keywords Paper

0

0

0

0

3:34

26/10/2020

Solving K-MDPs

Jonathan Ferrer-Mestres, Thomas G. Dietterich, Olivier Buffet, Iadine Chadès

Keywords Paper

state abstraction, interpretability, MDP, computational sustainability

0

0

0

0

10:13

18/07/2021

Private Stochastic Convex Optimization: Optimal Rates in L1 Geometry

Hilal Asi, Vitaly Feldman, Tomer Koren, Kunal Talwar

Keywords Paper

Deep Learning, Algorithms, Multitask and Transfer Learning; Algorithms, Online Learning, Social Aspects of Machine Learning, Privacy, Anonymity, and Security

0

0

0

0

17:27

06/12/2021

Breaking the Sample Complexity Barrier to Regret-Optimal Model-Free Reinforcement Learning

Gen Li, Laixi Shi, Yuxin Chen and
Yuantao Gu, Yuejie Chi

Keywords Paper

theory, reinforcement learning and planning

0

0

0

0

15:32

12/07/2020

Efficiently Solving MDPs with Stochastic Mirror Descent

Yujia Jin, Aaron Sidford

Keywords Paper

Optimization - Convex

0

0

0

0

14:56

06/12/2021

Beyond Tikhonov: faster learning with self-concordant losses, via iterative regularization

Gaspard Beugnot, Julien Mairal, Alessandro Rudi

Keywords Paper

theory, optimization, kernel methods

0

0

0

0

13:54

12/07/2020

Momentum-Based Policy Gradient Methods

Feihu Huang, Shangqian Gao, Jian Pei, Heng Huang

Keywords Paper

Reinforcement Learning - General

0

0

0

0

13:28

04/08/2021

Projected Stochastic Gradient Langevin Algorithms for Constrained Sampling and Non-Convex Learning

Andrew Lamperski

Keywords Paper

0

0

0

0

16:52

12/07/2020

Multi-Step Greedy Reinforcement Learning Algorithms

Manan Tomar, Yonathan Efroni, Mohammad Ghavamzadeh

Keywords Paper

Reinforcement Learning - General

0

0

0

0

12:49

26/04/2020

To Relieve Your Headache of Training an MRF, Take AdVIL

Chongxuan Li, Chao Du, Kun Xu and
Max Welling, Jun Zhu, Bo Zhang

Keywords Paper

Markov Random Fields, Undirected Graphical Models, Variational Inference, Black-box Infernece

0

0

0

0

4:24

06/12/2020

Robustness Analysis of Non-Convex Stochastic Gradient Descent using Biased Expectations

Kevin Scaman, Cedric Malherbe

Keywords Paper

0

0

0

0

3:09

06/12/2020

Direct Policy Gradients: Direct Optimization of Policies in Discrete Action Spaces

Guy Lorberbom, Chris J. Maddison, Nicolas Heess and
Tamir Hazan, Daniel Tarlow

Keywords Paper

0

0

0

0

3:16

03/05/2021

Local Search Algorithms for Rank-Constrained Convex Optimization

Kyriakos Axiotis, Maxim Sviridenko

Keywords Paper

matrix completion, rank-constrained convex optimization, low rank

0

0

0

0

4:59

12/07/2020

Accelerated Message Passing for Entropy-Regularized MAP Inference

Jonathan Lee, Aldo Pacchiano, Peter Bartlett, Michael Jordan

Keywords Paper

Probabilistic Inference - Models and Probabilistic Programming

0

0

0

0

14:57

26/08/2020

Langevin Monte Carlo without smoothness

Niladri Chatterji, Jelena Diakonikolas, Michael Jordan, Peter Bartlett

Keywords Paper

0

0

0

0

15:02

09/07/2020

A Fast Spectral Algorithm for Mean Estimation with Sub-Gaussian Rates

Zhixian Lei, Kyle Luh, Prayaag Venkat, Fred Zhang

Keywords Paper

High-dimensional statistics, Adversarial learning and robustness

0

0

0

0

15:00

02/02/2021

Combinatorial Pure Exploration with Full-Bandit or Partial Linear Feedback

Yihan Du, Yuko Kuroki, Wei Chen

Keywords Paper

0

0

0

0

17:13

06/12/2021

Sampling with Trusthworthy Constraints: A Variational Gradient Framework

Xingchao Liu, Xin Tong, Qiang Liu

Keywords Paper

optimization, machine learning, fairness, interpretability

0

0

0

0

11:21

13/04/2021

Sample complexity bounds for two timescale value-based reinforcement learning algorithms

Tengyu Xu, Yingbin Liang

Keywords Paper

0

0

0

0

2:57

06/12/2020

Provably Efficient Reinforcement Learning with Kernel and Neural Function Approximations

Zhuoran Yang, Chi Jin, Zhaoran Wang and
Mengdi Wang, Michael Jordan

Keywords Paper

0

0

0

0

3:42

06/12/2021

Policy Optimization in Adversarial MDPs: Improved Exploration via Dilated Bonuses

Haipeng Luo, Chen-Yu Wei, Chung-Wei Lee

Keywords Paper

optimization, reinforcement learning and planning, bandits

0

0

0

0

15:17

06/12/2021

Bayesian decision-making under misspecified priors with applications to meta-learning

Max Simchowitz, Christopher Tosh, Akshay Krishnamurthy and
Daniel Hsu, Thodoris Lykouris, Miro Dudik, Robert E Schapire

Keywords Paper

meta learning, bandits

0

0

0

0

14:58

06/12/2020

Adaptive Discretization for Model-Based Reinforcement Learning

Sean Sinclair, Tianyu Wang, Gauri Jain and
Sid Banerjee, Christina Yu

Keywords Paper

0

0

0

0

3:12

03/05/2021

Convex Potential Flows: Universal Probability Distributions with Optimal Transport and Convex Optimization

Chin-Wei Huang, Ricky T. Q. Chen, Christos Tsirigotis, Aaron Courville

Keywords Paper

convex optimization, Normalizing flows, universal approximation, optimal transport, invertible neural networks, variational inference, generative models

0

1

1

0

5:13

06/12/2021

Determinantal point processes based on orthogonal polynomials for sampling minibatches in SGD

Rémi Bardenet, Subhroshekhar Ghosh, Meixia LIN

Keywords Paper

optimization, machine learning

0

0

0

0

14:51

04/08/2021

Regret Minimization in Heavy-Tailed Bandits

Shubhada Agrawal, Sandeep K Juneja, Wouter M Koolen

Keywords Paper

0

0

0

0

17:35

06/12/2020

Geometric Exploration for Online Control

Orestis Plevrakis, Elad Hazan

Keywords Paper

0

0

0

0

3:21

06/12/2020

Tight Nonparametric Convergence Rates for Stochastic Gradient Descent under the Noiseless Linear Model

Raphaël Berthier, Francis Bach, Pierre Gaillard

Keywords Paper

Optimization -> Non-Convex Optimization, Deep Learning -> Optimization for Deep Networks

0

0

0

0

3:05

06/12/2021

A Domain-Shrinking based Bayesian Optimization Algorithm with Order-Optimal Regret Performance

Sudeep Salgia, Sattar Vakili, Qing Zhao

Keywords Paper

optimization, bandits, kernel methods

0

0

0

0

15:51

13/04/2021

Differentiable greedy algorithm for monotone submodular maximization: Guarantees, gradient estimators, and applications

Shinsaku Sakaue

Keywords Paper

0

0

0

0

2:59