An Improved Analysis of (Variance-Reduced) Policy Gradient and Natural Policy Gradient Methods

Abstract: In this paper, we revisit and improve the convergence of policy gradient (PG), natural PG (NPG) methods, and their variance-reduced variants, under general smooth policy parametrizations. More specifically, with the Fisher information matrix of the policy being positive definite: i) we show that a state-of-the-art variance-reduced PG method, which has only been shown to converge to stationary points, converges to the globally optimal value up to some inherent function approximation error due to policy parametrization; ii) we show that NPG enjoys a lower sample complexity; iii) we propose SRVR-NPG, which incorporates variance-reduction into the NPG update. Our improvements follow from an observation that the convergence of (variance-reduced) PG and NPG methods can improve each other: the stationary convergence analysis of PG can be applied on NPG as well, and the global convergence analysis of NPG can help to establish the global convergence of (variance-reduced) PG methods. Our analysis carefully integrates the advantages of these two lines of works. Thanks to this improvement, we have also made variance-reduction for NPG possible for the first time, with both global convergence and an efficient finite-sample complexity.

06/12/2020

An Improved Analysis of (Variance-Reduced) Policy Gradient and Natural Policy Gradient Methods

Yanli Liu, Kaiqing Zhang, Tamer Basar, Wotao Yin

Comments

Similar Papers

A Simple and Efficient Smoothing Method for Faster Optimization and Local Exploration

Kevin Scaman, Ludovic DOS SANTOS, Merwan Barlier, Igor Colin

Keywords Abstract Paper

Logistic q-learning

Joan Bas-Serrano, Sebastian Curi, Andreas Krause, Gergely Neu

Keywords Abstract Paper

On the Global Convergence Rates of Softmax Policy Gradient Methods

Jincheng Mei, Chenjun Xiao, Csaba Szepesvari, Dale Schuurmans

Keywords Abstract Paper

Reinforcement Learning - Theory

Explicit regularization of stochastic gradient methods through duality

Anant Raj, Francis Bach

Keywords Abstract Paper

From Importance Sampling to Doubly Robust Policy Gradient

Jiawei Huang, Nan Jiang

Keywords Abstract Paper

Reinforcement Learning - Theory

Settling the Variance of Multi-Agent Policy Gradients

Jakub Grudzien Kuba, Muning Wen, Linghui Meng and shangding gu, Haifeng Zhang, David Mguni, Jun Wang, Yaodong Yang

Keywords Abstract Paper

deep learning, reinforcement learning and planning

Direct loss minimization for sparse gaussian processes

Yadi Wei, Rishit Sheth, Roni Khardon

Keywords Abstract Paper

An Even More Optimal Stochastic Optimization Algorithm: Minibatching and Interpolation Learning

Blake Woodworth, Nathan Srebro

Keywords Abstract Paper

optimization

Fast Stochastic Bregman Gradient Methods: Sharp Analysis and Variance Reduction

Radu Alexandru Dragomir, Mathieu Even, Hadrien Hendrikx

Keywords Abstract Paper

Optimization, Convex Optimization

The Performance Analysis of Generalized Margin Maximizers on Separable Data

Fariborz Salehi, Ehsan Abbasi, Babak Hassibi

Keywords Abstract Paper

Learning Theory

Sample Complexity of Policy Gradient Finding Second-Order Stationary Points

Long Yang, Qian Zheng, Gang Pan

Keywords Abstract Paper

Temporal Difference Learning as Gradient Splitting

Rui Liu, Alex Olshevsky

Keywords Abstract Paper

Theory, RL, Decisions and Control Theory

A novel variational form of the Schatten-$p$ quasi-norm

Paris Giampouras, Rene Vidal, Athanasios Rontogiannis, Benjamin Haeffele

Keywords Abstract Paper

Efficient constrained sampling via the mirror-Langevin algorithm

Kwangjun Ahn, Sinho Chewi

Keywords Abstract Paper

optimization, generative model, optimal transport

Fast convergence of stochastic subgradient method under interpolation

Huang Fang, Zhenan Fan, Michael Friedlander

Keywords Abstract Paper

interpolation, stochastic subgradient method, convergence analysis, Optimization

PID Accelerated Value Iteration Algorithm

Amir-massoud Farahmand, Mohammad Ghavamzadeh

Keywords Abstract Paper

Reinforcement Learning and Planning

Gaussian Process Bandit Optimization of the Thermodynamic Variational Objective

Vu Nguyen, Vaden Masrani, Rob Brekelmans and Michael A Osborne, Frank Wood

Keywords Abstract Paper

On Density Estimation with Diffusion Models

Diederik Kingma, Tim Salimans, Ben Poole, Jonathan Ho

Keywords Abstract Paper

optimization, generative model

Distributionally Robust Federated Averaging

Yuyang Deng, Mohammad Mahdi Kamani, Mehrdad Mahdavi

Keywords Abstract Paper

Generalized Doubly Reparameterized Gradient Estimators

Matthias Bauer, Andriy Mnih

Keywords Abstract Paper

Probabilistic Methods, Approximate Inference

Convergence rates and approximation results for SGD and its continuous-time counterpart

Xavier Fontaine, Valentin De Bortoli, Alain Durmus

Keywords Abstract Paper

Greed Meets Sparsity: Understanding and Improving Greedy Coordinate Descent for Sparse Optimization

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Jakub Grudzien Kuba, Muning Wen, Linghui Meng and
shangding gu, Haifeng Zhang, David Mguni, Jun Wang, Yaodong Yang

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Vu Nguyen, Vaden Masrani, Rob Brekelmans and
Michael A Osborne, Frank Wood

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Youngsuk Park, Ryan Rossi, Zheng Wen and
Gang Wu, Handong Zhao

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Junyu Zhang, Chengzhuo Ni, zheng Yu and
Csaba Szepesvari, Mengdi Wang

Keywords Paper

Evgenii Chzhen, Christophe Denis, Mohamed Hebiri and
Luca Oneto, Massimiliano Pontil

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper