On the Global Convergence Rates of Softmax Policy Gradient Methods

12/07/2020

On the Global Convergence Rates of Softmax Policy Gradient Methods

Jincheng Mei, Chenjun Xiao, Csaba Szepesvari, Dale Schuurmans

Keywords: Reinforcement Learning - Theory

Abstract Paper Similar Papers

Abstract: We make three contributions toward better understanding policy gradient methods. First, we show that with the true gradient, policy gradient with a softmax parametrization converges at a $O(1/t)$ rate, with constants depending on the problem and initialization. This result significantly improves recent asymptotic convergence results. The analysis relies on two findings: that the softmax policy gradient satisfies a \L{}ojasiewicz inequality, and the minimum probability of an optimal action during optimization can be bounded in terms of its initial value. Second, we analyze entropy regularized policy gradient and show that in the one state (bandit) case it enjoys a linear convergence rate $O(e^{-t})$, while for general MDPs we prove that it converges at a $O(1/t)$ rate. This result resolves an open question in the recent literature. A key insight is that the entropy regularized gradient update behaves similarly to the contraction operator in value learning, with contraction factor depending on current policy. Finally, combining the above two results and additional lower bound results, we explain how entropy regularization improves policy optimization, even with the true gradient, from the perspective of convergence rate. These results provide a theoretical understanding of the impact of entropy and corroborate existing empirical studies.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at ICML 2020 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

02/02/2021

Sample Complexity of Policy Gradient Finding Second-Order Stationary Points

Long Yang, Qian Zheng, Gang Pan

Keywords Paper

0

0

0

0

14:41

06/12/2020

First-Order Methods for Large-Scale Market Equilibrium Computation

Yuan Gao, Christian Kroer

Keywords Paper

0

0

0

0

3:17

04/08/2021

Softmax Policy Gradient Methods Can Take Exponential Time to Converge

Gen Li, Yuting Wei, Yuejie Chi and
Yuantao Gu, Yuxin Chen

Keywords Paper

0

0

0

0

15:15

09/07/2020

Optimality and Approximation with Policy Gradient Methods in Markov Decision Processes

Alekh Agarwal, Sham Kakade, Jason Lee, Gaurav Mahajan

Keywords Paper

Reinforcement learning, Non-convex optimization

0

0

0

0

11:00

06/12/2020

Variational Policy Gradient Method for Reinforcement Learning with General Utilities

Junyu Zhang, Alec Koppel, Amrit Bedi and
Csaba Szepesvari, Mengdi Wang

Keywords Paper

0

0

0

0

3:20

06/12/2020

Robustness Analysis of Non-Convex Stochastic Gradient Descent using Biased Expectations

Kevin Scaman, Cedric Malherbe

Keywords Paper

0

0

0

0

3:09

06/12/2021

Rates of Estimation of Optimal Transport Maps using Plug-in Estimators via Barycentric Projections

NABARUN DEB, Promit Ghosal, Bodhisattva Sen

Keywords Paper

machine learning, optimal transport

0

1

2

1

14:43

12/07/2020

Structured Policy Iteration for Linear Quadratic Regulator

Youngsuk Park, Ryan Rossi, Zheng Wen and
Gang Wu, Handong Zhao

Keywords Paper

Reinforcement Learning - General

0

0

0

0

16:08

06/12/2020

A Maximum-Entropy Approach to Off-Policy Evaluation in Average-Reward MDPs

Nevena Lazic, Dong Yin, Mehrdad Farajtabar and
Nir Levine, DILAN Gorur, Chris Harris, Dale Schuurmans

Keywords Paper

Deep Learning -> Supervised Deep Networks, Algorithms -> Semi-Supervised Learning

0

0

0

0

3:20

03/05/2021

The Role of Momentum Parameters in the Optimal Convergence of Adaptive Polyak's Heavy-ball Methods

Wei Tao, sheng long, Gaowei Wu, Qing Tao

Keywords Paper

optimal convergence, convex optimization, momentum methods, Deep learning, adaptive heavy-ball methods

0

0

0

0

5:16

18/07/2021

Temporal Difference Learning as Gradient Splitting

Rui Liu, Alex Olshevsky

Keywords Paper

Theory, RL, Decisions and Control Theory

0

0

0

0

15:21

06/12/2020

Sinkhorn Barycenter via Functional Gradient Descent

Zebang Shen, Zhenfu Wang, Alejandro Ribeiro, Hamed Hassani

Keywords Paper

0

0

0

1

3:14

02/02/2021

On Convergence of Gradient Expected Sarsa(λ)

Long Yang, Gang Zheng, Yu Zhang and
Qian Zheng, Pengfei Li, Gang Pan

Keywords Paper

0

0

0

0

11:27

13/04/2021

Explicit regularization of stochastic gradient methods through duality

Anant Raj, Francis Bach

Keywords Paper

0

0

0

0

2:53

02/02/2021

Sample Efficient Reinforcement Learning with REINFORCE

Junzi Zhang, Jongho Kim, Brendan O'Donoghue, Stephen Boyd

Keywords Paper

0

0

0

0

20:13

13/04/2021

Reinforcement learning in parametric MDPs with exponential families

Sayak Ray Chowdhury, Aditya Gopalan, Odalric-Ambrym Maillard

Keywords Paper

0

0

0

0

3:22

06/12/2020

Federated Accelerated Stochastic Gradient Descent

Honglin Yuan, Tengyu Ma

Keywords Paper

0

0

0

0

3:19

18/07/2021

Doubly Robust Off-Policy Actor-Critic: Convergence and Optimality

Tengyu Xu, Zhuoran Yang, Zhaoran Wang, Yingbin LIANG

Keywords Paper

Theory, RL, Decisions and Control Theory

0

0

0

0

4:23

06/12/2020

The Wasserstein Proximal Gradient Algorithm

Adil Salim, Anna Korba, Giulia Luise

Keywords Paper

0

0

0

0

3:14

18/07/2021

Bilevel Optimization: Convergence Analysis and Enhanced Design

Kaiyi Ji, Junjie Yang, Yingbin LIANG

Keywords Paper

Optimization, Non-Convex Optimization

0

0

0

0

5:02

06/12/2020

Natural Policy Gradient Primal-Dual Method for Constrained Markov Decision Processes

Dongsheng Ding, Kaiqing Zhang, Tamer Basar, Mihailo Jovanovic

Keywords Paper

0

0

0

0

3:23

06/12/2020

Distributionally Robust Federated Averaging

Yuyang Deng, Mohammad Mahdi Kamani, Mehrdad Mahdavi

Keywords Paper

0

0

0

0

3:11

06/12/2021

On the Convergence Theory of Debiased Model-Agnostic Meta-Reinforcement Learning

Alireza Fallah, Kristian Georgiev, Aryan Mokhtari, Asuman Ozdaglar

Keywords Paper

theory, optimization, reinforcement learning and planning, meta learning

0

1

1

0

12:25

06/12/2021

Non-Asymptotic Analysis for Two Time-scale TDC with General Smooth Function Approximation

Yue Wang, Shaofeng Zou, Yi Zhou

Keywords Paper

reinforcement learning and planning

0

0

0

0

14:28

06/12/2021

Averaging on the Bures-Wasserstein manifold: dimension-free convergence of gradient descent

Jason Altschuler, Sinho Chewi, Patrik R Gerber, Austin Stromme

Keywords Paper

optimization, optimal transport

0

0

0

0

15:03

13/04/2021

Direct loss minimization for sparse gaussian processes

Yadi Wei, Rishit Sheth, Roni Khardon

Keywords Paper

0

0

0

0

3:24

03/05/2021

Optimism in Reinforcement Learning with Generalized Linear Function Approximation

Yining Wang, Ruosong Wang, Simon Du, Akshay Krishnamurthy

Keywords Paper

reinforcement learning, theory, exploration, function approximation, provable sample efficiency, regret analysis, optimism

0

0

0

0

4:51

06/12/2021

The Unbalanced Gromov Wasserstein Distance: Conic Formulation and Relaxation

Thibault Sejourne, Francois-Xavier Vialard, Gabriel Peyré

Keywords Paper

optimization, machine learning, domain adaptation, optimal transport

0

0

0

1

12:06

06/12/2021

Taming Communication and Sample Complexities in Decentralized Policy Evaluation for Cooperative Multi-Agent Reinforcement Learning

Xin Zhang, Zhuqing Liu, Jia Liu and
Zhengyuan Zhu, Songtao Lu

Keywords Paper

theory, optimization, reinforcement learning and planning

0

0

0

0

14:54

13/04/2021

Logistic q-learning

Joan Bas-Serrano, Sebastian Curi, Andreas Krause, Gergely Neu

Keywords Paper

0

0

0

0

2:44

06/12/2021

On the Convergence and Sample Efficiency of Variance-Reduced Policy Gradient Method

Junyu Zhang, Chengzhuo Ni, zheng Yu and
Csaba Szepesvari, Mengdi Wang

Keywords Paper

theory, reinforcement learning and planning

0

0

0

0

14:49

06/12/2021

Near-Optimal Offline Reinforcement Learning via Double Variance Reduction

Ming Yin, Yu Bai, Yu-Xiang Wang

Keywords Paper

theory, optimization, reinforcement learning and planning

0

0

0

0

8:57

14/09/2020

Efficiency of Coordinate Descent Methods For Structured Nonconvex Optimization

Qi Deng, Chenghao Lan

Keywords Paper

coordinate descent method, nonconvex optimization, nonsmooth optimization

0

0

0

0

3:20

04/08/2021

SGD Generalizes Better Than GD (And Regularization Doesn't Help)

Idan Amir, Tomer Koren, Roi Livni

Keywords Paper

0

0

0

0

15:53

19/08/2021

LTL-Constrained Steady-State Policy Synthesis

Jan Křetínský

Keywords Paper

Planning and Scheduling, Markov Decisions Processes, Formal Verification, Validation and Synthesis, Markov Decision Processes

0

0

0

0

15:01

06/12/2021

An Exponential Lower Bound for Linearly Realizable MDP with Constant Suboptimality Gap

Yuanhao Wang, Ruosong Wang, Sham Kakade

Keywords Paper

theory, reinforcement learning and planning, generative model

0

0

0

0

15:01

02/02/2021

Deep Bayesian Quadrature Policy Optimization

Ravi Tej Akella, Kamyar Azizzadenesheli, Mohammad Ghavamzadeh and
Animashree Anandkumar, Yisong Yue

Keywords Paper

0

0

0

0

15:39

18/07/2021

First-Order Methods for Wasserstein Distributionally Robust MDP

Julien Grand-Clement, Christian Kroer

Keywords Paper

Theory, RL, Decisions and Control Theory

0

0

0

0

5:18

06/12/2021

Optimal Uniform OPE and Model-based Offline Reinforcement Learning in Time-Homogeneous, Reward-Free and Task-Agnostic Settings

Ming Yin, Yu-Xiang Wang

Keywords Paper

theory, reinforcement learning and planning

0

0

0

0

8:46

06/12/2021

Finite-Sample Analysis of Off-Policy TD-Learning via Generalized Bellman Operators

Zaiwei Chen, Siva Theja Maguluri, Sanjay Shakkottai, Karthikeyan Shanmugam

Keywords Paper

0

0

0

0

14:56