Sample Efficient Policy Gradient Methods with Recursive Variance Reduction

26/04/2020

Sample Efficient Policy Gradient Methods with Recursive Variance Reduction

Pan Xu, Felicia Gao, Quanquan Gu

Keywords: Policy Gradient, Reinforcement Learning, Sample Efficiency

Abstract Paper Similar Papers

Abstract: Improving the sample efficiency in reinforcement learning has been a long-standing research problem. In this work, we aim to reduce the sample complexity of existing policy gradient methods. We propose a novel policy gradient algorithm called SRVR-PG, which only requires $O(1/\epsilon^{3/2})$\footnote{$O(\cdot)$ notation hides constant factors.} episodes to find an $\epsilon$-approximate stationary point of the nonconcave performance function $J(\boldsymbol{\theta})$ (i.e., $\boldsymbol{\theta}$ such that $\|\nabla J(\boldsymbol{\theta})\|_2^2\leq\epsilon$). This sample complexity improves the existing result $O(1/\epsilon^{5/3})$ for stochastic variance reduced policy gradient algorithms by a factor of $O(1/\epsilon^{1/6})$. In addition, we also propose a variant of SRVR-PG with parameter exploration, which explores the initial policy parameter from a prior probability distribution. We conduct numerical experiments on classic control problems in reinforcement learning to validate the performance of our proposed algorithms.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at ICLR 2020 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

12/07/2020

Momentum-Based Policy Gradient Methods

Feihu Huang, Shangqian Gao, Jian Pei, Heng Huang

Keywords Paper

Reinforcement Learning - General

0

0

0

0

13:28

06/12/2020

Breaking the Sample Size Barrier in Model-Based Reinforcement Learning with a Generative Model

Gen Li, Yuting Wei, Yuejie Chi and
Yuantao Gu, Yuxin Chen

Keywords Paper

0

0

0

0

3:09

06/12/2021

Policy Finetuning: Bridging Sample-Efficient Offline and Online Reinforcement Learning

Tengyang Xie, Nan Jiang, Huan Wang and
Caiming Xiong, Yu Bai

Keywords Paper

theory, optimization, reinforcement learning and planning

1

0

0

0

10:57

26/08/2020

A Reduction from Reinforcement Learning to No-Regret Online Learning

Ching-An Cheng, Remi Tachet des Combes, Byron Boots, Geoff Gordon

Keywords Paper

0

0

0

0

14:33

06/12/2021

Taming Communication and Sample Complexities in Decentralized Policy Evaluation for Cooperative Multi-Agent Reinforcement Learning

Xin Zhang, Zhuqing Liu, Jia Liu and
Zhengyuan Zhu, Songtao Lu

Keywords Paper

theory, optimization, reinforcement learning and planning

0

0

0

0

14:54

06/12/2021

Near-Optimal Offline Reinforcement Learning via Double Variance Reduction

Ming Yin, Yu Bai, Yu-Xiang Wang

Keywords Paper

theory, optimization, reinforcement learning and planning

0

0

0

0

8:57

18/07/2021

Model-Free Reinforcement Learning: from Clipped Pseudo-Regret to Sample Complexity

Zhang Zihan, Yuan Zhou, Xiangyang Ji

Keywords Paper

Reinforcement Learning and Planning

0

0

0

0

5:03

18/07/2021

UCB Momentum Q-learning: Correcting the bias without forgetting

Pierre MENARD, Omar Darwiche Domingues, Xuedong Shang, Michal Valko

Keywords Paper

Theory, RL, Decisions and Control Theory

0

0

0

0

16:10

18/07/2021

Provably Efficient Reinforcement Learning for Discounted MDPs with Feature Mapping

Dongruo Zhou, Jiafan He, Quanquan Gu

Keywords Paper

Reinforcement Learning and Planning

0

0

0

0

5:20

06/12/2020

Provably Efficient Reinforcement Learning with Kernel and Neural Function Approximations

Zhuoran Yang, Chi Jin, Zhaoran Wang and
Mengdi Wang, Michael Jordan

Keywords Paper

0

0

0

0

3:42

06/12/2021

Reward-Free Model-Based Reinforcement Learning with Linear Function Approximation

Weitong ZHANG, Dongruo Zhou, Quanquan Gu

Keywords Paper

reinforcement learning and planning

0

0

0

0

11:53

06/12/2020

Sample Complexity of Asynchronous Q-Learning: Sharper Analysis and Variance Reduction

Gen Li, Yuting Wei, Yuejie Chi and
Yuantao Gu, Yuxin Chen

Keywords Paper

0

0

0

0

3:06

06/12/2020

Deep Inverse Q-learning with Constraints

Gabriel Kalweit, Maria Huegle, Moritz Werling, Joschka Boedecker

Keywords Paper

0

0

0

0

3:14

18/07/2021

Tightening the Dependence on Horizon in the Sample Complexity of Q-Learning

Gen Li, Changxiao Cai, Yuxin Chen and
Yuantao Gu, Yuting Wei, Yuejie Chi

Keywords Paper

Reinforcement Learning and Planning

0

0

0

1

4:49

06/12/2020

Is Plug-in Solver Sample-Efficient for Feature-based Reinforcement Learning?

Qiwen Cui, Lin Yang

Keywords Paper

Algorithms -> Semi-Supervised Learning; Deep Learning -> Deep Autoencoders; Deep Learning -> Generative Models, Probabilistic Methods -> Variational Inference

0

0

0

0

3:25

06/12/2021

A Max-Min Entropy Framework for Reinforcement Learning

Seungyul Han, Youngchul Sung

Keywords Paper

optimization, reinforcement learning and planning

0

0

0

0

14:35

26/04/2020

Q-learning with UCB Exploration is Sample Efficient for Infinite-Horizon MDP

Yuanhao Wang, Kefan Dong, Xiaoyu Chen, Liwei Wang

Keywords Paper

theory, reinforcement learning, sample complexity

0

0

0

0

3:25

02/02/2021

Combinatorial Pure Exploration with Full-Bandit or Partial Linear Feedback

Yihan Du, Yuko Kuroki, Wei Chen

Keywords Paper

0

0

0

0

17:13

06/12/2021

Policy Optimization in Adversarial MDPs: Improved Exploration via Dilated Bonuses

Haipeng Luo, Chen-Yu Wei, Chung-Wei Lee

Keywords Paper

optimization, reinforcement learning and planning, bandits

0

0

0

0

15:17

09/07/2020

Provably Efficient Reinforcement Learning with Linear Function Approximation

Chi Jin, Zhuoran Yang, Zhaoran Wang, Michael Jordan

Keywords Paper

Reinforcement learning,

0

0

0

0

13:04

26/04/2020

CAQL: Continuous Action Q-Learning

Moonkyung Ryu, Yinlam Chow, Ross Anderson and
Christian Tjandraatmadja, Craig Boutilier

Keywords Paper

Reinforcement learning (RL), DQN, Continuous control, Mixed-Integer Programming (MIP)

0

0

0

0

5:36

18/07/2021

Near-Optimal Model-Free Reinforcement Learning in Non-Stationary Episodic MDPs

Weichao Mao, Kaiqing Zhang, Ruihao Zhu and
David Simchi-Levi, Tamer Basar

Keywords Paper

Theory, RL, Decisions and Control Theory

0

0

0

0

4:12

12/07/2020

Tightening Exploration in Upper Confidence Reinforcement Learning

Hippolyte Bourel, Odalric-Ambrym Maillard, Mohammad Sadegh Talebi

Keywords Paper

Reinforcement Learning - General

0

0

0

0

16:14

06/12/2021

Stochastic Shortest Path: Minimax, Parameter-Free and Towards Horizon-Free Regret

Jean Tarbouriech, Runlong Zhou, Simon Du and
Matteo Pirotta, Michal Valko, Alessandro Lazaric

Keywords Paper

theory, reinforcement learning and planning

0

0

0

0

13:47

06/12/2020

Improving Sample Complexity Bounds for (Natural) Actor-Critic Algorithms

Tengyu Xu, Zhe Wang, Yingbin Liang

Keywords Paper

0

0

0

0

3:12

06/12/2021

Nearly Horizon-Free Offline Reinforcement Learning

Tongzheng Ren, Jialian Li, Bo Dai and
Simon Du, Sujay Sanghavi

Keywords Paper

theory, optimization, reinforcement learning and planning

0

0

0

0

8:44

18/07/2021

PAGE: A Simple and Optimal Probabilistic Gradient Estimator for Nonconvex Optimization

Zhize Li, Hongyan Bao, Xiangliang Zhang, Peter Richtarik

Keywords Paper

Optimization

0

0

0

0

11:53

18/07/2021

Doubly Robust Off-Policy Actor-Critic: Convergence and Optimality

Tengyu Xu, Zhuoran Yang, Zhaoran Wang, Yingbin LIANG

Keywords Paper

Theory, RL, Decisions and Control Theory

0

0

0

0

4:23

18/07/2021

Near Optimal Reward-Free Reinforcement Learning

Zhang Zihan, Simon Du, Xiangyang Ji

Keywords Paper

Theory, RL, Decisions and Control Theory

0

0

0

0

17:08

26/04/2020

Ranking Policy Gradient

Kaixiang Lin, Jiayu Zhou

Keywords Paper

Sample-efficient reinforcement learning, off-policy learning.

0

0

0

0

5:43

06/12/2020

Model-Based Multi-Agent RL in Zero-Sum Markov Games with Near-Optimal Sample Complexity

Kaiqing Zhang, Sham Kakade, Tamer Basar, Lin Yang

Keywords Paper

0

0

0

0

3:25

02/02/2021

Robust Reinforcement Learning: A Case Study in Linear Quadratic Regulation

Bo Pang, Zhong-Ping Jiang

Keywords Paper

0

0

0

0

20:01

04/08/2021

Nearly Minimax Optimal Reinforcement Learning for Linear Mixture MDPs

Dongruo Zhou, Quanquan Gu, Csaba Szepesvari

Keywords Paper

0

0

0

0

16:33

06/12/2020

Sample Efficient Reinforcement Learning via Low-Rank Matrix Estimation

Devavrat Shah, Dogyoon Song, Zhi Xu, Yuzhe Yang

Keywords Paper

0

0

0

0

3:22

06/12/2021

An Exponential Lower Bound for Linearly Realizable MDP with Constant Suboptimality Gap

Yuanhao Wang, Ruosong Wang, Sham Kakade

Keywords Paper

theory, reinforcement learning and planning, generative model

0

0

0

0

15:01

03/05/2021

Optimism in Reinforcement Learning with Generalized Linear Function Approximation

Yining Wang, Ruosong Wang, Simon Du, Akshay Krishnamurthy

Keywords Paper

reinforcement learning, theory, exploration, function approximation, provable sample efficiency, regret analysis, optimism

0

0

0

0

4:51

06/12/2020

Geometric Exploration for Online Control

Orestis Plevrakis, Elad Hazan

Keywords Paper

0

0

0

0

3:21

18/07/2021

Model-based Reinforcement Learning for Continuous Control with Posterior Sampling

Ying Fan, Yifei Ming

Keywords Paper

Reinforcement Learning and Planning

0

0

0

0

18:34

13/04/2021

Logistic q-learning

Joan Bas-Serrano, Sebastian Curi, Andreas Krause, Gergely Neu

Keywords Paper

0

0

0

0

2:44

06/12/2020

Generalization Bound of Gradient Descent for Non-Convex Metric Learning

MINGZHI DONG, Xiaochen Yang, Rui Zhu and
Yujiang Wang, Jing-Hao Xue

Keywords Paper

0

0

0

0

3:18