Characterizing the Gap Between Actor-Critic and Policy Gradient

18/07/2021

Characterizing the Gap Between Actor-Critic and Policy Gradient

Junfeng Wen, Saurabh Kumar, Ramki Gummadi, Dale Schuurmans

Keywords: Reinforcement Learning and Planning

Abstract Paper Similar Papers

Abstract: Actor-critic (AC) methods are ubiquitous in reinforcement learning. Although it is understood that AC methods are closely related to policy gradient (PG), their precise connection has not been fully characterized previously. In this paper, we explain the gap between AC and PG methods by identifying the exact adjustment to the AC objective/gradient that recovers the true policy gradient of the cumulative reward objective (PG). Furthermore, by viewing the AC method as a two-player Stackelberg game between the actor and critic, we show that the Stackelberg policy gradient can be recovered as a special case of our more general analysis. Based on these results, we develop practical algorithms, Residual Actor-Critic and Stackelberg Actor-Critic, for estimating the correction between AC and PG and use these to modify the standard AC algorithm. Experiments on popular tabular and continuous environments show the proposed corrections can improve both the sample efficiency and final performance of existing AC methods.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at ICML 2021 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

18/07/2021

Offline Reinforcement Learning with Fisher Divergence Critic Regularization

Ilya Kostrikov, Rob Fergus, Jonathan Tompson, Ofir Nachum

Keywords Paper

Reinforcement Learning and Planning, Deep RL

0

0

0

0

4:49

06/12/2021

Local policy search with Bayesian optimization

Sarah Müller, Alexander von Rohr, Sebastian Trimpe

Keywords Paper

theory, optimization, reinforcement learning and planning, active learning

0

0

0

0

11:42

06/12/2021

FACMAC: Factored Multi-Agent Centralised Policy Gradients

Bei Peng, Tabish Rashid, Christian Schroeder de Witt and
Pierre-Alexandre Kamienny, Philip Torr, Wendelin Boehmer, Shimon Whiteson

Keywords Paper

reinforcement learning and planning

0

0

0

0

14:15

19/08/2021

Variational Model-based Policy Optimization

Yinlam Chow, Brandon Cui, Moonkyung Ryu, Mohammad Ghavamzadeh

Keywords Paper

Machine Learning, Reinforcement Learning

0

0

0

0

15:31

06/12/2021

Towards Robust Bisimulation Metric Learning

Mete Kemertas, Tristan Aumentado-Armstrong

Keywords Paper

reinforcement learning and planning, robustness, representation learning

0

0

0

0

12:24

18/07/2021

On-Policy Deep Reinforcement Learning for the Average-Reward Criterion

Yiming Zhang, Keith Ross

Keywords Paper

Reinforcement Learning and Planning

0

0

0

0

5:14

18/07/2021

Taylor Expansion of Discount Factors

Yunhao Tang, Mark Rowland, Remi Munos, Michal Valko

Keywords Paper

Reinforcement Learning and Planning

0

0

0

0

5:22

18/07/2021

Tesseract: Tensorised Actors for Multi-Agent Reinforcement Learning

Anuj Mahajan, Mikayel Samvelyan, Lei Mao and
Viktor Makoviychuk, Animesh Garg, Jean Kossaifi, Shimon Whiteson, Yuke Zhu, Anima Anandkumar

Keywords Paper

Reinforcement Learning and Planning, Multi-Agent RL

0

0

0

0

5:16

18/07/2021

On Proximal Policy Optimization's Heavy-tailed Gradients

Saurabh Garg, Joshua Zhanson, Emilio Parisotto and
Adarsh Prasad, Zico Kolter, Zachary Lipton, Sivaraman Balakrishnan, Russ Salakhutdinov, Pradeep Ravikumar

Keywords Paper

Reinforcement Learning and Planning, Deep RL

0

0

0

0

5:34

18/07/2021

Phasic Policy Gradient

Karl Cobbe, Jacob Hilton, Oleg Klimov, John Schulman

Keywords Paper

Reinforcement Learning and Planning, Deep RL

0

0

0

0

20:40

18/07/2021

Doubly Robust Off-Policy Actor-Critic: Convergence and Optimality

Tengyu Xu, Zhuoran Yang, Zhaoran Wang, Yingbin LIANG

Keywords Paper

Theory, RL, Decisions and Control Theory

0

0

0

0

4:23

19/08/2021

Average-Reward Reinforcement Learning with Trust Region Methods

Xiaoteng Ma, Xiaohang Tang, Li Xia and
Jun Yang, Qianchuan Zhao

Keywords Paper

Machine Learning, Deep Reinforcement Learning, Reinforcement Learning, Markov Decision Processes

0

0

0

0

14:41

06/12/2020

MOReL: Model-Based Offline Reinforcement Learning

Rahul Kidambi, Aravind Rajeswaran, Praneeth Netrapalli, Thorsten Joachims

Keywords Paper

1

0

0

0

3:23

12/07/2020

Structured Policy Iteration for Linear Quadratic Regulator

Youngsuk Park, Ryan Rossi, Zheng Wen and
Gang Wu, Handong Zhao

Keywords Paper

Reinforcement Learning - General

0

0

0

0

16:08

13/04/2021

Logistic q-learning

Joan Bas-Serrano, Sebastian Curi, Andreas Krause, Gergely Neu

Keywords Paper

0

0

0

0

2:44

12/07/2020

Learning Fair Policies in Multi-Objective (Deep) Reinforcement Learning with Average and Discounted Rewards

Umer Siddique, Paul Weng, Matthieu Zimmer

Keywords Paper

Reinforcement Learning - Deep RL

0

0

0

0

15:17

06/12/2021

Explicable Reward Design for Reinforcement Learning Agents

Rati Devidze, Goran Radanovic, Parameswaran Kamalaruban, Adish Singla

Keywords Paper

optimization, reinforcement learning and planning, interpretability

0

0

0

0

4:10

06/12/2020

Direct Policy Gradients: Direct Optimization of Policies in Discrete Action Spaces

Guy Lorberbom, Chris J. Maddison, Nicolas Heess and
Tamir Hazan, Daniel Tarlow

Keywords Paper

0

0

0

0

3:16

03/05/2021

Learning Value Functions in Deep Policy Gradients using Residual Variance

Yannis Flet-Berliac, reda ouhamma, odalric-ambrym maillard, philippe preux

Keywords Paper

0

0

0

0

4:49

18/07/2021

Representation Matters: Offline Pretraining for Sequential Decision Making

Mengjiao Yang, Ofir Nachum

Keywords Paper

Reinforcement Learning and Planning

1

0

0

0

5:06

03/05/2021

Discovering a set of policies for the worst case reward

Tom Zahavy, Andre Barreto, Daniel J Mankowitz and
Shaobo Hou, Brendan ODonoghue, Iurii Kemaev, Satinder Singh

Keywords Paper

0

0

0

0

10:33

26/04/2020

CAQL: Continuous Action Q-Learning

Moonkyung Ryu, Yinlam Chow, Ross Anderson and
Christian Tjandraatmadja, Craig Boutilier

Keywords Paper

Reinforcement learning (RL), DQN, Continuous control, Mixed-Integer Programming (MIP)

0

0

0

0

5:36

06/12/2020

How to Learn a Useful Critic? Model-based Action-Gradient-Estimator Policy Optimization

Pierluca D'Oro, Wojciech Jaśkowski

Keywords Paper

0

0

0

0

3:09

06/12/2020

Off-Policy Evaluation and Learning for External Validity under a Covariate Shift

Masatoshi Uehara, Masahiro Kato, Shota Yasui

Keywords Paper

0

0

0

0

3:06

18/07/2021

Recomposing the Reinforcement Learning Building Blocks with Hypernetworks

Elad Sarafian, Shai Keynan, Sarit Kraus

Keywords Paper

Reinforcement Learning and Planning, Deep RL

0

0

0

0

5:16

18/07/2021

Dynamic Balancing for Model Selection in Bandits and RL

Ashok Cutkosky, Christoph Dann, Abhimanyu Das and
Claudio Gentile, Aldo Pacchiano, Manish Purohit

Keywords Paper

Reinforcement Learning and Planning, Bandits

0

0

0

0

5:18

06/12/2020

The Value Equivalence Principle for Model-Based Reinforcement Learning

Christopher Grimm, Andre Barreto, Satinder Singh, David Silver

Keywords Paper

0

0

0

0

3:19

06/12/2021

An Efficient Transfer Learning Framework for Multiagent Reinforcement Learning

Tianpei Yang, Weixun Wang, Hongyao Tang and
Jianye Hao, Zhaopeng Meng, Hangyu Mao, Dong Li, Wulong Liu, Yingfeng Chen, Yujing Hu, Changjie Fan, Chengwei Zhang

Keywords Paper

reinforcement learning and planning, transfer learning

0

0

0

0

15:21

12/07/2020

Minimax Weight and Q-Function Learning for Off-Policy Evaluation

Masatoshi Uehara, Jiawei Huang, Nan Jiang

Keywords Paper

Reinforcement Learning - Theory

0

0

0

0

14:20

06/12/2020

An Efficient Asynchronous Method for Integrating Evolutionary and Gradient-based Policy Search

Kyunghyun Lee, Byeong-Uk Lee, Ukcheol Shin, In So Kweon

Keywords Paper

0

0

0

0

3:19

14/06/2020

Selective Transfer With Reinforced Transfer Network for Partial Domain Adaptation

Zhihong Chen, Chao Chen, Zhaowei Cheng and
Boyuan Jiang, Ke Fang, Xinyu Jin

Keywords Paper

partial domain adaptation, selective transfer, pixel-level information, reconstruct error, reinforcement learning

1

1

0

0

1:01

25/07/2020

Sequential recommendation with self-attentive multi-adversarial network

Ruiyang Ren, Zhaoyang Liu, Yaliang Li and
Wayne Xin Zhao, Hui Wang, Bolin Ding, Ji-Rong Wen

Keywords Paper

sequential recommendation, adversarial training, self-attentive mechanism

0

0

0

0

15:12

06/12/2021

Risk-Aware Transfer in Reinforcement Learning using Successor Features

Michael Gimelfarb, Andre Barreto, Scott Sanner, Chi-Guhn Lee

Keywords Paper

reinforcement learning and planning, representation learning, transfer learning

0

0

0

0

12:06

26/04/2020

Implementation Matters in Deep RL: A Case Study on PPO and TRPO

Logan Engstrom, Andrew Ilyas, Shibani Santurkar and
Dimitris Tsipras, Firdaus Janoos, Larry Rudolph, Aleksander Madry

Keywords Paper

deep policy gradient methods, deep reinforcement learning, trpo, ppo

0

0

0

0

20:41

06/12/2021

Bridging Explicit and Implicit Deep Generative Models via Neural Stein Estimators

Qitian Wu, Rui Gao, Hongyuan Zha

Keywords Paper

generative model

0

0

0

0

12:51

06/12/2021

Settling the Variance of Multi-Agent Policy Gradients

Jakub Grudzien Kuba, Muning Wen, Linghui Meng and
shangding gu, Haifeng Zhang, David Mguni, Jun Wang, Yaodong Yang

Keywords Paper

deep learning, reinforcement learning and planning

0

0

0

0

13:12

06/12/2020

Cooperative Heterogeneous Deep Reinforcement Learning

Han Zheng, Pengfei Wei, Jing Jiang and
Guodong Long, Qinghua Lu, Chengqi Zhang

Keywords Paper

0

0

0

0

3:08

26/08/2020

A Hybrid Stochastic Policy Gradient Algorithm for Reinforcement Learning

Nhan Pham, Lam Nguyen, Dzung Phan and
PHUONG HA NGUYEN, Marten van Dijk, Quoc Tran-Dinh

Keywords Paper

0

0

0

0

15:49

06/12/2020

Distributionally Robust Federated Averaging

Yuyang Deng, Mohammad Mahdi Kamani, Mehrdad Mahdavi

Keywords Paper

0

0

0

0

3:11

06/12/2020

Softmax Deep Double Deterministic Policy Gradients

Ling Pan, Qingpeng Cai, Longbo Huang

Keywords Paper

0

0

0

0

3:23