How to Learn a Useful Critic? Model-based Action-Gradient-Estimator Policy Optimization

06/12/2020

How to Learn a Useful Critic? Model-based Action-Gradient-Estimator Policy Optimization

Pierluca D'Oro, Wojciech Jaśkowski

Keywords:

Abstract Paper Similar Papers

Abstract: Deterministic-policy actor-critic algorithms for continuous control improve the actor by plugging its actions into the critic and ascending the action-value gradient, which is obtained by chaining the actor's Jacobian matrix with the gradient of the critic with respect to input actions. However, instead of gradients, the critic is, typically, only trained to accurately predict expected returns, which, on their own, are useless for policy optimization. In this paper, we propose MAGE, a model-based actor-critic algorithm, grounded in the theory of policy gradients, which explicitly learns the action-value gradient. MAGE backpropagates through the learned dynamics to compute gradient targets in temporal difference learning, leading to a critic tailored for policy improvement. On a set of MuJoCo continuous-control tasks, we demonstrate the efficiency of the algorithm in comparison to model-free and model-based state-of-the-art baselines.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at NeurIPS 2020 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

18/07/2021

Offline Reinforcement Learning with Fisher Divergence Critic Regularization

Ilya Kostrikov, Rob Fergus, Jonathan Tompson, Ofir Nachum

Keywords Paper

Reinforcement Learning and Planning, Deep RL

0

0

0

0

4:49

03/05/2021

Learning Value Functions in Deep Policy Gradients using Residual Variance

Yannis Flet-Berliac, reda ouhamma, odalric-ambrym maillard, philippe preux

Keywords Paper

0

0

0

0

4:49

16/11/2020

Safe Policy Learning for Continuous Control

Yinlam Chow, Ofir Nachum, Aleksandra Faust and
Edgar Dueñez-Guzman, Mohammad Ghavamzadeh

Keywords Paper

0

0

0

0

5:20

06/12/2021

FACMAC: Factored Multi-Agent Centralised Policy Gradients

Bei Peng, Tabish Rashid, Christian Schroeder de Witt and
Pierre-Alexandre Kamienny, Philip Torr, Wendelin Boehmer, Shimon Whiteson

Keywords Paper

reinforcement learning and planning

0

0

0

0

14:15

06/12/2021

Learning MDPs from Features: Predict-Then-Optimize for Sequential Decision Making by Reinforcement Learning

Kai Wang, Sanket Shah, Haipeng Chen and
Andrew Perrault, Finale Doshi-Velez, Milind Tambe

Keywords Paper

deep learning, optimization, reinforcement learning and planning

0

0

0

0

14:52

06/12/2021

Towards Robust Bisimulation Metric Learning

Mete Kemertas, Tristan Aumentado-Armstrong

Keywords Paper

reinforcement learning and planning, robustness, representation learning

0

0

0

0

12:24

06/12/2020

Online Meta-Critic Learning for Off-Policy Actor-Critic Methods

Wei Zhou, Yiying Li, Yongxin Yang and
Huaimin Wang, Timothy Hospedales

Keywords Paper

0

0

0

0

3:12

03/05/2021

Parameter-Based Value Functions

Francesco Faccio, Louis Kirsch, Jürgen Schmidhuber

Keywords Paper

Off-Policy Reinforcement Learning, Reinforcement Learning

0

0

0

0

2:45

06/12/2020

Model-based Adversarial Meta-Reinforcement Learning

Zichuan Lin, Garrett Thomas, Guangwen Yang, Tengyu Ma

Keywords Paper

0

0

0

0

3:31

06/12/2020

Logarithmic Regret Bound in Partially Observable Linear Dynamical Systems

Sahin Lale, Kamyar Azizzadenesheli, Babak Hassibi, Anima Anandkumar

Keywords Paper

0

0

0

0

3:25

18/07/2021

Characterizing the Gap Between Actor-Critic and Policy Gradient

Junfeng Wen, Saurabh Kumar, Ramki Gummadi, Dale Schuurmans

Keywords Paper

Reinforcement Learning and Planning

0

0

0

0

4:54

06/12/2021

Wasserstein Flow Meets Replicator Dynamics: A Mean-Field Analysis of Representation Learning in Actor-Critic

Yufeng Zhang, Siyu Chen, Zhuoran Yang and
Michael Jordan, Zhaoran Wang

Keywords Paper

deep learning, optimization, reinforcement learning and planning, representation learning, optimal transport

0

0

0

0

7:25

06/12/2020

Inverse Reinforcement Learning from a Gradient-based Learner

Giorgia Ramponi, Gianluca Drappo, Marcello Restelli

Keywords Paper

0

0

0

0

2:42

06/12/2020

Direct Policy Gradients: Direct Optimization of Policies in Discrete Action Spaces

Guy Lorberbom, Chris J. Maddison, Nicolas Heess and
Tamir Hazan, Daniel Tarlow

Keywords Paper

0

0

0

0

3:16

03/05/2021

Control-Aware Representations for Model-based Reinforcement Learning

Brandon Cui, Yinlam Chow, Mohammad Ghavamzadeh

Keywords Paper

0

0

0

0

4:57

06/12/2021

Is Bang-Bang Control All You Need? Solving Continuous Control with Bernoulli Policies

Tim Seyde, Igor Gilitschenski, Wilko Schwarting and
Bartolomeo Stellato, Martin Riedmiller, Markus Wulfmeier, Daniela Rus

Keywords Paper

reinforcement learning and planning

0

0

0

0

6:48

06/12/2020

An Imitation from Observation Approach to Transfer Learning with Dynamics Mismatch

Siddharth Desai, Ishan Durugkar, Haresh Karnan and
Garrett Warnell, Josiah Hanna, Peter Stone

Keywords Paper

0

0

0

0

3:22

18/07/2021

On Proximal Policy Optimization's Heavy-tailed Gradients

Saurabh Garg, Joshua Zhanson, Emilio Parisotto and
Adarsh Prasad, Zico Kolter, Zachary Lipton, Sivaraman Balakrishnan, Russ Salakhutdinov, Pradeep Ravikumar

Keywords Paper

Reinforcement Learning and Planning, Deep RL

0

0

0

0

5:34

18/07/2021

Provably Correct Optimization and Exploration with Non-linear Policies

Fei Feng, Wotao Yin, Alekh Agarwal, Lin Yang

Keywords Paper

Deep Learning, Adversarial Networks, Applications, Fairness, Accountability, and Transparency, Theory, RL, Decisions and Control Theory

0

0

0

0

5:03

13/04/2021

Learning to defend by learning to attack

Haoming Jiang, Zhehui Chen, Yuyang Shi and
Bo Dai, Tuo Zhao

Keywords Paper

0

0

0

0

2:58

02/02/2021

Addressing Action Oscillations through Learning Policy Inertia

Chen Chen, Hongyao Tang, Jianye Hao and
Wulong Liu, Zhaopeng Meng

Keywords Paper

0

0

0

0

14:57

18/07/2021

Tesseract: Tensorised Actors for Multi-Agent Reinforcement Learning

Anuj Mahajan, Mikayel Samvelyan, Lei Mao and
Viktor Makoviychuk, Animesh Garg, Jean Kossaifi, Shimon Whiteson, Yuke Zhu, Anima Anandkumar

Keywords Paper

Reinforcement Learning and Planning, Multi-Agent RL

0

0

0

0

5:16

06/12/2021

COMBO: Conservative Offline Model-Based Policy Optimization

Tianhe Yu, Aviral Kumar, Rafael Rafailov and
Aravind Rajeswaran, Sergey Levine, Chelsea Finn

Keywords Paper

deep learning, optimization, reinforcement learning and planning

0

0

0

0

12:35

06/12/2021

Local policy search with Bayesian optimization

Sarah Müller, Alexander von Rohr, Sebastian Trimpe

Keywords Paper

theory, optimization, reinforcement learning and planning, active learning

0

0

0

0

11:42

06/12/2021

Neural Algorithmic Reasoners are Implicit Planners

Andreea-Ioana Deac, Petar Veličković, Ognjen Milinkovic and
Pierre-Luc Bacon, Jian Tang, Mladen Nikolic

Keywords Paper

deep learning, reinforcement learning and planning, self-supervised learning, generative model, graph learning

0

0

0

0

13:10

06/12/2021

An online passive-aggressive algorithm for difference-of-squares classification

Lawrence Saul

Keywords Paper

machine learning, online learning

0

0

0

0

14:00

16/11/2020

Tolerance-Guided Policy Learning for Adaptable and Transferrable Delicate Industrial Insertion

Boshen Niu, Chenxi Wang, Changliu Liu

Keywords Paper

0

0

0

0

5:36

03/05/2021

Batch Reinforcement Learning Through Continuation Method

Yijie Guo, Shengyu Feng, Nicolas Le Roux and
Ed H. Chi, Honglak Lee, Minmin Chen

Keywords Paper

batch reinforcement learning, relaxed regularization, continuation method

1

0

0

0

5:34

04/07/2020

Semi-Supervised Dialogue Policy Learning via Stochastic Reward Estimation

Xinting Huang, Jianzhong Qi, Yu Sun, Rui Zhang

Keywords Paper

Semi-Supervised Learning, generalization function, Stochastic Estimation, Dialogue optimization

0

0

0

0

11:31

12/07/2020

Tuning-free Plug-and-Play Proximal Algorithm for Inverse Imaging Problems

Kaixuan Wei, Angelica I Aviles-Rivero, Jingwei Liang and
Ying Fu, Carola-Bibiane Schönlieb, Hua Huang

Keywords Paper

Deep Learning - Algorithms

0

0

0

0

11:48

02/02/2021

Right for Better Reasons: Training Differentiable Models by Constraining their Influence Functions

Xiaoting Shao, Arseny Skryagin, Wolfgang Stammer and
Patrick Schramowski, Kristian Kersting

Keywords Paper

0

0

0

0

19:08

03/05/2021

Robust Reinforcement Learning on State Observations with Learned Optimal Adversary

Huan Zhang, Hongge Chen, Duane S Boning, Cho-Jui Hsieh

Keywords Paper

reinforcement learning, robustness, adversarial attacks, adversarial defense

0

0

0

0

5:14

18/07/2021

OptiDICE: Offline Policy Optimization via Stationary Distribution Correction Estimation

Jongmin Lee, Wonseok Jeon, Byung-Jun Lee and
Joelle Pineau, Kee-Eung Kim

Keywords Paper

Reinforcement Learning and Planning

1

0

0

1

5:15

13/04/2021

Logistic q-learning

Joan Bas-Serrano, Sebastian Curi, Andreas Krause, Gergely Neu

Keywords Paper

0

0

0

0

2:44

06/12/2021

Time-series Generation by Contrastive Imitation

Daniel Jarrett, Ioana Bica, Mihaela van der Schaar

Keywords Paper

generative model

0

0

0

0

8:47

02/02/2021

Explaining Neural Matrix Factorization with Gradient Rollback

Carolin Lawrence, Timo Sztyler, Mathias Niepert

Keywords Paper

0

0

0

0

16:47

06/12/2020

Gamma-Models: Generative Temporal Difference Learning for Infinite-Horizon Prediction

Michael Janner, Igor Mordatch, Sergey Levine

Keywords Paper

0

0

0

0

3:16

18/07/2021

Variational Empowerment as Representation Learning for Goal-Conditioned Reinforcement Learning

Jongwook Choi, Archit Sharma, Honglak Lee and
Sergey Levine, Shixiang Gu

Keywords Paper

Neuroscience and Cognitive Science, Neuroscience, Reinforcement Learning and Planning, Algorithms, Representation Learning; Algorithms, Sparse Coding and Dimensionality Expansion; Applications, Matrix and Ten

0

0

0

0

5:16

23/08/2020

AutoFIS: Automatic feature interaction selection in factorization models for click-through rate prediction

Bin Liu, Chenxu Zhu, Guilin Li and
Weinan Zhang, Jincai Lai, Ruiming Tang, Xiuqiang He, Zhenguo Li, Yong Yu

Keywords Paper

feature selection, neural architecture search, recommendation, factorization machine

0

0

0

0

19:23

02/02/2021

Cascade Network with Guided Loss and Hybrid Attention for Finding Good Correspondences

Zhi Chen, Fan Yang, Wenbing Tao

Keywords Paper

0

0

0

0

17:32