Learning Value Functions in Deep Policy Gradients using Residual Variance

Abstract: Policy gradient algorithms have proven to be successful in diverse decision making and control tasks. However, these methods suffer from high sample complexity and instability issues. In this paper, we address these challenges by providing a different approach for training the critic in the actor-critic framework. Our work builds on recent studies indicating that traditional actor-critic algorithms do not succeed in fitting the true value function, calling for the need to identify a better objective for the critic. In our method, the critic uses a new state-value (resp. state-action-value) function approximation that learns the value of the states (resp. state-action pairs) relative to their mean value rather than the absolute value as in conventional actor-critic. We prove the theoretical consistency of the new gradient estimator and observe dramatic empirical improvement across a variety of continuous control tasks and algorithms. Furthermore, we validate our method in tasks with sparse rewards, where we provide experimental evidence and theoretical insights.

06/12/2021

Deep Learning, Adversarial Networks, Applications, Fairness, Accountability, and Transparency, Theory, RL, Decisions and Control Theory

5:03

06/12/2021

Learning Value Functions in Deep Policy Gradients using Residual Variance

Yannis Flet-Berliac, reda ouhamma, odalric-ambrym maillard, philippe preux

Comments

Similar Papers

Towards Robust Bisimulation Metric Learning

Mete Kemertas, Tristan Aumentado-Armstrong

Keywords Abstract Paper

reinforcement learning and planning, robustness, representation learning

Provably Correct Optimization and Exploration with Non-linear Policies

Fei Feng, Wotao Yin, Alekh Agarwal, Lin Yang

Keywords Abstract Paper

Deep Learning, Adversarial Networks, Applications, Fairness, Accountability, and Transparency, Theory, RL, Decisions and Control Theory

COMBO: Conservative Offline Model-Based Policy Optimization

Tianhe Yu, Aviral Kumar, Rafael Rafailov and Aravind Rajeswaran, Sergey Levine, Chelsea Finn

Keywords Abstract Paper

deep learning, optimization, reinforcement learning and planning

Off-Policy Imitation Learning from Observations

Zhuangdi Zhu, Kaixiang Lin, Bo Dai, Jiayu Zhou

Keywords Abstract Paper

Tactical Optimism and Pessimism for Deep Reinforcement Learning

Ted Moskovitz, Jack Parker-Holder, Aldo Pacchiano and Michael Arbel, Michael Jordan

Keywords Abstract Paper

reinforcement learning and planning, bandits

The Advantage of Conditional Meta-Learning for Biased Regularization and Fine Tuning

Giulia Denevi, Massimiliano Pontil, Carlo Ciliberto

Keywords Abstract Paper

Adversarially Guided Actor-Critic

Yannis Flet-Berliac, Johan Ferret, Olivier Pietquin and philippe preux, Matthieu Geist

Keywords Abstract Paper

Making Sense of Reinforcement Learning and Probabilistic Inference

Brendan O'Donoghue, Ian Osband, Catalin Ionescu

Keywords Abstract Paper

Reinforcement learning, Bayesian inference, Exploration

Adversarial Robustness of Supervised Sparse Coding

Jeremias Sulam, Ramchandran Muthukumar, Raman Arora

Keywords Abstract Paper

On the Stability and Convergence of Robust Adversarial Reinforcement Learning: A Case Study on Linear Quadratic Systems

Kaiqing Zhang, Bin Hu, Tamer Basar

Keywords Abstract Paper

Model-based Adversarial Meta-Reinforcement Learning

Zichuan Lin, Garrett Thomas, Guangwen Yang, Tengyu Ma

Keywords Abstract Paper

Blending MPC & Value Function Approximation for Efficient Reinforcement Learning

Mohak Bhardwaj, Sanjiban Choudhury, Byron Boots

Keywords Abstract Paper

reinforcement learning, model-predictive control

Monotonic Robust Policy Optimization with Model Discrepancy

yuankun jiang, Chenglin Li, Wenrui Dai and Junni Zou, Hongkai Xiong

Keywords Abstract Paper

Reinforcement Learning and Planning

A Distribution-dependent Analysis of Meta Learning

Mikhail Konobeev, Ilja Kuzborskij, Csaba Szepesvari

Keywords Abstract Paper

Theory, Statistical Learning Theory

Modeling the Second Player in Distributionally Robust Optimization

Paul Michel, Tatsunori Hashimoto, Graham Neubig

Keywords Abstract Paper

adversarial learning, deep learning, robustness, distributionally robust optimization

Adversarially robust estimate and risk analysis in linear regression

Yue Xing, Ruizhi Zhang, Guang Cheng

Keywords Abstract Paper

Stabilizing Deep Q-Learning with ConvNets and Vision Transformers under Data Augmentation

Nicklas Hansen, Hao Su, Xiaolong Wang

Keywords Abstract Paper

reinforcement learning and planning, transformers

Provable Benefits of Actor-Critic Methods for Offline Reinforcement Learning

Andrea Zanette, Martin J Wainwright, Emma Brunskill

Keywords Abstract Paper

reinforcement learning and planning

Modeling and Optimization Trade-off in Meta-learning

Katelyn Gao, Ozan Sener

Keywords Abstract Paper

Parameter-Based Value Functions

Francesco Faccio, Louis Kirsch, Jürgen Schmidhuber

Keywords Abstract Paper

Off-Policy Reinforcement Learning, Reinforcement Learning

Trade-offs and Guarantees of Adversarial Representation Learning for Information Obfuscation

Han Zhao, Jianfeng Chi, Yuan Tian, Geoffrey Gordon

Keywords Abstract Paper

Online model selection for reinforcement learning with function approximation

Keywords Paper

Keywords Paper

Tianhe Yu, Aviral Kumar, Rafael Rafailov and
Aravind Rajeswaran, Sergey Levine, Chelsea Finn

Keywords Paper

Keywords Paper

Ted Moskovitz, Jack Parker-Holder, Aldo Pacchiano and
Michael Arbel, Michael Jordan

Keywords Paper

Keywords Paper

Yannis Flet-Berliac, Johan Ferret, Olivier Pietquin and
philippe preux, Matthieu Geist

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

yuankun jiang, Chenglin Li, Wenrui Dai and
Junni Zou, Hongkai Xiong

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Jonathan Lee, Aldo Pacchiano, Vidya Muthukumar and
Weihao Kong, Emma Brunskill

Keywords Paper

Kai Wang, Sanket Shah, Haipeng Chen and
Andrew Perrault, Finale Doshi-Velez, Milind Tambe

Keywords Paper

Keywords Paper

Yijie Guo, Shengyu Feng, Nicolas Le Roux and
Ed H. Chi, Honglak Lee, Minmin Chen

Keywords Paper

Dustin Morrill, Ryan D'Orazio, Reca Sarfati and
Marc Lanctot, James R Wright, Amy R Greenwald, Michael Bowling

Keywords Paper

Keywords Paper

Mohammad Mehrabi, Adel Javanmard, Ryan A. Rossi and
Anup Rao, Tung Mai

Keywords Paper

Fan Bao, Guoqiang Wu, Chongxuan LI and
Jun Zhu, Bo Zhang

Keywords Paper

Keywords Paper

Keywords Paper

Anuj Mahajan, Mikayel Samvelyan, Lei Mao and
Viktor Makoviychuk, Animesh Garg, Jean Kossaifi, Shimon Whiteson, Yuke Zhu, Anima Anandkumar

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Dibya Ghosh, Abhishek Gupta, Ashwin D Reddy and
Justin Fu, Coline M Devin, Ben Eysenbach, Sergey Levine

Keywords Paper