Unifying Gradient Estimators for Meta-Reinforcement Learning via Off-Policy Evaluation

Abstract: Model-agnostic meta-reinforcement learning requires estimating the Hessian matrix of value functions. This is challenging from an implementation perspective, as repeatedly differentiating policy gradient estimates may lead to biased Hessian estimates. In this work, we provide a unifying framework for estimating higher-order derivatives of value functions, based on off-policy evaluation. Our framework interprets a number of prior approaches as special cases and elucidates the bias and variance trade-off of Hessian estimates. This framework also opens the door to a new family of estimates, which can be easily implemented with auto-differentiation libraries, and lead to performance gains in practice.

06/12/2021

Unifying Gradient Estimators for Meta-Reinforcement Learning via Off-Policy Evaluation

Yunhao Tang, Tadashi Kozuno, Mark Rowland, Remi Munos, Michal Valko

Comments

Similar Papers

Local policy search with Bayesian optimization

Sarah Müller, Alexander von Rohr, Sebastian Trimpe

Keywords Abstract Paper

theory, optimization, reinforcement learning and planning, active learning

Mandoline: Model Evaluation under Distribution Shift

Mayee Chen, Karan Goel, Nimit Sohoni and Fait Poms, Kayvon Fatahalian, Christopher Re

Keywords Abstract Paper

Algorithms, Others

Variational Model-based Policy Optimization

Yinlam Chow, Brandon Cui, Moonkyung Ryu, Mohammad Ghavamzadeh

Keywords Abstract Paper

Machine Learning, Reinforcement Learning

Set Prediction without Imposing Structure as Conditional Density Estimation

David W Zhang, Gertjan J Burghouts, Cees G Snoek

Keywords Abstract Paper

energy based models, set prediction

Harnessing Distribution Ratio Estimators for Learning Agents with Quality and Diversity

Tanmay Gangwani, Jian Peng, Yuan Zhou

Keywords Abstract Paper

Tesseract: Tensorised Actors for Multi-Agent Reinforcement Learning

Anuj Mahajan, Mikayel Samvelyan, Lei Mao and Viktor Makoviychuk, Animesh Garg, Jean Kossaifi, Shimon Whiteson, Yuke Zhu, Anima Anandkumar

Keywords Abstract Paper

Reinforcement Learning and Planning, Multi-Agent RL

Learning to Score Behaviors for Guided Policy Optimization

Aldo Pacchiano, Jack Parker-Holder, Yunhao Tang and Krzysztof Choromanski, Anna Choromanska, Michael Jordan

Keywords Abstract Paper

Reinforcement Learning - General

Minimax Weight and Q-Function Learning for Off-Policy Evaluation

Masatoshi Uehara, Jiawei Huang, Nan Jiang

Keywords Abstract Paper

Reinforcement Learning - Theory

Differentiable Annealed Importance Sampling and the Perils of Gradient Noise

Guodong Zhang, Kyle Hsu, Jianing Li and Chelsea Finn, Roger Grosse

Keywords Abstract Paper

optimization, generative model

Discrete Action On-Policy Learning with Action-Value Critic

Yuguang Yue, Yunhao Tang, Mingzhang Yin, Mingyuan Zhou

Keywords Abstract Paper

Towards Robust Bisimulation Metric Learning

Mete Kemertas, Tristan Aumentado-Armstrong

Keywords Abstract Paper

reinforcement learning and planning, robustness, representation learning

Promoting Stochasticity for Expressive Policies via a Simple and Efficient Regularization Method

Qi Zhou, Yufei Kuang, Zherui Qiu and Houqiang Li, Jie Wang

Keywords Abstract Paper

Structured Dropout Variational Inference for Bayesian Neural Networks

Son Nguyen, Duong Nguyen, Khai Nguyen and Khoat Than, Hung Bui, Nhat Ho

Keywords Abstract Paper

deep learning, generative model

Using Random Effects to Account for High-Cardinality Categorical Features and Repeated Measures in Deep Neural Networks

Giora Simchoni, Saharon Rosset

Keywords Abstract Paper

deep learning, machine learning, vision

Heteroskedastic and Imbalanced Deep Learning with Adaptive Regularization

Kaidi Cao, Yining Chen, Junwei Lu and Nikos Arechiga, Adrien Gaidon, Tengyu Ma

Keywords Abstract Paper

imbalanced learning, noise robust learning, deep learning

Control Variates for Slate Off-Policy Evaluation

Nikos Vlassis, Ashok Chandrashekar, Fernando Amat, Nathan Kallus

Keywords Abstract Paper

optimization, bandits

SOPE: Spectrum of Off-Policy Estimators

Christina Yuan, Yash Chandak, Stephen Giguere and Philip S. Thomas, Scott Niekum

Keywords Abstract Paper

reinforcement learning and planning

Loss function based second-order Jensen inequality and its application to particle variational inference

Futoshi Futami, Tomoharu Iwata, naonori ueda and Issei Sato, Masashi Sugiyama

Keywords Abstract Paper

optimization, generative model

GaussianPath:A Bayesian Multi-Hop Reasoning Framework for Knowledge Graph Reasoning

Guojia Wan, Bo Du

Keywords Abstract Paper

CCA-flow: Deep multi-view subspace learning with inverse autoregressive flow

Jia He, Feiyang Pan, Fuzhen Zhuang, Qing He

Keywords Abstract Paper

The Value-Improvement Path: Towards Better Representations for Reinforcement Learning

Keywords Paper

Mayee Chen, Karan Goel, Nimit Sohoni and
Fait Poms, Kayvon Fatahalian, Christopher Re

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Anuj Mahajan, Mikayel Samvelyan, Lei Mao and
Viktor Makoviychuk, Animesh Garg, Jean Kossaifi, Shimon Whiteson, Yuke Zhu, Anima Anandkumar

Keywords Paper

Aldo Pacchiano, Jack Parker-Holder, Yunhao Tang and
Krzysztof Choromanski, Anna Choromanska, Michael Jordan

Keywords Paper

Keywords Paper

Guodong Zhang, Kyle Hsu, Jianing Li and
Chelsea Finn, Roger Grosse

Keywords Paper

Keywords Paper

Keywords Paper

Qi Zhou, Yufei Kuang, Zherui Qiu and
Houqiang Li, Jie Wang

Keywords Paper

Son Nguyen, Duong Nguyen, Khai Nguyen and
Khoat Than, Hung Bui, Nhat Ho

Keywords Paper

Keywords Paper

Kaidi Cao, Yining Chen, Junwei Lu and
Nikos Arechiga, Adrien Gaidon, Tengyu Ma

Keywords Paper

Keywords Paper

Christina Yuan, Yash Chandak, Stephen Giguere and
Philip S. Thomas, Scott Niekum

Keywords Paper

Futoshi Futami, Tomoharu Iwata, naonori ueda and
Issei Sato, Masashi Sugiyama

Keywords Paper

Keywords Paper

Keywords Paper

Will Dabney, André Barreto, Mark Rowland and
Robert Dadashi, John Quan, Marc G. Bellemare, David Silver

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Guy Lorberbom, Chris J. Maddison, Nicolas Heess and
Tamir Hazan, Daniel Tarlow

Keywords Paper

Ba-Hien Tran, Simone Rossi, Dimitrios Milios and
Pietro Michiardi, Edwin Bonilla, Maurizio Filippone

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper