Scalable Bayesian Inverse Reinforcement Learning

Abstract: Bayesian inference over the reward presents an ideal solution to the ill-posed nature of the inverse reinforcement learning problem. Unfortunately current methods generally do not scale well beyond the small tabular setting due to the need for an inner-loop MDP solver, and even non-Bayesian methods that do themselves scale often require extensive interaction with the environment to perform well, being inappropriate for high stakes or costly applications such as healthcare. In this paper we introduce our method, Approximate Variational Reward Imitation Learning (AVRIL), that addresses both of these issues by jointly learning an approximate posterior distribution over the reward that scales to arbitrarily complicated state spaces alongside an appropriate policy in a completely offline manner through a variational approach to said latent reward. Applying our method to real medical data alongside classic control simulations, we demonstrate Bayesian reward inference in environments beyond the scope of current methods, as well as task performance competitive with focused offline imitation learning algorithms.

13/04/2021

Scalable Bayesian Inverse Reinforcement Learning

Alex Chan, Mihaela van der Schaar

Comments

Similar Papers

Off-policy evaluation in infinite-horizon reinforcement learning with latent confounders

Andrew Bennett, Nathan Kallus, Lihong Li, Ali Mousavi

Keywords Abstract Paper

Learning "What-if" Explanations for Sequential Decision-Making

Ioana Bica, Dan Jarrett, Alihan Hüyük, Mihaela van der Schaar

Keywords Abstract Paper

counterfactuals, preference learning, explaining decision-making

Iteratively-Refined Interactive 3D Medical Image Segmentation With Multi-Agent Reinforcement Learning

Xuan Liao, Wenhao Li, Qisen Xu and Xiangfeng Wang, Bo Jin, Xiaoyun Zhang, Yanfeng Wang, Ya Zhang

Keywords Abstract Paper

medical image segmentation, interactive image segmentation, reinforcement learning

Learning to search efficiently for causally near-optimal treatments

Samuel Håkansson, Viktor Lindblom, Omer Gottesman, Fredrik Johansson

Keywords Abstract Paper

Algorithms -> Online Learning, Reinforcement Learning and Planning -> Reinforcement Learning

PC-PG: Policy Cover Directed Exploration for Provable Policy Gradient Learning

Alekh Agarwal, Mikael Henaff, Sham Kakade, Wen Sun

Keywords Abstract Paper

Is Pessimism Provably Efficient for Offline RL?

Ying Jin, Zhuoran Yang, Zhaoran Wang

Keywords Abstract Paper

Reinforcement Learning and Planning, Others

Regret minimization for causal inference on large treatment space

Akira Tanimoto, Tomoya Sakai, Takashi Takenouchi, Hisashi Kashima

Keywords Abstract Paper

Policy Gradient Bayesian Robust Optimization for Imitation Learning

Zaynah Javed, Daniel Brown, Satvik Sharma and Jerry Zhu, Ashwin Balakrishna, Marek Petrik, Anca Dragan, Ken Goldberg

Keywords Abstract Paper

Social Aspects of Machine Learning, AI Safety

Provably Efficient Causal Reinforcement Learning with Confounded Observational Data

Lingxiao Wang, Zhuoran Yang, Zhaoran Wang

Keywords Abstract Paper

deep learning, reinforcement learning and planning, causality

Doubly Robust Off-Policy Value and Gradient Estimation for Deterministic Policies

Nathan Kallus, Masatoshi Uehara

Keywords Abstract Paper

Efficient Model-Based Reinforcement Learning through Optimistic Policy Search and Planning

Sebastian Curi, Felix Berkenkamp, Andreas Krause

Keywords Abstract Paper

COMBO: Conservative Offline Model-Based Policy Optimization

Tianhe Yu, Aviral Kumar, Rafael Rafailov and Aravind Rajeswaran, Sergey Levine, Chelsea Finn

Keywords Abstract Paper

deep learning, optimization, reinforcement learning and planning

Information Theoretic Counterfactual Learning from Missing-Not-At-Random Feedback

Zifeng Wang, Xi Chen, Rui Wen and Shao-Lun Huang, Ercan E Kuruoglu, Yefeng Zheng

Keywords Abstract Paper

Off-Policy Imitation Learning from Observations

Zhuangdi Zhu, Kaixiang Lin, Bo Dai, Jiayu Zhou

Keywords Abstract Paper

Stage-wise Conservative Linear Bandits

Ahmadreza Moradipari, Christos Thrampoulidis, Mahnoosh Alizadeh

Keywords Abstract Paper

Non-asymptotic Confidence Intervals of Off-policy Evaluation: Primal and Dual Bounds

Yihao Feng, Ziyang Tang, Na Zhang, Qiang Liu

Keywords Abstract Paper

Reinforcement Learnings, Off Policy Evaluation, Non-asymptotic Confidence Intervals

FAR: A General Framework for Attributional Robustness

Adam Ivankay, Ivan Girardi, Chiara Marchiori, Pascal Frossard

Keywords Abstract Paper

robustness, attribution robustness, adversarial attacks, explainability, attribution maps

A Class of Algorithms for General Instrumental Variable Models

Niki Kilbertus, Matt Kusner, Ricardo Silva

Keywords Abstract Paper

Identifying through Flows for Recovering Latent Representations

Shen Li, Bryan Hooi, Gim Hee Lee

Keywords Abstract Paper

Representation learning, identifiable generative models, nonlinear-ICA

Towards Robust Bisimulation Metric Learning

Mete Kemertas, Tristan Aumentado-Armstrong

Keywords Abstract Paper

reinforcement learning and planning, robustness, representation learning

Disentangling Human Error from Ground Truth in Segmentation of Medical Images

Le Zhang, Ryu Tanno, Moucheng Xu and Chen Jin, Joseph Jacob, Olga Cicarrelli, Frederik Barkhof, Daniel Alexander

Keywords Abstract Paper

Shortest-Path Constrained Reinforcement Learning for Sparse Reward Tasks

Sungryull Sohn, Sungtae Lee, Jongwook Choi and Harm van Seijen, Mehdi Fatemi, Honglak Lee

Keywords Paper

Keywords Paper

Xuan Liao, Wenhao Li, Qisen Xu and
Xiangfeng Wang, Bo Jin, Xiaoyun Zhang, Yanfeng Wang, Ya Zhang

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Zaynah Javed, Daniel Brown, Satvik Sharma and
Jerry Zhu, Ashwin Balakrishna, Marek Petrik, Anca Dragan, Ken Goldberg

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Tianhe Yu, Aviral Kumar, Rafael Rafailov and
Aravind Rajeswaran, Sergey Levine, Chelsea Finn

Keywords Paper

Zifeng Wang, Xi Chen, Rui Wen and
Shao-Lun Huang, Ercan E Kuruoglu, Yefeng Zheng

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Le Zhang, Ryu Tanno, Moucheng Xu and
Chen Jin, Joseph Jacob, Olga Cicarrelli, Frederik Barkhof, Daniel Alexander

Keywords Paper

Sungryull Sohn, Sungtae Lee, Jongwook Choi and
Harm van Seijen, Mehdi Fatemi, Honglak Lee

Keywords Paper

Rasool Fakoor, Jonas Mueller, Kavosh Asadi and
Pratik Chaudhari, Alexander J Smola

Keywords Paper

Keywords Paper

Yu Wang, Jingyang Lin, Jingjing Zou and
Yingwei Pan, Ting Yao, Tao Mei

Keywords Paper

Ehsan Adeli, Qingyu Zhao, Adolf Pfefferbaum and
Edith V. Sullivan, Li Fei-Fei, Juan Carlos Niebles, Kilian M. Pohl

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Omer Gottesman, Joseph Futoma, Yao Liu and
Sonali Parbhoo, Leo Celi, Emma Brunskill, Finale Doshi-Velez

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Sandamal Weerasinghe, Tamas Abraham, Tansu Alpcan and
Sarah M. Erfani, Christopher Leckie, Benjamin I. P. Rubinstein

Keywords Paper