Bayesian Bellman Operators

06/12/2021

Bayesian Bellman Operators

Mattie Fellows, Kristian Hartikainen, Shimon Whiteson

Keywords: reinforcement learning and planning

Abstract Paper Similar Papers

Abstract: We introduce a novel perspective on Bayesian reinforcement learning (RL); whereas existing approaches infer a posterior over the transition distribution or Q-function, we characterise the uncertainty in the Bellman operator. Our Bayesian Bellman operator (BBO) framework is motivated by the insight that when bootstrapping is introduced, model-free approaches actually infer a posterior over Bellman operators, not value functions. In this paper, we use BBO to provide a rigorous theoretical analysis of model-free Bayesian RL to better understand its relationship to established frequentist RL methodologies. We prove that Bayesian solutions are consistent with frequentist RL solutions, even when approximate inference is used, and derive conditions for which convergence properties hold. Empirically, we demonstrate that algorithms derived from the BBO framework have sophisticated deep exploration properties that enable them to solve continuous control tasks at which state-of-the-art regularised actor-critic algorithms fail catastrophically.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at NeurIPS 2021 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

03/05/2021

Bayesian Context Aggregation for Neural Processes

Michael Volpp, Fabian Flürenbrock, Lukas Grossberger and
Christian Daniel, Gerhard Neumann

Keywords Paper

Neural Processes, Multi-task Learning, Deep Sets, Meta Learning, Latent Variable Models, Aggregation Methods

0

0

0

0

5:04

12/07/2020

Minimax Weight and Q-Function Learning for Off-Policy Evaluation

Masatoshi Uehara, Jiawei Huang, Nan Jiang

Keywords Paper

Reinforcement Learning - Theory

0

0

0

0

14:20

18/07/2021

Continuous-time Model-based Reinforcement Learning

Cagatay Yildiz, Markus Heinonen, Harri Lähdesmäki

Keywords Paper

Reinforcement Learning and Planning

0

0

0

0

5:00

06/12/2021

Finite Sample Analysis of Average-Reward TD Learning and $Q$-Learning

Sheng Zhang, Zhe Zhang, Siva Theja Maguluri

Keywords Paper

theory, reinforcement learning and planning

0

0

0

0

10:20

12/07/2020

Representations for Stable Off-Policy Reinforcement Learning

Dibya Ghosh, Marc Bellemare

Keywords Paper

Reinforcement Learning - General

0

0

0

0

14:38

03/05/2021

FOCAL: Efficient Fully-Offline Meta-Reinforcement Learning via Distance Metric Learning and Behavior Regularization

Lanqing Li, Rui Yang, Dijun Luo

Keywords Paper

distance metric learning, offline/batch reinforcement learning, meta-reinforcement learning, contrastive learning, multi-task reinforcement learning

1

0

0

0

6:21

06/12/2021

Why Generalization in RL is Difficult: Epistemic POMDPs and Implicit Partial Observability

Dibya Ghosh, Jad Rahme, Aviral Kumar and
Amy Zhang, Ryan Adams, Sergey Levine

Keywords Paper

reinforcement learning and planning

0

0

0

0

15:17

06/12/2021

Online Robust Reinforcement Learning with Model Uncertainty

Yue Wang, Shaofeng Zou

Keywords Paper

reinforcement learning and planning, robustness

0

0

0

0

14:45

06/12/2020

On the Stability and Convergence of Robust Adversarial Reinforcement Learning: A Case Study on Linear Quadratic Systems

Kaiqing Zhang, Bin Hu, Tamer Basar

Keywords Paper

0

0

0

0

3:22

26/04/2020

Robust Reinforcement Learning for Continuous Control with Model Misspecification

Daniel J. Mankowitz, Nir Levine, Rae Jeong and
Abbas Abdolmaleki, Jost Tobias Springenberg, Yuanyuan Shi, Jackie Kay, Todd Hester, Timothy Mann, Martin Riedmiller

Keywords Paper

reinforcement learning, robustness

0

0

0

0

5:24

18/07/2021

Proximal Causal Learning with Kernels: Two-Stage Estimation and Moment Restriction

Afsaneh Mastouri, Yuchen Zhu, Limor Gultchin and
Anna Korba, Ricardo Silva, Matt J. Kusner, Arthur Gretton, Krikamol Muandet

Keywords Paper

Algorithms, Kernel Methods

0

0

0

0

5:10

18/07/2021

Combining Pessimism with Optimism for Robust and Efficient Model-Based Deep Reinforcement Learning

Sebastian Curi, Ilija Bogunovic, Andreas Krause

Keywords Paper

Reinforcement Learning and Planning

0

0

0

0

4:41

06/12/2021

Stability and Generalization of Bilevel Programming in Hyperparameter Optimization

Fan Bao, Guoqiang Wu, Chongxuan LI and
Jun Zhu, Bo Zhang

Keywords Paper

optimization

0

0

0

0

8:58

06/12/2021

Bellman-consistent Pessimism for Offline Reinforcement Learning

Tengyang Xie, Ching-An Cheng, Nan Jiang and
Paul Mineiro, Alekh Agarwal

Keywords Paper

theory, reinforcement learning and planning, robustness

0

0

0

0

17:42

06/12/2021

Fast Algorithms for $L_\infty$-constrained S-rectangular Robust MDPs

Bahram Behzadian, Marek Petrik, Chin Pang Ho

Keywords Paper

optimization, reinforcement learning and planning

0

0

0

0

6:14

12/07/2020

Learning Near Optimal Policies with Low Inherent Bellman Error

Andrea Zanette, Alessandro Lazaric, Mykel Kochenderfer, Emma Brunskill

Keywords Paper

Reinforcement Learning - Theory

0

0

0

0

14:22

18/07/2021

SUNRISE: A Simple Unified Framework for Ensemble Learning in Deep Reinforcement Learning

Kimin Lee, Michael Laskin, Aravind Srinivas, Pieter Abbeel

Keywords Paper

Reinforcement Learning and Planning, Deep RL

0

0

0

0

4:47

06/12/2021

When False Positive is Intolerant: End-to-End Optimization with Low FPR for Multipartite Ranking

Peisong Wen, Qianqian Xu, Zhiyong Yang and
Yuan He, Qingming Huang

Keywords Paper

deep learning, optimization, machine learning, vision

0

0

0

0

7:00

13/04/2021

Optimizing percentile criterion using robust MDPs

Bahram Behzadian, Reazul Hasan Russel, Marek Petrik, Chin Pang Ho

Keywords Paper

0

0

0

0

3:00

06/12/2020

CoinDICE: Off-Policy Confidence Interval Estimation

Bo Dai, Ofir Nachum, Yinlam Chow and
Lihong Li, Csaba Szepesvari, Dale Schuurmans

Keywords Paper

0

0

0

0

3:21

06/12/2021

Exponential Bellman Equation and Improved Regret Bounds for Risk-Sensitive Reinforcement Learning

Yingjie Fei, Zhuoran Yang, Yudong Chen, Zhaoran Wang

Keywords Paper

reinforcement learning and planning

0

0

0

0

7:22

06/12/2020

Leverage the Average: an Analysis of KL Regularization in Reinforcement Learning

Nino Vieillard, Tadashi Kozuno, Bruno Scherrer and
Olivier Pietquin, Remi Munos, Matthieu Geist

Keywords Paper

0

0

0

0

3:25

26/08/2020

Finite-Time Error Bounds for Biased Stochastic Approximation with Applications to Q-Learning

Gang Wang, Georgios B. Giannakis

Keywords Paper

0

0

0

0

14:03

06/12/2021

Structured Dropout Variational Inference for Bayesian Neural Networks

Son Nguyen, Duong Nguyen, Khai Nguyen and
Khoat Than, Hung Bui, Nhat Ho

Keywords Paper

deep learning, generative model

0

0

0

0

11:28

02/02/2021

Symmetric Component Caching for Model Counting on Combinatorial Instances

Timothy van Bremen, Vincent Derkinderen, Shubham Sharma and
Subhajit Roy, Kuldeep S. Meel

Keywords Paper

0

0

0

0

17:34

06/12/2021

Hyperparameter Optimization Is Deceiving Us, and How to Stop It

A. Feder Cooper, Yucheng Lu, Jessica Forde, Christopher De Sa

Keywords Paper

optimization, machine learning

0

0

0

0

11:55

18/07/2021

Risk Bounds and Rademacher Complexity in Batch Reinforcement Learning

Yaqi Duan, Chi Jin, Zhiyuan Li

Keywords Paper

Theory, RL, Decisions and Control Theory

0

0

0

0

5:18

26/04/2020

CAQL: Continuous Action Q-Learning

Moonkyung Ryu, Yinlam Chow, Ross Anderson and
Christian Tjandraatmadja, Craig Boutilier

Keywords Paper

Reinforcement learning (RL), DQN, Continuous control, Mixed-Integer Programming (MIP)

0

0

0

0

5:36

06/12/2021

Uncertainty-Based Offline Reinforcement Learning with Diversified Q-Ensemble

Gaon An, Seungyong Moon, Jang-Hyun Kim, Hyun Oh Song

Keywords Paper

deep learning, reinforcement learning and planning

1

0

0

0

13:50

13/04/2021

Latent derivative bayesian last layer networks

Joe Watson, Jihao Andreas Lin, Pascal Klink and
Joni Pajarinen, Jan Peters

Keywords Paper

0

0

0

0

3:05

18/07/2021

Tesseract: Tensorised Actors for Multi-Agent Reinforcement Learning

Anuj Mahajan, Mikayel Samvelyan, Lei Mao and
Viktor Makoviychuk, Animesh Garg, Jean Kossaifi, Shimon Whiteson, Yuke Zhu, Anima Anandkumar

Keywords Paper

Reinforcement Learning and Planning, Multi-Agent RL

0

0

0

0

5:16

06/12/2020

Steady State Analysis of Episodic Reinforcement Learning

Huang Bojun

Keywords Paper

0

0

0

0

3:20

06/12/2020

Bias no more: high-probability data-dependent regret bounds for adversarial bandits and MDPs

Chung-Wei Lee, Haipeng Luo, Chen-Yu Wei, Mengxiao Zhang

Keywords Paper

0

0

0

0

3:16

06/12/2021

Conservative Offline Distributional Reinforcement Learning

Yecheng Ma, Dinesh Jayaraman, Osbert Bastani

Keywords Paper

reinforcement learning and planning

1

0

0

0

13:54

18/07/2021

Dynamic Balancing for Model Selection in Bandits and RL

Ashok Cutkosky, Christoph Dann, Abhimanyu Das and
Claudio Gentile, Aldo Pacchiano, Manish Purohit

Keywords Paper

Reinforcement Learning and Planning, Bandits

0

0

0

0

5:18

06/12/2020

Differentiable Expected Hypervolume Improvement for Parallel Multi-Objective Bayesian Optimization

Sam Daulton, Max Balandat, Eytan Bakshy

Keywords Paper

0

0

0

0

3:20

06/12/2021

Provable Benefits of Actor-Critic Methods for Offline Reinforcement Learning

Andrea Zanette, Martin J Wainwright, Emma Brunskill

Keywords Paper

reinforcement learning and planning

0

0

0

0

14:28

26/08/2020

Mixed Strategies for Robust Optimization of Unknown Objectives

Pier Giuseppe Sessa, Ilija Bogunovic, Maryam Kamgarpour, Andreas Krause

Keywords Paper

0

0

0

0

14:13

19/01/2020

Proving Expected Sensitivity of Probabilistic Programs with Randomized Variable-Dependent Termination Time

Peixin Wang, Hongfei Fu, Krishnendu Chatterjee and
Yuxin Deng, Ming Xu

Keywords Paper

Martingales, Expected Sensitivity, Probabilistic Programs

0

0

0

0

21:04

13/04/2021

Adversarially robust estimate and risk analysis in linear regression

Yue Xing, Ruizhi Zhang, Guang Cheng

Keywords Paper

0

0

0

0

3:03