Off-Belief Learning

18/07/2021

Off-Belief Learning

Hengyuan Hu, Adam Lerer, Brandon Cui, Luis Pineda, Noam Brown, Jakob Foerster

Keywords: Reinforcement Learning and Planning

Abstract Paper Similar Papers

Abstract: The standard problem setting in Dec-POMDPs is self-play, where the goal is to find a set of policies that play optimally together. Policies learned through self-play may adopt arbitrary conventions and implicitly rely on multi-step reasoning based on fragile assumptions about other agents' actions and thus fail when paired with humans or independently trained agents at test time. To address this, we present off-belief learning (OBL). At each timestep OBL agents follow a policy $\pi_1$ that is optimized assuming past actions were taken by a given, fixed policy ($\pi_0$), but assuming that future actions will be taken by $\pi_1$. When $\pi_0$ is uniform random, OBL converges to an optimal policy that does not rely on inferences based on other agents' behavior (an optimal grounded policy). OBL can be iterated in a hierarchy, where the optimal policy from one level becomes the input to the next, thereby introducing multi-level cognitive reasoning in a controlled manner. Unlike existing approaches, which may converge to any equilibrium policy, OBL converges to a unique policy, making it suitable for zero-shot coordination (ZSC). OBL can be scaled to high-dimensional settings with a fictitious transition mechanism and shows strong performance in both a toy-setting and the benchmark human-AI & ZSC problem Hanabi.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at ICML 2021 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

18/07/2021

OptiDICE: Offline Policy Optimization via Stationary Distribution Correction Estimation

Jongmin Lee, Wonseok Jeon, Byung-Jun Lee and
Joelle Pineau, Kee-Eung Kim

Keywords Paper

Reinforcement Learning and Planning

1

0

0

1

5:15

06/12/2020

Decisions, Counterfactual Explanations and Strategic Behavior

Stratis Tsirtsis, Manuel Gomez Rodriguez

Keywords Paper

0

0

0

0

3:24

06/12/2021

Conservative Offline Distributional Reinforcement Learning

Yecheng Ma, Dinesh Jayaraman, Osbert Bastani

Keywords Paper

reinforcement learning and planning

1

0

0

0

13:54

26/04/2020

Learning to Balance: Bayesian Meta-Learning for Imbalanced and Out-of-distribution Tasks

Hae Beom Lee, Hayeon Lee, Donghyun Na and
Saehoon Kim, Minseop Park, Eunho Yang, Sung Ju Hwang

Keywords Paper

meta-learning, few-shot learning, Bayesian neural network, variational inference, learning to learn, imbalanced and out-of-distribution tasks for few-shot learning

0

0

0

1

13:46

06/12/2020

Is Long Horizon RL More Difficult Than Short Horizon RL?

Ruosong Wang, Simon Du, Lin Yang, Sham Kakade

Keywords Paper

0

0

0

0

3:20

13/04/2021

Linear models are robust optimal under strategic behavior

Wei Tang, Chien-Ju Ho, Yang Liu

Keywords Paper

0

0

0

0

3:32

06/12/2021

Optimality and Stability in Federated Learning: A Game-theoretic Approach

Kate Donahue, Jon Kleinberg

Keywords Paper

theory, federated learning

0

0

0

0

12:30

06/12/2020

Model-based Adversarial Meta-Reinforcement Learning

Zichuan Lin, Garrett Thomas, Guangwen Yang, Tengyu Ma

Keywords Paper

0

0

0

0

3:31

26/08/2020

Balancing Learning Speed and Stability in Policy Gradient via Adaptive Exploration

Matteo Papini, Andrea Battistello, Marcello Restelli

Keywords Paper

0

0

0

0

12:47

13/04/2021

Non-stationary off-policy optimization

Joey Hong, Branislav Kveton, Manzil Zaheer and
Yinlam Chow, Amr Ahmed

Keywords Paper

0

0

0

0

2:57

06/12/2021

Automated Dynamic Mechanism Design

Hanrui Zhang, Vincent Conitzer

Keywords Paper

0

0

0

0

14:35

12/07/2020

Performative Prediction

Juan Perdomo, Tijana Zrnic, Celestine Mendler-Dünner, University of California Moritz Hardt

Keywords Paper

Learning Theory

0

0

0

0

11:22

16/11/2020

Safe Policy Learning for Continuous Control

Yinlam Chow, Ofir Nachum, Aleksandra Faust and
Edgar Dueñez-Guzman, Mohammad Ghavamzadeh

Keywords Paper

0

0

0

0

5:20

06/12/2021

Neural Algorithmic Reasoners are Implicit Planners

Andreea-Ioana Deac, Petar Veličković, Ognjen Milinkovic and
Pierre-Luc Bacon, Jian Tang, Mladen Nikolic

Keywords Paper

deep learning, reinforcement learning and planning, self-supervised learning, generative model, graph learning

0

0

0

0

13:10

06/12/2021

Provable Representation Learning for Imitation with Contrastive Fourier Features

Ofir Nachum, Mengjiao Yang

Keywords Paper

reinforcement learning and planning, contrastive learning, representation learning

0

0

0

0

15:06

18/07/2021

Deciding What to Learn: A Rate-Distortion Approach

Dilip Arumugam, Benjamin Van Roy

Keywords Paper

Reinforcement Learning and Planning, Bandits

0

0

0

0

5:12

18/07/2021

Provably Correct Optimization and Exploration with Non-linear Policies

Fei Feng, Wotao Yin, Alekh Agarwal, Lin Yang

Keywords Paper

Deep Learning, Adversarial Networks, Applications, Fairness, Accountability, and Transparency, Theory, RL, Decisions and Control Theory

0

0

0

0

5:03

18/07/2021

Fundamental Tradeoffs in Distributionally Adversarial Training

Mohammad Mehrabi, Adel Javanmard, Ryan A. Rossi and
Anup Rao, Tung Mai

Keywords Paper

Theory

0

0

0

1

5:50

06/12/2021

Bridging Offline Reinforcement Learning and Imitation Learning: A Tale of Pessimism

Paria Rashidinejad, Banghua Zhu, Cong Ma and
Jiantao Jiao, Stuart Russell

Keywords Paper

theory, reinforcement learning and planning, bandits

0

0

0

0

12:21

06/12/2021

Offline Constrained Multi-Objective Reinforcement Learning via Pessimistic Dual Value Iteration

Runzhe Wu, Yufeng Zhang, Zhuoran Yang, Zhaoran Wang

Keywords Paper

reinforcement learning and planning

0

0

0

0

13:40

18/07/2021

Doubly Robust Off-Policy Actor-Critic: Convergence and Optimality

Tengyu Xu, Zhuoran Yang, Zhaoran Wang, Yingbin LIANG

Keywords Paper

Theory, RL, Decisions and Control Theory

0

0

0

0

4:23

12/07/2020

Batch Reinforcement Learning with Hyperparameter Gradients

Byung-Jun Lee, Jongmin Lee, Peter Vrancx and
Dongho Kim, Kee-Eung Kim

Keywords Paper

Reinforcement Learning - General

0

0

0

0

16:07

06/12/2021

There Is No Turning Back: A Self-Supervised Approach for Reversibility-Aware Reinforcement Learning

Nathan Grinsztajn, Johan Ferret, Olivier Pietquin and
philippe preux, Matthieu Geist

Keywords Paper

reinforcement learning and planning

0

0

0

0

14:31

06/12/2020

Characterizing Optimal Mixed Policies: Where to Intervene and What to Observe

Sanghack Lee, Elias Bareinboim

Keywords Paper

0

0

0

0

3:19

18/07/2021

A Distribution-dependent Analysis of Meta Learning

Mikhail Konobeev, Ilja Kuzborskij, Csaba Szepesvari

Keywords Paper

Theory, Statistical Learning Theory

0

0

0

0

5:06

06/12/2021

Uncertainty-Based Offline Reinforcement Learning with Diversified Q-Ensemble

Gaon An, Seungyong Moon, Jang-Hyun Kim, Hyun Oh Song

Keywords Paper

deep learning, reinforcement learning and planning

1

0

0

0

13:50

06/12/2020

Robust Multi-Agent Reinforcement Learning with Model Uncertainty

Kaiqing Zhang, TAO SUN, Yunzhe Tao and
Sahika Genc, Sunil Mallya, Tamer Basar

Keywords Paper

0

0

0

0

3:11

18/07/2021

Temporal Predictive Coding For Model-Based Planning In Latent Space

Tung Nguyen, Rui Shu, Tuan Pham and
Hung Bui, Stefano Ermon

Keywords Paper

Deep Learning, Embedding and Representation learning

0

0

0

0

5:19

06/12/2021

Counterfactual Invariance to Spurious Correlations in Text Classification

Victor Veitch, Alexander D'Amour, Steve Yadlowsky, Jacob Eisenstein

Keywords Paper

theory, machine learning, domain adaptation, causality

0

0

0

0

15:06

03/05/2021

Parameter-Based Value Functions

Francesco Faccio, Louis Kirsch, Jürgen Schmidhuber

Keywords Paper

Off-Policy Reinforcement Learning, Reinforcement Learning

0

0

0

0

2:45

04/08/2021

Adaptivity in Adaptive Submodularity

Hossein Esfandiari, Amin Karbasi, Vahab Mirrokni

Keywords Paper

0

0

0

0

13:54

03/05/2021

C-Learning: Learning to Achieve Goals via Recursive Classification

Ben Eysenbach, Ruslan Salakhutdinov, Sergey Levine

Keywords Paper

reinforcement learning, goal reaching, density estimation, hindsight relabeling, Q-learning

0

0

0

0

5:09

06/12/2021

Accommodating Picky Customers: Regret Bound and Exploration Complexity for Multi-Objective Reinforcement Learning

Jingfeng Wu, Vladimir Braverman, Lin Yang

Keywords Paper

theory, reinforcement learning and planning

0

0

0

0

14:22

18/07/2021

Dynamic Planning and Learning under Recovering Rewards

David Simchi-Levi, Zeyu Zheng, Feng Zhu

Keywords Paper

Reinforcement Learning and Planning, Bandits

0

0

0

0

4:53

06/12/2020

Off-Policy Imitation Learning from Observations

Zhuangdi Zhu, Kaixiang Lin, Bo Dai, Jiayu Zhou

Keywords Paper

0

0

0

1

3:24

12/07/2020

Learning From Strategic Agents: Accuracy, Improvement, and Causality

Yonadav Shavit, Benjamin Edelman, Brian Axelrod

Keywords Paper

Accountability, Transparency and Interpretability

0

0

0

0

12:37

06/12/2021

Learning One Representation to Optimize All Rewards

Ahmed Touati, Yann Ollivier

Keywords Paper

deep learning, reinforcement learning and planning, representation learning

0

0

0

0

14:52

06/12/2020

Confounding-Robust Policy Evaluation in Infinite-Horizon Reinforcement Learning

Nathan Kallus, Angela Zhou

Keywords Paper

0

0

0

0

4:51

06/12/2020

Preference-based Reinforcement Learning with Finite-Time Guarantees

Yichong Xu, Ruosong Wang, Lin Yang and
Aarti Singh, Artur Dubrawski

Keywords Paper

0

0

0

0

3:04

03/05/2021

UPDeT: Universal Multi-agent RL via Policy Decoupling with Transformers

Siyi Hu, Fengda Zhu, Xiaojun Chang, Xiaodan Liang

Keywords Paper

Transfer Learning, Multi-agent Reinforcement Learning

0

0

0

0

2:46