The Importance of Pessimism in Fixed-Dataset Policy Optimization

03/05/2021

The Importance of Pessimism in Fixed-Dataset Policy Optimization

Jacob Buckman, Carles Gelada, Marc G Bellemare

Keywords: reinforcement learning, offline reinforcement learning, deep learning

Abstract Paper Similar Papers

Abstract: We study worst-case guarantees on the expected return of fixed-dataset policy optimization algorithms. Our core contribution is a unified conceptual and mathematical framework for the study of algorithms in this regime. This analysis reveals that for naive approaches, the possibility of erroneous value overestimation leads to a difficult-to-satisfy requirement: in order to guarantee that we select a policy which is near-optimal, we may need the dataset to be informative of the value of every policy. To avoid this, algorithms can follow the pessimism principle, which states that we should choose the policy which acts optimally in the worst possible world. We show why pessimistic algorithms can achieve good performance even when the dataset is not informative of every policy, and derive families of algorithms which follow this principle. These theoretical findings are validated by experiments on a tabular gridworld, and deep learning experiments on four MinAtar environments.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at ICLR 2021 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

06/12/2021

Rethinking and Reweighting the Univariate Losses for Multi-Label Ranking: Consistency and Generalization

Guoqiang Wu, Chongxuan LI, Kun Xu, Jun Zhu

Keywords Paper

theory, machine learning

0

0

0

0

9:29

18/07/2021

Beyond Variance Reduction: Understanding the True Impact of Baselines on Policy Optimization

Wes Chung, Valentin Thomas, Marlos C. Machado, Nicolas Le Roux

Keywords Paper

Reinforcement Learning and Planning

0

0

0

0

5:13

12/07/2020

No-Regret and Incentive-Compatible Online Learning

Rupert Freeman, David Pennock, Chara Podimata, Jennifer Wortman Vaughan

Keywords Paper

Learning Theory

0

0

0

0

13:48

13/04/2021

Experimental design for regret minimization in linear bandits

Andrew Wagenmaker, Julian Katz-Samuels, Kevin Jamieson

Keywords Paper

0

0

0

0

3:05

18/07/2021

Provably Efficient Fictitious Play Policy Optimization for Zero-Sum Markov Games with Structured Transitions

Shuang Qiu, Xiaohan Wei, Jieping Ye and
Zhaoran Wang, Zhuoran Yang

Keywords Paper

Reinforcement Learning and Planning

0

0

0

0

11:21

18/07/2021

Monotonic Robust Policy Optimization with Model Discrepancy

yuankun jiang, Chenglin Li, Wenrui Dai and
Junni Zou, Hongkai Xiong

Keywords Paper

Reinforcement Learning and Planning

0

0

0

0

5:17

06/12/2020

Adapting to Misspecification in Contextual Bandits

Dylan Foster, Claudio Gentile, Mehryar Mohri, Julian Zimmert

Keywords Paper

0

0

0

0

3:05

04/08/2021

Adversarially Robust Low Dimensional Representations

Pranjal Awasthi, Vaggos Chatziafratis, Xue Chen, Aravindan Vijayaraghavan

Keywords Paper

0

0

0

0

20:19

18/07/2021

Revisiting Peng's Q($\lambda$) for Modern Reinforcement Learning

Tadashi Kozuno, Yunhao Tang, Mark Rowland and
Remi Munos, Steven Kapturowski, Will Dabney, Michal Valko, Dave Abel

Keywords Paper

Reinforcement Learning and Planning

0

0

0

0

4:56

02/02/2021

Data-driven Competitive Algorithms for Online Knapsack and Set Cover

Ali Zeynali, Bo Sun, Mohammad Hajiesmaili, Adam Wierman

Keywords Paper

0

0

0

0

16:26

26/08/2020

Mixed Strategies for Robust Optimization of Unknown Objectives

Pier Giuseppe Sessa, Ilija Bogunovic, Maryam Kamgarpour, Andreas Krause

Keywords Paper

0

0

0

0

14:13

02/02/2021

Stable Adversarial Learning under Distributional Shifts

Jiashuo Liu, Zheyan Shen, Peng Cui and
Linjun Zhou, Kun Kuang, Bo Li, Yishi Lin

Keywords Paper

0

0

0

0

14:30

06/12/2021

Misspecified Gaussian Process Bandit Optimization

Ilija Bogunovic, Andreas Krause

Keywords Paper

optimization, bandits, kernel methods

0

0

0

0

11:41

06/12/2020

Trade-offs and Guarantees of Adversarial Representation Learning for Information Obfuscation

Han Zhao, Jianfeng Chi, Yuan Tian, Geoffrey Gordon

Keywords Paper

0

0

0

0

3:17

06/12/2021

Uncertainty-Based Offline Reinforcement Learning with Diversified Q-Ensemble

Gaon An, Seungyong Moon, Jang-Hyun Kim, Hyun Oh Song

Keywords Paper

deep learning, reinforcement learning and planning

1

0

0

0

13:50

06/12/2020

Temporal Variability in Implicit Online Learning

Nicolò Campolongo, Francesco Orabona

Keywords Paper

1

1

0

1

3:11

06/12/2020

Tight First- and Second-Order Regret Bounds for Adversarial Linear Bandits

Shinji Ito, Shuichi Hirahara, Tasuku Soma, Yuichi Yoshida

Keywords Paper

0

0

0

0

3:24

18/07/2021

Provably Correct Optimization and Exploration with Non-linear Policies

Fei Feng, Wotao Yin, Alekh Agarwal, Lin Yang

Keywords Paper

Deep Learning, Adversarial Networks, Applications, Fairness, Accountability, and Transparency, Theory, RL, Decisions and Control Theory

0

0

0

0

5:03

03/05/2021

Non-asymptotic Confidence Intervals of Off-policy Evaluation: Primal and Dual Bounds

Yihao Feng, Ziyang Tang, Na Zhang, Qiang Liu

Keywords Paper

Reinforcement Learnings, Off Policy Evaluation, Non-asymptotic Confidence Intervals

0

0

0

0

4:26

18/07/2021

Robust Learning-Augmented Caching: An Experimental Study

Jakub Chłędowski, Adam Polak, Bartosz Szabucki, Konrad Zolna

Keywords Paper

Applications

0

0

0

0

4:52

18/07/2021

The Power of Adaptivity for Stochastic Submodular Cover

Rohan Ghuge, Anupam Gupta, viswanath nagarajan

Keywords Paper

Optimization, Stochastic Optimization

0

0

0

0

16:47

06/12/2020

Follow the Perturbed Leader: Optimism and Fast Parallel Algorithms for Smooth Minimax Games

Arun Suggala, Praneeth Netrapalli

Keywords Paper

1

1

0

0

3:29

13/04/2021

Online model selection for reinforcement learning with function approximation

Jonathan Lee, Aldo Pacchiano, Vidya Muthukumar and
Weihao Kong, Emma Brunskill

Keywords Paper

0

0

0

0

3:15

06/12/2020

Secretary and Online Matching Problems with Machine Learned Advice

Antonios Antoniadis, Themis Gouleakis, Pieter Kleer, Pavel Kolev

Keywords Paper

0

0

0

0

3:27

18/07/2021

A Distribution-dependent Analysis of Meta Learning

Mikhail Konobeev, Ilja Kuzborskij, Csaba Szepesvari

Keywords Paper

Theory, Statistical Learning Theory

0

0

0

0

5:06

06/12/2021

Risk Minimization from Adaptively Collected Data: Guarantees for Supervised and Policy Learning

Aurelien Bibaut, Nathan Kallus, Maria Dimakopoulou and
Antoine Chambaz, Mark van der Laan

Keywords Paper

theory, reinforcement learning and planning, machine learning, bandits

0

0

0

0

16:07

06/12/2021

Hyperparameter Optimization Is Deceiving Us, and How to Stop It

A. Feder Cooper, Yucheng Lu, Jessica Forde, Christopher De Sa

Keywords Paper

optimization, machine learning

0

0

0

0

11:55

06/12/2020

Provably Efficient Exploration for Reinforcement Learning Using Unsupervised Learning

Fei Feng, Ruosong Wang, Wotao Yin and
Simon Du, Lin Yang

Keywords Paper

Reinforcement Learning and Planning -> Decision and Control, Probabilistic Methods -> Gaussian Processes

0

0

0

0

3:11

06/12/2020

Online Bayesian Persuasion

Matteo Castiglioni, Andrea Celli, Alberto Marchesi, Nicola Gatti

Keywords Paper

0

0

0

0

3:00

03/05/2021

Learning Value Functions in Deep Policy Gradients using Residual Variance

Yannis Flet-Berliac, reda ouhamma, odalric-ambrym maillard, philippe preux

Keywords Paper

0

0

0

0

4:49

12/07/2020

Online Learning with Imperfect Hints

Aditya Bhaskara, Ashok Cutkosky, Ravi Kumar, Manish Purohit

Keywords Paper

Online Learning, Active Learning, and Bandits

1

1

1

1

13:17

06/12/2020

Decisions, Counterfactual Explanations and Strategic Behavior

Stratis Tsirtsis, Manuel Gomez Rodriguez

Keywords Paper

0

0

0

0

3:24

06/12/2021

USCO-Solver: Solving Undetermined Stochastic Combinatorial Optimization Problems

Guangmo Tong

Keywords Paper

optimization

0

0

0

0

15:00

06/12/2020

Preference-based Reinforcement Learning with Finite-Time Guarantees

Yichong Xu, Ruosong Wang, Lin Yang and
Aarti Singh, Artur Dubrawski

Keywords Paper

0

0

0

0

3:04

04/08/2021

Efficient Bandit Convex Optimization: Beyond Linear Losses

Arun Sai Suggala, Pradeep Ravikumar, Praneeth Netrapalli

Keywords Paper

0

0

0

0

20:29

02/02/2021

Computing Quantal Stackelberg Equilibrium in Extensive-Form Games

Jakub Černý, Viliam Lisý, Branislav Bošanský, Bo An

Keywords Paper

0

0

0

0

15:01

18/07/2021

Conservative Objective Models for Effective Offline Model-Based Optimization

Brandon Trabucco, Aviral Kumar, Xinyang Geng, Sergey Levine

Keywords Paper

Deep Learning

0

0

0

0

5:06

06/12/2021

The balancing principle for parameter choice in distance-regularized domain adaptation

Werner Zellinger, Natalia Shepeleva, Marius-Constantin Dinu and
Hamid Eghbal-zadeh, Hoan Duc Nguyen, Bernhard Nessler, Sergei Pereverzyev, Bernhard A. Moser

Keywords Paper

domain adaptation

0

0

0

0

12:47

06/12/2020

On the Stability and Convergence of Robust Adversarial Reinforcement Learning: A Case Study on Linear Quadratic Systems

Kaiqing Zhang, Bin Hu, Tamer Basar

Keywords Paper

0

0

0

0

3:22

18/07/2021

When All We Need is a Piece of the Pie: A Generic Framework for Optimizing Two-way Partial AUC

Zhiyong Yang, Qianqian Xu, Shilong Bao and
Yuan He, Xiaochun Cao, Qingming Huang

Keywords Paper

Algorithms, Supervised Learning

0

0

0

0

15:48