Towards Hyperparameter-free Policy Selection for Offline Reinforcement Learning

06/12/2021

Towards Hyperparameter-free Policy Selection for Offline Reinforcement Learning

Siyuan Zhang, Nan Jiang

Keywords: reinforcement learning and planning

Abstract Paper Similar Papers

Abstract: How to select between policies and value functions produced by different training algorithms in offline reinforcement learning (RL)---which is crucial for hyperparameter tuning---is an important open question. Existing approaches based on off-policy evaluation (OPE) often require additional function approximation and hence hyperparameters, creating a chicken-and-egg situation. In this paper, we design hyperparameter-free algorithms for policy selection based on BVFT [XJ21], a recent theoretical advance in value-function selection, and demonstrate their effectiveness in discrete-action benchmarks such as Atari. To address performance degradation due to poor critics in continuous-action domains, we further combine BVFT with OPE to get the best of both worlds, and obtain a hyperparameter-tuning method for $Q$-function based OPE with theoretical guarantees as a side product.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at NeurIPS 2021 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

18/07/2021

A Deep Reinforcement Learning Approach to Marginalized Importance Sampling with the Successor Representation

Scott Fujimoto, David Meger, Doina Precup

Keywords Paper

Reinforcement Learning and Planning, Deep RL

1

0

0

0

3:50

06/12/2020

On the Stability and Convergence of Robust Adversarial Reinforcement Learning: A Case Study on Linear Quadratic Systems

Kaiqing Zhang, Bin Hu, Tamer Basar

Keywords Paper

0

0

0

0

3:22

02/02/2021

Bayesian Distributional Policy Gradients

Luchen Li, A. Aldo Faisal

Keywords Paper

1

0

0

0

18:06

06/12/2020

Weighted QMIX: Expanding Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning

Tabish Rashid, Gregory Farquhar, Bei Peng, Shimon Whiteson

Keywords Paper

0

0

0

0

2:40

18/07/2021

Monotonic Robust Policy Optimization with Model Discrepancy

yuankun jiang, Chenglin Li, Wenrui Dai and
Junni Zou, Hongkai Xiong

Keywords Paper

Reinforcement Learning and Planning

0

0

0

0

5:17

18/07/2021

Provably Correct Optimization and Exploration with Non-linear Policies

Fei Feng, Wotao Yin, Alekh Agarwal, Lin Yang

Keywords Paper

Deep Learning, Adversarial Networks, Applications, Fairness, Accountability, and Transparency, Theory, RL, Decisions and Control Theory

0

0

0

0

5:03

26/04/2020

CAQL: Continuous Action Q-Learning

Moonkyung Ryu, Yinlam Chow, Ross Anderson and
Christian Tjandraatmadja, Craig Boutilier

Keywords Paper

Reinforcement learning (RL), DQN, Continuous control, Mixed-Integer Programming (MIP)

0

0

0

0

5:36

18/07/2021

Emphatic Algorithms for Deep Reinforcement Learning

Ray Jiang, Tom Zahavy, Zhongwen Xu and
Adam White, Matteo Hessel, Charles Blundell, Hado van Hasselt

Keywords Paper

Reinforcement Learning and Planning, Deep RL

1

0

0

0

5:21

03/05/2021

Non-asymptotic Confidence Intervals of Off-policy Evaluation: Primal and Dual Bounds

Yihao Feng, Ziyang Tang, Na Zhang, Qiang Liu

Keywords Paper

Reinforcement Learnings, Off Policy Evaluation, Non-asymptotic Confidence Intervals

0

0

0

0

4:26

18/07/2021

Representation Matters: Offline Pretraining for Sequential Decision Making

Mengjiao Yang, Ofir Nachum

Keywords Paper

Reinforcement Learning and Planning

1

0

0

0

5:06

03/05/2021

Parameter-Based Value Functions

Francesco Faccio, Louis Kirsch, Jürgen Schmidhuber

Keywords Paper

Off-Policy Reinforcement Learning, Reinforcement Learning

0

0

0

0

2:45

06/12/2021

Local policy search with Bayesian optimization

Sarah Müller, Alexander von Rohr, Sebastian Trimpe

Keywords Paper

theory, optimization, reinforcement learning and planning, active learning

0

0

0

0

11:42

18/07/2021

FOP: Factorizing Optimal Joint Policy of Maximum-Entropy Multi-Agent Reinforcement Learning

Tianhao Zhang, 岳珩李, Chen Wang and
Guangming Xie, Zongqing Lu

Keywords Paper

Reinforcement Learning and Planning, Multi-Agent RL

0

0

0

0

3:53

06/12/2021

Conservative Offline Distributional Reinforcement Learning

Yecheng Ma, Dinesh Jayaraman, Osbert Bastani

Keywords Paper

reinforcement learning and planning

1

0

0

0

13:54

06/12/2021

Provable Benefits of Actor-Critic Methods for Offline Reinforcement Learning

Andrea Zanette, Martin J Wainwright, Emma Brunskill

Keywords Paper

reinforcement learning and planning

0

0

0

0

14:28

06/12/2020

Policy Improvement via Imitation of Multiple Oracles

Ching-An Cheng, Andrey Kolobov, Alekh Agarwal

Keywords Paper

0

0

0

0

3:12

18/07/2021

Data-efficient Hindsight Off-policy Option Learning

Markus Wulfmeier, Dushyant Rao, Roland Hafner and
Thomas Lampe, Abbas Abdolmaleki, Tim Hertweck, Michael Neunert, Dhruva Tirumala Bukkapatnam, Noah Siegel, Nicolas Heess, Martin Riedmiller

Keywords Paper

Reinforcement Learning and Planning, Deep RL

0

0

0

0

5:09

02/02/2021

DeepSynth: Automata Synthesis for Automatic Task Segmentation in Deep Reinforcement Learning

Mohammadhosein Hasanbeig, Natasha Yogananda Jeppu, Alessandro Abate and
Tom Melham, Daniel Kroening

Keywords Paper

0

0

0

0

15:45

02/02/2021

Value-Decomposition Multi-Agent Actor-Critics

Jianyu Su, Stephen Adams, Peter Beling

Keywords Paper

0

0

0

0

19:21

06/12/2020

RL Unplugged: A Collection of Benchmarks for Offline Reinforcement Learning

CAGLAR Gulcehre, Ziyu Wang, Alexander Novikov and
Thomas Paine, Sergio Gómez, Konrad Zolna, Rishabh Agarwal, Josh Merel, Daniel Mankowitz, Cosmin Paduraru, Gabriel Dulac-Arnold, Jerry Li, Mohammad Norouzi, Matthew Hoffman, Nicolas Heess, Nando de Freitas

Keywords Paper

0

0

0

0

3:25

18/07/2021

Directional Bias Amplification

Angelina Wang, Olga Russakovsky

Keywords Paper

Reinforcement Learning and Planning, Exploration, Algorithms, Bandit Algorithms; Reinforcement Learning and Planning, Reinforcement Learning; Theory, Learning Theory, Social Aspects of Machine Learning, Fairness, Accountability, and Transparency

0

0

0

0

5:54

06/12/2020

Adversarial Distributional Training for Robust Deep Learning

Yinpeng Dong, Zhijie Deng, Tianyu Pang and
Jun Zhu, Hang Su

Keywords Paper

1

0

0

1

3:22

18/07/2021

Tesseract: Tensorised Actors for Multi-Agent Reinforcement Learning

Anuj Mahajan, Mikayel Samvelyan, Lei Mao and
Viktor Makoviychuk, Animesh Garg, Jean Kossaifi, Shimon Whiteson, Yuke Zhu, Anima Anandkumar

Keywords Paper

Reinforcement Learning and Planning, Multi-Agent RL

0

0

0

0

5:16

06/12/2020

Multi-task Batch Reinforcement Learning with Metric Learning

Jiachen Li, Quan Vuong, Shuang Liu and
Minghua Liu, Kamil Ciosek, Henrik Christensen, Hao Su

Keywords Paper

Algorithms -> Multitask and Transfer Learning; Algorithms -> Representation Learning; Data, Challenges, Implementations, and So, Applications -> Natural Language Processing

0

0

0

0

3:15

06/12/2020

Model-based Adversarial Meta-Reinforcement Learning

Zichuan Lin, Garrett Thomas, Guangwen Yang, Tengyu Ma

Keywords Paper

0

0

0

0

3:31

06/12/2021

Offline Meta Reinforcement Learning -- Identifiability Challenges and Effective Data Collection Strategies

Ron Dorfman, Idan Shenfeld, Aviv Tamar

Keywords Paper

reinforcement learning and planning

0

0

0

0

14:44

06/12/2020

Conservative Q-Learning for Offline Reinforcement Learning

Aviral Kumar, Aurick Zhou, George Tucker, Sergey Levine

Keywords Paper

Algorithms -> Representation Learning; Algorithms -> Structured Prediction; Applications -> Computational Biology and Bioinform, Deep Learning -> Embedding Approaches

0

0

0

0

3:16

26/04/2020

Exploring Model-based Planning with Policy Networks

Tingwu Wang, Jimmy Ba

Keywords Paper

reinforcement learning, model-based reinforcement learning, planning

0

0

0

0

5:01

02/02/2021

GaussianPath:A Bayesian Multi-Hop Reasoning Framework for Knowledge Graph Reasoning

Guojia Wan, Bo Du

Keywords Paper

0

0

0

0

13:52

06/12/2021

Is Bang-Bang Control All You Need? Solving Continuous Control with Bernoulli Policies

Tim Seyde, Igor Gilitschenski, Wilko Schwarting and
Bartolomeo Stellato, Martin Riedmiller, Markus Wulfmeier, Daniela Rus

Keywords Paper

reinforcement learning and planning

0

0

0

0

6:48

06/12/2020

Preference-based Reinforcement Learning with Finite-Time Guarantees

Yichong Xu, Ruosong Wang, Lin Yang and
Aarti Singh, Artur Dubrawski

Keywords Paper

0

0

0

0

3:04

06/12/2021

Episodic Multi-agent Reinforcement Learning with Curiosity-driven Exploration

Lulu Zheng, Jiarui Chen, Jianhao Wang and
Jiamin He, Yujing Hu, Yingfeng Chen, Changjie Fan, Yang Gao, Chongjie Zhang

Keywords Paper

reinforcement learning and planning

0

0

0

0

12:25

12/07/2020

Q-value Path Decomposition for Deep Multiagent Reinforcement Learning

Yaodong Yang, Jianye Hao, Guangyong Chen and
Hongyao Tang, Yingfeng Chen, Yujing Hu, Changjie Fan, Zhongyu Wei

Keywords Paper

Planning, Control, and Multiagent Learning

0

0

0

0

6:42

26/08/2020

Why Non-myopic Bayesian Optimization is Promising and How Far Should We Look-ahead? A Study via Rollout

Xubo Yue, Raed AL Kontar

Keywords Paper

0

0

0

0

13:38

06/12/2020

Value-driven Hindsight Modelling

Arthur Guez, Fabio Viola, Theophane Weber and
Lars Buesing, Steven Kapturowski, Doina Precup, David Silver, Nicolas Heess

Keywords Paper

1

0

0

0

3:20

18/07/2021

Convex Regularization in Monte-Carlo Tree Search

Tuan Q Dam, Carlo D'Eramo, Jan Peters, Joni Pajarinen

Keywords Paper

Reinforcement Learning and Planning

0

0

0

0

4:52

06/12/2020

Efficient Exploration of Reward Functions in Inverse Reinforcement Learning via Bayesian Optimization

Sreejith Balakrishnan, Quoc Phong Nguyen, Bryan Kian Hsiang Low, Harold Soh

Keywords Paper

0

0

0

0

3:22

18/07/2021

Off-Belief Learning

Hengyuan Hu, Adam Lerer, Brandon Cui and
Luis Pineda, Noam Brown, Jakob Foerster

Keywords Paper

Reinforcement Learning and Planning

0

0

0

0

5:10

02/02/2021

Addressing Action Oscillations through Learning Policy Inertia

Chen Chen, Hongyao Tang, Jianye Hao and
Wulong Liu, Zhaopeng Meng

Keywords Paper

0

0

0

0

14:57

02/02/2021

The Value-Improvement Path: Towards Better Representations for Reinforcement Learning

Will Dabney, André Barreto, Mark Rowland and
Robert Dadashi, John Quan, Marc G. Bellemare, David Silver

Keywords Paper

0

0

0

0

20:06