Active Offline Policy Selection

06/12/2021

Active Offline Policy Selection

Ksenia Konyushova, Yutian Chen, Thomas Paine, Caglar Gulcehre, Cosmin Paduraru, Daniel J Mankowitz, Misha Denil, Nando de Freitas

Keywords: optimization, reinforcement learning and planning, active learning

Abstract Paper Similar Papers

Abstract: This paper addresses the problem of policy selection in domains with abundant logged data, but with a restricted interaction budget. Solving this problem would enable safe evaluation and deployment of offline reinforcement learning policies in industry, robotics, and recommendation domains among others. Several off-policy evaluation (OPE) techniques have been proposed to assess the value of policies using only logged data. However, there is still a big gap between the evaluation by OPE and the full online evaluation in the real environment. Yet, large amounts of online interactions are often not possible in practice. To overcome this problem, we introduce active offline policy selection --- a novel sequential decision approach that combines logged data with online interaction to identify the best policy. This approach uses OPE estimates to warm start the online evaluation. Then, in order to utilize the limited environment interactions wisely we decide which policy to evaluate next based on a Bayesian optimization method with a kernel function that represents policy similarity. We use multiple benchmarks with a large number of candidate policies to show that the proposed approach improves upon state-of-the-art OPE estimates and pure online policy evaluation

1

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at NeurIPS 2021 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

03/05/2021

Benchmarks for Deep Off-Policy Evaluation

Justin Fu, Mohammad Norouzi, Ofir Nachum and
George Tucker, ziyu wang, Alexander Novikov, Sherry Yang, Michael Zhang, Yutian Chen, Aviral Kumar, Cosmin Paduraru, Sergey Levine, Tom Paine

Keywords Paper

reinforcement learning, benchmarks, off-policy evaluation

0

0

0

0

10:05

13/04/2021

Non-stationary off-policy optimization

Joey Hong, Branislav Kveton, Manzil Zaheer and
Yinlam Chow, Amr Ahmed

Keywords Paper

0

0

0

0

2:57

18/07/2021

Provably Correct Optimization and Exploration with Non-linear Policies

Fei Feng, Wotao Yin, Alekh Agarwal, Lin Yang

Keywords Paper

Deep Learning, Adversarial Networks, Applications, Fairness, Accountability, and Transparency, Theory, RL, Decisions and Control Theory

0

0

0

0

5:03

02/02/2021

Data-driven Competitive Algorithms for Online Knapsack and Set Cover

Ali Zeynali, Bo Sun, Mohammad Hajiesmaili, Adam Wierman

Keywords Paper

0

0

0

0

16:26

06/12/2020

Off-Policy Imitation Learning from Observations

Zhuangdi Zhu, Kaixiang Lin, Bo Dai, Jiayu Zhou

Keywords Paper

0

0

0

1

3:24

06/12/2020

Leveraging Predictions in Smoothed Online Convex Optimization via Gradient-based Algorithms

Yingying Li, Na Li

Keywords Paper

Deep Learning -> Generative Models, Deep Learning -> Attention Models

0

0

0

0

3:19

22/09/2020

Exploring clustering of bandits for online recommendation system

Liu Yang, Bo Liu, Leyu Lin and
Feng Xia, Kai Chen, Qiang Yang

Keywords Paper

online learning, cluster-of-bandit, recommendation system

0

0

0

0

2:57

03/08/2020

Semi-bandit Optimization in the Dispersed Setting

Travis Dick, Wesley Pegden, Maria-Florina Balcan

Keywords Paper

0

0

0

0

8:04

14/06/2020

Projection & Probability-Driven Black-Box Attack

Jie Li, Rongrong Ji, Hong Liu and
Jianzhuang Liu, Bineng Zhong, Cheng Deng, Qi Tian

Keywords Paper

adversarial example, black-box attack, projection matrix, compressed sensing, random walk, low-frequency perturbation

0

0

0

0

1:01

18/07/2021

Learning Online Algorithms with Distributional Advice

Ilias Diakonikolas, Vasilis Kontonis, Christos Tzamos and
Ali Vakilian, Nikos Zarifis

Keywords Paper

Algorithms

0

0

0

0

5:45

25/07/2020

Accelerated convergence for counterfactual learning to rank

Rolf Jagerman, Maarten Rijke

Keywords Paper

unbiased learning, counterfactual learning, learning to rank

0

0

0

0

14:21

18/07/2021

EMaQ: Expected-Max Q-Learning Operator for Simple Yet Effective Offline and Online RL

Seyed Kamyar Seyed Ghasemipour, Dale Schuurmans, Shixiang Gu

Keywords Paper

Reinforcement Learning and Planning

0

0

0

1

5:54

06/12/2020

High-Dimensional Contextual Policy Search with Unknown Context Rewards using Bayesian Optimization

Qing Feng , Ben Letham, Hongzi Mao, Eytan Bakshy

Keywords Paper

0

0

0

0

3:29

06/12/2021

Provably Efficient Causal Reinforcement Learning with Confounded Observational Data

Lingxiao Wang, Zhuoran Yang, Zhaoran Wang

Keywords Paper

deep learning, reinforcement learning and planning, causality

0

0

0

0

14:54

06/12/2020

Follow the Perturbed Leader: Optimism and Fast Parallel Algorithms for Smooth Minimax Games

Arun Suggala, Praneeth Netrapalli

Keywords Paper

1

1

0

0

3:29

18/07/2021

Is Pessimism Provably Efficient for Offline RL?

Ying Jin, Zhuoran Yang, Zhaoran Wang

Keywords Paper

Reinforcement Learning and Planning, Others

0

0

0

0

5:17

26/04/2020

VariBAD: A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning

Luisa Zintgraf, Kyriacos Shiarlis, Maximilian Igl and
Sebastian Schulze, Yarin Gal, Katja Hofmann, Shimon Whiteson

Keywords Paper

Meta-Learning, Bayesian Reinforcement Learning, BAMDPs, Deep Reinforcement Learning

0

0

0

0

5:11

26/08/2020

Truly Batch Model-Free Inverse Reinforcement Learning about Multiple Intentions

Giorgia Ramponi, Amarildo Likmeta, Alberto Maria Metelli and
Andrea Tirinzoni, Marcello Restelli

Keywords Paper

0

0

0

0

9:41

13/04/2021

Experimental design for regret minimization in linear bandits

Andrew Wagenmaker, Julian Katz-Samuels, Kevin Jamieson

Keywords Paper

0

0

0

0

3:05

06/12/2021

Efficient Training of Retrieval Models using Negative Cache

Erik Lindgren, Sashank Reddi, Ruiqi Guo, Sanjiv Kumar

Keywords Paper

deep learning, machine learning

0

0

0

0

10:41

06/12/2021

PartialFed: Cross-Domain Personalized Federated Learning via Partial Initialization

Benyuan Sun, Hongxing Huo, YI YANG, Bo Bai

Keywords Paper

machine learning, privacy, federated learning

0

0

0

0

10:35

25/07/2020

Automated embedding size search in deep recommender systems

Haochen Liu, Xiangyu Zhao, Chong Wang and
Xiaobing Liu, Jiliang Tang

Keywords Paper

embedding, recommender system, AutoML

0

0

0

0

16:19

03/05/2021

Deployment-Efficient Reinforcement Learning via Model-Based Offline Optimization

Tatsuya Matsushima, Hiroki Furuta, Yutaka Matsuo and
Ofir Nachum, Shixiang Gu

Keywords Paper

Model-based RL, deployment-efficiency, offline RL, Reinforcement Learning

0

0

0

0

5:14

12/07/2020

Customizing ML Predictions for Online Algorithms

Keerti Anand, Rong Ge, Debmalya Panigrahi

Keywords Paper

Optimization - General

0

0

0

0

16:00

02/02/2021

A Primal-Dual Online Algorithm for Online Matching Problem in Dynamic Environments

Yu-Hang Zhou, Peng Hu, Chen Liang and
Huan Xu, Guangda Huzhang, Yinfu Feng, Qing Da, Xinshang Wang, An-Xiang Zeng

Keywords Paper

0

0

0

0

18:32

03/08/2020

Brief announcement: Deterministic lower bound for dynamic balanced graph partitioning

Maciej Pacut, Mahmoud Parham, Stefan Schmid

Keywords Paper

online algorithms, graph partitioning, self-adjusting networks

0

0

0

0

10:22

06/12/2021

Factored Policy Gradients: Leveraging Structure for Efficient Learning in MOMDPs

Thomas Spooner, Nelson Vadori, Sumitra Ganesh

Keywords Paper

bandits

0

0

0

0

14:40

18/07/2021

Online Selection Problems against Constrained Adversary

Zhihao Jiang, Pinyan Lu, Zhihao Gavin Tang, Yuhao Zhang

Keywords Paper

Algorithms

0

0

0

0

5:16

18/07/2021

Decision-Making Under Selective Labels: Optimal Finite-Domain Policies and Beyond

Dennis Wei

Keywords Paper

Applications, Computer Vision, Deep Learning, Adversarial Networks; Deep Learning, Generative Models, Social Aspects of Machine Learning

0

0

0

0

5:13

02/02/2021

Policy Optimization as Online Learning with Mediator Feedback

Alberto Maria Metelli, Matteo Papini, Pierluca D'Oro, Marcello Restelli

Keywords Paper

0

0

0

0

16:44

03/05/2021

Revisiting Locally Supervised Learning: an Alternative to End-to-end Training

Yulin Wang, Zanlin Ni, Shiji Song and
Le Yang, Gao Huang

Keywords Paper

Deep learning, Locally supervised training

1

0

0

1

5:03

06/12/2020

Critic Regularized Regression

Ziyu Wang, Alexander Novikov, Konrad Zolna and
Josh Merel, Jost Tobias Springenberg, Scott Reed, Bobak Shahriari, Noah Siegel, Caglar Gulcehre, Nicolas Heess, Nando de Freitas

Keywords Paper

0

0

0

0

3:20

02/02/2021

Projection-free Online Learning in Dynamic Environments

Yuanyu Wan, Bo Xue, Lijun Zhang

Keywords Paper

0

0

0

0

15:41

04/08/2021

Adaptivity in Adaptive Submodularity

Hossein Esfandiari, Amin Karbasi, Vahab Mirrokni

Keywords Paper

0

0

0

0

13:54

02/02/2021

Online DR-Submodular Maximization: Minimizing Regret and Constraint Violation

Prasanna Raut, Omid Sadeghi, Maryam Fazel

Keywords Paper

0

0

0

0

15:37

06/12/2020

Stateful Posted Pricing with Vanishing Regret via Dynamic Deterministic Markov Decision Processes

Yuval Emek, Ron Lavi, Rad Niazadeh, Yangguang Shi

Keywords Paper

0

0

0

0

3:10

06/12/2021

Online Adaptation to Label Distribution Shift

Ruihan Wu, Chuan Guo, Yi Su, Kilian Weinberger

Keywords Paper

optimization, machine learning, online learning

0

0

0

0

9:46

03/05/2021

A Panda? No, It's a Sloth: Slowdown Attacks on Adaptive Multi-Exit Neural Network Inference

Sanghyun Hong, Yigitcan Kaya, Ionut-Vlad Modoranu, Tudor Dumitras

Keywords Paper

efficient inference, adversarial examples, input-adaptive multi-exit neural networks, Slowdown attacks

0

0

0

0

10:24

12/07/2020

Min-Max Optimization without Gradients: Convergence and Applications to Black-Box Evasion and Poisoning Attacks

Sijia Liu, Songtao Lu, Xiangyi Chen and
Yao Feng, Kaidi Xu, Abdullah Al-Dujaili, Mingyi Hong, Una-May O'Reilly

Keywords Paper

Optimization - Non-convex

0

0

0

0

11:59

14/06/2020

Adaptive Hierarchical Down-Sampling for Point Cloud Classification

Ehsan Nezhadarya, Ehsan Taghavi, Ryan Razani and
Bingbing Liu, Jun Luo

Keywords Paper

critical points layer, pooling layer, graph neural networks, point cloud, down-sampling

0

0

0

0

1:00