Optimal Off-Policy Evaluation from Multiple Logging Policies

18/07/2021

Optimal Off-Policy Evaluation from Multiple Logging Policies

Nathan Kallus, Yuta Saito, Masatoshi Uehara

Keywords: Probabilistic Methods, Causal Inference

Abstract Paper Similar Papers

Abstract: We study off-policy evaluation (OPE) from multiple logging policies, each generating a dataset of fixed size, i.e., stratified sampling. Previous work noted that in this setting the ordering of the variances of different importance sampling estimators is instance-dependent, which brings up a dilemma as to which importance sampling weights to use. In this paper, we resolve this dilemma by finding the OPE estimator for multiple loggers with minimum variance for any instance, i.e., the efficient one. In particular, we establish the efficiency bound under stratified sampling and propose an estimator achieving this bound when given consistent $q$-estimates. To guard against misspecification of $q$-functions, we also provide a way to choose the control variate in a hypothesis class to minimize variance. Extensive experiments demonstrate the benefits of our methods' efficiently leveraging of the stratified sampling of off-policy data from multiple loggers.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at ICML 2021 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

26/08/2020

Asymptotically Efficient Off-Policy Evaluation for Tabular Reinforcement Learning

Ming Yin, Yu-Xiang Wang

Keywords Paper

0

0

0

0

14:17

06/12/2021

Control Variates for Slate Off-Policy Evaluation

Nikos Vlassis, Ashok Chandrashekar, Fernando Amat, Nathan Kallus

Keywords Paper

optimization, bandits

0

0

0

0

12:25

13/04/2021

Near-optimal provable uniform convergence in offline policy evaluation for reinforcement learning

Ming Yin, Yu Bai, Yu-Xiang Wang

Keywords Paper

0

0

0

0

3:09

02/02/2021

Learning from eXtreme Bandit Feedback

Romain Lopez, Inderjit S. Dhillon, Michael I. Jordan

Keywords Paper

0

0

0

0

19:29

06/12/2021

The Adaptive Doubly Robust Estimator and a Paradox Concerning Logging Policy

Masahiro Kato, Kenichiro McAlinn, Shota Yasui

Keywords Paper

machine learning, causality

0

0

0

0

14:41

18/07/2021

Bootstrapping Fitted Q-Evaluation for Off-Policy Inference

Botao Hao, Xiang Ji, Yaqi Duan and
Hao Lu, Csaba Szepesvari, Mengdi Wang

Keywords Paper

Theory, RL, Decisions and Control Theory

0

0

0

0

5:18

18/07/2021

OptiDICE: Offline Policy Optimization via Stationary Distribution Correction Estimation

Jongmin Lee, Wonseok Jeon, Byung-Jun Lee and
Joelle Pineau, Kee-Eung Kim

Keywords Paper

Reinforcement Learning and Planning

1

0

0

1

5:15

06/12/2020

Fair regression with Wasserstein barycenters

Evgenii Chzhen, Christophe Denis, Mohamed Hebiri and
Luca Oneto, Massimiliano Pontil

Keywords Paper

0

0

0

0

3:12

06/12/2021

Bayesian decision-making under misspecified priors with applications to meta-learning

Max Simchowitz, Christopher Tosh, Akshay Krishnamurthy and
Daniel Hsu, Thodoris Lykouris, Miro Dudik, Robert E Schapire

Keywords Paper

meta learning, bandits

0

0

0

0

14:58

06/12/2021

Offline Constrained Multi-Objective Reinforcement Learning via Pessimistic Dual Value Iteration

Runzhe Wu, Yufeng Zhang, Zhuoran Yang, Zhaoran Wang

Keywords Paper

reinforcement learning and planning

0

0

0

0

13:40

26/08/2020

A Theoretical Case Study of Structured Variational Inference for Community Detection

Mingzhang Yin, Y. X. Rachel Wang, Purnamrita Sarkar

Keywords Paper

0

0

0

0

10:54

06/12/2020

On ranking via sorting by estimated expected utility

Clement Calauzenes, Nicolas Usunier

Keywords Paper

Optimization -> Convex Optimization, Optimization -> Stochastic Optimization

0

0

0

0

3:23

06/12/2021

Deep Bandits Show-Off: Simple and Efficient Exploration with Deep Networks

Rong Zhu, Mattia Rigotti

Keywords Paper

theory, deep learning, reinforcement learning and planning, bandits

0

0

0

0

8:45

26/08/2020

Balanced Off-Policy Evaluation in General Action Spaces

Arjun Sondhi, David Arbour, Drew Dimmery

Keywords Paper

0

0

0

0

12:36

26/08/2020

Why Non-myopic Bayesian Optimization is Promising and How Far Should We Look-ahead? A Study via Rollout

Xubo Yue, Raed AL Kontar

Keywords Paper

0

0

0

0

13:38

06/12/2020

Adaptive Sampling for Stochastic Risk-Averse Learning

Sebastian Curi, Kfir Y. Levy, Stefanie Jegelka, Andreas Krause

Keywords Paper

0

0

0

0

3:13

04/08/2021

Benign Overfitting of Constant-Stepsize SGD for Linear Regression

Difan Zou, Jingfeng Wu, Vladimir Braverman and
Quanquan Gu, Sham Kakade

Keywords Paper

0

0

0

0

18:27

02/02/2021

Online Optimal Control with Affine Constraints

Yingying Li, Subhro Das, Na Li

Keywords Paper

0

0

0

0

19:35

26/08/2020

A Robust Univariate Mean Estimator is All You Need

Adarsh Prasad, Sivaraman Balakrishnan, Pradeep Ravikumar

Keywords Paper

0

0

0

0

13:59

18/07/2021

Doubly Robust Off-Policy Actor-Critic: Convergence and Optimality

Tengyu Xu, Zhuoran Yang, Zhaoran Wang, Yingbin LIANG

Keywords Paper

Theory, RL, Decisions and Control Theory

0

0

0

0

4:23

06/12/2020

Leveraging Predictions in Smoothed Online Convex Optimization via Gradient-based Algorithms

Yingying Li, Na Li

Keywords Paper

Deep Learning -> Generative Models, Deep Learning -> Attention Models

0

0

0

0

3:19

06/12/2021

The balancing principle for parameter choice in distance-regularized domain adaptation

Werner Zellinger, Natalia Shepeleva, Marius-Constantin Dinu and
Hamid Eghbal-zadeh, Hoan Duc Nguyen, Bernhard Nessler, Sergei Pereverzyev, Bernhard A. Moser

Keywords Paper

domain adaptation

0

0

0

0

12:47

06/12/2021

Factored Policy Gradients: Leveraging Structure for Efficient Learning in MOMDPs

Thomas Spooner, Nelson Vadori, Sumitra Ganesh

Keywords Paper

bandits

0

0

0

0

14:40

12/07/2020

Implicit Class-Conditioned Domain Alignment for Unsupervised Domain Adaptation

Xiang Jiang, Qicheng Lao, Stan Matwin, Mohammad Havaei

Keywords Paper

Deep Learning - Algorithms

0

0

0

0

14:47

13/04/2021

On multilevel monte carlo unbiased gradient estimation for deep latent variable models

Yuyang Shi, Rob Cornish

Keywords Paper

0

0

0

0

3:06

06/12/2020

The Advantage of Conditional Meta-Learning for Biased Regularization and Fine Tuning

Giulia Denevi, Massimiliano Pontil, Carlo Ciliberto

Keywords Paper

0

0

0

0

3:24

18/07/2021

Finite-Sample Analysis of Off-Policy Natural Actor-Critic Algorithm

sajad khodadadian, Zaiwei Chen, Siva Maguluri

Keywords Paper

Reinforcement Learning and Planning

0

0

0

0

5:05

18/07/2021

Offline Reinforcement Learning with Fisher Divergence Critic Regularization

Ilya Kostrikov, Rob Fergus, Jonathan Tompson, Ofir Nachum

Keywords Paper

Reinforcement Learning and Planning, Deep RL

0

0

0

0

4:49

06/12/2021

Variational Bayesian Optimistic Sampling

Brendan O'Donoghue, Tor Lattimore

Keywords Paper

optimization, reinforcement learning and planning, generative model, bandits, online learning

0

0

0

0

15:13

02/02/2021

Variance Penalized On-Policy and Off-Policy Actor-Critic

Arushi Jain, Gandharv Patil, Ayush Jain and
Khimya Khetarpal, Doina Precup

Keywords Paper

0

0

0

0

17:58

26/04/2020

Ranking Policy Gradient

Kaixiang Lin, Jiayu Zhou

Keywords Paper

Sample-efficient reinforcement learning, off-policy learning.

0

0

0

0

5:43

06/12/2021

Nonuniform Negative Sampling and Log Odds Correction with Rare Events Data

HaiYing Wang, Aonan Zhang, Chong Wang

Keywords Paper

0

0

0

0

14:58

06/12/2021

COMBO: Conservative Offline Model-Based Policy Optimization

Tianhe Yu, Aviral Kumar, Rafael Rafailov and
Aravind Rajeswaran, Sergey Levine, Chelsea Finn

Keywords Paper

deep learning, optimization, reinforcement learning and planning

0

0

0

0

12:35

06/12/2020

Fair regression via plug-in estimator and recalibration with statistical guarantees

Evgenii Chzhen, Christophe Denis, Mohamed Hebiri and
Luca Oneto, Massimiliano Pontil

Keywords Paper

0

0

0

0

3:16

06/12/2021

Auditing Black-Box Prediction Models for Data Minimization Compliance

Bashir Rastegarpanah, Krishna Gummadi, Mark Crovella

Keywords Paper

reinforcement learning and planning, bandits, privacy

0

0

0

0

14:40

06/12/2021

Neural Algorithmic Reasoners are Implicit Planners

Andreea-Ioana Deac, Petar Veličković, Ognjen Milinkovic and
Pierre-Luc Bacon, Jian Tang, Mladen Nikolic

Keywords Paper

deep learning, reinforcement learning and planning, self-supervised learning, generative model, graph learning

0

0

0

0

13:10

22/09/2020

Doubly robust estimator for ranking metrics with post-click conversions

Yuta Saito

Keywords Paper

inverse propensity score., post-click conversions, ranking metrics, selection bias, doubly robust

0

0

0

0

3:19

26/08/2020

Ordered SGD: A New Stochastic Optimization Framework for Empirical Risk Minimization

Kenji Kawaguchi, Haihao Lu

Keywords Paper

0

0

0

0

14:10

18/07/2021

Beyond Variance Reduction: Understanding the True Impact of Baselines on Policy Optimization

Wes Chung, Valentin Thomas, Marlos C. Machado, Nicolas Le Roux

Keywords Paper

Reinforcement Learning and Planning

0

0

0

0

5:13

25/07/2020

Joint item recommendation and attribute inference: An adaptive graph convolutional network approach

Le Wu, Yonghui Yang, Kun Zhang and
Richang Hong, Yanjie Fu, Meng Wang

Keywords Paper

graph convolutional networks, collaborative filtering, attribute inference

0

0

0

0

14:14