Control Variates for Slate Off-Policy Evaluation

06/12/2021

Control Variates for Slate Off-Policy Evaluation

Nikos Vlassis, Ashok Chandrashekar, Fernando Amat, Nathan Kallus

Keywords: optimization, bandits

Abstract Paper Similar Papers

Abstract: We study the problem of off-policy evaluation from batched contextual bandit data with multidimensional actions, often termed slates. The problem is common to recommender systems and user-interface optimization, and it is particularly challenging because of the combinatorially-sized action space. Swaminathan et al. (2017) have proposed the pseudoinverse (PI) estimator under the assumption that the conditional mean rewards are additive in actions. Using control variates, we consider a large class of unbiased estimators that includes as specific cases the PI estimator and (asymptotically) its self-normalized variant. By optimizing over this class, we obtain new estimators with risk improvement guarantees over both the PI and the self-normalized PI estimators. Experiments with real-world recommender data as well as synthetic data validate these improvements in practice.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at NeurIPS 2021 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

03/05/2021

Tilted Empirical Risk Minimization

Tian Li, Ahmad Beirami, Maziar Sanjabi, Virginia Smith

Keywords Paper

fairness, label noise robustness, models of learning and generalization, exponential tilting

0

0

0

0

5:11

18/07/2021

Characterizing Fairness Over the Set of Good Models Under Selective Labels

Amanda Coston, Ashesh Rambachan, Alexandra Chouldechova

Keywords Paper

Social Aspects of Machine Learning, Fairness, Accountability, and Transparency

0

0

0

0

5:10

18/07/2021

Active Feature Acquisition with Generative Surrogate Models

Yang Li, Junier Oliva

Keywords Paper

Deep Learning, Generative Models, Applications, Computational Biology and Bioinformatics, Reinforcement Learning and Planning, Deep RL

0

0

0

0

5:44

06/12/2021

Design of Experiments for Stochastic Contextual Linear Bandits

Andrea Zanette, Kefan Dong, Jonathan N Lee, Emma Brunskill

Keywords Paper

reinforcement learning and planning, bandits

0

0

0

0

13:58

06/12/2020

Direct Policy Gradients: Direct Optimization of Policies in Discrete Action Spaces

Guy Lorberbom, Chris J. Maddison, Nicolas Heess and
Tamir Hazan, Daniel Tarlow

Keywords Paper

0

0

0

0

3:16

06/12/2021

Local policy search with Bayesian optimization

Sarah Müller, Alexander von Rohr, Sebastian Trimpe

Keywords Paper

theory, optimization, reinforcement learning and planning, active learning

0

0

0

0

11:42

19/08/2021

Variational Model-based Policy Optimization

Yinlam Chow, Brandon Cui, Moonkyung Ryu, Mohammad Ghavamzadeh

Keywords Paper

Machine Learning, Reinforcement Learning

0

0

0

0

15:31

14/09/2020

Counterfactual Propagation for Semi-Supervised Individual Treatment Effect Estimation

Shonosuke Harada, Hisashi Kashima

Keywords Paper

causal inference, treatment effect estimation, semi-supervised learning

0

0

0

0

11:23

06/12/2020

High-Dimensional Contextual Policy Search with Unknown Context Rewards using Bayesian Optimization

Qing Feng , Ben Letham, Hongzi Mao, Eytan Bakshy

Keywords Paper

0

0

0

0

3:29

18/07/2021

Doubly Robust Off-Policy Actor-Critic: Convergence and Optimality

Tengyu Xu, Zhuoran Yang, Zhaoran Wang, Yingbin LIANG

Keywords Paper

Theory, RL, Decisions and Control Theory

0

0

0

0

4:23

26/08/2020

Why Non-myopic Bayesian Optimization is Promising and How Far Should We Look-ahead? A Study via Rollout

Xubo Yue, Raed AL Kontar

Keywords Paper

0

0

0

0

13:38

06/12/2021

Risk Bounds and Calibration for a Smart Predict-then-Optimize Method

Heyuan Liu, Paul Grigas

Keywords Paper

theory, optimization, machine learning

0

0

0

0

14:56

06/12/2021

Towards Robust Bisimulation Metric Learning

Mete Kemertas, Tristan Aumentado-Armstrong

Keywords Paper

reinforcement learning and planning, robustness, representation learning

0

0

0

0

12:24

06/12/2021

Structured Dropout Variational Inference for Bayesian Neural Networks

Son Nguyen, Duong Nguyen, Khai Nguyen and
Khoat Than, Hung Bui, Nhat Ho

Keywords Paper

deep learning, generative model

0

0

0

0

11:28

12/07/2020

Exploration Through Bias: Revisiting Biased Maximum Likelihood Estimation in Stochastic Multi-Armed Bandits

Xi Liu, Ping-Chun Hsieh, Yu Heng Hung and
Anirban Bhattacharya, P. Kumar

Keywords Paper

Online Learning, Active Learning, and Bandits

0

0

0

0

14:46

18/07/2021

Provably Correct Optimization and Exploration with Non-linear Policies

Fei Feng, Wotao Yin, Alekh Agarwal, Lin Yang

Keywords Paper

Deep Learning, Adversarial Networks, Applications, Fairness, Accountability, and Transparency, Theory, RL, Decisions and Control Theory

0

0

0

0

5:03

26/04/2020

Infinite-horizon Off-Policy Policy Evaluation with Multiple Behavior Policies

Xinyun Chen, Lu Wang, Yizhe Hang and
Heng Ge, Hongyuan Zha

Keywords Paper

off-policy policy evaluation, multiple importance sampling, kernel method, variance reduction

0

0

0

0

6:57

06/12/2020

Adaptive Sampling for Stochastic Risk-Averse Learning

Sebastian Curi, Kfir Y. Levy, Stefanie Jegelka, Andreas Krause

Keywords Paper

0

0

0

0

3:13

06/12/2020

Fair Multiple Decision Making Through Soft Interventions

Yaowei Hu, Yongkai Wu, Lu Zhang, Xintao Wu

Keywords Paper

Algorithms -> Relational Learning; Applications -> Network Analysis; Deep Learning -> Attention Models; Deep Learning -> Recurr, Deep Learning -> Generative Models

0

0

0

0

3:21

12/07/2020

Convex Calibrated Surrogates for the Multi-Label F-Measure

Mingyuan Zhang, Harish Guruprasad Ramaswamy, Shivani Agarwal

Keywords Paper

Supervised Learning

0

0

0

0

16:09

20/08/2020

Raising Expectations: Automating Expected Cost Analysis with Types

Di Wang, David M. Kahn, Jan Hoffmann

Keywords Paper

resource-aware type system, expected execution cost, analysis of probabilistic programs

0

0

0

0

15:02

06/12/2021

Collaborative Uncertainty in Multi-Agent Trajectory Forecasting

Bohan Tang, Yiqi Zhong, Ulrich Neumann and
Gang Wang, Siheng Chen, Ya Zhang

Keywords Paper

deep learning

0

0

0

0

7:15

19/01/2020

Proving Expected Sensitivity of Probabilistic Programs with Randomized Variable-Dependent Termination Time

Peixin Wang, Hongfei Fu, Krishnendu Chatterjee and
Yuxin Deng, Ming Xu

Keywords Paper

Martingales, Expected Sensitivity, Probabilistic Programs

0

0

0

0

21:04

06/12/2021

Risk-Aware Transfer in Reinforcement Learning using Successor Features

Michael Gimelfarb, Andre Barreto, Scott Sanner, Chi-Guhn Lee

Keywords Paper

reinforcement learning and planning, representation learning, transfer learning

0

0

0

0

12:06

06/12/2021

Asymptotically Exact Error Characterization of Offline Policy Evaluation with Misspecified Linear Models

Kohei Miyaguchi

Keywords Paper

reinforcement learning and planning

0

0

0

0

9:06

06/12/2021

Inverse Optimal Control Adapted to the Noise Characteristics of the Human Sensorimotor System

Matthias Schultheis, Dominik Straub, Constantin Rothkopf

Keywords Paper

0

0

0

0

9:29

06/12/2021

Loss function based second-order Jensen inequality and its application to particle variational inference

Futoshi Futami, Tomoharu Iwata, naonori ueda and
Issei Sato, Masashi Sugiyama

Keywords Paper

optimization, generative model

0

0

0

0

14:09

18/07/2021

Bootstrapping Fitted Q-Evaluation for Off-Policy Inference

Botao Hao, Xiang Ji, Yaqi Duan and
Hao Lu, Csaba Szepesvari, Mengdi Wang

Keywords Paper

Theory, RL, Decisions and Control Theory

0

0

0

0

5:18

18/07/2021

Testing Group Fairness via Optimal Transport Projections

Nian Si, Karthyek Murthy, Jose Blanchet, Viet Anh Nguyen

Keywords Paper

Theory, Algorithms, Sparsity and Compressed Sensing; Optimization; Theory, Regularization, Social Aspects of Machine Learning, Fairness, Accountability, and Transparency

0

0

0

1

5:39

06/12/2021

COMBO: Conservative Offline Model-Based Policy Optimization

Tianhe Yu, Aviral Kumar, Rafael Rafailov and
Aravind Rajeswaran, Sergey Levine, Chelsea Finn

Keywords Paper

deep learning, optimization, reinforcement learning and planning

0

0

0

0

12:35

13/04/2021

Linear models are robust optimal under strategic behavior

Wei Tang, Chien-Ju Ho, Yang Liu

Keywords Paper

0

0

0

0

3:32

26/08/2020

Ordered SGD: A New Stochastic Optimization Framework for Empirical Risk Minimization

Kenji Kawaguchi, Haihao Lu

Keywords Paper

0

0

0

0

14:10

12/07/2020

Optimization and Analysis of the pAp@k Metric for Recommender Systems

Gaurush Hiranandani, Warut Vijitbenjaronk, Sanmi Koyejo, Prateek Jain

Keywords Paper

Learning Theory

0

0

0

0

16:11

18/07/2021

DORO: Distributional and Outlier Robust Optimization

Runtian Zhai, Chen Dan, Zico Kolter, Pradeep Ravikumar

Keywords Paper

Probabilistic Methods, Robust statistics

0

0

0

1

5:06

03/05/2021

Blending MPC & Value Function Approximation for Efficient Reinforcement Learning

Mohak Bhardwaj, Sanjiban Choudhury, Byron Boots

Keywords Paper

reinforcement learning, model-predictive control

0

0

0

0

5:09

06/12/2020

The Advantage of Conditional Meta-Learning for Biased Regularization and Fine Tuning

Giulia Denevi, Massimiliano Pontil, Carlo Ciliberto

Keywords Paper

0

0

0

0

3:24

12/07/2020

Uniform Convergence of Rank-weighted Learning

Liu Leqi, Justin Khim, Adarsh Prasad, Pradeep Ravikumar

Keywords Paper

Learning Theory

0

0

0

0

13:21

06/12/2021

Automated Dynamic Mechanism Design

Hanrui Zhang, Vincent Conitzer

Keywords Paper

0

0

0

0

14:35

02/02/2021

Fairness in Forecasting and Learning Linear Dynamical Systems

Quan Zhou, Jakub Marecek, Robert N. Shorten

Keywords Paper

0

0

0

0

15:54

04/08/2021

Instance-Dependent Complexity of Contextual Bandits and Reinforcement Learning: A Disagreement-Based Perspective

Dylan Foster, Alexander Rakhlin, David Simchi-Levi, Yunzong Xu

Keywords Paper

0

0

0

0

16:53