Subgaussian and Differentiable Importance Sampling for Off-Policy Evaluation and Learning

06/12/2021

Subgaussian and Differentiable Importance Sampling for Off-Policy Evaluation and Learning

Alberto Maria Metelli, Alessio Russo, Marcello Restelli

Keywords: bandits

Abstract Paper Similar Papers

Abstract: Importance Sampling (IS) is a widely used building block for a large variety of off-policy estimation and learning algorithms. However, empirical and theoretical studies have progressively shown that vanilla IS leads to poor estimations whenever the behavioral and target policies are too dissimilar. In this paper, we analyze the theoretical properties of the IS estimator by deriving a novel anticoncentration bound that formalizes the intuition behind its undesired behavior. Then, we propose a new class of IS transformations, based on the notion of power mean. To the best of our knowledge, the resulting estimator is the first to achieve, under certain conditions, two key properties: (i) it displays a subgaussian concentration rate; (ii) it preserves the differentiability in the target distribution. Finally, we provide numerical simulations on both synthetic examples and contextual bandits, in comparison with off-policy evaluation and learning baselines.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at NeurIPS 2021 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

06/12/2021

Determinantal point processes based on orthogonal polynomials for sampling minibatches in SGD

Rémi Bardenet, Subhroshekhar Ghosh, Meixia LIN

Keywords Paper

optimization, machine learning

0

0

0

0

14:51

18/07/2021

Doubly Robust Off-Policy Actor-Critic: Convergence and Optimality

Tengyu Xu, Zhuoran Yang, Zhaoran Wang, Yingbin LIANG

Keywords Paper

Theory, RL, Decisions and Control Theory

0

0

0

0

4:23

06/12/2020

High-recall causal discovery for autocorrelated time series with latent confounders

Andreas Gerhardus, Jakob Runge

Keywords Paper

0

0

0

0

3:22

06/12/2021

Loss function based second-order Jensen inequality and its application to particle variational inference

Futoshi Futami, Tomoharu Iwata, naonori ueda and
Issei Sato, Masashi Sugiyama

Keywords Paper

optimization, generative model

0

0

0

0

14:09

18/07/2021

Proximal Causal Learning with Kernels: Two-Stage Estimation and Moment Restriction

Afsaneh Mastouri, Yuchen Zhu, Limor Gultchin and
Anna Korba, Ricardo Silva, Matt J. Kusner, Arthur Gretton, Krikamol Muandet

Keywords Paper

Algorithms, Kernel Methods

0

0

0

0

5:10

18/07/2021

A Distribution-dependent Analysis of Meta Learning

Mikhail Konobeev, Ilja Kuzborskij, Csaba Szepesvari

Keywords Paper

Theory, Statistical Learning Theory

0

0

0

0

5:06

06/12/2021

Provable Benefits of Actor-Critic Methods for Offline Reinforcement Learning

Andrea Zanette, Martin J Wainwright, Emma Brunskill

Keywords Paper

reinforcement learning and planning

0

0

0

0

14:28

06/12/2021

COMBO: Conservative Offline Model-Based Policy Optimization

Tianhe Yu, Aviral Kumar, Rafael Rafailov and
Aravind Rajeswaran, Sergey Levine, Chelsea Finn

Keywords Paper

deep learning, optimization, reinforcement learning and planning

0

0

0

0

12:35

06/12/2021

Local policy search with Bayesian optimization

Sarah Müller, Alexander von Rohr, Sebastian Trimpe

Keywords Paper

theory, optimization, reinforcement learning and planning, active learning

0

0

0

0

11:42

06/12/2021

Towards Calibrated Model for Long-Tailed Visual Recognition from Prior Perspective

Zhengzhuo Xu, Zenghao Chai, Chun Yuan

Keywords Paper

theory, machine learning

0

0

0

0

4:23

06/12/2021

Deep Bandits Show-Off: Simple and Efficient Exploration with Deep Networks

Rong Zhu, Mattia Rigotti

Keywords Paper

theory, deep learning, reinforcement learning and planning, bandits

0

0

0

0

8:45

06/12/2021

Control Variates for Slate Off-Policy Evaluation

Nikos Vlassis, Ashok Chandrashekar, Fernando Amat, Nathan Kallus

Keywords Paper

optimization, bandits

0

0

0

0

12:25

12/07/2020

Implicit Class-Conditioned Domain Alignment for Unsupervised Domain Adaptation

Xiang Jiang, Qicheng Lao, Stan Matwin, Mohammad Havaei

Keywords Paper

Deep Learning - Algorithms

0

0

0

0

14:47

26/08/2020

Variational Autoencoders and Nonlinear ICA: A Unifying Framework

Ilyes Khemakhem, Diederik Kingma, Ricardo Monti, Aapo Hyvarinen

Keywords Paper

1

0

0

0

14:20

06/12/2021

Learning with Labeling Induced Abstentions

Kareem Amin, Giulia DeSalvo, Afshin Rostamizadeh

Keywords Paper

machine learning, active learning

0

0

0

0

11:22

06/12/2020

The Advantage of Conditional Meta-Learning for Biased Regularization and Fine Tuning

Giulia Denevi, Massimiliano Pontil, Carlo Ciliberto

Keywords Paper

0

0

0

0

3:24

18/07/2021

DriftSurf: Stable-State / Reactive-State Learning under Concept Drift

Ashraf Tahmasbi, Ellango Jothimurugesan, Srikanta Tirthapura, Phil Gibbons

Keywords Paper

Algorithms, Online Learning Algorithms

0

0

0

0

5:07

06/12/2020

Domain Adaptation with Conditional Distribution Matching and Generalized Label Shift

Remi Tachet des Combes, Han Zhao, Yu-Xiang Wang, Geoffrey Gordon

Keywords Paper

0

0

0

0

3:19

04/08/2021

Benign Overfitting of Constant-Stepsize SGD for Linear Regression

Difan Zou, Jingfeng Wu, Vladimir Braverman and
Quanquan Gu, Sham Kakade

Keywords Paper

0

0

0

0

18:27

06/12/2020

Goal-directed Generation of Discrete Structures with Conditional Generative Models

Amina Mollaysa, Brooks Paige, Alexandros Kalousis

Keywords Paper

0

0

0

0

3:10

06/12/2021

Overparameterization Improves Robustness to Covariate Shift in High Dimensions

Nilesh Tripuraneni, Ben Adlam, Jeffrey Pennington

Keywords Paper

theory, deep learning, machine learning, robustness

0

0

0

0

15:11

18/07/2021

EMaQ: Expected-Max Q-Learning Operator for Simple Yet Effective Offline and Online RL

Seyed Kamyar Seyed Ghasemipour, Dale Schuurmans, Shixiang Gu

Keywords Paper

Reinforcement Learning and Planning

0

0

0

1

5:54

13/04/2021

Comparing the value of labeled and unlabeled data in method-of-moments latent variable estimation

Mayee Chen, Benjamin Cohen-Wang, Stephen Mussmann and
Frederic Sala, Christopher Re

Keywords Paper

0

0

0

0

3:04

19/08/2021

Partial Multi-Label Optimal Margin Distribution Machine

Nan Cao, Teng Zhang, Hai Jin

Keywords Paper

Machine Learning, Classification, Multi-instance; Multi-label; Multi-view learning, Weakly Supervised Learning

0

0

0

0

11:43

06/12/2021

Learning to Select Exogenous Events for Marked Temporal Point Process

Ping Zhang, Rishabh Iyer, Ashish Tendulkar and
Gaurav Aggarwal, Abir De

Keywords Paper

0

0

0

0

12:27

18/07/2021

Active Learning of Continuous-time Bayesian Networks through Interventions

Dominik Linzner, Heinz Koeppl

Keywords Paper

Probabilistic Methods, Graphical Models

0

0

0

0

5:07

03/05/2021

Learning Value Functions in Deep Policy Gradients using Residual Variance

Yannis Flet-Berliac, reda ouhamma, odalric-ambrym maillard, philippe preux

Keywords Paper

0

0

0

0

4:49

06/12/2021

Bayesian Adaptation for Covariate Shift

Aurick Zhou, Sergey Levine

Keywords Paper

deep learning, machine learning, robustness, vision, domain adaptation

0

0

0

0

8:21

06/12/2021

Bridging Offline Reinforcement Learning and Imitation Learning: A Tale of Pessimism

Paria Rashidinejad, Banghua Zhu, Cong Ma and
Jiantao Jiao, Stuart Russell

Keywords Paper

theory, reinforcement learning and planning, bandits

0

0

0

0

12:21

06/12/2021

Near-optimal Offline and Streaming Algorithms for Learning Non-Linear Dynamical Systems

Suhas Kowshik, Dheeraj Nagaraj, Prateek Jain, Praneeth Netrapalli

Keywords Paper

theory

0

0

0

0

14:43

06/12/2021

ReLU Regression with Massart Noise

Ilias Diakonikolas, Jong Ho Park, Christos Tzamos

Keywords Paper

0

0

0

0

11:59

06/12/2020

The Value Equivalence Principle for Model-Based Reinforcement Learning

Christopher Grimm, Andre Barreto, Satinder Singh, David Silver

Keywords Paper

0

0

0

0

3:19

18/07/2021

Generalised Lipschitz Regularisation Equals Distributional Robustness

Zac Cranko, Zhan Shi, Xinhua Zhang and
Richard Nock, Simon Kornblith

Keywords Paper

Algorithms, Kernel Methods

0

0

0

0

5:18

07/06/2020

Influence Maximization Using Influence and Susceptibility Embeddings

George Panagopoulos, Fragkiskos D. Malliaros, Michalis Vazirgiannis

Keywords Paper

cascades, influences, learning, networks, nodes, political, relationships, representation learning, representations, spread, terms, traditional, viral marketing

0

0

0

0

9:57

06/12/2021

Leveraging Recursive Gumbel-Max Trick for Approximate Inference in Combinatorial Spaces

Kirill Struminsky, Artyom Gadetsky, Denis Rakitin and
Danil Karpushkin, Dmitry Vetrov

Keywords Paper

deep learning, optimization

0

0

0

0

14:26

06/12/2021

Recovering Latent Causal Factor for Generalization to Distributional Shifts

Xinwei Sun, Botong Wu, Xiangyu Zheng and
Chang Liu, Wei Chen, Tao Qin, Tie-Yan Liu

Keywords Paper

domain adaptation

0

0

0

0

13:35

03/08/2020

Ordering Variables for Weighted Model Integration

Vincent Derkinderen, Evert Heylen, Pedro Zuidberg Dos Martires and
Samuel Kolb, Luc Raedt

Keywords Paper

0

0

0

0

7:55

06/12/2020

Minimax Estimation of Conditional Moment Models

Nishanth Dikkala, Greg Lewis, Lester Mackey, Vasilis Syrgkanis

Keywords Paper

0

0

0

0

3:04

06/12/2020

Off-Policy Imitation Learning from Observations

Zhuangdi Zhu, Kaixiang Lin, Bo Dai, Jiayu Zhou

Keywords Paper

0

0

0

1

3:24

26/04/2020

Intensity-Free Learning of Temporal Point Processes

Oleksandr Shchur, Marin Biloš, Stephan Günnemann

Keywords Paper

Temporal point process, neural density estimation

0

0

0

0

4:32