Off-Policy Confidence Sequences

18/07/2021

Off-Policy Confidence Sequences

Nikos Karampatziakis, Paul Mineiro, Aaditya Ramdas

Keywords: Reinforcement Learning and Planning, Bandits

Abstract Paper Similar Papers

Abstract: We develop confidence bounds that hold uniformly over time for off-policy evaluation in the contextual bandit setting. These confidence sequences are based on recent ideas from martingale analysis and are non-asymptotic, non-parametric, and valid at arbitrary stopping times. We provide algorithms for computing these confidence sequences that strike a good balance between computational and statistical efficiency. We empirically demonstrate the tightness of our approach in terms of failure probability and width and apply it to the ``gated deployment'' problem of safely upgrading a production contextual bandit system.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at ICML 2021 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

13/04/2021

Critical parameters for scalable distributed learning with large batches and asynchronous updates

Sebastian Stich, Amirkeivan Mohtashami, Martin Jaggi

Keywords Paper

0

0

0

0

3:00

26/04/2020

GenDICE: Generalized Offline Estimation of Stationary Values

Ruiyi Zhang, Bo Dai, Lihong Li, Dale Schuurmans

Keywords Paper

Off-policy Policy Evaluation, Reinforcement Learning, Stationary Distribution Correction Estimation, Fenchel Dual

0

0

0

0

15:37

06/12/2021

Tactical Optimism and Pessimism for Deep Reinforcement Learning

Ted Moskovitz, Jack Parker-Holder, Aldo Pacchiano and
Michael Arbel, Michael Jordan

Keywords Paper

reinforcement learning and planning, bandits

0

0

0

0

6:30

26/08/2020

A Distributional Analysis of Sampling-Based Reinforcement Learning Algorithms

Philip Amortila, Doina Precup, Prakash Panangaden, Marc G. Bellemare

Keywords Paper

0

0

0

0

15:15

09/07/2020

Noise-tolerant, Reliable Active Classification with Comparison Queries

Max Hopkins, Shachar Lovett, Daniel Kane, Gaurav Mahajan

Keywords Paper

Active learning, Classification, Learning with algebraic or combinatorial structure, PAC learning

0

0

0

0

15:23

18/07/2021

Stochastic Sign Descent Methods: New Algorithms and Better Theory

Mher Safaryan, Peter Richtarik

Keywords Paper

Optimization, Distributed and Parallel Optimization

0

0

0

0

5:12

04/11/2020

Serving DNNs like Clockwork: Performance Predictability from the Bottom Up

Arpan Gujarati, Reza Karimi, Safya Alzayat and
Wei Hao, Antoine Kaufmann, Ymir Vigfusson, Jonathan Mace

Keywords Paper

0

0

0

0

20:17

06/12/2021

Neural Algorithmic Reasoners are Implicit Planners

Andreea-Ioana Deac, Petar Veličković, Ognjen Milinkovic and
Pierre-Luc Bacon, Jian Tang, Mladen Nikolic

Keywords Paper

deep learning, reinforcement learning and planning, self-supervised learning, generative model, graph learning

0

0

0

0

13:10

06/12/2021

Linear Convergence in Federated Learning: Tackling Client Heterogeneity and Sparse Gradients

Aritra Mitra, Rayana Jaafar, George J. Pappas, Hamed Hassani

Keywords Paper

optimization, federated learning

0

0

0

0

14:43

06/12/2021

A Minimalist Approach to Offline Reinforcement Learning

Scott Fujimoto, Shixiang (Shane) Gu

Keywords Paper

reinforcement learning and planning, generative model

1

0

0

0

8:31

06/12/2020

Task-Robust Model-Agnostic Meta-Learning

Liam Collins, Aryan Mokhtari, Sanjay Shakkottai

Keywords Paper

0

0

0

0

3:17

19/08/2021

Verifying Reinforcement Learning up to Infinity

Edoardo Bacci, Mirco Giacobbe, David Parker

Keywords Paper

Machine Learning, Deep Reinforcement Learning, Validation and Verification, Learning in Robotics

0

0

0

0

14:57

26/08/2020

A nonasymptotic law of iterated logarithm for general M-estimators

Nicolas Schreuder, Victor-Emmanuel Brunel, Arnak Dalalyan,

Keywords Paper

0

0

0

0

13:39

18/07/2021

Learning from History for Byzantine Robust Optimization

Praneeth Karimireddy, Lie He, Martin Jaggi

Keywords Paper

Optimization, Non-Convex Optimization

0

0

0

1

5:01

03/05/2021

Better Fine-Tuning by Reducing Representational Collapse

Armen Aghajanyan, Akshat Shrivastava, Anchit Gupta and
Naman Goyal, Luke Zettlemoyer, Sonal Gupta

Keywords Paper

nlp, glue, representational learning, finetuning

0

0

0

0

5:06

18/07/2021

Robust Unsupervised Learning via L-statistic Minimization

Andreas Maurer, Daniela Angela Parletta, Andrea Paudice, Massimiliano Pontil

Keywords Paper

Theory, Statistical Learning Theory

0

0

0

0

5:03

12/07/2020

AdaScale SGD: A User-Friendly Algorithm for Distributed Training

Tyler Johnson, Pulkit Agrawal, Haijie Gu, Carlos Guestrin

Keywords Paper

Optimization - Large Scale, Parallel and Distributed

0

0

0

0

14:22

13/04/2021

Approximately solving mean field games via entropy-regularized deep reinforcement learning

Kai Cui, Heinz Koeppl

Keywords Paper

0

0

0

0

3:01

06/12/2020

Gaussian Process Bandit Optimization of the Thermodynamic Variational Objective

Vu Nguyen, Vaden Masrani, Rob Brekelmans and
Michael A Osborne, Frank Wood

Keywords Paper

0

0

0

0

3:23

12/07/2020

Fast Deterministic CUR Matrix Decomposition with Accuracy Assurance

Yasutoshi Ida, Sekitoshi Kanai, Yasuhiro Fujiwara and
Tomoharu Iwata, Koh Takeuchi, Hisashi Kashima

Keywords Paper

Optimization - General

0

0

0

0

12:24

06/12/2021

Efficient Generalization with Distributionally Robust Learning

Soumyadip Ghosh, Mark Squillante, Ebisa Wollega

Keywords Paper

optimization, machine learning

0

0

0

0

14:57

03/05/2021

Acting in Delayed Environments with Non-Stationary Markov Policies

Esther Derman, Gal Dalal, Shie Mannor

Keywords Paper

reinforcement learning, delay

0

0

0

0

5:07

06/12/2021

BooVAE: Boosting Approach for Continual Learning of VAE

Evgenii Egorov, Anna Kuzina, Evgeny Burnaev

Keywords Paper

self-supervised learning, generative model, continual learning

0

0

0

0

8:54

06/12/2020

The Statistical Complexity of Early-Stopped Mirror Descent

Tomas Vaskevicius, Varun Kanade, Patrick Rebeschini

Keywords Paper

Algorithms; Algorithms -> Regression; Algorithms -> Similarity and Distance Learning; Optimization -> Combinatorial Optimizatio, Optimization

0

0

0

0

3:21

06/12/2020

On Warm-Starting Neural Network Training

Jordan Ash, Ryan Adams

Keywords Paper

0

0

0

0

2:30

05/12/2020

Towards a better understanding of label smoothing in neural machine translation

Yingbo Gao, Weiyue Wang, Christian Herold and
Zijian Yang, Hermann Ney

Keywords Paper

0

0

0

0

13:37

06/12/2021

Bayesian Optimization of Function Networks

Raul Astudillo, Peter Frazier

Keywords Paper

optimization, reinforcement learning and planning, kernel methods

0

0

0

0

15:14

12/07/2020

On Convergence-Diagnostic based Step Sizes for Stochastic Gradient Descent

Scott Pesme, Aymeric Dieuleveut, Nicolas Flammarion

Keywords Paper

Optimization - Convex

0

0

0

0

15:20

03/05/2021

FOCAL: Efficient Fully-Offline Meta-Reinforcement Learning via Distance Metric Learning and Behavior Regularization

Lanqing Li, Rui Yang, Dijun Luo

Keywords Paper

distance metric learning, offline/batch reinforcement learning, meta-reinforcement learning, contrastive learning, multi-task reinforcement learning

1

0

0

0

6:21

03/05/2021

Reset-Free Lifelong Learning with Skill-Space Planning

Kevin Lu, Aditya Grover, Pieter Abbeel, Igor Mordatch

Keywords Paper

reinforcement learning, lifelong, reset-free

0

0

0

0

4:53

09/07/2020

Finite Time Analysis of Linear Two-timescale Stochastic Approximation with Markovian Noise

Maksim Kaledin, Eric Moulines, Alexey Naumov and
Vladislav Tadic, Hoi-To Wai

Keywords Paper

Stochastic optimization, Reinforcement learning

0

0

0

0

12:29

06/12/2021

Backward-Compatible Prediction Updates: A Probabilistic Approach

Frederik Träuble, Julius von Kügelgen, Matthäus Kleindessner and
Francesco Locatello, Bernhard Schölkopf, Peter Gehler

Keywords Paper

machine learning, vision

0

0

0

0

14:45

06/12/2020

Neural Non-Rigid Tracking

Aljaz Bozic, Pablo Palafox, Michael Zollhöfer and
Angela Dai, Justus Thies, Matthias Niessner

Keywords Paper

0

0

0

0

3:22

12/07/2020

Tightening Exploration in Upper Confidence Reinforcement Learning

Hippolyte Bourel, Odalric-Ambrym Maillard, Mohammad Sadegh Talebi

Keywords Paper

Reinforcement Learning - General

0

0

0

0

16:14

06/12/2021

Fast Minimum-norm Adversarial Attacks through Adaptive Norm Constraints

Maura Pintor, Fabio Roli, Wieland Brendel, Battista Biggio

Keywords Paper

optimization, machine learning, robustness, adversarial robustness and security, vision

0

0

0

0

11:35

06/12/2021

Fault-Tolerant Federated Reinforcement Learning with Theoretical Guarantee

Xiaofeng Fan, Yining Ma, Zhongxiang Dai and
Wei Jing, Cheston Tan, Bryan Kian Hsiang Low

Keywords Paper

optimization, reinforcement learning and planning, adversarial robustness and security, federated learning

0

0

0

0

13:35

03/05/2021

Regularization Matters in Policy Optimization - An Empirical Study on Continuous Control

Zhuang Liu, Xuanlin Li, Bingyi Kang, trevor darrell

Keywords Paper

Deep Reinforcement Learning, Regularization, Continuous Control, Policy Optimization

0

0

0

0

8:45

02/02/2021

Infinite Gaussian Mixture Modeling with an Improved Estimation of the Number of Clusters

Avi Matza, Yuval Bistritz

Keywords Paper

0

0

0

0

20:14

19/08/2021

Contrastive Losses and Solution Caching for Predict-and-Optimize

Maxime Mulamba, Jayanta Mandi, Michelangelo Diligenti and
Michele Lombardi, Victor Bucarey, Tias Guns

Keywords Paper

Machine Learning, Neuro-Symbolic Methods, Structured Prediction, Constraint Optimization

0

0

0

0

12:10

06/12/2021

Co-Adaptation of Algorithmic and Implementational Innovations in Inference-based Deep Reinforcement Learning

Hiroki Furuta, Tadashi Kozuno, Tatsuya Matsushima and
Yutaka Matsuo, Shixiang (Shane) Gu

Keywords Paper

reinforcement learning and planning

0

0

0

0

10:00