From Optimality to Robustness: Adaptive Re-Sampling Strategies in Stochastic Bandits

06/12/2021

From Optimality to Robustness: Adaptive Re-Sampling Strategies in Stochastic Bandits

Dorian Baudry, Patrick Saux, Odalric-Ambrym Maillard

Keywords: reinforcement learning and planning, robustness, bandits

Abstract Paper Similar Papers

Abstract: The stochastic multi-arm bandit problem has been extensively studied under standard assumptions on the arm's distribution (e.g bounded with known support, exponential family, etc). These assumptions are suitable for many real-world problems but sometimes they require knowledge (on tails for instance) that may not be precisely accessible to the practitioner, raising the question of the robustness of bandit algorithms to model misspecification. In this paper we study a generic \emph{Dirichlet Sampling} (DS) algorithm, based on pairwise comparisons of empirical indices computed with \textit{re-sampling} of the arms' observations and a data-dependent \textit{exploration bonus}. We show that different variants of this strategy achieve provably optimal regret guarantees when the distributions are bounded and logarithmic regret for semi-bounded distributions with a mild quantile condition. We also show that a simple tuning achieve robustness with respect to a large class of unbounded distributions, at the cost of slightly worse than logarithmic asymptotic regret. We finally provide numerical experiments showing the merits of DS in a decision-making problem on synthetic agriculture data.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at NeurIPS 2021 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

04/08/2021

Benign Overfitting of Constant-Stepsize SGD for Linear Regression

Difan Zou, Jingfeng Wu, Vladimir Braverman and
Quanquan Gu, Sham Kakade

Keywords Paper

0

0

0

0

18:27

06/12/2021

Deep Bandits Show-Off: Simple and Efficient Exploration with Deep Networks

Rong Zhu, Mattia Rigotti

Keywords Paper

theory, deep learning, reinforcement learning and planning, bandits

0

0

0

0

8:45

06/12/2020

Unreasonable Effectiveness of Greedy Algorithms in Multi-Armed Bandit with Many Arms

Mohsen Bayati, Nima Hamidi, Ramesh Johari, Khashayar Khosravi

Keywords Paper

0

0

0

0

3:23

13/04/2021

Reinforcement learning in parametric MDPs with exponential families

Sayak Ray Chowdhury, Aditya Gopalan, Odalric-Ambrym Maillard

Keywords Paper

0

0

0

0

3:22

13/04/2021

Stochastic bandits with linear constraints

Aldo Pacchiano, Mohammad Ghavamzadeh, Peter Bartlett, Heinrich Jiang

Keywords Paper

0

0

0

0

3:02

13/04/2021

Contextual blocking bandits

Soumya Basu, Orestis Papadigenopoulos, Constantine Caramanis, Sanjay Shakkottai

Keywords Paper

0

0

0

0

2:47

02/02/2021

Disposable Linear Bandits for Online Recommendations

Melda Korkut, Andrew Li

Keywords Paper

0

0

0

0

17:20

13/04/2021

On multilevel monte carlo unbiased gradient estimation for deep latent variable models

Yuyang Shi, Rob Cornish

Keywords Paper

0

0

0

0

3:06

06/12/2021

Dealing With Misspecification In Fixed-Confidence Linear Top-m Identification

Clémence Réda, Andrea Tirinzoni, Rémy Degenne

Keywords Paper

theory, reinforcement learning and planning, bandits

0

0

0

0

14:14

13/04/2021

Low-rank generalized linear bandit problems

Yangyi Lu, Amirhossein Meisami, Ambuj Tewari

Keywords Paper

0

0

0

0

2:49

26/08/2020

A Robust Univariate Mean Estimator is All You Need

Adarsh Prasad, Sivaraman Balakrishnan, Pradeep Ravikumar

Keywords Paper

0

0

0

0

13:59

04/08/2021

Parameter-Free Multi-Armed Bandit Algorithms with Hybrid Data-Dependent Regret Bounds

Shinji Ito

Keywords Paper

0

0

0

0

15:29

06/12/2020

Adaptive Sampling for Stochastic Risk-Averse Learning

Sebastian Curi, Kfir Y. Levy, Stefanie Jegelka, Andreas Krause

Keywords Paper

0

0

0

0

3:13

06/12/2020

High-Dimensional Sparse Linear Bandits

Botao Hao, Tor Lattimore, Mengdi Wang

Keywords Paper

0

0

0

0

2:54

09/07/2020

Smooth Contextual Bandits: Bridging the Parametric and Non-differentiable Regret Regimes

YICHUN HU, Nathan Kallus, Xiaojie Mao

Keywords Paper

Bandit problems,

0

0

0

0

14:35

06/12/2021

Variational Bayesian Optimistic Sampling

Brendan O'Donoghue, Tor Lattimore

Keywords Paper

optimization, reinforcement learning and planning, generative model, bandits, online learning

0

0

0

0

15:13

26/08/2020

Distributionally Robust Formulation and Model Selection for the Graphical Lasso

Pedro Cisneros, Alexander Petersen, Sang-Yun Oh

Keywords Paper

0

0

0

0

14:08

19/08/2021

Neural Regret-Matching for Distributed Constraint Optimization Problems

Yanchen Deng, Runsheng Yu, Xinrun Wang, Bo An

Keywords Paper

Agent-based and Multi-agent Systems, Coordination and Cooperation, Constraint Optimization, Distributed Constraints

0

0

0

0

9:34

12/07/2020

On Thompson Sampling with Langevin Algorithms

Eric Mazumdar, Aldo Pacchiano, Yian Ma and
Michael Jordan, Peter Bartlett

Keywords Paper

Online Learning, Active Learning, and Bandits

0

0

0

0

15:33

26/08/2020

Thompson Sampling for Linearly Constrained Bandits

Vidit Saxena, Joakim Jalden, Joseph Gonzalez

Keywords Paper

0

0

0

0

13:06

18/07/2021

Optimal Thompson Sampling strategies for support-aware CVaR bandits

Dorian Baudry, Romain Gautron, Emilie Kaufmann, Odalric-Ambrym Maillard

Keywords Paper

Probabilistic Methods, Algorithms, Uncertainty Estimation; Applications; Probabilistic Methods, MCMC, Reinforcement Learning and Planning, Bandits

0

0

0

0

5:32

06/12/2020

Tight First- and Second-Order Regret Bounds for Adversarial Linear Bandits

Shinji Ito, Shuichi Hirahara, Tasuku Soma, Yuichi Yoshida

Keywords Paper

0

0

0

0

3:24

06/12/2020

Statistical Efficiency of Thompson Sampling for Combinatorial Semi-Bandits

Pierre Perrault, Etienne Boursier, Michal Valko, Vianney Perchet

Keywords Paper

0

0

0

0

3:22

12/07/2020

Exploration Through Bias: Revisiting Biased Maximum Likelihood Estimation in Stochastic Multi-Armed Bandits

Xi Liu, Ping-Chun Hsieh, Yu Heng Hung and
Anirban Bhattacharya, P. Kumar

Keywords Paper

Online Learning, Active Learning, and Bandits

0

0

0

0

14:46

13/04/2021

Smooth bandit optimization: Generalization to holder space

Yusha Liu, Yining Wang, Aarti Singh

Keywords Paper

0

0

0

0

2:52

12/07/2020

Optimistic Policy Optimization with Bandit Feedback

Lior Shani, Yonathan Efroni, Aviv Rosenberg, Shie Mannor

Keywords Paper

Reinforcement Learning - Theory

0

0

0

0

14:07

12/07/2020

Structured Linear Contextual Bandits: A Sharp and Geometric Smoothed Analysis

Vidyashankar Sivakumar, Steven Wu, Arindam Banerjee

Keywords Paper

Online Learning, Active Learning, and Bandits

0

0

0

0

17:56

06/12/2021

A unified framework for bandit multiple testing

Ziyu Xu, Ruodu Wang, Aaditya Ramdas

Keywords Paper

theory, reinforcement learning and planning, bandits

0

0

0

0

13:39

26/08/2020

Sublinear Optimal Policy Value Estimation in Contextual Bandits

Weihao Kong, Emma Brunskill, Gregory Valiant

Keywords Paper

0

0

0

0

12:11

06/12/2020

Geometric Exploration for Online Control

Orestis Plevrakis, Elad Hazan

Keywords Paper

0

0

0

0

3:21

06/12/2021

Factored Policy Gradients: Leveraging Structure for Efficient Learning in MOMDPs

Thomas Spooner, Nelson Vadori, Sumitra Ganesh

Keywords Paper

bandits

0

0

0

0

14:40

06/12/2021

Determinantal point processes based on orthogonal polynomials for sampling minibatches in SGD

Rémi Bardenet, Subhroshekhar Ghosh, Meixia LIN

Keywords Paper

optimization, machine learning

0

0

0

0

14:51

18/07/2021

Risk Bounds and Rademacher Complexity in Batch Reinforcement Learning

Yaqi Duan, Chi Jin, Zhiyuan Li

Keywords Paper

Theory, RL, Decisions and Control Theory

0

0

0

0

5:18

06/12/2021

Bayesian decision-making under misspecified priors with applications to meta-learning

Max Simchowitz, Christopher Tosh, Akshay Krishnamurthy and
Daniel Hsu, Thodoris Lykouris, Miro Dudik, Robert E Schapire

Keywords Paper

meta learning, bandits

0

0

0

0

14:58

06/12/2020

Doubly Robust Off-Policy Value and Gradient Estimation for Deterministic Policies

Nathan Kallus, Masatoshi Uehara

Keywords Paper

0

0

0

0

3:11

06/12/2021

Off-Policy Risk Assessment in Contextual Bandits

Audrey Huang, Liu Leqi, Zachary Lipton, Kamyar Azizzadenesheli

Keywords Paper

robustness, bandits

0

0

0

0

15:06

06/12/2020

Stage-wise Conservative Linear Bandits

Ahmadreza Moradipari, Christos Thrampoulidis, Mahnoosh Alizadeh

Keywords Paper

0

0

0

0

3:18

13/04/2021

Fundamental limits of ridge-regularized empirical risk minimization in high dimensions

Hossein Taheri, Ramtin Pedarsani, Christos Thrampoulidis

Keywords Paper

0

0

0

0

3:33

06/12/2020

Flexible mean field variational inference using mixtures of non-overlapping exponential families

Jeffrey Spence

Keywords Paper

0

0

0

0

2:23

02/02/2021

Minimax Regret Optimisation for Robust Planning in Uncertain Markov Decision Processes

Marc Rigter, Bruno Lacerda, Nick Hawes

Keywords Paper

0

0

0

0

19:16