Lenient Regret for Multi-Armed Bandits

02/02/2021

Lenient Regret for Multi-Armed Bandits

Nadav Merlis, Shie Mannor

Keywords:

Abstract Paper Similar Papers

Abstract: We consider the Multi-Armed Bandit (MAB) problem, where an agent sequentially chooses actions and observes rewards for the actions it took. While the majority of algorithms try to minimize the regret, i.e., the cumulative difference between the reward of the best action and the agent's action, this criterion might lead to undesirable results. For example, in large problems, or when the interaction with the environment is brief, finding an optimal arm is infeasible, and regret-minimizing algorithms tend to over-explore. To overcome this issue, algorithms for such settings should instead focus on playing near-optimal arms. To this end, we suggest a new, more lenient, regret criterion that ignores suboptimality gaps smaller than some ε. We then present a variant of the Thompson Sampling (TS) algorithm, called ε-TS, and prove its asymptotic optimality in terms of the lenient regret. Importantly, we show that when the mean of the optimal arm is high enough, the lenient regret of ε-TS is bounded by a constant. Finally, we show that ε-TS can be applied to improve the performance when the agent knows a lower bound of the suboptimality gaps.

The video of this talk cannot be embedded. You can watch it here:

https://slideslive.com/38949121

(Link will open in new window)

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at AAAI 2021 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

26/08/2020

Thompson Sampling for Linearly Constrained Bandits

Vidit Saxena, Joakim Jalden, Joseph Gonzalez

Keywords Paper

0

0

0

0

13:06

06/12/2021

Cooperative Stochastic Bandits with Asynchronous Agents and Constrained Feedback

Lin Yang, Yu-Zhen Janice Chen, Stephen Pasteris and
Mohammad Hajiesmaili, John C. S. Lui, Don Towsley

Keywords Paper

bandits

0

0

0

0

12:07

06/12/2021

Stochastic bandits with groups of similar arms.

Fabien Pesquerel, Hassan SABER, Odalric-Ambrym Maillard

Keywords Paper

optimization, generative model, bandits

0

0

0

0

13:22

06/12/2020

Unreasonable Effectiveness of Greedy Algorithms in Multi-Armed Bandit with Many Arms

Mohsen Bayati, Nima Hamidi, Ramesh Johari, Khashayar Khosravi

Keywords Paper

0

0

0

0

3:23

04/08/2021

Parameter-Free Multi-Armed Bandit Algorithms with Hybrid Data-Dependent Regret Bounds

Shinji Ito

Keywords Paper

0

0

0

0

15:29

04/08/2021

Adaptive Discretization for Adversarial Lipschitz Bandits

Chara Podimata, Alex Slivkins

Keywords Paper

0

0

0

0

18:13

06/12/2021

Doubly Robust Thompson Sampling with Linear Payoffs

Wonyoung Kim, Gi-Soo Kim, Myunghee Cho Paik

Keywords Paper

bandits

0

0

0

0

14:18

18/07/2021

Beyond $log^2(T)$ regret for decentralized bandits in matching markets

Soumya Basu, Karthik Abinav Sankararaman, Abishek Sankararaman

Keywords Paper

Reinforcement Learning and Planning, Bandits

0

0

0

0

5:11

09/07/2020

Smooth Contextual Bandits: Bridging the Parametric and Non-differentiable Regret Regimes

YICHUN HU, Nathan Kallus, Xiaojie Mao

Keywords Paper

Bandit problems,

0

0

0

0

14:35

13/04/2021

Low-rank generalized linear bandit problems

Yangyi Lu, Amirhossein Meisami, Ambuj Tewari

Keywords Paper

0

0

0

0

2:49

13/04/2021

Bandit algorithms: Letting go of logarithmic regret for statistical robustness

Kumar Ashutosh, Jayakrishnan Nair, Anmol Kagrecha, Krishna Jagannathan

Keywords Paper

0

0

0

0

3:14

09/07/2020

Tight Lower Bounds for Combinatorial Multi-Armed Bandits

Nadav Merlis, Shie Mannor

Keywords Paper

Bandit problems, Learning with algebraic or combinatorial structure

0

0

0

0

14:00

06/12/2020

Adversarial Blocking Bandits

Nicholas Bishop, Hau Chan, Debmalya Mandal, Long Tran-Thanh

Keywords Paper

0

0

0

0

3:09

12/07/2020

Improved Sleeping Bandits with Stochastic Action Sets and Adversarial Rewards

Aadirupa Saha, Pierre Gaillard, Michal Valko

Keywords Paper

Online Learning, Active Learning, and Bandits

0

0

0

0

15:20

12/07/2020

Improved Optimistic Algorithms for Logistic Bandits

Louis Faury, Marc Abeille, Clément Calauzènes, Olivier Fercoq

Keywords Paper

Online Learning, Active Learning, and Bandits

0

0

0

0

15:22

26/08/2020

Budget-Constrained Bandits over General Cost and Reward Distributions

Semih Cayci, Atilla Eryilmaz, R Srikant

Keywords Paper

0

0

0

0

10:40

06/12/2021

Online Multi-Armed Bandits with Adaptive Inference

Maria Dimakopoulou, Zhimei Ren, Zhengyuan Zhou

Keywords Paper

theory, reinforcement learning and planning, bandits, online learning, causality

0

0

0

0

17:11

06/12/2020

On Regret with Multiple Best Arms

Yinglun Zhu, Robert Nowak

Keywords Paper

0

0

0

0

3:22

26/08/2020

Adaptive Exploration in Linear Contextual Bandit

Botao Hao, Tor Lattimore, Csaba Szepesvari

Keywords Paper

0

0

0

0

14:29

13/04/2021

Smooth bandit optimization: Generalization to holder space

Yusha Liu, Yining Wang, Aarti Singh

Keywords Paper

0

0

0

0

2:52

13/04/2021

Stochastic linear bandits robust to adversarial attacks

Ilija Bogunovic, Arpan Losalka, Andreas Krause, Jonathan Scarlett

Keywords Paper

0

0

0

0

3:01

09/07/2020

Tsallis-INF for Decoupled Exploration and Exploitation in Multi-armed Bandits

Chloé Rouyer , Yevgeny Seldin

Keywords Paper

Bandit problems, Online learning

0

0

0

0

15:30

18/07/2021

Adapting to misspecification in contextual bandits with offline regression oracles

Sanath Kumar Krishnamurthy, Vitor Hadad, Susan Athey

Keywords Paper

Reinforcement Learning and Planning, Bandits

0

0

0

0

4:17

06/12/2020

Optimal Algorithms for Stochastic Multi-Armed Bandits with Heavy Tailed Rewards

Kyungjae Lee, Hongjun Yang, Sungbin Lim, Songhwai Oh

Keywords Paper

0

0

0

0

3:26

02/02/2021

Disposable Linear Bandits for Online Recommendations

Melda Korkut, Andrew Li

Keywords Paper

0

0

0

0

17:20

06/12/2021

Bayesian decision-making under misspecified priors with applications to meta-learning

Max Simchowitz, Christopher Tosh, Akshay Krishnamurthy and
Daniel Hsu, Thodoris Lykouris, Miro Dudik, Robert E Schapire

Keywords Paper

meta learning, bandits

0

0

0

0

14:58

13/04/2021

Corralling stochastic bandit algorithms

Raman Arora, Teodor Vanislavov Marinov, Mehryar Mohri

Keywords Paper

0

0

0

0

2:37

02/02/2021

Adaptive Algorithms for Multi-armed Bandit with Composite and Anonymous Feedback

Siwei Wang, Haoyun Wang, Longbo Huang

Keywords Paper

0

0

0

0

19:29

18/07/2021

Probabilistic Sequential Shrinking: A Best Arm Identification Algorithm for Stochastic Bandits with Corruptions

Zixin Zhong, Wang Chi Cheung, Vincent Tan

Keywords Paper

Reinforcement Learning and Planning, Bandits

0

0

0

0

4:54

04/08/2021

Efficient Bandit Convex Optimization: Beyond Linear Losses

Arun Sai Suggala, Pradeep Ravikumar, Praneeth Netrapalli

Keywords Paper

0

0

0

0

20:29

06/12/2020

Finite Continuum-Armed Bandits

Solenne Gaucher

Keywords Paper

Algorithms -> Density Estimation; Algorithms -> Unsupervised Learning; Applications -> Computer Vision, Deep Learning -> Generative Models

0

0

0

0

3:18

06/12/2021

On the Suboptimality of Thompson Sampling in High Dimensions

Raymond Zhang, Richard Combes

Keywords Paper

reinforcement learning and planning, bandits

0

0

0

0

10:23

13/04/2021

Instance-wise minimax-optimal algorithms for logistic bandits

Marc Abeille, Louis Faury, Clement Calauzenes

Keywords Paper

0

0

0

0

3:06

13/04/2021

Stochastic bandits with linear constraints

Aldo Pacchiano, Mohammad Ghavamzadeh, Peter Bartlett, Heinrich Jiang

Keywords Paper

0

0

0

0

3:02

06/12/2020

Stage-wise Conservative Linear Bandits

Ahmadreza Moradipari, Christos Thrampoulidis, Mahnoosh Alizadeh

Keywords Paper

0

0

0

0

3:18

12/07/2020

Structured Linear Contextual Bandits: A Sharp and Geometric Smoothed Analysis

Vidyashankar Sivakumar, Steven Wu, Arindam Banerjee

Keywords Paper

Online Learning, Active Learning, and Bandits

0

0

0

0

17:56

02/02/2021

Reinforcement Learning with Trajectory Feedback

Yonathan Efroni, Nadav Merlis, Shie Mannor

Keywords Paper

0

0

0

0

14:17

06/12/2020

Tight First- and Second-Order Regret Bounds for Adversarial Linear Bandits

Shinji Ito, Shuichi Hirahara, Tasuku Soma, Yuichi Yoshida

Keywords Paper

0

0

0

0

3:24

13/04/2021

Tractable contextual bandits beyond realizability

Sanath Kumar Krishnamurthy, Vitor Hadad, Susan Athey

Keywords Paper

0

0

0

0

2:51

04/08/2021

Regret Minimization in Heavy-Tailed Bandits

Shubhada Agrawal, Sandeep K Juneja, Wouter M Koolen

Keywords Paper

0

0

0

0

17:35