Top-k eXtreme Contextual Bandits with Arm Hierarchy

18/07/2021

Top-k eXtreme Contextual Bandits with Arm Hierarchy

Rajat Sen, Alexander Rakhlin, Lexing Ying, Rahul Kidambi, Dean Foster, Daniel Hill, Inderjit Dhillon

Keywords: Reinforcement Learning and Planning, Bandits

Abstract Paper Similar Papers

Abstract: Motivated by modern applications, such as online advertisement and recommender systems, we study the top-$k$ extreme contextual bandits problem, where the total number of arms can be enormous, and the learner is allowed to select $k$ arms and observe all or some of the rewards for the chosen arms. We first propose an algorithm for the non-extreme realizable setting, utilizing the Inverse Gap Weighting strategy for selecting multiple arms. We show that our algorithm has a regret guarantee of $O(k\sqrt{(A-k+1)T \log (|F|T)})$, where $A$ is the total number of arms and $F$ is the class containing the regression function, while only requiring $\tilde{O}(A)$ computation per time step. In the extreme setting, where the total number of arms can be in the millions, we propose a practically-motivated arm hierarchy model that induces a certain structure in mean rewards to ensure statistical and computational efficiency. The hierarchical structure allows for an exponential reduction in the number of relevant arms for each context, thus resulting in a regret guarantee of $O(k\sqrt{(\log A-k+1)T \log (|F|T)})$. Finally, we implement our algorithm using a hierarchical linear function class and show superior performance with respect to well-known benchmarks on simulated bandit feedback experiments using extreme multi-label classification datasets. On a dataset with three million arms, our reduction scheme has an average inference time of only 7.9 milliseconds, which is a 100x improvement.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at ICML 2021 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

06/12/2021

Bandits with many optimal arms

Rianne de Heide, James Cheshire, Pierre Ménard, Alexandra Carpentier

Keywords Paper

bandits

0

0

0

0

12:23

06/12/2020

On Regret with Multiple Best Arms

Yinglun Zhu, Robert Nowak

Keywords Paper

0

0

0

0

3:22

26/08/2020

Contextual Combinatorial Volatile Multi-armed Bandit with Adaptive Discretization

Andi Nika, Sepehr Elahi, Cem Tekin

Keywords Paper

0

0

0

0

13:12

06/12/2021

Stochastic bandits with groups of similar arms.

Fabien Pesquerel, Hassan SABER, Odalric-Ambrym Maillard

Keywords Paper

optimization, generative model, bandits

0

0

0

0

13:22

02/02/2021

Near-Optimal Regret Bounds for Contextual Combinatorial Semi-Bandits with Linear Payoff Functions

Kei Takemura, Shinji Ito, Daisuke Hatano and
Hanna Sumita, Takuro Fukunaga, Naonori Kakimura, Ken-ichi Kawarabayashi

Keywords Paper

0

0

0

0

14:16

06/12/2020

Restless-UCB, an Efficient and Low-complexity Algorithm for Online Restless Bandits

Siwei Wang, Longbo Huang, John C. S. Lui

Keywords Paper

0

0

0

0

3:19

06/12/2020

Unreasonable Effectiveness of Greedy Algorithms in Multi-Armed Bandit with Many Arms

Mohsen Bayati, Nima Hamidi, Ramesh Johari, Khashayar Khosravi

Keywords Paper

0

0

0

0

3:23

06/12/2021

Mean-based Best Arm Identification in Stochastic Bandits under Reward Contamination

Arpan Mukherjee, Ali Tajer, Pin-Yu Chen, Payel Das

Keywords Paper

theory, bandits

0

0

0

0

15:07

18/07/2021

Problem Dependent View on Structured Thresholding Bandit Problems

James Cheshire, Pierre MENARD, Alexandra Carpentier

Keywords Paper

Algorithms, Online Learning, Algorithms, Bandit Algorithms, Reinforcement Learning and Planning, Bandits

0

0

0

0

4:49

12/07/2020

Structure Adaptive Algorithms for Stochastic Bandits

Rémy Degenne, Han Shao, Wouter Koolen

Keywords Paper

Online Learning, Active Learning, and Bandits

0

0

0

0

16:05

04/08/2021

Minimax Regret for Stochastic Shortest Path with Adversarial Costs and Known Transition

Liyu Chen, Haipeng Luo, Chen-Yu Wei

Keywords Paper

0

0

0

0

14:48

18/07/2021

Dynamic Planning and Learning under Recovering Rewards

David Simchi-Levi, Zeyu Zheng, Feng Zhu

Keywords Paper

Reinforcement Learning and Planning, Bandits

0

0

0

0

4:53

06/12/2021

Doubly Robust Thompson Sampling with Linear Payoffs

Wonyoung Kim, Gi-Soo Kim, Myunghee Cho Paik

Keywords Paper

bandits

0

0

0

0

14:18

06/12/2020

Choice Bandits

Arpit Agarwal, Nicholas Johnson, Shivani Agarwal

Keywords Paper

0

0

0

0

3:24

12/07/2020

The Intrinsic Robustness of Stochastic Bandits to Strategic Manipulation

Zhe Feng, David Parkes, Haifeng Xu

Keywords Paper

Learning Theory

0

0

0

0

12:47

09/07/2020

Estimating Principal Components under Adversarial Perturbations

Pranjal Awasthi, Xue Chen, Aravindan Vijayaraghavan

Keywords Paper

Unsupervised and semi-supervised learning, Adversarial learning and robustness

0

0

0

0

15:40

06/12/2021

Online Multi-Armed Bandits with Adaptive Inference

Maria Dimakopoulou, Zhimei Ren, Zhengyuan Zhou

Keywords Paper

theory, reinforcement learning and planning, bandits, online learning, causality

0

0

0

0

17:11

06/12/2020

An Optimal Elimination Algorithm for Learning a Best Arm

Avinatan Hassidim, Ron Kupfer, Yaron Singer

Keywords Paper

0

0

0

0

3:23

06/12/2021

List-Decodable Mean Estimation in Nearly-PCA Time

Ilias Diakonikolas, Daniel Kane, Daniel Kongsgaard and
Jerry Li, Kevin Tian

Keywords Paper

theory, clustering

0

0

0

0

14:21

06/12/2021

Recurrent Submodular Welfare and Matroid Blocking Semi-Bandits

Orestis Papadigenopoulos, Constantine Caramanis

Keywords Paper

bandits

0

0

0

0

12:28

09/07/2020

Efficient and robust algorithms for adversarial linear contextual bandits

Gergely Neu, Julia Olkhovskaya

Keywords Paper

Bandit problems, Online learning

0

0

0

0

9:53

06/12/2021

Multi-Armed Bandits with Bounded Arm-Memory: Near-Optimal Guarantees for Best-Arm Identification and Regret Minimization

Arnab Maiti, Vishakha Patil, Arindam Khan

Keywords Paper

theory, bandits

0

0

0

0

14:33

06/12/2021

Bayesian decision-making under misspecified priors with applications to meta-learning

Max Simchowitz, Christopher Tosh, Akshay Krishnamurthy and
Daniel Hsu, Thodoris Lykouris, Miro Dudik, Robert E Schapire

Keywords Paper

meta learning, bandits

0

0

0

0

14:58

26/08/2020

Stochastic Bandits with Delay-Dependent Payoffs

Leonardo Cella, Nicolò Cesa-Bianchi

Keywords Paper

0

0

0

0

14:50

26/08/2020

A Novel Confidence-Based Algorithm for Structured Bandits

Andrea Tirinzoni, Alessandro Lazaric, Marcello Restelli

Keywords Paper

0

0

0

0

12:17

09/07/2020

The Influence of Shape Constraints on the Thresholding Bandit Problem

James Cheshire, Pierre Menard, Alexandra Carpentier

Keywords Paper

Bandit problems, Convex optimization

0

0

0

0

14:51

06/12/2020

From Finite to Countable-Armed Bandits

Anand Kalvit, Assaf Zeevi

Keywords Paper

, Theory -> Control Theory

0

0

0

0

3:15

18/07/2021

Best Model Identification: A Rested Bandit Formulation

Leonardo Cella, Massimiliano Pontil, Claudio Gentile

Keywords Paper

Reinforcement Learning and Planning, Bandits

0

0

0

0

5:05

18/07/2021

Beyond $log^2(T)$ regret for decentralized bandits in matching markets

Soumya Basu, Karthik Abinav Sankararaman, Abishek Sankararaman

Keywords Paper

Reinforcement Learning and Planning, Bandits

0

0

0

0

5:11

13/04/2021

Contextual blocking bandits

Soumya Basu, Orestis Papadigenopoulos, Constantine Caramanis, Sanjay Shakkottai

Keywords Paper

0

0

0

0

2:47

18/07/2021

Mind the Box: $l_1$-APGD for Sparse Adversarial Attacks on Image Classifiers

Francesco Croce, Matthias Hein

Keywords Paper

Algorithms, Adversarial Examples

0

0

0

0

4:46

18/07/2021

Adapting to Delays and Data in Adversarial Multi-Armed Bandits

András György, Pooria Joulani

Keywords Paper

Deep Learning, Attention Models, Applications, Time Series Analysis; Deep Learning, Predictive Models, Reinforcement Learning and Planning, Bandits

0

0

0

0

6:18

04/08/2021

Efficient Bandit Convex Optimization: Beyond Linear Losses

Arun Sai Suggala, Pradeep Ravikumar, Praneeth Netrapalli

Keywords Paper

0

0

0

0

20:29

19/08/2021

Optimal Algorithms for Range Searching over Multi-Armed Bandits

Siddharth Barman, Ramakrishnan Krishnamurthy, Saladi Rahul

Keywords Paper

Machine Learning, Online Learning

0

0

0

0

14:43

12/07/2020

Doubly robust off-policy evaluation with shrinkage

Yi Su, Maria Dimakopoulou, Akshay Krishnamurthy, Miroslav Dudik

Keywords Paper

Online Learning, Active Learning, and Bandits

0

0

0

0

15:08

06/12/2020

Fast Adversarial Robustness Certification of Nearest Prototype Classifiers for Arbitrary Seminorms

Sascha Saralajew, Lars Holdijk, Thomas Villmann

Keywords Paper

0

0

0

0

3:23

06/12/2020

An Empirical Process Approach to the Union Bound: Practical Algorithms for Combinatorial and Linear Bandits

Julian Katz-Samuels, Lalit Jain, zohar karnin, Kevin Jamieson

Keywords Paper

0

0

0

0

3:20

04/08/2021

Regret Minimization in Heavy-Tailed Bandits

Shubhada Agrawal, Sandeep K Juneja, Wouter M Koolen

Keywords Paper

0

0

0

0

17:35

06/12/2021

Breaking the Sample Complexity Barrier to Regret-Optimal Model-Free Reinforcement Learning

Gen Li, Laixi Shi, Yuxin Chen and
Yuantao Gu, Yuejie Chi

Keywords Paper

theory, reinforcement learning and planning

0

0

0

0

15:32

06/12/2021

Fast Minimum-norm Adversarial Attacks through Adaptive Norm Constraints

Maura Pintor, Fabio Roli, Wieland Brendel, Battista Biggio

Keywords Paper

optimization, machine learning, robustness, adversarial robustness and security, vision

0

0

0

0

11:35