Thompson Sampling for Bandits with Clustered Arms

Abstract: We propose algorithms based on a multi-level Thompson sampling scheme, for the stochastic multi-armed bandit and its contextual variant with linear expected rewards, in the setting where arms are clustered. We show, both theoretically and empirically, how exploiting a given cluster structure can significantly improve the regret and computational cost compared to using standard Thompson sampling. In the case of the stochastic multi-armed bandit we give upper bounds on the expected cumulative regret showing how it depends on the quality of the clustering. Finally, we perform an empirical evaluation showing that our algorithms perform well compared to previously proposed algorithms for bandits with clustered arms.

06/12/2020

Thompson Sampling for Bandits with Clustered Arms

Emil Carlsson, Devdatt Dubhashi, Fredrik D. Johansson

Comments

Similar Papers

Latent Bandits Revisited

Joey Hong, Branislav Kveton, Manzil Zaheer and Yinlam Chow, Amr Ahmed, Craig Boutilier

Keywords Abstract Paper

Regret Analysis of Bandit Problems with Causal Background Knowledge

Yangyi Lu, Amirhossein Meisami, Ambuj Tewari, William Yan

Keywords Abstract Paper

Optimal Gradient-based Algorithms for Non-concave Bandit Optimization

Baihe Huang, Kaixuan Huang, Sham Kakade and Jason Lee, Qi Lei, Runzhe Wang, Jiaqi Yang

Keywords Abstract Paper

theory, deep learning, optimization, generative model, bandits

Exploration Through Bias: Revisiting Biased Maximum Likelihood Estimation in Stochastic Multi-Armed Bandits

Xi Liu, Ping-Chun Hsieh, Yu Heng Hung and Anirban Bhattacharya, P. Kumar

Keywords Abstract Paper

Online Learning, Active Learning, and Bandits

Bias-Robust Bayesian Optimization via Dueling Bandits

Johannes Kirschner, Andreas Krause

Keywords Abstract Paper

Reinforcement Learning and Planning, Bandits

Fairness of Exposure in Stochastic Bandits

Lequn Wang, Yiwei Bai, Wen Sun, Thorsten Joachims

Keywords Abstract Paper

Social Aspects of Machine Learning, Fairness, Accountability, and Transparency

Adaptive Discretization for Adversarial Lipschitz Bandits

Chara Podimata, Alex Slivkins

Keywords Abstract Paper

Information Directed Sampling for Sparse Linear Bandits

Botao Hao, Tor Lattimore, Wei Deng

Keywords Abstract Paper

bandits

Stochastic bandits with linear constraints

Aldo Pacchiano, Mohammad Ghavamzadeh, Peter Bartlett, Heinrich Jiang

Keywords Abstract Paper

Asymptotically Optimal Information-Directed Sampling

Johannes Kirschner, Tor Lattimore, Claire Vernade, Csaba Szepesvari

Keywords Abstract Paper

Bandits with Knapsacks beyond the Worst Case

Karthik Abinav Sankararaman, Aleksandrs Slivkins

Keywords Abstract Paper

theory, bandits, online learning

Thompson Sampling Algorithms for Mean-Variance Bandits

Qiuyu Zhu, Vincent Tan

Keywords Abstract Paper

Online Learning, Active Learning, and Bandits

Instance-Dependent Complexity of Contextual Bandits and Reinforcement Learning: A Disagreement-Based Perspective

Dylan Foster, Alexander Rakhlin, David Simchi-Levi, Yunzong Xu

Keywords Abstract Paper

Learning from eXtreme Bandit Feedback

Romain Lopez, Inderjit S. Dhillon, Michael I. Jordan

Keywords Abstract Paper

Bayesian decision-making under misspecified priors with applications to meta-learning

Max Simchowitz, Christopher Tosh, Akshay Krishnamurthy and Daniel Hsu, Thodoris Lykouris, Miro Dudik, Robert E Schapire

Keywords Abstract Paper

meta learning, bandits

Thompson Sampling for Linearly Constrained Bandits

Vidit Saxena, Joakim Jalden, Joseph Gonzalez

Keywords Abstract Paper

Neural Thompson Sampling

Weitong ZHANG, Dongruo Zhou, Lihong Li, Quanquan Gu

Keywords Abstract Paper

Thompson sampling, Contextual Bandits, Deep Learning

Finite-Time Analysis of Round-Robin Kullback-Leibler Upper Confidence Bounds for Optimal Adaptive Allocation with Multiple Plays and Markovian Rewards

Vrettos Moulos

Keywords Abstract Paper

Hybrid Regret Bounds for Combinatorial Semi-Bandits and Adversarial Linear Bandits

Shinji Ito

Keywords Abstract Paper

bandits

Structure Adaptive Algorithms for Stochastic Bandits

Rémy Degenne, Han Shao, Wouter Koolen

Keywords Abstract Paper

Online Learning, Active Learning, and Bandits

Lenient Regret for Multi-Armed Bandits

Nadav Merlis, Shie Mannor

Keywords Abstract Paper

Corralling stochastic bandit algorithms

Raman Arora, Teodor Vanislavov Marinov, Mehryar Mohri

Joey Hong, Branislav Kveton, Manzil Zaheer and
Yinlam Chow, Amr Ahmed, Craig Boutilier

Keywords Paper

Keywords Paper

Baihe Huang, Kaixuan Huang, Sham Kakade and
Jason Lee, Qi Lei, Runzhe Wang, Jiaqi Yang

Keywords Paper

Xi Liu, Ping-Chun Hsieh, Yu Heng Hung and
Anirban Bhattacharya, P. Kumar

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Max Simchowitz, Christopher Tosh, Akshay Krishnamurthy and
Daniel Hsu, Thodoris Lykouris, Miro Dudik, Robert E Schapire

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper