Metadata-based Multi-Task Bandits with Bayesian Hierarchical Models

Abstract: How to explore efficiently is a central problem in multi-armed bandits. In this paper, we introduce the metadata-based multi-task bandit problem, where the agent needs to solve a large number of related multi-armed bandit tasks and can leverage some task-specific features (i.e., metadata) to share knowledge across tasks. As a general framework, we propose to capture task relations through the lens of Bayesian hierarchical models, upon which a Thompson sampling algorithm is designed to efficiently learn task relations, share information, and minimize the cumulative regrets. Two concrete examples for Gaussian bandits and Bernoulli bandits are carefully analyzed. The Bayes regret for Gaussian bandits clearly demonstrates the benefits of information sharing with our algorithm. The proposed method is further supported by extensive experiments.

18/07/2021

Metadata-based Multi-Task Bandits with Bayesian Hierarchical Models

Runzhe Wan, Lin Ge, Rui Song

Comments

Similar Papers

Meta-Thompson Sampling

Branislav Kveton, Mikhail Konobeev, Manzil Zaheer and Chih-wei Hsu, Martin Mladenov, Craig Boutilier, Csaba Szepesvari

Keywords Abstract Paper

Reinforcement Learning and Planning, Bandits

Fairness of Exposure in Stochastic Bandits

Lequn Wang, Yiwei Bai, Wen Sun, Thorsten Joachims

Keywords Abstract Paper

Social Aspects of Machine Learning, Fairness, Accountability, and Transparency

Learning from eXtreme Bandit Feedback

Romain Lopez, Inderjit S. Dhillon, Michael I. Jordan

Keywords Abstract Paper

Structured Linear Contextual Bandits: A Sharp and Geometric Smoothed Analysis

Vidyashankar Sivakumar, Steven Wu, Arindam Banerjee

Keywords Abstract Paper

Online Learning, Active Learning, and Bandits

Bandits with Knapsacks beyond the Worst Case

Karthik Abinav Sankararaman, Aleksandrs Slivkins

Keywords Abstract Paper

theory, bandits, online learning

Neural Regret-Matching for Distributed Constraint Optimization Problems

Yanchen Deng, Runsheng Yu, Xinrun Wang, Bo An

Keywords Abstract Paper

Agent-based and Multi-agent Systems, Coordination and Cooperation, Constraint Optimization, Distributed Constraints

Thompson Sampling for Bandits with Clustered Arms

Emil Carlsson, Devdatt Dubhashi, Fredrik D. Johansson

Keywords Abstract Paper

Machine Learning, Online Learning, Learning Theory, Reinforcement Learning

Information Directed Sampling for Sparse Linear Bandits

Botao Hao, Tor Lattimore, Wei Deng

Keywords Abstract Paper

bandits

Message Passing Least Squares: A Unified Framework for Fast and Robust Group Synchronization

Yunpeng Shi, Gilad Lerman

Keywords Abstract Paper

Applications - Computer Vision

Instance-Dependent Complexity of Contextual Bandits and Reinforcement Learning: A Disagreement-Based Perspective

Dylan Foster, Alexander Rakhlin, David Simchi-Levi, Yunzong Xu

Keywords Abstract Paper

Thompson Sampling Algorithms for Mean-Variance Bandits

Qiuyu Zhu, Vincent Tan

Keywords Abstract Paper

Online Learning, Active Learning, and Bandits

FISSA: Fusing item similarity models with self-attention networks for sequential recommendation

Jing Lin, Weike Pan, Zhong Ming

Keywords Abstract Paper

Item Similarity Models, Sequential Recommendation, Gating Networks, Self-Attention

Knowledge Graph-Augmented Abstractive Summarization with Semantic-Driven Cloze Reward

Luyang Huang, Lingfei Wu, Lu Wang

Keywords Abstract Paper

Knowledge Summarization, abstractive summarization, semantic interpretation, generation summaries

Stochastic Linear Contextual Bandits with Diverse Contexts

Weiqiang Wu, Jing Yang, Cong Shen

Keywords Abstract Paper

Exploring clustering of bandits for online recommendation system

Liu Yang, Bo Liu, Leyu Lin and Feng Xia, Kai Chen, Qiang Yang

Keywords Abstract Paper

online learning, cluster-of-bandit, recommendation system

Impact of Representation Learning in Linear Bandits

Jiaqi Yang, Wei Hu, Jason Lee, Simon Du

Keywords Abstract Paper

multi-task learning, representation learning, linear bandits

Latent Bandits Revisited

Joey Hong, Branislav Kveton, Manzil Zaheer and Yinlam Chow, Amr Ahmed, Craig Boutilier

Keywords Abstract Paper

Influence Diagram Bandits

Tong Yu, Branislav Kveton, Zheng Wen and Ruiyi Zhang, Ole J. Mengshoel

Keywords Abstract Paper

Online Learning, Active Learning, and Bandits

Federated Multi-Armed Bandits

Chengshuai Shi, Cong Shen

Keywords Abstract Paper

Exploration Through Bias: Revisiting Biased Maximum Likelihood Estimation in Stochastic Multi-Armed Bandits

Xi Liu, Ping-Chun Hsieh, Yu Heng Hung and Anirban Bhattacharya, P. Kumar

Keywords Abstract Paper

Online Learning, Active Learning, and Bandits

Kernel Methods for Cooperative Multi-Agent Learning with Delays

Branislav Kveton, Mikhail Konobeev, Manzil Zaheer and
Chih-wei Hsu, Martin Mladenov, Craig Boutilier, Csaba Szepesvari

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Liu Yang, Bo Liu, Leyu Lin and
Feng Xia, Kai Chen, Qiang Yang

Keywords Paper

Keywords Paper

Joey Hong, Branislav Kveton, Manzil Zaheer and
Yinlam Chow, Amr Ahmed, Craig Boutilier

Keywords Paper

Tong Yu, Branislav Kveton, Zheng Wen and
Ruiyi Zhang, Ole J. Mengshoel

Keywords Paper

Keywords Paper

Xi Liu, Ping-Chun Hsieh, Yu Heng Hung and
Anirban Bhattacharya, P. Kumar

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Huazheng Wang, Qian Zhao, Qingyun Wu and
Shubham Chopra, Abhinav Khaitan, Hongning Wang

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Chuchu Han, Zhedong Zheng, Changxin Gao and
Nong Sang, Yi Yang

Keywords Paper

Qingxia Liu, Yue Chen, Gong Cheng and
Evgeny Kharlamov, Junyou Li, Yuzhong Qu

Keywords Paper

Keywords Paper

Craig Boutilier, Chih-wei Hsu, Branislav Kveton and
Martin Mladenov, Csaba Szepesvari, Manzil Zaheer

Keywords Paper