Multitask bandit learning through heterogeneous feedback aggregation

Abstract: In many real-world applications, multiple agents seek to learn how to perform highly related yet slightly different tasks in an online bandit learning protocol. We formulate this problem as the \epsilon-multi-player multi-armed bandit problem, in which a set of players concurrently interact with a set of arms, and for each arm, the reward distributions for all players are similar but not necessarily identical. We develop an upper confidence bound-based algorithm, RobustAgg(\epsilon), that adaptively aggregates rewards collected by different players. In the setting where an upper bound on the pairwise dissimilarities of reward distributions between players is known, we achieve instance-dependent regret guarantees that depend on the amenability of information sharing across players. We complement these upper bounds with nearly matching lower bounds. In the setting where pairwise dissimilarities are unknown, we provide a lower bound, as well as an algorithm that trades off minimax regret guarantees for adaptivity to unknown similarity structure.

06/12/2020

Multitask bandit learning through heterogeneous feedback aggregation

Zhi Wang, Chicheng Zhang, Manish Kumar Singh, Laurel Riek, Kamalika Chaudhuri

Comments

Similar Papers

Finite-Time Analysis of Round-Robin Kullback-Leibler Upper Confidence Bounds for Optimal Adaptive Allocation with Multiple Plays and Markovian Rewards

Vrettos Moulos

Keywords Abstract Paper

Cooperative Stochastic Bandits with Asynchronous Agents and Constrained Feedback

Lin Yang, Yu-Zhen Janice Chen, Stephen Pasteris and Mohammad Hajiesmaili, John C. S. Lui, Don Towsley

Keywords Abstract Paper

bandits

Contextual blocking bandits

Soumya Basu, Orestis Papadigenopoulos, Constantine Caramanis, Sanjay Shakkottai

Keywords Abstract Paper

Adaptive Algorithms for Multi-armed Bandit with Composite and Anonymous Feedback

Siwei Wang, Haoyun Wang, Longbo Huang

Keywords Abstract Paper

Tight Lower Bounds for Combinatorial Multi-Armed Bandits

Nadav Merlis, Shie Mannor

Keywords Abstract Paper

Bandit problems, Learning with algebraic or combinatorial structure

Kernel Methods for Cooperative Multi-Agent Learning with Delays

Abhimanyu Dubey, Alex `Sandy' Pentland

Keywords Abstract Paper

Planning, Control, and Multiagent Learning

Fairness of Exposure in Stochastic Bandits

Lequn Wang, Yiwei Bai, Wen Sun, Thorsten Joachims

Keywords Abstract Paper

Social Aspects of Machine Learning, Fairness, Accountability, and Transparency

Corralling stochastic bandit algorithms

Raman Arora, Teodor Vanislavov Marinov, Mehryar Mohri

Keywords Abstract Paper

Learning Strategy-Aware Linear Classifiers

Yiling Chen, Yang Liu, Chara Podimata

Keywords Abstract Paper

DORB: Dynamically Optimizing Multiple Rewards with Bandits

Ramakanth Pasunuru, Han Guo, Mohit Bansal

Keywords Abstract Paper

language tasks, optimization rewards, nlg tasks, question generation

Incentivized Bandit Learning with Self-Reinforcing User Preferences

Tianchen Zhou, Jia Liu, Chaosheng Dong, jingyuan deng

Keywords Abstract Paper

Reinforcement Learning and Planning, Bandits

Dynamic Planning and Learning under Recovering Rewards

David Simchi-Levi, Zeyu Zheng, Feng Zhu

Keywords Abstract Paper

Reinforcement Learning and Planning, Bandits

Mean Field Equilibrium in Multi-Armed Bandit Game with Continuous Reward

Xiong Wang, Riheng Jia

Keywords Abstract Paper

Machine Learning, Online Learning, Algorithmic Game Theory, Multi-agent Learning

Latent Bandits Revisited

Joey Hong, Branislav Kveton, Manzil Zaheer and Yinlam Chow, Amr Ahmed, Craig Boutilier

Keywords Abstract Paper

Stochastic bandits with groups of similar arms.

Fabien Pesquerel, Hassan SABER, Odalric-Ambrym Maillard

Keywords Abstract Paper

optimization, generative model, bandits

Continuous Mean-Covariance Bandits

Yihan Du, Siwei Wang, Zhixuan Fang, Longbo Huang

Keywords Abstract Paper

bandits

A Gang of Adversarial Bandits

Mark Herbster, Stephen Pasteris, Fabio Vitale, Massimiliano Pontil

Keywords Abstract Paper

generative model, bandits, online learning

Budget-Constrained Bandits over General Cost and Reward Distributions

Semih Cayci, Atilla Eryilmaz, R Srikant

Keywords Abstract Paper

Stochastic bandits with linear constraints

Aldo Pacchiano, Mohammad Ghavamzadeh, Peter Bartlett, Heinrich Jiang

Keywords Abstract Paper

Recurrent Submodular Welfare and Matroid Blocking Semi-Bandits

Orestis Papadigenopoulos, Constantine Caramanis

Keywords Abstract Paper

bandits

Accommodating Picky Customers: Regret Bound and Exploration Complexity for Multi-Objective Reinforcement Learning

Jingfeng Wu, Vladimir Braverman, Lin Yang

Keywords Abstract Paper

theory, reinforcement learning and planning

Keywords Paper

Lin Yang, Yu-Zhen Janice Chen, Stephen Pasteris and
Mohammad Hajiesmaili, John C. S. Lui, Don Towsley

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Joey Hong, Branislav Kveton, Manzil Zaheer and
Yinlam Chow, Amr Ahmed, Craig Boutilier

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Meena Jagadeesan, Alexander Wei, Yixin Wang and
Michael Jordan, Jacob Steinhardt

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Woodrow Z. Wang, Mark Beliaev, Erdem Bıyık and
Daniel A. Lazar, Ramtin Pedarsani, Dorsa Sadigh

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Dustin Morrill, Ryan D'Orazio, Marc Lanctot and
James Wright, Michael Bowling, Amy Greenwald

Keywords Paper

Alexia Atsidakou, Orestis Papadigenopoulos, Soumya Basu and
Constantine Caramanis, Sanjay Shakkottai

Keywords Paper