Thompson Sampling for Linearly Constrained Bandits

Abstract: We address multi-armed bandits (MAB) where the objective is to maximize the cumulative reward under a probabilistic linear constraint. For a few real-world instances of this problem, constrained extensions of the well-known Thompson Sampling (TS) heuristic have recently been proposed. However, finite-time analysis of constrained TS is challenging; as a result, only O( sqrt( T ) ) bounds on the cumulative reward loss (i.e., the regret) are available. In this paper, we describe LinConTS, a TS-based algorithm for bandits that place a linear constraint on the probability of earning a reward in every round. We show that for LinConTS, the regret as well as the cumulative constraint violations are upper bounded by O( log ( T ) ). We develop a proof technique that relies on careful analysis of the dual problem and combine it with recent theoretical work on unconstrained TS. Through numerical experiments on two real-world datasets, we demonstrate that LinConTS outperforms an asymptotically optimal upper confidence bound (UCB) scheme in terms of simultaneously minimizing the regret and the violation.

09/07/2020

Reinforcement Learning and Planning -> Markov Decision Processes; Reinforcement Learning and Planning -> Reinforcement Learning, Reinforcement Learning and Planning

3:03

06/12/2020

Thompson Sampling for Linearly Constrained Bandits

Vidit Saxena, Joakim Jalden, Joseph Gonzalez

Comments

Similar Papers

Smooth Contextual Bandits: Bridging the Parametric and Non-differentiable Regret Regimes

YICHUN HU, Nathan Kallus, Xiaojie Mao

Keywords Abstract Paper

Bandit problems,

Stochastic bandits with groups of similar arms.

Fabien Pesquerel, Hassan SABER, Odalric-Ambrym Maillard

Keywords Abstract Paper

optimization, generative model, bandits

Optimal Algorithms for Stochastic Multi-Armed Bandits with Heavy Tailed Rewards

Kyungjae Lee, Hongjun Yang, Sungbin Lim, Songhwai Oh

Keywords Abstract Paper

Stochastic bandits with linear constraints

Aldo Pacchiano, Mohammad Ghavamzadeh, Peter Bartlett, Heinrich Jiang

Keywords Abstract Paper

Lenient Regret for Multi-Armed Bandits

Nadav Merlis, Shie Mannor

Keywords Abstract Paper

Smooth bandit optimization: Generalization to holder space

Yusha Liu, Yining Wang, Aarti Singh

Keywords Abstract Paper

Improved Sleeping Bandits with Stochastic Action Sets and Adversarial Rewards

Aadirupa Saha, Pierre Gaillard, Michal Valko

Keywords Abstract Paper

Online Learning, Active Learning, and Bandits

Low-rank generalized linear bandit problems

Yangyi Lu, Amirhossein Meisami, Ambuj Tewari

Keywords Abstract Paper

Locally Differentially Private (Contextual) Bandits Learning

Kai Zheng, Tianle Cai, Weiran Huang and Zhenguo Li, Liwei Wang

Keywords Abstract Paper

Reinforcement Learning and Planning -> Markov Decision Processes; Reinforcement Learning and Planning -> Reinforcement Learning, Reinforcement Learning and Planning

Stage-wise Conservative Linear Bandits

Ahmadreza Moradipari, Christos Thrampoulidis, Mahnoosh Alizadeh

Keywords Abstract Paper

Parameter-Free Multi-Armed Bandit Algorithms with Hybrid Data-Dependent Regret Bounds

Shinji Ito

Keywords Abstract Paper

Cooperative Stochastic Bandits with Asynchronous Agents and Constrained Feedback

Lin Yang, Yu-Zhen Janice Chen, Stephen Pasteris and Mohammad Hajiesmaili, John C. S. Lui, Don Towsley

Keywords Abstract Paper

bandits

Budget-Constrained Bandits over General Cost and Reward Distributions

Semih Cayci, Atilla Eryilmaz, R Srikant

Keywords Abstract Paper

Tight Lower Bounds for Combinatorial Multi-Armed Bandits

Nadav Merlis, Shie Mannor

Keywords Abstract Paper

Bandit problems, Learning with algebraic or combinatorial structure

Doubly Robust Thompson Sampling with Linear Payoffs

Wonyoung Kim, Gi-Soo Kim, Myunghee Cho Paik

Keywords Abstract Paper

bandits

Adaptive Discretization for Adversarial Lipschitz Bandits

Chara Podimata, Alex Slivkins

Keywords Abstract Paper

Unreasonable Effectiveness of Greedy Algorithms in Multi-Armed Bandit with Many Arms

Mohsen Bayati, Nima Hamidi, Ramesh Johari, Khashayar Khosravi

Keywords Abstract Paper

Near-Optimal Representation Learning for Linear Bandits and Linear RL

Jiachen Hu, Xiaoyu Chen, Chi Jin and Lihong Li, Liwei Wang

Keywords Abstract Paper

Theory, Online Learning Theory

Neural Regret-Matching for Distributed Constraint Optimization Problems

Yanchen Deng, Runsheng Yu, Xinrun Wang, Bo An

Keywords Abstract Paper

Agent-based and Multi-agent Systems, Coordination and Cooperation, Constraint Optimization, Distributed Constraints

Adaptive Algorithms for Multi-armed Bandit with Composite and Anonymous Feedback

Siwei Wang, Haoyun Wang, Longbo Huang

Keywords Abstract Paper

Reinforcement Learning in Linear MDPs: Constant Regret and Representation Selection

Matteo Papini, Andrea Tirinzoni, Aldo Pacchiano and Marcello Restelli, Alessandro Lazaric, Matteo Pirotta

Keywords Abstract Paper

reinforcement learning and planning

Tight First- and Second-Order Regret Bounds for Adversarial Linear Bandits

Shinji Ito, Shuichi Hirahara, Tasuku Soma, Yuichi Yoshida

Keywords Abstract Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Kai Zheng, Tianle Cai, Weiran Huang and
Zhenguo Li, Liwei Wang

Keywords Paper

Keywords Paper

Keywords Paper

Lin Yang, Yu-Zhen Janice Chen, Stephen Pasteris and
Mohammad Hajiesmaili, John C. S. Lui, Don Towsley

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Jiachen Hu, Xiaoyu Chen, Chi Jin and
Lihong Li, Liwei Wang

Keywords Paper

Keywords Paper

Keywords Paper

Matteo Papini, Andrea Tirinzoni, Aldo Pacchiano and
Marcello Restelli, Alessandro Lazaric, Matteo Pirotta

Keywords Paper

Keywords Paper

Baihe Huang, Kaixuan Huang, Sham Kakade and
Jason Lee, Qi Lei, Runzhe Wang, Jiaqi Yang

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Max Simchowitz, Christopher Tosh, Akshay Krishnamurthy and
Daniel Hsu, Thodoris Lykouris, Miro Dudik, Robert E Schapire

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper