Parallelizing Thompson Sampling

06/12/2021

Parallelizing Thompson Sampling

Amin Karbasi, Vahab Mirrokni, Mohammad Shadravan

Keywords: reinforcement learning and planning, bandits

Abstract Paper Similar Papers

Abstract: How can we make use of information parallelism in online decision-making problems while efficiently balancing the exploration-exploitation trade-off? In this paper, we introduce a batch Thompson Sampling framework for two canonical online decision-making problems with partial feedback, namely, stochastic multi-arm bandit and linear contextual bandit. Over a time horizon $T$, our batch Thompson Sampling policy achieves the same (asymptotic) regret bound of a fully sequential one while carrying out only $O(\log T)$ batch queries. To achieve this exponential reduction, i.e., reducing the number of interactions from $T$ to $O(\log T)$, our batch policy dynamically determines the duration of each batch in order to balance the exploration-exploitation trade-off. We also demonstrate experimentally that dynamic batch allocation outperforms natural baselines.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at NeurIPS 2021 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

06/12/2021

Batched Thompson Sampling

Cem Kalkanli, Ayfer Ozgur

Keywords Paper

reinforcement learning and planning, bandits

0

0

0

0

11:03

12/07/2020

Dual Mirror Descent for Online Allocation Problems

Haihao Lu, Santiago Balseiro, Vahab Mirrokni

Keywords Paper

Online Learning, Active Learning, and Bandits

0

0

0

0

15:14

26/08/2020

Online Continuous DR-Submodular Maximization with Long-Term Budget Constraints

Omid Sadeghi, Maryam Fazel

Keywords Paper

0

0

0

0

13:32

06/12/2020

Leveraging Predictions in Smoothed Online Convex Optimization via Gradient-based Algorithms

Yingying Li, Na Li

Keywords Paper

Deep Learning -> Generative Models, Deep Learning -> Attention Models

0

0

0

0

3:19

04/08/2021

Minimax Regret for Stochastic Shortest Path with Adversarial Costs and Known Transition

Liyu Chen, Haipeng Luo, Chen-Yu Wei

Keywords Paper

0

0

0

0

14:48

02/02/2021

A Primal-Dual Online Algorithm for Online Matching Problem in Dynamic Environments

Yu-Hang Zhou, Peng Hu, Chen Liang and
Huan Xu, Guangda Huzhang, Yinfu Feng, Qing Da, Xinshang Wang, An-Xiang Zeng

Keywords Paper

0

0

0

0

18:32

09/07/2020

Smooth Contextual Bandits: Bridging the Parametric and Non-differentiable Regret Regimes

YICHUN HU, Nathan Kallus, Xiaojie Mao

Keywords Paper

Bandit problems,

0

0

0

0

14:35

19/08/2021

Neural Regret-Matching for Distributed Constraint Optimization Problems

Yanchen Deng, Runsheng Yu, Xinrun Wang, Bo An

Keywords Paper

Agent-based and Multi-agent Systems, Coordination and Cooperation, Constraint Optimization, Distributed Constraints

0

0

0

0

9:34

06/12/2021

Variational Bayesian Optimistic Sampling

Brendan O'Donoghue, Tor Lattimore

Keywords Paper

optimization, reinforcement learning and planning, generative model, bandits, online learning

0

0

0

0

15:13

09/07/2020

Finite Regret and Cycles with Fixed Step-Size via Alternating Gradient Descent-Ascent

James P Bailey, Gauthier Gidel, Georgios Piliouras

Keywords Paper

Economics, game theory, and incentives, Online learning

0

0

0

0

15:38

06/12/2021

Breaking the Sample Complexity Barrier to Regret-Optimal Model-Free Reinforcement Learning

Gen Li, Laixi Shi, Yuxin Chen and
Yuantao Gu, Yuejie Chi

Keywords Paper

theory, reinforcement learning and planning

0

0

0

0

15:32

18/07/2021

Sparse Feature Selection Makes Batch Reinforcement Learning More Sample Efficient

Botao Hao, Yaqi Duan, Tor Lattimore and
Csaba Szepesvari, Mengdi Wang

Keywords Paper

Theory, Statistical Learning Theory

0

0

0

0

5:20

06/12/2021

Efficient Online Estimation of Causal Effects by Deciding What to Observe

Shantanu Gupta, Zachary Lipton, David Childers

Keywords Paper

reinforcement learning and planning, graph learning, causality

0

0

0

0

14:18

18/07/2021

Regularized Online Allocation Problems: Fairness and Beyond

Santiago Balseiro, Haihao Lu, Vahab Mirrokni

Keywords Paper

Algorithms, Online Learning Algorithms

0

0

0

0

5:23

13/04/2021

An efficient algorithm for generalized linear bandit: Online stochastic gradient descent and thompson sampling

Qin Ding, Cho-Jui Hsieh, James Sharpnack

Keywords Paper

0

0

0

0

3:03

12/07/2020

Distributed Online Optimization over a Heterogeneous Network

Nima Eshraghi, Ben Liang

Keywords Paper

Optimization - Convex

0

0

0

0

13:05

06/12/2020

Delay and Cooperation in Nonstochastic Linear Bandits

Shinji Ito, Daisuke Hatano, Hanna Sumita and
Kei Takemura, Takuro Fukunaga, Naonori Kakimura, Ken-Ichi Kawarabayashi

Keywords Paper

0

0

0

0

3:19

18/07/2021

Almost Optimal Anytime Algorithm for Batched Multi-Armed Bandits

Tianyuan Jin, Jing Tang, Pan Xu and
Keke Huang, Xiaokui Xiao, Quanquan Gu

Keywords Paper

Reinforcement Learning and Planning, Bandits

0

0

0

0

5:19

06/12/2020

A Bandit Learning Algorithm and Applications to Auction Design

Kim Thang Nguyen

Keywords Paper

0

0

0

0

2:43

18/07/2021

Non-Exponentially Weighted Aggregation: Regret Bounds for Unbounded Loss Functions

Pierre Alquier

Keywords Paper

Probabilistic Methods, Bayesian Methods

0

0

0

0

4:35

06/12/2020

Direct Policy Gradients: Direct Optimization of Policies in Discrete Action Spaces

Guy Lorberbom, Chris J. Maddison, Nicolas Heess and
Tamir Hazan, Daniel Tarlow

Keywords Paper

0

0

0

0

3:16

08/07/2020

Space-efficient Query Evaluation over Probabilistic Event Streams

Rajeev Alur, Yu Chen, Kishor Jothimurugan, Sanjeev Khanna

Keywords Paper

Query processing over streams, Streaming algorithms, Probabilistic streams

0

0

0

0

22:51

08/07/2020

The Online Min-Sum Set Cover Problem

Dimitris Fotakis, Loukas Kavouras, Grigorios Koumoutsos and
Stratis Skoulakis, Manolis Vardas

Keywords Paper

Online Algorithms, Competitive Analysis, Min-Sum Set Cover

0

0

0

0

25:10

04/08/2021

Adaptivity in Adaptive Submodularity

Hossein Esfandiari, Amin Karbasi, Vahab Mirrokni

Keywords Paper

0

0

0

0

13:54

22/06/2020

Non-adaptive adaptive sampling on turnstile streams

Sepideh Mahabadi, Ilya Razenshteyn, David P. Woodruff, Samson Zhou

Keywords Paper

volume maximization, determinantal point processes, computational geometry, streaming algorithms

0

0

0

0

25:07

06/12/2021

Higher Order Kernel Mean Embeddings to Capture Filtrations of Stochastic Processes

Cristopher Salvi, Maud Lemercier, Chong Liu and
Blanka Horvath, Theodoros Damoulas, Terry Lyons

Keywords Paper

machine learning, graph learning, causality

0

0

0

0

15:02

06/12/2021

Bayesian decision-making under misspecified priors with applications to meta-learning

Max Simchowitz, Christopher Tosh, Akshay Krishnamurthy and
Daniel Hsu, Thodoris Lykouris, Miro Dudik, Robert E Schapire

Keywords Paper

meta learning, bandits

0

0

0

0

14:58

26/08/2020

Asymptotically Efficient Off-Policy Evaluation for Tabular Reinforcement Learning

Ming Yin, Yu-Xiang Wang

Keywords Paper

0

0

0

0

14:17

04/08/2021

Regret Minimization in Heavy-Tailed Bandits

Shubhada Agrawal, Sandeep K Juneja, Wouter M Koolen

Keywords Paper

0

0

0

0

17:35

15/11/2020

Testing Consensus Implementations using Communication Closure

Cezara Drăgoi, Constantin Enea, Burcu Kulahcioglu Ozkan and
Rupak Majumdar, Filip Niksic

Keywords Paper

Distributed consensus, Communication closure, Randomized testing

0

0

0

0

15:19

03/05/2021

Greedy-GQ with Variance Reduction: Finite-time Analysis and Improved Complexity

Shaocong Ma, Ziyi Chen, Yi Zhou, Shaofeng Zou

Keywords Paper

Machine Learning, Reinforcement Learning, Optimization

0

0

0

0

2:59

06/12/2021

Information Directed Sampling for Sparse Linear Bandits

Botao Hao, Tor Lattimore, Wei Deng

Keywords Paper

bandits

0

0

0

0

10:32

18/07/2021

Joint Online Learning and Decision-making via Dual Mirror Descent

Alfonso Lobos Ruiz, Paul Grigas, Zheng Wen

Keywords Paper

Deep Learning, Generative Models, Applications, Computer Vision; Applications, Visual Scene Analysis and Interpretation; Deep Learning, Adversarial Network, Algorithms, Online Learning Algorithms

0

0

0

0

5:15

06/12/2021

Streaming Linear System Identification with Reverse Experience Replay

Suhas Kowshik, Dheeraj Nagaraj, Prateek Jain, Praneeth Netrapalli

Keywords Paper

optimization, reinforcement learning and planning

1

0

0

0

14:17

03/05/2021

Learning to Make Decisions via Submodular Regularization

Ayya Alieva, Aiden Aceves, Jialin Song and
Stephen Mayo, Yisong Yue, Yuxin Chen

Keywords Paper

0

0

0

0

5:53

06/07/2020

Towards multi-sequence MR image recovery from undersampled k-space data

Cheng Peng, Wei-An Lin, Rama Chellappa, S. Kevin Zhou

Keywords Paper

0

0

0

0

5:00

06/12/2021

Online Variational Filtering and Parameter Learning

Andrew Campbell, Yuyang Shi, Thomas Rainforth, Arnaud Doucet

Keywords Paper

generative model, online learning

0

0

0

0

20:00

06/12/2021

Learning-to-learn non-convex piecewise-Lipschitz functions

Maria-Florina Balcan, Mikhail Khodak, Dravyansh Sharma, Ameet S Talwalkar

Keywords Paper

optimization, machine learning, robustness, meta learning, online learning

0

0

0

0

14:13

06/12/2020

Tackling the Objective Inconsistency Problem in Heterogeneous Federated Optimization

Jianyu Wang, Qinghua Liu, Hao Liang and
Gauri Joshi, H. Vincent Poor

Keywords Paper

0

0

0

0

3:14

02/02/2021

Online Optimal Control with Affine Constraints

Yingying Li, Subhro Das, Na Li

Keywords Paper

0

0

0

0

19:35