Information Directed Sampling for Linear Partial Monitoring

Abstract: Partial monitoring is a rich framework for sequential decision making under uncertainty that generalizes many well known bandit models, including linear, combinatorial and dueling bandits. We introduce information directed sampling (IDS) for stochastic partial monitoring with a linear reward and observation structure. IDS achieves adaptive worst-case regret rates that depend on precise observability conditions of the game. Moreover, we prove lower bounds that classify the minimax regret of all finite games into four possible regimes. IDS achieves the optimal rate in all cases up to logarithmic factors, without tuning any hyper-parameters. We further extend our results to the contextual and the kernelized setting, which significantly increases the range of possible applications.

06/12/2021

Information Directed Sampling for Linear Partial Monitoring

Johannes Kirschner, Tor Lattimore, Andreas Krause

Comments

Similar Papers

Continuous Mean-Covariance Bandits

Yihan Du, Siwei Wang, Zhixuan Fang, Longbo Huang

Keywords Abstract Paper

bandits

Thompson Sampling Algorithms for Mean-Variance Bandits

Qiuyu Zhu, Vincent Tan

Keywords Abstract Paper

Online Learning, Active Learning, and Bandits

Efficient Bandit Convex Optimization: Beyond Linear Losses

Arun Sai Suggala, Pradeep Ravikumar, Praneeth Netrapalli

Keywords Abstract Paper

Sample-Efficient Learning of Stackelberg Equilibria in General-Sum Games

Yu Bai, Chi Jin, Huan Wang, Caiming Xiong

Keywords Abstract Paper

theory, reinforcement learning and planning, bandits

Bias-Robust Bayesian Optimization via Dueling Bandits

Johannes Kirschner, Andreas Krause

Keywords Abstract Paper

Reinforcement Learning and Planning, Bandits

OSOM: A simultaneously optimal algorithm for multi-armed and linear contextual bandits

Niladri Chatterji, Vidya Muthukumar, Peter Bartlett

Keywords Abstract Paper

Instance-Dependent Complexity of Contextual Bandits and Reinforcement Learning: A Disagreement-Based Perspective

Dylan Foster, Alexander Rakhlin, David Simchi-Levi, Yunzong Xu

Keywords Abstract Paper

Adaptive Discretization for Adversarial Lipschitz Bandits

Chara Podimata, Alex Slivkins

Keywords Abstract Paper

Model-Free Online Learning in Unknown Sequential Decision Making Problems and Games

Gabriele Farina, Tuomas Sandholm

Keywords Abstract Paper

One More Step Towards Reality: Cooperative Bandits with Imperfect Communication

Udari Madhushani, Abhimanyu Dubey, Naomi Leonard, Alex Pentland

Keywords Abstract Paper

bandits

Efficient Deviation Types and Learning for Hindsight Rationality in Extensive-Form Games

Dustin Morrill, Ryan D'Orazio, Marc Lanctot and James Wright, Michael Bowling, Amy Greenwald

Keywords Abstract Paper

Theory, Game Theory and Computational Economics

Exploration Through Bias: Revisiting Biased Maximum Likelihood Estimation in Stochastic Multi-Armed Bandits

Xi Liu, Ping-Chun Hsieh, Yu Heng Hung and Anirban Bhattacharya, P. Kumar

Keywords Abstract Paper

Online Learning, Active Learning, and Bandits

Contextual blocking bandits

Soumya Basu, Orestis Papadigenopoulos, Constantine Caramanis, Sanjay Shakkottai

Keywords Abstract Paper

Asymptotically Optimal Information-Directed Sampling

Johannes Kirschner, Tor Lattimore, Claire Vernade, Csaba Szepesvari

Keywords Abstract Paper

Control Variates for Slate Off-Policy Evaluation

Nikos Vlassis, Ashok Chandrashekar, Fernando Amat, Nathan Kallus

Keywords Abstract Paper

optimization, bandits

Online Sign Identification: Minimization of the Number of Errors in Thresholding Bandits

Reda Ouhamma, Rémy Degenne, Vianney Perchet, Pierre Gaillard

Keywords Abstract Paper

bandits, online learning

Learning by Repetition: Stochastic Multi-armed Bandits under Priming Effect

Priyank Agrawal, Theja Tulabandula

Keywords Abstract Paper

Learning from eXtreme Bandit Feedback

Romain Lopez, Inderjit S. Dhillon, Michael I. Jordan

Keywords Abstract Paper

Bayesian decision-making under misspecified priors with applications to meta-learning

Max Simchowitz, Christopher Tosh, Akshay Krishnamurthy and Daniel Hsu, Thodoris Lykouris, Miro Dudik, Robert E Schapire

Keywords Abstract Paper

meta learning, bandits

Adaptive Sampling for Stochastic Risk-Averse Learning

Sebastian Curi, Kfir Y. Levy, Stefanie Jegelka, Andreas Krause

Keywords Abstract Paper

Stochastic linear bandits robust to adversarial attacks

Ilija Bogunovic, Arpan Losalka, Andreas Krause, Jonathan Scarlett

Keywords Abstract Paper

Latent Bandits Revisited

Joey Hong, Branislav Kveton, Manzil Zaheer and Yinlam Chow, Amr Ahmed, Craig Boutilier

Keywords Abstract Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Dustin Morrill, Ryan D'Orazio, Marc Lanctot and
James Wright, Michael Bowling, Amy Greenwald

Keywords Paper

Xi Liu, Ping-Chun Hsieh, Yu Heng Hung and
Anirban Bhattacharya, P. Kumar

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Max Simchowitz, Christopher Tosh, Akshay Krishnamurthy and
Daniel Hsu, Thodoris Lykouris, Miro Dudik, Robert E Schapire

Keywords Paper

Keywords Paper

Keywords Paper

Joey Hong, Branislav Kveton, Manzil Zaheer and
Yinlam Chow, Amr Ahmed, Craig Boutilier

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper