Model Selection in Contextual Stochastic Bandit Problems

06/12/2020

Model Selection in Contextual Stochastic Bandit Problems

Aldo Pacchiano, My Phan, Yasin Abbasi Yadkori, Anup Rao, Julian Zimmert, Tor Lattimore, Csaba Szepesvari

Keywords:

Abstract Paper Similar Papers

Abstract: We study bandit model selection in stochastic environments. Our approach relies on a master algorithm that selects between candidate base algorithms. We develop a master-base algorithm abstraction that can work with general classes of base algorithms and different type of adversarial master algorithms. Our methods rely on a novel and generic smoothing transformation for bandit algorithms that permits us to obtain optimal $O(\sqrt{T})$ model selection guarantees for stochastic contextual bandit problems as long as the optimal base algorithm satisfies a high probability regret guarantee. We show through a lower bound that even when one of the base algorithms has $O(\log T)$ regret, in general it is impossible to get better than $\Omega(\sqrt{T})$ regret in model selection, even asymptotically. Using our techniques, we address model selection in a variety of problems such as misspecified linear contextual bandits \citep{lattimore2019learning}, linear bandit with unknown dimension \citep{Foster-Krishnamurthy-Luo-2019} and reinforcement learning with unknown feature maps. Our algorithm requires the knowledge of the optimal base regret to adjust the master learning rate. We show that without such prior knowledge any master can suffer a regret larger than the optimal base regret.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at NeurIPS 2020 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

13/04/2021

Low-rank generalized linear bandit problems

Yangyi Lu, Amirhossein Meisami, Ambuj Tewari

Keywords Paper

0

0

0

0

2:49

13/04/2021

Smooth bandit optimization: Generalization to holder space

Yusha Liu, Yining Wang, Aarti Singh

Keywords Paper

0

0

0

0

2:52

06/12/2020

A Bandit Learning Algorithm and Applications to Auction Design

Kim Thang Nguyen

Keywords Paper

0

0

0

0

2:43

04/08/2021

Efficient Bandit Convex Optimization: Beyond Linear Losses

Arun Sai Suggala, Pradeep Ravikumar, Praneeth Netrapalli

Keywords Paper

0

0

0

0

20:29

18/07/2021

Provably Efficient Reinforcement Learning for Discounted MDPs with Feature Mapping

Dongruo Zhou, Jiafan He, Quanquan Gu

Keywords Paper

Reinforcement Learning and Planning

0

0

0

0

5:20

09/07/2020

Smooth Contextual Bandits: Bridging the Parametric and Non-differentiable Regret Regimes

YICHUN HU, Nathan Kallus, Xiaojie Mao

Keywords Paper

Bandit problems,

0

0

0

0

14:35

03/05/2021

Impact of Representation Learning in Linear Bandits

Jiaqi Yang, Wei Hu, Jason Lee, Simon Du

Keywords Paper

multi-task learning, representation learning, linear bandits

0

0

0

0

5:40

13/04/2021

Online model selection for reinforcement learning with function approximation

Jonathan Lee, Aldo Pacchiano, Vidya Muthukumar and
Weihao Kong, Emma Brunskill

Keywords Paper

0

0

0

0

3:15

18/07/2021

Adversarial Combinatorial Bandits with General Non-linear Reward Functions

Yanjun Han, Yining Wang, Xi Chen

Keywords Paper

Applications, Computer Vision, Applications, Computational Photography, Theory, Online Learning Theory

0

0

0

0

5:21

06/12/2020

Delay and Cooperation in Nonstochastic Linear Bandits

Shinji Ito, Daisuke Hatano, Hanna Sumita and
Kei Takemura, Takuro Fukunaga, Naonori Kakimura, Ken-Ichi Kawarabayashi

Keywords Paper

0

0

0

0

3:19

19/08/2021

Neural Regret-Matching for Distributed Constraint Optimization Problems

Yanchen Deng, Runsheng Yu, Xinrun Wang, Bo An

Keywords Paper

Agent-based and Multi-agent Systems, Coordination and Cooperation, Constraint Optimization, Distributed Constraints

0

0

0

0

9:34

04/08/2021

Fine-Grained Gap-Dependent Bounds for Tabular MDPs via Adaptive Multi-Step Bootstrap

Haike Xu, Tengyu Ma, Simon Du

Keywords Paper

0

0

0

0

10:42

04/08/2021

Minimax Regret for Stochastic Shortest Path with Adversarial Costs and Known Transition

Liyu Chen, Haipeng Luo, Chen-Yu Wei

Keywords Paper

0

0

0

0

14:48

06/12/2020

Stage-wise Conservative Linear Bandits

Ahmadreza Moradipari, Christos Thrampoulidis, Mahnoosh Alizadeh

Keywords Paper

0

0

0

0

3:18

06/12/2021

Provably Efficient Reinforcement Learning with Linear Function Approximation under Adaptivity Constraints

Tianhao Wang, Dongruo Zhou, Quanquan Gu

Keywords Paper

reinforcement learning and planning

0

0

0

0

13:12

06/12/2021

Cooperative Stochastic Bandits with Asynchronous Agents and Constrained Feedback

Lin Yang, Yu-Zhen Janice Chen, Stephen Pasteris and
Mohammad Hajiesmaili, John C. S. Lui, Don Towsley

Keywords Paper

bandits

0

0

0

0

12:07

18/07/2021

Model-based Reinforcement Learning for Continuous Control with Posterior Sampling

Ying Fan, Yifei Ming

Keywords Paper

Reinforcement Learning and Planning

0

0

0

0

18:34

06/12/2021

Bayesian decision-making under misspecified priors with applications to meta-learning

Max Simchowitz, Christopher Tosh, Akshay Krishnamurthy and
Daniel Hsu, Thodoris Lykouris, Miro Dudik, Robert E Schapire

Keywords Paper

meta learning, bandits

0

0

0

0

14:58

18/07/2021

Beyond Variance Reduction: Understanding the True Impact of Baselines on Policy Optimization

Wes Chung, Valentin Thomas, Marlos C. Machado, Nicolas Le Roux

Keywords Paper

Reinforcement Learning and Planning

0

0

0

0

5:13

18/07/2021

Adapting to misspecification in contextual bandits with offline regression oracles

Sanath Kumar Krishnamurthy, Vitor Hadad, Susan Athey

Keywords Paper

Reinforcement Learning and Planning, Bandits

0

0

0

0

4:17

26/08/2020

OSOM: A simultaneously optimal algorithm for multi-armed and linear contextual bandits

Niladri Chatterji, Vidya Muthukumar, Peter Bartlett

Keywords Paper

0

0

0

0

8:20

03/05/2021

Learning to Make Decisions via Submodular Regularization

Ayya Alieva, Aiden Aceves, Jialin Song and
Stephen Mayo, Yisong Yue, Yuxin Chen

Keywords Paper

0

0

0

0

5:53

02/02/2021

Multinomial Logit Contextual Bandits: Provable Optimality and Practicality

Min-hwan Oh, Garud Iyengar

Keywords Paper

0

0

0

0

15:47

04/08/2021

Non-stationary Reinforcement Learning without Prior Knowledge: an Optimal Black-box Approach

Chen-Yu Wei, Haipeng Luo

Keywords Paper

0

0

0

0

17:25

04/08/2021

Instance-Dependent Complexity of Contextual Bandits and Reinforcement Learning: A Disagreement-Based Perspective

Dylan Foster, Alexander Rakhlin, David Simchi-Levi, Yunzong Xu

Keywords Paper

0

0

0

0

16:53

18/07/2021

Dynamic Balancing for Model Selection in Bandits and RL

Ashok Cutkosky, Christoph Dann, Abhimanyu Das and
Claudio Gentile, Aldo Pacchiano, Manish Purohit

Keywords Paper

Reinforcement Learning and Planning, Bandits

0

0

0

0

5:18

06/12/2021

On Optimal Robustness to Adversarial Corruption in Online Decision Problems

Shinji Ito

Keywords Paper

robustness, adversarial robustness and security, bandits

0

0

0

0

5:41

06/12/2020

Tight First- and Second-Order Regret Bounds for Adversarial Linear Bandits

Shinji Ito, Shuichi Hirahara, Tasuku Soma, Yuichi Yoshida

Keywords Paper

0

0

0

0

3:24

13/04/2021

Combinatorial gaussian process bandits with probabilistically triggered arms

Ilker Demirel, Cem Tekin

Keywords Paper

0

0

0

0

3:01

04/08/2021

Online Markov Decision Processes with Aggregate Bandit Feedback

Alon Cohen, Haim Kaplan, Tomer Koren, Yishay Mansour

Keywords Paper

0

0

0

0

13:07

06/12/2021

Doubly Robust Thompson Sampling with Linear Payoffs

Wonyoung Kim, Gi-Soo Kim, Myunghee Cho Paik

Keywords Paper

bandits

0

0

0

0

14:18

06/12/2020

Locally Differentially Private (Contextual) Bandits Learning

Kai Zheng, Tianle Cai, Weiran Huang and
Zhenguo Li, Liwei Wang

Keywords Paper

Reinforcement Learning and Planning -> Markov Decision Processes; Reinforcement Learning and Planning -> Reinforcement Learning, Reinforcement Learning and Planning

0

0

0

0

3:03

13/04/2021

Bandit algorithms: Letting go of logarithmic regret for statistical robustness

Kumar Ashutosh, Jayakrishnan Nair, Anmol Kagrecha, Krishna Jagannathan

Keywords Paper

0

0

0

0

3:14

18/07/2021

Adversarial Dueling Bandits

Aadirupa Saha, Tomer Koren, Yishay Mansour

Keywords Paper

Algorithms, Ranking and Preference Learning

0

0

0

0

5:58

12/07/2020

Structured Linear Contextual Bandits: A Sharp and Geometric Smoothed Analysis

Vidyashankar Sivakumar, Steven Wu, Arindam Banerjee

Keywords Paper

Online Learning, Active Learning, and Bandits

0

0

0

0

17:56

06/12/2021

Hybrid Regret Bounds for Combinatorial Semi-Bandits and Adversarial Linear Bandits

Shinji Ito

Keywords Paper

bandits

0

0

0

0

10:49

06/12/2021

Policy Optimization in Adversarial MDPs: Improved Exploration via Dilated Bonuses

Haipeng Luo, Chen-Yu Wei, Chung-Wei Lee

Keywords Paper

optimization, reinforcement learning and planning, bandits

0

0

0

0

15:17

18/07/2021

Improved Regret Bound and Experience Replay in Regularized Policy Iteration

Nevena Lazic, Dong Yin, Yasin Abbasi-Yadkori, Csaba Szepesvari

Keywords Paper

Theory, RL, Decisions and Control Theory

0

0

0

0

20:03

18/07/2021

Near-Optimal Representation Learning for Linear Bandits and Linear RL

Jiachen Hu, Xiaoyu Chen, Chi Jin and
Lihong Li, Liwei Wang

Keywords Paper

Theory, Online Learning Theory

0

0

0

0

5:13

06/12/2020

Simultaneously Learning Stochastic and Adversarial Episodic MDPs with Known Transition

Tiancheng Jin, Haipeng Luo

Keywords Paper

0

0

0

0

3:39