The Pareto Frontier of model selection for general Contextual Bandits

06/12/2021

The Pareto Frontier of model selection for general Contextual Bandits

Teodor Vanislavov Marinov, Julian Zimmert

Keywords: bandits

Abstract Paper Similar Papers

Abstract: Recent progress in model selection raises the question of the fundamental limits of these techniques. Under specific scrutiny has been model selection for general contextual bandits with nested policy classes, resulting in a COLT2020 open problem. It asks whether it is possible to obtain simultaneously the optimal single algorithm guarantees over all policies in a nested sequence of policy classes, or if otherwise this is possible for a trade-off $\alpha\in[\frac{1}{2},1)$ between complexity term and time: $\ln(|\Pi_m|)^{1-\alpha}T^\alpha$. We give a disappointing answer to this question. Even in the purely stochastic regime, the desired results are unobtainable. We present a Pareto frontier of up to logarithmic factors matching upper and lower bounds, thereby proving that an increase in the complexity term $\ln(|\Pi_m|)$ independent of $T$ is unavoidable for general policy classes.As a side result, we also resolve a COLT2016 open problem concerning second-order bounds in full-information games.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at NeurIPS 2021 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

06/12/2020

Is Long Horizon RL More Difficult Than Short Horizon RL?

Ruosong Wang, Simon Du, Lin Yang, Sham Kakade

Keywords Paper

0

0

0

0

3:20

13/04/2021

Contextual blocking bandits

Soumya Basu, Orestis Papadigenopoulos, Constantine Caramanis, Sanjay Shakkottai

Keywords Paper

0

0

0

0

2:47

18/07/2021

Dynamic Planning and Learning under Recovering Rewards

David Simchi-Levi, Zeyu Zheng, Feng Zhu

Keywords Paper

Reinforcement Learning and Planning, Bandits

0

0

0

0

4:53

06/12/2021

Policy Optimization in Adversarial MDPs: Improved Exploration via Dilated Bonuses

Haipeng Luo, Chen-Yu Wei, Chung-Wei Lee

Keywords Paper

optimization, reinforcement learning and planning, bandits

0

0

0

0

15:17

18/07/2021

Almost Optimal Anytime Algorithm for Batched Multi-Armed Bandits

Tianyuan Jin, Jing Tang, Pan Xu and
Keke Huang, Xiaokui Xiao, Quanquan Gu

Keywords Paper

Reinforcement Learning and Planning, Bandits

0

0

0

0

5:19

04/08/2021

Adaptivity in Adaptive Submodularity

Hossein Esfandiari, Amin Karbasi, Vahab Mirrokni

Keywords Paper

0

0

0

0

13:54

18/07/2021

Provably Efficient Fictitious Play Policy Optimization for Zero-Sum Markov Games with Structured Transitions

Shuang Qiu, Xiaohan Wei, Jieping Ye and
Zhaoran Wang, Zhuoran Yang

Keywords Paper

Reinforcement Learning and Planning

0

0

0

0

11:21

19/08/2021

Fast Pareto Optimization for Subset Selection with Dynamic Cost Constraints

Chao Bian, Chao Qian, Frank Neumann, Yang Yu

Keywords Paper

Machine Learning, Evolutionary Learning, Heuristic Search, Heuristic Search and Machine Learning

0

0

0

0

13:05

06/12/2021

Finite Sample Analysis of Average-Reward TD Learning and $Q$-Learning

Sheng Zhang, Zhe Zhang, Siva Theja Maguluri

Keywords Paper

theory, reinforcement learning and planning

0

0

0

0

10:20

06/12/2021

Learning to Select Exogenous Events for Marked Temporal Point Process

Ping Zhang, Rishabh Iyer, Ashish Tendulkar and
Gaurav Aggarwal, Abir De

Keywords Paper

0

0

0

0

12:27

13/04/2021

Stability and risk bounds of iterative hard thresholding

Xiaotong Yuan, Ping Li

Keywords Paper

0

0

0

0

3:08

06/12/2021

Bayesian decision-making under misspecified priors with applications to meta-learning

Max Simchowitz, Christopher Tosh, Akshay Krishnamurthy and
Daniel Hsu, Thodoris Lykouris, Miro Dudik, Robert E Schapire

Keywords Paper

meta learning, bandits

0

0

0

0

14:58

06/12/2021

Breaking the Sample Complexity Barrier to Regret-Optimal Model-Free Reinforcement Learning

Gen Li, Laixi Shi, Yuxin Chen and
Yuantao Gu, Yuejie Chi

Keywords Paper

theory, reinforcement learning and planning

0

0

0

0

15:32

18/07/2021

Finding the Stochastic Shortest Path with Low Regret: the Adversarial Cost and Unknown Transition Case

Liyu Chen, Haipeng Luo

Keywords Paper

Theory, RL, Decisions and Control Theory

0

0

0

0

5:08

06/12/2021

Near-Optimal Offline Reinforcement Learning via Double Variance Reduction

Ming Yin, Yu Bai, Yu-Xiang Wang

Keywords Paper

theory, optimization, reinforcement learning and planning

0

0

0

0

8:57

26/08/2020

Frequentist Regret Bounds for Randomized Least-Squares Value Iteration

Andrea Zanette, David Brandfonbrener, Emma Brunskill and
Matteo Pirotta, Alessandro Lazaric

Keywords Paper

0

0

0

0

12:45

04/08/2021

Softmax Policy Gradient Methods Can Take Exponential Time to Converge

Gen Li, Yuting Wei, Yuejie Chi and
Yuantao Gu, Yuxin Chen

Keywords Paper

0

0

0

0

15:15

09/07/2020

Optimality and Approximation with Policy Gradient Methods in Markov Decision Processes

Alekh Agarwal, Sham Kakade, Jason Lee, Gaurav Mahajan

Keywords Paper

Reinforcement learning, Non-convex optimization

0

0

0

0

11:00

18/07/2021

Combinatorial Blocking Bandits with Stochastic Delays

Alexia Atsidakou, Orestis Papadigenopoulos, Soumya Basu and
Constantine Caramanis, Sanjay Shakkottai

Keywords Paper

Reinforcement Learning and Planning, Bandits

0

0

0

0

5:12

12/07/2020

Reinforcement Learning in Feature Space: Matrix Bandit, Kernels, and Regret Bound

Lin Yang, Mengdi Wang

Keywords Paper

Reinforcement Learning - Theory

0

0

0

0

15:14

06/12/2020

Provably Good Batch Off-Policy Reinforcement Learning Without Great Exploration

Yao Liu, Adith Swaminathan, Alekh Agarwal, Emma Brunskill

Keywords Paper

0

0

0

0

3:17

18/07/2021

Off-Belief Learning

Hengyuan Hu, Adam Lerer, Brandon Cui and
Luis Pineda, Noam Brown, Jakob Foerster

Keywords Paper

Reinforcement Learning and Planning

0

0

0

0

5:10

18/07/2021

A Sharp Analysis of Model-based Reinforcement Learning with Self-Play

Qinghua Liu, Tiancheng Yu, Yu Bai, Chi Jin

Keywords Paper

Algorithms, Multitask and Transfer Learning, Algorithms, Meta-Learning; Applications, Object Recognition; Data, Challenges, Implementations, and Software, Benchmarks;, Theory, RL, Decisions and Control Theory

0

0

0

0

4:49

06/12/2020

Hitting the High Notes: Subset Selection for Maximizing Expected Order Statistics

Aranyak Mehta, Uri Nadav, Alexandros Psomas, Aviad Rubinstein

Keywords Paper

0

0

0

0

3:23

06/12/2021

Bandits with many optimal arms

Rianne de Heide, James Cheshire, Pierre Ménard, Alexandra Carpentier

Keywords Paper

bandits

0

0

0

0

12:23

04/08/2021

Generalizing Complex Hypotheses on Product Distributions: Auctions, Prophet Inequalities, and Pandora's Problem

Chenghao Guo, Zhiyi Huang, Zhihao Gavin Tang, Xinzhi Zhang

Keywords Paper

0

0

0

0

13:12

13/04/2021

Reinforcement learning in parametric MDPs with exponential families

Sayak Ray Chowdhury, Aditya Gopalan, Odalric-Ambrym Maillard

Keywords Paper

0

0

0

0

3:22

18/07/2021

On the Optimality of Batch Policy Optimization Algorithms

Chenjun Xiao, Yifan Wu, Jincheng Mei and
Bo Dai, Tor Lattimore, Lihong Li, Csaba Szepesvari, Dale Schuurmans

Keywords Paper

Reinforcement Learning and Planning

0

0

0

0

5:03

06/12/2020

Adapting to Misspecification in Contextual Bandits

Dylan Foster, Claudio Gentile, Mehryar Mohri, Julian Zimmert

Keywords Paper

0

0

0

0

3:05

02/02/2021

Computing Quantal Stackelberg Equilibrium in Extensive-Form Games

Jakub Černý, Viliam Lisý, Branislav Bošanský, Bo An

Keywords Paper

0

0

0

0

15:01

18/07/2021

Provably Correct Optimization and Exploration with Non-linear Policies

Fei Feng, Wotao Yin, Alekh Agarwal, Lin Yang

Keywords Paper

Deep Learning, Adversarial Networks, Applications, Fairness, Accountability, and Transparency, Theory, RL, Decisions and Control Theory

0

0

0

0

5:03

06/12/2020

A Single Recipe for Online Submodular Maximization with Adversarial or Stochastic Constraints

Omid Sadeghi, Prasanna Raut, Maryam Fazel

Keywords Paper

0

0

0

0

3:18

22/06/2020

The one-way communication complexity of submodular maximization with applications to streaming and robustness

Moran Feldman, Ashkan Norouzi-Fard, Ola Svensson, Rico Zenklusen

Keywords Paper

Submodular Maximization, Approximation Algorithms, Robustness, Streaming, Communication Complexity

0

0

0

0

24:58

23/08/2020

Diverse rule sets

Guangyi Zhang, Aristides Gionis

Keywords Paper

sampling, classifier, pattern mining, rule learning, diversification, rule sets

0

0

0

0

9:41

03/05/2021

Acting in Delayed Environments with Non-Stationary Markov Policies

Esther Derman, Gal Dalal, Shie Mannor

Keywords Paper

reinforcement learning, delay

0

0

0

0

5:07

04/08/2021

Online Learning with Simple Predictors and a Combinatorial Characterization of Minimax in 0/1 Games

Steve Hanneke, Roi Livni, Shay Moran

Keywords Paper

0

0

0

0

18:07

08/07/2020

Space-efficient Query Evaluation over Probabilistic Event Streams

Rajeev Alur, Yu Chen, Kishor Jothimurugan, Sanjeev Khanna

Keywords Paper

Query processing over streams, Streaming algorithms, Probabilistic streams

0

0

0

0

22:51

09/07/2020

Locally Private Hypothesis Selection

Sivakanth Gopi, Gautam Kamath, Janardhan D Kulkarni and
Aleksandar Nikolov, Steven Wu, Huanyu Zhang

Keywords Paper

Privacy, fairness, Distribution learning/testing

0

0

0

0

14:58

03/05/2021

Non-asymptotic Confidence Intervals of Off-policy Evaluation: Primal and Dual Bounds

Yihao Feng, Ziyang Tang, Na Zhang, Qiang Liu

Keywords Paper

Reinforcement Learnings, Off Policy Evaluation, Non-asymptotic Confidence Intervals

0

0

0

0

4:26

18/07/2021

Pareto GAN: Extending the Representational Power of GANs to Heavy-Tailed Distributions

Todd Huster, Jeremy Cohen, Zinan Lin and
Kevin Chan, Charles Kamhoua, Nandi O. Leslie, Cho-Yu Chiang, Vyas Sekar

Keywords Paper

Deep Learning, Adversarial Networks

0

0

0

0

5:11