Taking a hint: How to leverage loss predictors in contextual bandits?

09/07/2020

Taking a hint: How to leverage loss predictors in contextual bandits?

Chen-Yu Wei, Haipeng Luo, Alekh Agarwal

Keywords: Bandit problems, Online learning

Abstract Paper Similar Papers

Abstract: We initiate the study of learning in contextual bandits with the help of loss predictors. The main question we address is whether one can improve over the minimax regret $\mathcal{O}(\sqrt{T})$ for learning over $T$ rounds, when the total error of the predicted losses relative to the realized losses, denoted as $\mathcal{E} \leq T$, is relatively small. We provide a complete answer to this question, with upper and lower bounds for various settings: adversarial and stochastic environments, known and unknown $\mathcal{E}$, and single and multiple predictors. We show several surprising results, such as 1) the optimal regret is $\mathcal{O}(\min\{\sqrt{T}, \sqrt{\mathcal{E}}T^\frac{1}{4}\})$ when $\mathcal{E}$ is known, in contrast to the standard and better bound $\mathcal{O}(\sqrt{\mathcal{E}})$ for non-contextual problems (such as multi-armed bandits); 2) the same bound cannot be achieved if $\mathcal{E}$ is unknown, but as a remedy, $\mathcal{O}(\sqrt{\mathcal{E}}T^\frac{1}{3})$ is achievable; 3) with $M$ predictors, a linear dependence on $M$ is necessary, even though logarithmic dependence is possible for non-contextual problems.\n\nWe also develop several novel algorithmic techniques to achieve matching upper bounds, including 1) a key \emph{action remapping} technique for optimal regret with known $\mathcal{E}$, 2) computationally efficient implementation of Catoni's robust mean estimator via an ERM oracle in the stochastic setting with optimal regret, 3) an underestimator for $\mathcal{E}$ via estimating the histogram with bins of exponentially increasing size for the stochastic setting with unknown $\mathcal{E}$, and 4) a self-referential scheme for learning with multiple predictors, all of which might be of independent interest.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at COLT 2020 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

04/08/2021

Non-stationary Reinforcement Learning without Prior Knowledge: an Optimal Black-box Approach

Chen-Yu Wei, Haipeng Luo

Keywords Paper

0

0

0

0

17:25

18/07/2021

Optimal regret algorithm for Pseudo-1d Bandit Convex Optimization

Aadirupa Saha, Nagarajan Natarajan, Praneeth Netrapalli, Prateek Jain

Keywords Paper

Optimization, Convex Optimization

0

0

0

0

6:19

04/08/2021

Efficient Bandit Convex Optimization: Beyond Linear Losses

Arun Sai Suggala, Pradeep Ravikumar, Praneeth Netrapalli

Keywords Paper

0

0

0

0

20:29

06/12/2020

Making Non-Stochastic Control (Almost) as Easy as Stochastic

Max Simchowitz

Keywords Paper

0

0

0

0

2:51

12/07/2020

Naive Exploration is Optimal for Online LQR

Max Simchowitz, Dylan Foster

Keywords Paper

Reinforcement Learning - Theory

0

0

0

0

15:12

06/12/2021

Bandit Phase Retrieval

Tor Lattimore, Botao Hao

Keywords Paper

bandits

0

0

0

0

14:14

04/08/2021

Optimal Dynamic Regret in Exp-Concave Online Learning

Dheeraj Baby, Yu-Xiang Wang

Keywords Paper

1

1

0

1

16:30

06/12/2021

Optimal Order Simple Regret for Gaussian Process Bandits

Sattar Vakili, Nacime Bouziani, Sepehr Jalali and
Alberto Bernacchia, Da-shan Shiu

Keywords Paper

optimization, reinforcement learning and planning, bandits, kernel methods

0

0

0

0

11:05

13/04/2021

On information gain and regret bounds in gaussian process bandits

Sattar Vakili, Kia Khezeli, Victor Picheny

Keywords Paper

0

0

0

0

2:52

13/04/2021

Combinatorial gaussian process bandits with probabilistically triggered arms

Ilker Demirel, Cem Tekin

Keywords Paper

0

0

0

0

3:01

06/12/2021

Improved Variance-Aware Confidence Sets for Linear Bandits and Linear Mixture MDP

Zihan Zhang, Jiaqi Yang, Xiangyang Ji, Simon Du

Keywords Paper

theory, reinforcement learning and planning, bandits

0

0

0

0

15:14

06/12/2021

Calibration and Consistency of Adversarial Surrogate Losses

Pranjal Awasthi, Natalie Frank, Anqi Mao and
Mehryar Mohri, Yutao Zhong

Keywords Paper

theory, optimization, machine learning, robustness, adversarial robustness and security

0

0

0

0

13:30

06/12/2020

Adaptive Online Estimation of Piecewise Polynomial Trends

Dheeraj Baby, Yu-Xiang Wang

Keywords Paper

1

1

0

1

3:12

06/12/2021

List-Decodable Mean Estimation in Nearly-PCA Time

Ilias Diakonikolas, Daniel Kane, Daniel Kongsgaard and
Jerry Li, Kevin Tian

Keywords Paper

theory, clustering

0

0

0

0

14:21

04/08/2021

Exponential Weights Algorithms for Selective Learning

Mingda Qiao, Gregory Valiant

Keywords Paper

0

0

0

0

12:52

18/07/2021

Private Stochastic Convex Optimization: Optimal Rates in L1 Geometry

Hilal Asi, Vitaly Feldman, Tomer Koren, Kunal Talwar

Keywords Paper

Deep Learning, Algorithms, Multitask and Transfer Learning; Algorithms, Online Learning, Social Aspects of Machine Learning, Privacy, Anonymity, and Security

0

0

0

0

17:27

06/12/2021

Logarithmic Regret from Sublinear Hints

Aditya Bhaskara, Ashok Cutkosky, Ravi Kumar, Manish Purohit

Keywords Paper

optimization, online learning

0

0

0

0

14:36

06/12/2020

Minimax Regret of Switching-Constrained Online Convex Optimization: No Phase Transition

Lin Chen, Qian Yu, Hannah Lawrence, Amin Karbasi

Keywords Paper

0

0

0

0

3:19

09/07/2020

Logistic Regression Regret: What’s the Catch?

Gil I Shamir

Keywords Paper

Online learning, Convex optimization, Information theory, Regression

0

0

0

0

16:03

06/12/2021

Breaking the Sample Complexity Barrier to Regret-Optimal Model-Free Reinforcement Learning

Gen Li, Laixi Shi, Yuxin Chen and
Yuantao Gu, Yuejie Chi

Keywords Paper

theory, reinforcement learning and planning

0

0

0

0

15:32

06/12/2021

Policy Optimization in Adversarial MDPs: Improved Exploration via Dilated Bonuses

Haipeng Luo, Chen-Yu Wei, Chung-Wei Lee

Keywords Paper

optimization, reinforcement learning and planning, bandits

0

0

0

0

15:17

06/12/2020

Adapting to Misspecification in Contextual Bandits

Dylan Foster, Claudio Gentile, Mehryar Mohri, Julian Zimmert

Keywords Paper

0

0

0

0

3:05

06/12/2020

Stage-wise Conservative Linear Bandits

Ahmadreza Moradipari, Christos Thrampoulidis, Mahnoosh Alizadeh

Keywords Paper

0

0

0

0

3:18

18/07/2021

Sample Complexity of Robust Linear Classification on Separated Data

Robi Bhattacharjee, Somesh Jha, Kamalika Chaudhuri

Keywords Paper

Theory, Statistical Learning Theory

0

0

0

0

5:26

09/07/2020

Estimating Principal Components under Adversarial Perturbations

Pranjal Awasthi, Xue Chen, Aravindan Vijayaraghavan

Keywords Paper

Unsupervised and semi-supervised learning, Adversarial learning and robustness

0

0

0

0

15:40

18/07/2021

Adversarial Dueling Bandits

Aadirupa Saha, Tomer Koren, Yishay Mansour

Keywords Paper

Algorithms, Ranking and Preference Learning

0

0

0

0

5:58

06/12/2021

Bayesian decision-making under misspecified priors with applications to meta-learning

Max Simchowitz, Christopher Tosh, Akshay Krishnamurthy and
Daniel Hsu, Thodoris Lykouris, Miro Dudik, Robert E Schapire

Keywords Paper

meta learning, bandits

0

0

0

0

14:58

18/07/2021

Near-Optimal Representation Learning for Linear Bandits and Linear RL

Jiachen Hu, Xiaoyu Chen, Chi Jin and
Lihong Li, Liwei Wang

Keywords Paper

Theory, Online Learning Theory

0

0

0

0

5:13

06/12/2021

How Tight Can PAC-Bayes be in the Small Data Regime?

Andrew Foong, Wessel Bruinsma, David Burt, Richard Turner

Keywords Paper

theory, machine learning

0

0

0

0

11:43

03/08/2020

A Simple Online Algorithm for Competing with Dynamic Comparators

Yu-Jie Zhang, Peng Zhao, Zhi-Hua Zhou

Keywords Paper

0

0

0

0

7:44

06/12/2021

Finite-Sample Analysis of Off-Policy TD-Learning via Generalized Bellman Operators

Zaiwei Chen, Siva Theja Maguluri, Sanjay Shakkottai, Karthikeyan Shanmugam

Keywords Paper

0

0

0

0

14:56

13/04/2021

Instance-wise minimax-optimal algorithms for logistic bandits

Marc Abeille, Louis Faury, Clement Calauzenes

Keywords Paper

0

0

0

0

3:06

06/12/2021

Rethinking and Reweighting the Univariate Losses for Multi-Label Ranking: Consistency and Generalization

Guoqiang Wu, Chongxuan LI, Kun Xu, Jun Zhu

Keywords Paper

theory, machine learning

0

0

0

0

9:29

03/05/2021

Benefit of deep learning with non-convex noisy gradient descent: Provable excess risk bound and superiority to kernel methods

Taiji Suzuki, Akiyama Shunta

Keywords Paper

local Rademacher complexity, minimax optimal rate, Excess risk, linear estimator, kernel method, fast learning rate

0

0

0

0

10:13

13/04/2021

Low-rank generalized linear bandit problems

Yangyi Lu, Amirhossein Meisami, Ambuj Tewari

Keywords Paper

0

0

0

0

2:49

06/12/2020

Simultaneously Learning Stochastic and Adversarial Episodic MDPs with Known Transition

Tiancheng Jin, Haipeng Luo

Keywords Paper

0

0

0

0

3:39

02/02/2021

Projection-free Online Learning in Dynamic Environments

Yuanyu Wan, Bo Xue, Lijun Zhang

Keywords Paper

0

0

0

0

15:41

06/12/2020

Locally Differentially Private (Contextual) Bandits Learning

Kai Zheng, Tianle Cai, Weiran Huang and
Zhenguo Li, Liwei Wang

Keywords Paper

Reinforcement Learning and Planning -> Markov Decision Processes; Reinforcement Learning and Planning -> Reinforcement Learning, Reinforcement Learning and Planning

0

0

0

0

3:03

06/12/2020

Adaptive Sampling for Stochastic Risk-Averse Learning

Sebastian Curi, Kfir Y. Levy, Stefanie Jegelka, Andreas Krause

Keywords Paper

0

0

0

0

3:13

04/08/2021

Online Learning with Simple Predictors and a Combinatorial Characterization of Minimax in 0/1 Games

Steve Hanneke, Roi Livni, Shay Moran

Keywords Paper

0

0

0

0

18:07