Simultaneously Learning Stochastic and Adversarial Episodic MDPs with Known Transition

06/12/2020

Simultaneously Learning Stochastic and Adversarial Episodic MDPs with Known Transition

Tiancheng Jin, Haipeng Luo

Keywords:

Abstract Paper Similar Papers

Abstract: This work studies the problem of learning episodic Markov Decision Processes with known transition and bandit feedback. We develop the first algorithm with a ``best-of-both-worlds'' guarantee: it achieves O(log T) regret when the losses are stochastic, and simultaneously enjoys worst-case robustness with \tilde{O}(\sqrt{T}) regret even when the losses are adversarial, where T is the number of episodes. More generally, it achieves \tilde{O}(\sqrt{C}) regret in an intermediate setting where the losses are corrupted by a total amount of C. Our algorithm is based on the Follow-the-Regularized-Leader method from Zimin and Neu (2013), with a novel hybrid regularizer inspired by recent works of Zimmert et al. (2019a, 2019b) for the special case of multi-armed bandits. Crucially, our regularizer admits a non-diagonal Hessian with a highly complicated inverse. Analyzing such a regularizer and deriving a particular self-bounding regret guarantee is our key technical contribution and might be of independent interest.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at NeurIPS 2020 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

04/08/2021

Efficient Bandit Convex Optimization: Beyond Linear Losses

Arun Sai Suggala, Pradeep Ravikumar, Praneeth Netrapalli

Keywords Paper

0

0

0

0

20:29

04/08/2021

Online Markov Decision Processes with Aggregate Bandit Feedback

Alon Cohen, Haim Kaplan, Tomer Koren, Yishay Mansour

Keywords Paper

0

0

0

0

13:07

09/07/2020

Root-n-Regret for Learning in Markov Decision Processes with Function Approximation and Low Bellman Rank

Kefan Dong, Jian Peng, Yining Wang, Yuan Zhou

Keywords Paper

Reinforcement learning,

0

0

0

0

12:46

06/12/2021

Bayesian decision-making under misspecified priors with applications to meta-learning

Max Simchowitz, Christopher Tosh, Akshay Krishnamurthy and
Daniel Hsu, Thodoris Lykouris, Miro Dudik, Robert E Schapire

Keywords Paper

meta learning, bandits

0

0

0

0

14:58

12/07/2020

Learning Adversarial Markov Decision Processes with Bandit Feedback and Unknown Transition

Chi Jin, Tiancheng Jin, Haipeng Luo and
Suvrit Sra, Tiancheng Yu

Keywords Paper

Reinforcement Learning - Theory

0

0

0

0

15:14

06/12/2020

Bias no more: high-probability data-dependent regret bounds for adversarial bandits and MDPs

Chung-Wei Lee, Haipeng Luo, Chen-Yu Wei, Mengxiao Zhang

Keywords Paper

0

0

0

0

3:16

18/07/2021

On Lower Bounds for Standard and Robust Gaussian Process Bandit Optimization

Xu Cai, Jonathan Scarlett

Keywords Paper

Applications, Natural Language Processing, Applications, Network Analysis, Reinforcement Learning and Planning, Bandits

0

0

0

0

4:19

04/08/2021

Minimax Regret for Stochastic Shortest Path with Adversarial Costs and Known Transition

Liyu Chen, Haipeng Luo, Chen-Yu Wei

Keywords Paper

0

0

0

0

14:48

18/07/2021

Achieving Near Instance-Optimality and Minimax-Optimality in Stochastic and Adversarial Linear Bandits Simultaneously

Chung-Wei Lee, Haipeng Luo, Chen-Yu Wei and
Mengxiao Zhang, Xiaojin Zhang

Keywords Paper

Reinforcement Learning and Planning, Bandits

0

0

0

0

5:12

13/04/2021

Low-rank generalized linear bandit problems

Yangyi Lu, Amirhossein Meisami, Ambuj Tewari

Keywords Paper

0

0

0

0

2:49

04/08/2021

Online Learning with Simple Predictors and a Combinatorial Characterization of Minimax in 0/1 Games

Steve Hanneke, Roi Livni, Shay Moran

Keywords Paper

0

0

0

0

18:07

06/12/2021

The best of both worlds: stochastic and adversarial episodic MDPs with unknown transition

Tiancheng Jin, Longbo Huang, Haipeng Luo

Keywords Paper

reinforcement learning and planning, online learning

0

0

0

0

19:08

06/12/2021

Learning-to-learn non-convex piecewise-Lipschitz functions

Maria-Florina Balcan, Mikhail Khodak, Dravyansh Sharma, Ameet S Talwalkar

Keywords Paper

optimization, machine learning, robustness, meta learning, online learning

0

0

0

0

14:13

04/08/2021

Parameter-Free Multi-Armed Bandit Algorithms with Hybrid Data-Dependent Regret Bounds

Shinji Ito

Keywords Paper

0

0

0

0

15:29

06/12/2021

Neural Active Learning with Performance Guarantees

Zhilei Wang, Pranjal Awasthi, Christoph Dann and
Ayush Sekhari, Claudio Gentile

Keywords Paper

deep learning, active learning

0

0

0

0

10:43

09/07/2020

Tsallis-INF for Decoupled Exploration and Exploitation in Multi-armed Bandits

Chloé Rouyer , Yevgeny Seldin

Keywords Paper

Bandit problems, Online learning

0

0

0

0

15:30

06/12/2020

Tight First- and Second-Order Regret Bounds for Adversarial Linear Bandits

Shinji Ito, Shuichi Hirahara, Tasuku Soma, Yuichi Yoshida

Keywords Paper

0

0

0

0

3:24

06/12/2021

On Optimal Robustness to Adversarial Corruption in Online Decision Problems

Shinji Ito

Keywords Paper

robustness, adversarial robustness and security, bandits

0

0

0

0

5:41

04/08/2021

Optimal Dynamic Regret in Exp-Concave Online Learning

Dheeraj Baby, Yu-Xiang Wang

Keywords Paper

1

1

0

1

16:30

06/12/2020

Locally Differentially Private (Contextual) Bandits Learning

Kai Zheng, Tianle Cai, Weiran Huang and
Zhenguo Li, Liwei Wang

Keywords Paper

Reinforcement Learning and Planning -> Markov Decision Processes; Reinforcement Learning and Planning -> Reinforcement Learning, Reinforcement Learning and Planning

0

0

0

0

3:03

06/12/2020

Exploiting the Surrogate Gap in Online Multiclass Classification

Dirk van der Hoeven

Keywords Paper

0

0

0

0

3:24

03/05/2021

Impact of Representation Learning in Linear Bandits

Jiaqi Yang, Wei Hu, Jason Lee, Simon Du

Keywords Paper

multi-task learning, representation learning, linear bandits

0

0

0

0

5:40

04/08/2021

Asymptotically Optimal Information-Directed Sampling

Johannes Kirschner, Tor Lattimore, Claire Vernade, Csaba Szepesvari

Keywords Paper

0

0

0

0

15:01

06/12/2020

Follow the Perturbed Leader: Optimism and Fast Parallel Algorithms for Smooth Minimax Games

Arun Suggala, Praneeth Netrapalli

Keywords Paper

1

1

0

0

3:29

09/07/2020

Smooth Contextual Bandits: Bridging the Parametric and Non-differentiable Regret Regimes

YICHUN HU, Nathan Kallus, Xiaojie Mao

Keywords Paper

Bandit problems,

0

0

0

0

14:35

02/02/2021

Decentralized Multi-Agent Linear Bandits with Safety Constraints

Sanae Amani, Christos Thrampoulidis

Keywords Paper

0

0

0

0

19:13

06/12/2020

Delay and Cooperation in Nonstochastic Linear Bandits

Shinji Ito, Daisuke Hatano, Hanna Sumita and
Kei Takemura, Takuro Fukunaga, Naonori Kakimura, Ken-Ichi Kawarabayashi

Keywords Paper

0

0

0

0

3:19

06/12/2021

Hybrid Regret Bounds for Combinatorial Semi-Bandits and Adversarial Linear Bandits

Shinji Ito

Keywords Paper

bandits

0

0

0

0

10:49

06/12/2020

An Asymptotically Optimal Primal-Dual Incremental Algorithm for Contextual Linear Bandits

Andrea Tirinzoni, Matteo Pirotta, Marcello Restelli, Alessandro Lazaric

Keywords Paper

0

0

0

0

3:13

06/12/2021

Breaking the Moments Condition Barrier: No-Regret Algorithm for Bandits with Super Heavy-Tailed Payoffs

Han Zhong, Jiayi Huang, Lin Yang, Liwei Wang

Keywords Paper

machine learning, bandits

0

0

0

0

3:25

06/12/2020

Stage-wise Conservative Linear Bandits

Ahmadreza Moradipari, Christos Thrampoulidis, Mahnoosh Alizadeh

Keywords Paper

0

0

0

0

3:18

06/12/2021

Recurrent Submodular Welfare and Matroid Blocking Semi-Bandits

Orestis Papadigenopoulos, Constantine Caramanis

Keywords Paper

bandits

0

0

0

0

12:28

02/02/2021

Model-Free Online Learning in Unknown Sequential Decision Making Problems and Games

Gabriele Farina, Tuomas Sandholm

Keywords Paper

0

0

0

0

17:09

18/07/2021

A Distribution-dependent Analysis of Meta Learning

Mikhail Konobeev, Ilja Kuzborskij, Csaba Szepesvari

Keywords Paper

Theory, Statistical Learning Theory

0

0

0

0

5:06

04/08/2021

Adaptive Discretization for Adversarial Lipschitz Bandits

Chara Podimata, Alex Slivkins

Keywords Paper

0

0

0

0

18:13

26/08/2020

Thompson Sampling for Linearly Constrained Bandits

Vidit Saxena, Joakim Jalden, Joseph Gonzalez

Keywords Paper

0

0

0

0

13:06

18/07/2021

Adapting to Delays and Data in Adversarial Multi-Armed Bandits

András György, Pooria Joulani

Keywords Paper

Deep Learning, Attention Models, Applications, Time Series Analysis; Deep Learning, Predictive Models, Reinforcement Learning and Planning, Bandits

0

0

0

0

6:18

09/07/2020

Taking a hint: How to leverage loss predictors in contextual bandits?

Chen-Yu Wei, Haipeng Luo, Alekh Agarwal

Keywords Paper

Bandit problems, Online learning

0

0

0

0

14:35

06/12/2020

Model Selection in Contextual Stochastic Bandit Problems

Aldo Pacchiano, My Phan, Yasin Abbasi Yadkori and
Anup Rao, Julian Zimmert, Tor Lattimore, Csaba Szepesvari

Keywords Paper

0

0

0

0

3:22

13/04/2021

Online k-means clustering

Vincent Cohen-Addad, Benjamin Guedj, Varun Kanade, Guy Rom

Keywords Paper

0

0

0

0

2:52