Nearly Minimax Optimal Reinforcement Learning for Discounted MDPs

06/12/2021

Nearly Minimax Optimal Reinforcement Learning for Discounted MDPs

jiafan he, Dongruo Zhou, Quanquan Gu

Keywords: reinforcement learning and planning

Abstract Paper Similar Papers

Abstract: We study the reinforcement learning problem for discounted Markov Decision Processes (MDPs) under the tabular setting. We propose a model-based algorithm named UCBVI-$\gamma$, which is based on the \emph{optimism in the face of uncertainty principle} and the Bernstein-type bonus. We show that UCBVI-$\gamma$ achieves an $\tilde{O}\big({\sqrt{SAT}}/{(1-\gamma)^{1.5}}\big)$ regret, where $S$ is the number of states, $A$ is the number of actions, $\gamma$ is the discount factor and $T$ is the number of steps. In addition, we construct a class of hard MDPs and show that for any algorithm, the expected regret is at least $\tilde{\Omega}\big({\sqrt{SAT}}/{(1-\gamma)^{1.5}}\big)$. Our upper bound matches the minimax lower bound up to logarithmic factors, which suggests that UCBVI-$\gamma$ is nearly minimax optimal for discounted MDPs.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at NeurIPS 2021 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

13/04/2021

Improved exploration in factored average-reward MDPs

Mohammad Sadegh Talebi, Anders Jonsson, Odalric Maillard

Keywords Paper

0

0

0

0

3:00

18/07/2021

Provably Efficient Reinforcement Learning for Discounted MDPs with Feature Mapping

Dongruo Zhou, Jiafan He, Quanquan Gu

Keywords Paper

Reinforcement Learning and Planning

0

0

0

0

5:20

18/07/2021

Model-based Reinforcement Learning for Continuous Control with Posterior Sampling

Ying Fan, Yifei Ming

Keywords Paper

Reinforcement Learning and Planning

0

0

0

0

18:34

06/12/2021

Provably Efficient Reinforcement Learning with Linear Function Approximation under Adaptivity Constraints

Tianhao Wang, Dongruo Zhou, Quanquan Gu

Keywords Paper

reinforcement learning and planning

0

0

0

0

13:12

06/12/2021

Accommodating Picky Customers: Regret Bound and Exploration Complexity for Multi-Objective Reinforcement Learning

Jingfeng Wu, Vladimir Braverman, Lin Yang

Keywords Paper

theory, reinforcement learning and planning

0

0

0

0

14:22

06/12/2021

Reinforcement Learning in Linear MDPs: Constant Regret and Representation Selection

Matteo Papini, Andrea Tirinzoni, Aldo Pacchiano and
Marcello Restelli, Alessandro Lazaric, Matteo Pirotta

Keywords Paper

reinforcement learning and planning

0

0

0

0

13:32

06/12/2020

Risk-Sensitive Reinforcement Learning: Near-Optimal Risk-Sample Tradeoff in Regret

Yingjie Fei, Zhuoran Yang, Yudong Chen and
Zhaoran Wang, Qiaomin Xie

Keywords Paper

0

0

0

0

3:13

02/02/2021

Improved Worst-Case Regret Bounds for Randomized Least-Squares Value Iteration

Priyank Agrawal, Jinglin Chen, Nan Jiang

Keywords Paper

0

0

0

0

20:04

06/12/2021

Sample-Efficient Reinforcement Learning for Linearly-Parameterized MDPs with a Generative Model

Bingyan Wang, Yuling Yan, Jianqing Fan

Keywords Paper

theory, reinforcement learning and planning, generative model

0

0

0

0

7:34

06/12/2021

RL for Latent MDPs: Regret Guarantees and a Lower Bound

Jeongyeol Kwon, Yonathan Efroni, Constantine Caramanis, Shie Mannor

Keywords Paper

reinforcement learning and planning

0

0

0

0

13:24

06/12/2021

Improved Variance-Aware Confidence Sets for Linear Bandits and Linear Mixture MDP

Zihan Zhang, Jiaqi Yang, Xiangyang Ji, Simon Du

Keywords Paper

theory, reinforcement learning and planning, bandits

0

0

0

0

15:14

13/04/2021

Smooth bandit optimization: Generalization to holder space

Yusha Liu, Yining Wang, Aarti Singh

Keywords Paper

0

0

0

0

2:52

04/08/2021

Nearly Minimax Optimal Reinforcement Learning for Linear Mixture MDPs

Dongruo Zhou, Quanquan Gu, Csaba Szepesvari

Keywords Paper

0

0

0

0

16:33

04/08/2021

Fine-Grained Gap-Dependent Bounds for Tabular MDPs via Adaptive Multi-Step Bootstrap

Haike Xu, Tengyu Ma, Simon Du

Keywords Paper

0

0

0

0

10:42

13/04/2021

Reinforcement learning in parametric MDPs with exponential families

Sayak Ray Chowdhury, Aditya Gopalan, Odalric-Ambrym Maillard

Keywords Paper

0

0

0

0

3:22

18/07/2021

Near-Optimal Representation Learning for Linear Bandits and Linear RL

Jiachen Hu, Xiaoyu Chen, Chi Jin and
Lihong Li, Liwei Wang

Keywords Paper

Theory, Online Learning Theory

0

0

0

0

5:13

04/08/2021

Sequential prediction under log-loss and misspecification

Meir Feder, Yury Polyanskiy

Keywords Paper

0

0

0

0

17:47

18/07/2021

Kernel-Based Reinforcement Learning: A Finite-Time Analysis

Omar Darwiche Domingues, Pierre Menard, Matteo Pirotta and
Emilie Kaufmann, Michal Valko

Keywords Paper

Theory, RL, Decisions and Control Theory

0

0

0

0

4:58

06/12/2020

Dynamic Regret of Policy Optimization in Non-Stationary Environments

Yingjie Fei, Zhuoran Yang, Zhaoran Wang, Qiaomin Xie

Keywords Paper

0

0

0

0

2:41

13/04/2021

Q-learning with logarithmic regret

Kunhe Yang, Lin Yang, Simon Du

Keywords Paper

0

0

0

0

3:25

18/07/2021

On Reinforcement Learning with Adversarial Corruption and Its Application to Block MDP

Tianhao Wu, Yunchang Yang, Simon Du, Liwei Wang

Keywords Paper

Theory, RL, Decisions and Control Theory

0

0

0

0

5:08

18/07/2021

Adversarial Combinatorial Bandits with General Non-linear Reward Functions

Yanjun Han, Yining Wang, Xi Chen

Keywords Paper

Applications, Computer Vision, Applications, Computational Photography, Theory, Online Learning Theory

0

0

0

0

5:21

18/07/2021

Adversarial Dueling Bandits

Aadirupa Saha, Tomer Koren, Yishay Mansour

Keywords Paper

Algorithms, Ranking and Preference Learning

0

0

0

0

5:58

26/08/2020

Budget-Constrained Bandits over General Cost and Reward Distributions

Semih Cayci, Atilla Eryilmaz, R Srikant

Keywords Paper

0

0

0

0

10:40

18/07/2021

Logarithmic Regret for Reinforcement Learning with Linear Function Approximation

Jiafan He, Dongruo Zhou, Quanquan Gu

Keywords Paper

Algorithms, Classification, Deep Learning, CNN Architectures; Deep Learning, Visualization or Exposition Techniques for Deep Networks, Theory, RL, Decisions and Control Theory

0

0

0

0

5:15

18/07/2021

Randomized Exploration in Reinforcement Learning with General Value Function Approximation

Haque Ishfaq, Qiwen Cui, Viet Nguyen and
Alex Ayoub, Zhuoran Yang, Zhaoran Wang, Doina Precup, Lin Yang

Keywords Paper

Theory, RL, Decisions and Control Theory

0

0

0

0

5:22

13/04/2021

Bandit algorithms: Letting go of logarithmic regret for statistical robustness

Kumar Ashutosh, Jayakrishnan Nair, Anmol Kagrecha, Krishna Jagannathan

Keywords Paper

0

0

0

0

3:14

13/04/2021

Combinatorial gaussian process bandits with probabilistically triggered arms

Ilker Demirel, Cem Tekin

Keywords Paper

0

0

0

0

3:01

12/07/2020

Near-optimal Regret Bounds for Stochastic Shortest Path

Aviv Rosenberg, Alon Cohen, Yishay Mansour, Haim Kaplan

Keywords Paper

Online Learning, Active Learning, and Bandits

0

0

0

0

15:57

06/12/2021

Stochastic Shortest Path: Minimax, Parameter-Free and Towards Horizon-Free Regret

Jean Tarbouriech, Runlong Zhou, Simon Du and
Matteo Pirotta, Michal Valko, Alessandro Lazaric

Keywords Paper

theory, reinforcement learning and planning

0

0

0

0

13:47

13/04/2021

Low-rank generalized linear bandit problems

Yangyi Lu, Amirhossein Meisami, Ambuj Tewari

Keywords Paper

0

0

0

0

2:49

06/12/2021

An Efficient Pessimistic-Optimistic Algorithm for Stochastic Linear Bandits with General Constraints

Xin Liu, Bin Li, Pengyi Shi, Lei Ying

Keywords Paper

optimization, bandits

0

0

0

0

12:44

06/12/2020

Stage-wise Conservative Linear Bandits

Ahmadreza Moradipari, Christos Thrampoulidis, Mahnoosh Alizadeh

Keywords Paper

0

0

0

0

3:18

09/07/2020

Logistic Regression Regret: What’s the Catch?

Gil I Shamir

Keywords Paper

Online learning, Convex optimization, Information theory, Regression

0

0

0

0

16:03

04/08/2021

Fast Rates for the Regret of Offline Reinforcement Learning

Yichun Hu, Nathan Kallus, Masatoshi Uehara

Keywords Paper

0

0

0

0

17:53

06/12/2021

Variational Bayesian Reinforcement Learning with Regret Bounds

Brendan O'Donoghue

Keywords Paper

theory, reinforcement learning and planning, generative model

0

0

0

0

14:40

26/08/2020

Frequentist Regret Bounds for Randomized Least-Squares Value Iteration

Andrea Zanette, David Brandfonbrener, Emma Brunskill and
Matteo Pirotta, Alessandro Lazaric

Keywords Paper

0

0

0

0

12:45

06/12/2021

Optimal Order Simple Regret for Gaussian Process Bandits

Sattar Vakili, Nacime Bouziani, Sepehr Jalali and
Alberto Bernacchia, Da-shan Shiu

Keywords Paper

optimization, reinforcement learning and planning, bandits, kernel methods

0

0

0

0

11:05

06/12/2020

Geometric Exploration for Online Control

Orestis Plevrakis, Elad Hazan

Keywords Paper

0

0

0

0

3:21

06/12/2020

Adversarial Blocking Bandits

Nicholas Bishop, Hau Chan, Debmalya Mandal, Long Tran-Thanh

Keywords Paper

0

0

0

0

3:09