Near-Optimal Offline Reinforcement Learning via Double Variance Reduction

06/12/2021

Near-Optimal Offline Reinforcement Learning via Double Variance Reduction

Ming Yin, Yu Bai, Yu-Xiang Wang

Keywords: theory, optimization, reinforcement learning and planning

Abstract Paper Similar Papers

Abstract: We consider the problem of offline reinforcement learning (RL) --- a well-motivated setting of RL that aims at policy optimization using only historical data. Despite its wide applicability, theoretical understandings of offline RL, such as its optimal sample complexity, remain largely open even in basic settings such as \emph{tabular} Markov Decision Processes (MDPs). In this paper, we propose \emph{Off-Policy Double Variance Reduction} (OPDVR), a new variance reduction-based algorithm for offline RL. Our main result shows that OPDVR provably identifies an $\epsilon$-optimal policy with $\widetilde{O}(H^2/d_m\epsilon^2)$ episodes of offline data in the finite-horizon \emph{stationary transition} setting, where $H$ is the horizon length and $d_m$ is the minimal marginal state-action distribution induced by the behavior policy. This improves over the best-known upper bound by a factor of $H$. Moreover, we establish an information-theoretic lower bound of $\Omega(H^2/d_m\epsilon^2)$ which certifies that OPDVR is optimal up to logarithmic factors. Lastly, we show that OPDVR also achieves rate-optimal sample complexity under alternative settings such as the finite-horizon MDPs with non-stationary transitions and the infinite horizon MDPs with discounted rewards.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at NeurIPS 2021 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

06/12/2021

Policy Optimization in Adversarial MDPs: Improved Exploration via Dilated Bonuses

Haipeng Luo, Chen-Yu Wei, Chung-Wei Lee

Keywords Paper

optimization, reinforcement learning and planning, bandits

0

0

0

0

15:17

26/08/2020

A Reduction from Reinforcement Learning to No-Regret Online Learning

Ching-An Cheng, Remi Tachet des Combes, Byron Boots, Geoff Gordon

Keywords Paper

0

0

0

0

14:33

06/12/2020

Agnostic $Q$-learning with Function Approximation in Deterministic Systems: Near-Optimal Bounds on Approximation Error and Sample Complexity

Simon Du, Jason Lee, Gaurav Mahajan, Ruosong Wang

Keywords Paper

0

0

0

0

1:56

18/07/2021

Provably Correct Optimization and Exploration with Non-linear Policies

Fei Feng, Wotao Yin, Alekh Agarwal, Lin Yang

Keywords Paper

Deep Learning, Adversarial Networks, Applications, Fairness, Accountability, and Transparency, Theory, RL, Decisions and Control Theory

0

0

0

0

5:03

06/12/2021

Policy Finetuning: Bridging Sample-Efficient Offline and Online Reinforcement Learning

Tengyang Xie, Nan Jiang, Huan Wang and
Caiming Xiong, Yu Bai

Keywords Paper

theory, optimization, reinforcement learning and planning

1

0

0

0

10:57

06/12/2021

Nearly Horizon-Free Offline Reinforcement Learning

Tongzheng Ren, Jialian Li, Bo Dai and
Simon Du, Sujay Sanghavi

Keywords Paper

theory, optimization, reinforcement learning and planning

0

0

0

0

8:44

06/12/2020

Provably Efficient Reinforcement Learning with Kernel and Neural Function Approximations

Zhuoran Yang, Chi Jin, Zhaoran Wang and
Mengdi Wang, Michael Jordan

Keywords Paper

0

0

0

0

3:42

09/07/2020

Optimality and Approximation with Policy Gradient Methods in Markov Decision Processes

Alekh Agarwal, Sham Kakade, Jason Lee, Gaurav Mahajan

Keywords Paper

Reinforcement learning, Non-convex optimization

0

0

0

0

11:00

04/08/2021

Adaptivity in Adaptive Submodularity

Hossein Esfandiari, Amin Karbasi, Vahab Mirrokni

Keywords Paper

0

0

0

0

13:54

09/07/2020

Provably Efficient Reinforcement Learning with Linear Function Approximation

Chi Jin, Zhuoran Yang, Zhaoran Wang, Michael Jordan

Keywords Paper

Reinforcement learning,

0

0

0

0

13:04

06/12/2020

Is Plug-in Solver Sample-Efficient for Feature-based Reinforcement Learning?

Qiwen Cui, Lin Yang

Keywords Paper

Algorithms -> Semi-Supervised Learning; Deep Learning -> Deep Autoencoders; Deep Learning -> Generative Models, Probabilistic Methods -> Variational Inference

0

0

0

0

3:25

03/05/2021

What are the Statistical Limits of Offline RL with Linear Function Approximation?

Ruosong Wang, Dean Foster, Sham M Kakade

Keywords Paper

batch reinforcement learning, representation, function approximation, lower bound

0

0

0

0

9:02

06/12/2020

Is Long Horizon RL More Difficult Than Short Horizon RL?

Ruosong Wang, Simon Du, Lin Yang, Sham Kakade

Keywords Paper

0

0

0

0

3:20

06/12/2021

Breaking the Sample Complexity Barrier to Regret-Optimal Model-Free Reinforcement Learning

Gen Li, Laixi Shi, Yuxin Chen and
Yuantao Gu, Yuejie Chi

Keywords Paper

theory, reinforcement learning and planning

0

0

0

0

15:32

26/04/2020

Sample Efficient Policy Gradient Methods with Recursive Variance Reduction

Pan Xu, Felicia Gao, Quanquan Gu

Keywords Paper

Policy Gradient, Reinforcement Learning, Sample Efficiency

0

0

0

0

4:40

18/07/2021

EMaQ: Expected-Max Q-Learning Operator for Simple Yet Effective Offline and Online RL

Seyed Kamyar Seyed Ghasemipour, Dale Schuurmans, Shixiang Gu

Keywords Paper

Reinforcement Learning and Planning

0

0

0

1

5:54

06/12/2021

Optimal Uniform OPE and Model-based Offline Reinforcement Learning in Time-Homogeneous, Reward-Free and Task-Agnostic Settings

Ming Yin, Yu-Xiang Wang

Keywords Paper

theory, reinforcement learning and planning

0

0

0

0

8:46

06/12/2021

An Exponential Lower Bound for Linearly Realizable MDP with Constant Suboptimality Gap

Yuanhao Wang, Ruosong Wang, Sham Kakade

Keywords Paper

theory, reinforcement learning and planning, generative model

0

0

0

0

15:01

04/08/2021

Softmax Policy Gradient Methods Can Take Exponential Time to Converge

Gen Li, Yuting Wei, Yuejie Chi and
Yuantao Gu, Yuxin Chen

Keywords Paper

0

0

0

0

15:15

06/12/2021

Uncertainty-Based Offline Reinforcement Learning with Diversified Q-Ensemble

Gaon An, Seungyong Moon, Jang-Hyun Kim, Hyun Oh Song

Keywords Paper

deep learning, reinforcement learning and planning

1

0

0

0

13:50

06/12/2021

Finite-Sample Analysis of Off-Policy TD-Learning via Generalized Bellman Operators

Zaiwei Chen, Siva Theja Maguluri, Sanjay Shakkottai, Karthikeyan Shanmugam

Keywords Paper

0

0

0

0

14:56

13/04/2021

Reinforcement learning in parametric MDPs with exponential families

Sayak Ray Chowdhury, Aditya Gopalan, Odalric-Ambrym Maillard

Keywords Paper

0

0

0

0

3:22

26/08/2020

Asymptotically Efficient Off-Policy Evaluation for Tabular Reinforcement Learning

Ming Yin, Yu-Xiang Wang

Keywords Paper

0

0

0

0

14:17

04/08/2021

Minimax Regret for Stochastic Shortest Path with Adversarial Costs and Known Transition

Liyu Chen, Haipeng Luo, Chen-Yu Wei

Keywords Paper

0

0

0

0

14:48

03/05/2021

Non-asymptotic Confidence Intervals of Off-policy Evaluation: Primal and Dual Bounds

Yihao Feng, Ziyang Tang, Na Zhang, Qiang Liu

Keywords Paper

Reinforcement Learnings, Off Policy Evaluation, Non-asymptotic Confidence Intervals

0

0

0

0

4:26

06/12/2021

Bayesian decision-making under misspecified priors with applications to meta-learning

Max Simchowitz, Christopher Tosh, Akshay Krishnamurthy and
Daniel Hsu, Thodoris Lykouris, Miro Dudik, Robert E Schapire

Keywords Paper

meta learning, bandits

0

0

0

0

14:58

06/12/2020

Model-Based Multi-Agent RL in Zero-Sum Markov Games with Near-Optimal Sample Complexity

Kaiqing Zhang, Sham Kakade, Tamer Basar, Lin Yang

Keywords Paper

0

0

0

0

3:25

06/12/2020

Geometric Exploration for Online Control

Orestis Plevrakis, Elad Hazan

Keywords Paper

0

0

0

0

3:21

12/07/2020

Accelerated Stochastic Gradient-free and Projection-free Methods

Feihu Huang, Lue Tao, Songcan Chen

Keywords Paper

Optimization - Non-convex

0

0

0

0

13:05

26/04/2020

Reanalysis of Variance Reduced Temporal Difference Learning

Tengyu Xu, Zhe Wang, Yi Zhou, Yingbin Liang

Keywords Paper

Reinforcement Learning, TD learning, Markovian sample, Variance Reduction

0

0

0

0

4:29

12/07/2020

Momentum-Based Policy Gradient Methods

Feihu Huang, Shangqian Gao, Jian Pei, Heng Huang

Keywords Paper

Reinforcement Learning - General

0

0

0

0

13:28

26/08/2020

Learning piecewise Lipschitz functions in changing environments

Dravyansh Sharma, Maria-Florina Balcan, Travis Dick

Keywords Paper

0

0

0

0

14:57

06/12/2021

Stochastic Shortest Path: Minimax, Parameter-Free and Towards Horizon-Free Regret

Jean Tarbouriech, Runlong Zhou, Simon Du and
Matteo Pirotta, Michal Valko, Alessandro Lazaric

Keywords Paper

theory, reinforcement learning and planning

0

0

0

0

13:47

06/12/2020

Dynamic Regret of Policy Optimization in Non-Stationary Environments

Yingjie Fei, Zhuoran Yang, Zhaoran Wang, Qiaomin Xie

Keywords Paper

0

0

0

0

2:41

18/07/2021

Doubly Robust Off-Policy Actor-Critic: Convergence and Optimality

Tengyu Xu, Zhuoran Yang, Zhaoran Wang, Yingbin LIANG

Keywords Paper

Theory, RL, Decisions and Control Theory

0

0

0

0

4:23

04/08/2021

Fine-Grained Gap-Dependent Bounds for Tabular MDPs via Adaptive Multi-Step Bootstrap

Haike Xu, Tengyu Ma, Simon Du

Keywords Paper

0

0

0

0

10:42

18/07/2021

Is Pessimism Provably Efficient for Offline RL?

Ying Jin, Zhuoran Yang, Zhaoran Wang

Keywords Paper

Reinforcement Learning and Planning, Others

0

0

0

0

5:17

18/07/2021

Private Stochastic Convex Optimization: Optimal Rates in L1 Geometry

Hilal Asi, Vitaly Feldman, Tomer Koren, Kunal Talwar

Keywords Paper

Deep Learning, Algorithms, Multitask and Transfer Learning; Algorithms, Online Learning, Social Aspects of Machine Learning, Privacy, Anonymity, and Security

0

0

0

0

17:27

18/07/2021

Provably Efficient Reinforcement Learning for Discounted MDPs with Feature Mapping

Dongruo Zhou, Jiafan He, Quanquan Gu

Keywords Paper

Reinforcement Learning and Planning

0

0

0

0

5:20

26/04/2020

Q-learning with UCB Exploration is Sample Efficient for Infinite-Horizon MDP

Yuanhao Wang, Kefan Dong, Xiaoyu Chen, Liwei Wang

Keywords Paper

theory, reinforcement learning, sample complexity

0

0

0

0

3:25