What are the Statistical Limits of Offline RL with Linear Function Approximation?

03/05/2021

What are the Statistical Limits of Offline RL with Linear Function Approximation?

Ruosong Wang, Dean Foster, Sham M Kakade

Keywords: batch reinforcement learning, representation, function approximation, lower bound

Abstract Paper Similar Papers

Abstract: Offline reinforcement learning seeks to utilize offline (observational) data to guide the learning of (causal) sequential decision making strategies. The hope is that offline reinforcement learning coupled with function approximation methods (to deal with the curse of dimensionality) can provide a means to help alleviate the excessive sample complexity burden in modern sequential decision making problems. However, the extent to which this broader approach can be effective is not well understood, where the literature largely consists of sufficient conditions. This work focuses on the basic question of what are necessary representational and distributional conditions that permit provable sample-efficient offline reinforcement learning. Perhaps surprisingly, our main result shows that even if: i) we have realizability in that the true value function of \emph{every} policy is linear in a given set of features and 2) our off-policy data has good coverage over all features (under a strong spectral condition), any algorithm still (information-theoretically) requires a number of offline samples that is exponential in the problem horizon to non-trivially estimate the value of \emph{any} given policy. Our results highlight that sample-efficient offline policy evaluation is not possible unless significantly stronger conditions hold; such conditions include either having low distribution shift (where the offline data distribution is close to the distribution of the policy to be evaluated) or significantly stronger representational conditions (beyond realizability).

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at ICLR 2021 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

06/12/2021

Uncertainty-Based Offline Reinforcement Learning with Diversified Q-Ensemble

Gaon An, Seungyong Moon, Jang-Hyun Kim, Hyun Oh Song

Keywords Paper

deep learning, reinforcement learning and planning

1

0

0

0

13:50

18/07/2021

Instabilities of Offline RL with Pre-Trained Neural Representation

Ruosong Wang, Yifan Wu, Russ Salakhutdinov, Sham Kakade

Keywords Paper

Reinforcement Learning and Planning

0

0

0

0

5:19

18/07/2021

Doubly Robust Off-Policy Actor-Critic: Convergence and Optimality

Tengyu Xu, Zhuoran Yang, Zhaoran Wang, Yingbin LIANG

Keywords Paper

Theory, RL, Decisions and Control Theory

0

0

0

0

4:23

06/12/2021

Improving Self-supervised Learning with Automated Unsupervised Outlier Arbitration

Yu Wang, Jingyang Lin, Jingjing Zou and
Yingwei Pan, Ting Yao, Tao Mei

Keywords Paper

self-supervised learning, contrastive learning

0

0

0

0

12:26

06/12/2021

COMBO: Conservative Offline Model-Based Policy Optimization

Tianhe Yu, Aviral Kumar, Rafael Rafailov and
Aravind Rajeswaran, Sergey Levine, Chelsea Finn

Keywords Paper

deep learning, optimization, reinforcement learning and planning

0

0

0

0

12:35

06/12/2020

MOPO: Model-based Offline Policy Optimization

Tianhe (Kevin) Yu, Garrett Thomas, Lantao Yu and
Stefano Ermon, James Zou, Sergey Levine, Chelsea Finn, Tengyu Ma

Keywords Paper

0

0

0

0

3:30

13/04/2021

Finite-sample regret bound for distributionally robust offline tabular reinforcement learning

Zhengqing Zhou, Zhengyuan Zhou, Qinxun Bai and
Linhai Qiu, Jose Blanchet, Peter Glynn

Keywords Paper

0

0

0

0

3:02

06/12/2020

Confounding-Robust Policy Evaluation in Infinite-Horizon Reinforcement Learning

Nathan Kallus, Angela Zhou

Keywords Paper

0

0

0

0

4:51

06/12/2021

An Analysis of Constant Step Size SGD in the Non-convex Regime: Asymptotic Normality and Bias

Lu Yu, Krishnakumar Balasubramanian, Stanislav Volgushev, Murat Erdogdu

Keywords Paper

optimization, machine learning

0

0

0

0

10:21

06/12/2020

Breaking the Sample Size Barrier in Model-Based Reinforcement Learning with a Generative Model

Gen Li, Yuting Wei, Yuejie Chi and
Yuantao Gu, Yuxin Chen

Keywords Paper

0

0

0

0

3:09

26/04/2020

CAQL: Continuous Action Q-Learning

Moonkyung Ryu, Yinlam Chow, Ross Anderson and
Christian Tjandraatmadja, Craig Boutilier

Keywords Paper

Reinforcement learning (RL), DQN, Continuous control, Mixed-Integer Programming (MIP)

0

0

0

0

5:36

06/12/2021

Near-Optimal Offline Reinforcement Learning via Double Variance Reduction

Ming Yin, Yu Bai, Yu-Xiang Wang

Keywords Paper

theory, optimization, reinforcement learning and planning

0

0

0

0

8:57

06/12/2020

Distributionally Robust Federated Averaging

Yuyang Deng, Mohammad Mahdi Kamani, Mehrdad Mahdavi

Keywords Paper

0

0

0

0

3:11

06/12/2020

Provably Good Batch Off-Policy Reinforcement Learning Without Great Exploration

Yao Liu, Adith Swaminathan, Alekh Agarwal, Emma Brunskill

Keywords Paper

0

0

0

0

3:17

06/12/2021

An Exponential Lower Bound for Linearly Realizable MDP with Constant Suboptimality Gap

Yuanhao Wang, Ruosong Wang, Sham Kakade

Keywords Paper

theory, reinforcement learning and planning, generative model

0

0

0

0

15:01

26/04/2020

Is a Good Representation Sufficient for Sample Efficient Reinforcement Learning?

Simon S. Du, Sham M. Kakade, Ruosong Wang, Lin F. Yang

Keywords Paper

reinforcement learning, function approximation, lower bound, representation

0

0

0

0

4:55

09/07/2020

Optimality and Approximation with Policy Gradient Methods in Markov Decision Processes

Alekh Agarwal, Sham Kakade, Jason Lee, Gaurav Mahajan

Keywords Paper

Reinforcement learning, Non-convex optimization

0

0

0

0

11:00

12/07/2020

Momentum-Based Policy Gradient Methods

Feihu Huang, Shangqian Gao, Jian Pei, Heng Huang

Keywords Paper

Reinforcement Learning - General

0

0

0

0

13:28

06/12/2021

Determinantal point processes based on orthogonal polynomials for sampling minibatches in SGD

Rémi Bardenet, Subhroshekhar Ghosh, Meixia LIN

Keywords Paper

optimization, machine learning

0

0

0

0

14:51

06/12/2021

Bayesian decision-making under misspecified priors with applications to meta-learning

Max Simchowitz, Christopher Tosh, Akshay Krishnamurthy and
Daniel Hsu, Thodoris Lykouris, Miro Dudik, Robert E Schapire

Keywords Paper

meta learning, bandits

0

0

0

0

14:58

06/12/2020

Expert-Supervised Reinforcement Learning for Offline Policy Learning and Evaluation

Aaron Sonabend, Junwei Lu, Leo Anthony Celi and
Tianxi Cai, Peter Szolovits

Keywords Paper

0

0

0

0

3:15

06/12/2021

Consistent Non-Parametric Methods for Maximizing Robustness

Robi Bhattacharjee, Kamalika Chaudhuri

Keywords Paper

robustness, adversarial robustness and security

0

0

0

0

15:25

02/02/2021

Robust Reinforcement Learning: A Case Study in Linear Quadratic Regulation

Bo Pang, Zhong-Ping Jiang

Keywords Paper

0

0

0

0

20:01

18/07/2021

Beyond Variance Reduction: Understanding the True Impact of Baselines on Policy Optimization

Wes Chung, Valentin Thomas, Marlos C. Machado, Nicolas Le Roux

Keywords Paper

Reinforcement Learning and Planning

0

0

0

0

5:13

26/04/2020

Ranking Policy Gradient

Kaixiang Lin, Jiayu Zhou

Keywords Paper

Sample-efficient reinforcement learning, off-policy learning.

0

0

0

0

5:43

18/07/2021

Provably Correct Optimization and Exploration with Non-linear Policies

Fei Feng, Wotao Yin, Alekh Agarwal, Lin Yang

Keywords Paper

Deep Learning, Adversarial Networks, Applications, Fairness, Accountability, and Transparency, Theory, RL, Decisions and Control Theory

0

0

0

0

5:03

06/12/2021

Policy Optimization in Adversarial MDPs: Improved Exploration via Dilated Bonuses

Haipeng Luo, Chen-Yu Wei, Chung-Wei Lee

Keywords Paper

optimization, reinforcement learning and planning, bandits

0

0

0

0

15:17

18/07/2021

Tesseract: Tensorised Actors for Multi-Agent Reinforcement Learning

Anuj Mahajan, Mikayel Samvelyan, Lei Mao and
Viktor Makoviychuk, Animesh Garg, Jean Kossaifi, Shimon Whiteson, Yuke Zhu, Anima Anandkumar

Keywords Paper

Reinforcement Learning and Planning, Multi-Agent RL

0

0

0

0

5:16

18/07/2021

Is Pessimism Provably Efficient for Offline RL?

Ying Jin, Zhuoran Yang, Zhaoran Wang

Keywords Paper

Reinforcement Learning and Planning, Others

0

0

0

0

5:17

06/12/2021

Bridging Offline Reinforcement Learning and Imitation Learning: A Tale of Pessimism

Paria Rashidinejad, Banghua Zhu, Cong Ma and
Jiantao Jiao, Stuart Russell

Keywords Paper

theory, reinforcement learning and planning, bandits

0

0

0

0

12:21

06/12/2020

Provably Efficient Reward-Agnostic Navigation with Linear Value Iteration

Andrea Zanette, Alessandro Lazaric, Mykel J Kochenderfer, Emma Brunskill

Keywords Paper

0

0

0

0

3:11

12/07/2020

Representations for Stable Off-Policy Reinforcement Learning

Dibya Ghosh, Marc Bellemare

Keywords Paper

Reinforcement Learning - General

0

0

0

0

14:38

26/04/2020

Identifying through Flows for Recovering Latent Representations

Shen Li, Bryan Hooi, Gim Hee Lee

Keywords Paper

Representation learning, identifiable generative models, nonlinear-ICA

0

0

0

0

5:11

18/07/2021

OptiDICE: Offline Policy Optimization via Stationary Distribution Correction Estimation

Jongmin Lee, Wonseok Jeon, Byung-Jun Lee and
Joelle Pineau, Kee-Eung Kim

Keywords Paper

Reinforcement Learning and Planning

1

0

0

1

5:15

06/12/2021

Multi-Label Learning with Pairwise Relevance Ordering

Ming-Kun Xie, Sheng-Jun Huang

Keywords Paper

machine learning

0

0

0

0

3:56

06/12/2021

Agnostic Reinforcement Learning with Low-Rank MDPs and Rich Observations

Ayush Sekhari, Christoph Dann, Mehryar Mohri and
Yishay Mansour, Karthik Sridharan

Keywords Paper

theory, reinforcement learning and planning

0

0

0

0

11:22

03/05/2021

FOCAL: Efficient Fully-Offline Meta-Reinforcement Learning via Distance Metric Learning and Behavior Regularization

Lanqing Li, Rui Yang, Dijun Luo

Keywords Paper

distance metric learning, offline/batch reinforcement learning, meta-reinforcement learning, contrastive learning, multi-task reinforcement learning

1

0

0

0

6:21

06/12/2021

Finite-Sample Analysis of Off-Policy TD-Learning via Generalized Bellman Operators

Zaiwei Chen, Siva Theja Maguluri, Sanjay Shakkottai, Karthikeyan Shanmugam

Keywords Paper

0

0

0

0

14:56

06/12/2020

PC-PG: Policy Cover Directed Exploration for Provable Policy Gradient Learning

Alekh Agarwal, Mikael Henaff, Sham Kakade, Wen Sun

Keywords Paper

0

0

0

0

3:13

06/12/2021

Efficient First-Order Contextual Bandits: Prediction, Allocation, and Triangular Discrimination

Dylan J Foster, Akshay Krishnamurthy

Keywords Paper

theory, reinforcement learning and planning, bandits, online learning

0

0

0

0

19:34