Minimax-Optimal Off-Policy Evaluation with Linear Function Approximation

Abstract: This paper studies the statistical theory of batch data reinforcement learning with function approximation. Consider the off-policy evaluation problem, which is to estimate the cumulative value of a new target policy from logged history generated by unknown behavior policies. We study a regression-based fitted Q iteration method, and show that it is equivalent to a model-based method that estimates a conditional mean embedding of the transition operator. We prove that this method is information-theoretically optimal and has nearly minimal estimation error. In particular, by leveraging contraction property of Markov processes and martingale concentration, we establish a finite-sample instance-dependent error upper bound and a nearly-matching minimax lower bound. The policy evaluation error depends sharply on a restricted chi-square divergence over the function class between the long-term distribution of target policy and the distribution of past data. This restricted chi-square divergence is both instance-dependent and function-class-dependent. It characterizes the statistical limit of off-policy evaluation. Further, we provide an easily computable confidence bound for the policy evaluator, which may be useful for optimistic planning and safe policy improvement.

18/07/2021

Deep Learning, Applications, Computer Vision, Algorithms, Image Segmentation; Algorithms, Similarity and Distance Learning; Algorithms, Spectral Methods; Applications

4:42

12/07/2020

Minimax-Optimal Off-Policy Evaluation with Linear Function Approximation

Yaqi Duan, Zeyu Jia, Mengdi Wang

Comments

Similar Papers

Sparse Feature Selection Makes Batch Reinforcement Learning More Sample Efficient

Botao Hao, Yaqi Duan, Tor Lattimore and Csaba Szepesvari, Mengdi Wang

Keywords Abstract Paper

Theory, Statistical Learning Theory

Doubly Robust Off-Policy Actor-Critic: Convergence and Optimality

Tengyu Xu, Zhuoran Yang, Zhaoran Wang, Yingbin LIANG

Keywords Abstract Paper

Theory, RL, Decisions and Control Theory

Distributionally Robust Federated Averaging

Yuyang Deng, Mohammad Mahdi Kamani, Mehrdad Mahdavi

Keywords Abstract Paper

Model-Based Reinforcement Learning with a Generative Model is Minimax Optimal

Alekh Agarwal, Sham Kakade, Lin Yang

Keywords Abstract Paper

Reinforcement learning, Sampling algorithms

Variance-Aware Off-Policy Evaluation with Linear Function Approximation

Yifei Min, Tianhao Wang, Dongruo Zhou, Quanquan Gu

Keywords Abstract Paper

theory, reinforcement learning and planning

Empirical Likelihood for Contextual Bandits

Nikos Karampatziakis, John Langford, Paul Mineiro

Keywords Abstract Paper

Ranking Policy Gradient

Kaixiang Lin, Jiayu Zhou

Keywords Abstract Paper

Sample-efficient reinforcement learning, off-policy learning.

Risk Bounds and Rademacher Complexity in Batch Reinforcement Learning

Yaqi Duan, Chi Jin, Zhiyuan Li

Keywords Abstract Paper

Theory, RL, Decisions and Control Theory

Robust Reinforcement Learning: A Case Study in Linear Quadratic Regulation

Bo Pang, Zhong-Ping Jiang

Keywords Abstract Paper

Posterior Value Functions: Hindsight Baselines for Policy Gradient Methods

Chris Nota, Philip Thomas, Bruno C. da Silva

Keywords Abstract Paper

Reinforcement Learning and Planning

Variational (Gradient) Estimate of the Score Function in Energy-based Latent Variable Models

Fan Bao, Taufik Xu, Chongxuan Li and Lanqing Hong, Jun Zhu, Bo Zhang

Keywords Abstract Paper

Deep Learning, Applications, Computer Vision, Algorithms, Image Segmentation; Algorithms, Similarity and Distance Learning; Algorithms, Spectral Methods; Applications

Learning with Good Feature Representations in Bandits and in RL with a Generative Model

Gellért Weisz, Tor Lattimore, Csaba Szepesvari

Keywords Abstract Paper

Reinforcement Learning - Theory

Variance Penalized On-Policy and Off-Policy Actor-Critic

Arushi Jain, Gandharv Patil, Ayush Jain and Khimya Khetarpal, Doina Precup

Keywords Abstract Paper

Asymptotically Efficient Off-Policy Evaluation for Tabular Reinforcement Learning

Ming Yin, Yu-Xiang Wang

Keywords Abstract Paper

Minimax Estimation of Conditional Moment Models

Nishanth Dikkala, Greg Lewis, Lester Mackey, Vasilis Syrgkanis

Keywords Abstract Paper

Bayesian decision-making under misspecified priors with applications to meta-learning

Max Simchowitz, Christopher Tosh, Akshay Krishnamurthy and Daniel Hsu, Thodoris Lykouris, Miro Dudik, Robert E Schapire

Keywords Abstract Paper

meta learning, bandits

Determinantal point processes based on orthogonal polynomials for sampling minibatches in SGD

Rémi Bardenet, Subhroshekhar Ghosh, Meixia LIN

Keywords Abstract Paper

optimization, machine learning

Momentum-Based Policy Gradient Methods

Feihu Huang, Shangqian Gao, Jian Pei, Heng Huang

Keywords Abstract Paper

Reinforcement Learning - General

Regularizing Variational Autoencoder with Diversity and Uncertainty Awareness

Dazhong Shen, Chuan Qin, Chao Wang and Hengshu Zhu, Enhong Chen, Hui Xiong

Keywords Abstract Paper

Machine Learning, Bayesian Learning, Probabilistic Machine Learning, Unsupervised Learning

Risk Minimization from Adaptively Collected Data: Guarantees for Supervised and Policy Learning

Aurelien Bibaut, Nathan Kallus, Maria Dimakopoulou and Antoine Chambaz, Mark van der Laan

Keywords Abstract Paper

theory, reinforcement learning and planning, machine learning, bandits

Learning Near Optimal Policies with Low Inherent Bellman Error

Andrea Zanette, Alessandro Lazaric, Mykel Kochenderfer, Emma Brunskill

Botao Hao, Yaqi Duan, Tor Lattimore and
Csaba Szepesvari, Mengdi Wang

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Fan Bao, Taufik Xu, Chongxuan Li and
Lanqing Hong, Jun Zhu, Bo Zhang

Keywords Paper

Keywords Paper

Arushi Jain, Gandharv Patil, Ayush Jain and
Khimya Khetarpal, Doina Precup

Keywords Paper

Keywords Paper

Keywords Paper

Max Simchowitz, Christopher Tosh, Akshay Krishnamurthy and
Daniel Hsu, Thodoris Lykouris, Miro Dudik, Robert E Schapire

Keywords Paper

Keywords Paper

Keywords Paper

Dazhong Shen, Chuan Qin, Chao Wang and
Hengshu Zhu, Enhong Chen, Hui Xiong

Keywords Paper

Aurelien Bibaut, Nathan Kallus, Maria Dimakopoulou and
Antoine Chambaz, Mark van der Laan

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Paria Rashidinejad, Banghua Zhu, Cong Ma and
Jiantao Jiao, Stuart Russell

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Yucen Luo, Alex Beatson, Mohammad Norouzi and
Jun Zhu, David Duvenaud, Ryan P. Adams, Ricky T. Q. Chen

Keywords Paper

Alessandro Rinaldo, Daren Wang, Qin Wen and
Rebecca Willett, Yi Yu

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper