Cautiously Optimistic Policy Optimization and Exploration with Linear Function Approximation

Abstract: Policy optimization methods are popular reinforcement learning (RL) algorithms, because their incremental and on-policy nature makes them more stable than the value-based counterparts. However, the same properties also make them slow to converge and sample inefficient, as the on-policy nature is not amenable to data reuse and the incremental updates result in a large iteration complexity before a near optimal policy is discovered. These empirical findings are mirrored in theory in the recent work of~\citet{agarwal2020pc}, which provides a policy optimization method PC-PG that provably finds near optimal policies in the linear MDP model and is robust to model misspecification, but suffers from an extremely poor sample complexity compared with value-based techniques. In this paper, we propose a new algorithm, COPOE, that overcomes the poor sample complexity of PC-PG while retaining the robustness to model misspecification. COPOE makes several important algorithmic enhancements of PC-PG, such as enabling data reuse, and uses more refined analysis techniques, which we expect to be more broadly applicable to designing new RL algorithms. The result is an improvement in sample complexity from $\widetilde{O}(1/\epsilon^{11})$ for PC-PG to $\widetilde{O}(1/\epsilon^3)$ for COPOE, nearly bridging the gap with value-based techniques.

06/12/2020

Cautiously Optimistic Policy Optimization and Exploration with Linear Function Approximation

Andrea Zanette, Ching-An Cheng, Alekh Agarwal

Comments

Similar Papers

An Asymptotically Optimal Primal-Dual Incremental Algorithm for Contextual Linear Bandits

Andrea Tirinzoni, Matteo Pirotta, Marcello Restelli, Alessandro Lazaric

Keywords Abstract Paper

Online model selection for reinforcement learning with function approximation

Jonathan Lee, Aldo Pacchiano, Vidya Muthukumar and Weihao Kong, Emma Brunskill

Keywords Abstract Paper

Twice regularized MDPs and the equivalence between robustness and regularization

Esther Derman, Matthieu Geist, Shie Mannor

Keywords Abstract Paper

optimization, reinforcement learning and planning, robustness

STORM+: Fully Adaptive SGD with Recursive Momentum for Nonconvex Optimization

Kfir Levy, Ali Kavis, Volkan Cevher

Keywords Abstract Paper

optimization

Momentum-Based Policy Gradient Methods

Feihu Huang, Shangqian Gao, Jian Pei, Heng Huang

Keywords Abstract Paper

Reinforcement Learning - General

Doubly Robust Off-Policy Actor-Critic: Convergence and Optimality

Tengyu Xu, Zhuoran Yang, Zhaoran Wang, Yingbin LIANG

Keywords Abstract Paper

Theory, RL, Decisions and Control Theory

Instance-wise minimax-optimal algorithms for logistic bandits

Marc Abeille, Louis Faury, Clement Calauzenes

Keywords Abstract Paper

Constrained Two-step Look-Ahead Bayesian Optimization

Yunxiang Zhang, Xiangyu Zhang, Peter Frazier

Keywords Abstract Paper

optimization, kernel methods

Sample-Efficient Reinforcement Learning Is Feasible for Linearly Realizable MDPs with Limited Revisiting

Gen Li, Yuxin Chen, Yuejie Chi and Yuantao Gu, Yuting Wei

Keywords Abstract Paper

theory, reinforcement learning and planning, generative model

Adaptive Exploration in Linear Contextual Bandit

Botao Hao, Tor Lattimore, Csaba Szepesvari

Keywords Abstract Paper

A Nonparametric Off-Policy Policy Gradient

Samuele Tosatto, Joao Carvalho, Hany Abdulsamad, Jan Peters

Keywords Abstract Paper

Policy Optimization in Adversarial MDPs: Improved Exploration via Dilated Bonuses

Haipeng Luo, Chen-Yu Wei, Chung-Wei Lee

Keywords Abstract Paper

optimization, reinforcement learning and planning, bandits

Provably Good Batch Off-Policy Reinforcement Learning Without Great Exploration

Yao Liu, Adith Swaminathan, Alekh Agarwal, Emma Brunskill

Keywords Abstract Paper

POMO: Policy Optimization with Multiple Optima for Reinforcement Learning

Yeong-Dae Kwon, Jinho Choo, Byoungjip Kim and Iljoo Yoon, Youngjune Gwon, Seungjai Min

Keywords Abstract Paper

Active Slices for Sliced Stein Discrepancy

Wenbo Gong, Kaibo Zhang, Yingzhen Li, Jose Miguel Hernandez-Lobato

Keywords Abstract Paper

, Deep Learning, Efficient Inference Methods, Algorithms, Kernel Methods

A Graduated Filter Method for Large Scale Robust Estimation

Huu Le, Christopher Zach

Keywords Abstract Paper

robust fitting, bundle adjustment, non-convex, poor local minima, non-linear least squares, graduated non-convexity.

Optimal Gradient-based Algorithms for Non-concave Bandit Optimization

Baihe Huang, Kaixuan Huang, Sham Kakade and Jason Lee, Qi Lei, Runzhe Wang, Jiaqi Yang

Keywords Abstract Paper

theory, deep learning, optimization, generative model, bandits

PC-PG: Policy Cover Directed Exploration for Provable Policy Gradient Learning

Alekh Agarwal, Mikael Henaff, Sham Kakade, Wen Sun

Keywords Abstract Paper

Least Squares Regression with Markovian Data: Fundamental Limits and Algorithms

Dheeraj Nagaraj, Xian Wu, Guy Bresler and Prateek Jain, Praneeth Netrapalli

Keywords Abstract Paper

Beyond Value-Function Gaps: Improved Instance-Dependent Regret Bounds for Episodic Reinforcement Learning

Christoph Dann, Teodor Vanislavov Marinov, Mehryar Mohri, Julian Zimmert

Keywords Abstract Paper

theory, reinforcement learning and planning

Communication-Efficient Frank-Wolfe Algorithm for Nonconvex Decentralized Distributed Learning

Wenhan Xian, Feihu Huang, Heng Huang

Keywords Abstract Paper

Principal component regression with semirandom observations via matrix completion

Aditya Bhaskara, Aravinda Kanchana Ruwanpathirana, Maheshakya Wijewardena

Keywords Paper

Jonathan Lee, Aldo Pacchiano, Vidya Muthukumar and
Weihao Kong, Emma Brunskill

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Gen Li, Yuxin Chen, Yuejie Chi and
Yuantao Gu, Yuting Wei

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Yeong-Dae Kwon, Jinho Choo, Byoungjip Kim and
Iljoo Yoon, Youngjune Gwon, Seungjai Min

Keywords Paper

Keywords Paper

Keywords Paper

Baihe Huang, Kaixuan Huang, Sham Kakade and
Jason Lee, Qi Lei, Runzhe Wang, Jiaqi Yang

Keywords Paper

Keywords Paper

Dheeraj Nagaraj, Xian Wu, Guy Bresler and
Prateek Jain, Praneeth Netrapalli

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Ravi Tej Akella, Kamyar Azizzadenesheli, Mohammad Ghavamzadeh and
Animashree Anandkumar, Yisong Yue

Keywords Paper

Aurelien Bibaut, Maria Dimakopoulou, Nathan Kallus and
Antoine Chambaz, Mark van der Laan

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Bo Liu, Xingchao Liu, Xiaojie Jin and
Peter Stone, Qiang Liu

Keywords Paper

Keywords Paper

Paria Rashidinejad, Banghua Zhu, Cong Ma and
Jiantao Jiao, Stuart Russell

Keywords Paper