Finite-Sample Analysis of Off-Policy Natural Actor-Critic Algorithm

Abstract: In this paper, we provide finite-sample convergence guarantees for an off-policy variant of the natural actor-critic (NAC) algorithm based on Importance Sampling. In particular, we show that the algorithm converges to a global optimal policy with a sample complexity of $\mathcal{O}(\epsilon^{-3}\log^2(1/\epsilon))$ under an appropriate choice of stepsizes. In order to overcome the issue of large variance due to Importance Sampling, we propose the $Q$-trace algorithm for the critic, which is inspired by the V-trace algorithm (Espeholt et al., 2018). This enables us to explicitly control the bias and variance, and characterize the trade-off between them. As an advantage of off-policy sampling, a major feature of our result is that we do not need any additional assumptions, beyond the ergodicity of the Markov chain induced by the behavior policy.

06/12/2021

Finite-Sample Analysis of Off-Policy Natural Actor-Critic Algorithm

sajad khodadadian, Zaiwei Chen, Siva Maguluri

Comments

Similar Papers

Bayesian decision-making under misspecified priors with applications to meta-learning

Max Simchowitz, Christopher Tosh, Akshay Krishnamurthy and Daniel Hsu, Thodoris Lykouris, Miro Dudik, Robert E Schapire

Keywords Abstract Paper

meta learning, bandits

Universal guarantees for decision tree induction via a higher-order splitting criterion

Guy Blanc, Neha Gupta, Jane Lange, Li-Yang Tan

Keywords Abstract Paper

Doubly Robust Off-Policy Actor-Critic: Convergence and Optimality

Tengyu Xu, Zhuoran Yang, Zhaoran Wang, Yingbin LIANG

Keywords Abstract Paper

Theory, RL, Decisions and Control Theory

Towards a Unified Information-Theoretic Framework for Generalization

Mahdi Haghifam, Gintare Karolina Dziugaite, Shay Moran, Dan Roy

Keywords Abstract Paper

graph learning

Slice Sampling Reparameterization Gradients

David M Zoltowski, Diana Cai, Ryan Adams

Keywords Abstract Paper

optimization, machine learning, generative model

Uncertainty-Based Offline Reinforcement Learning with Diversified Q-Ensemble

Gaon An, Seungyong Moon, Jang-Hyun Kim, Hyun Oh Song

Keywords Abstract Paper

deep learning, reinforcement learning and planning

vqSGD: Vector quantized stochastic gradient descent

Venkata Gandikota, Daniel Kane, Raj Kumar Maity, Arya Mazumdar

Keywords Abstract Paper

A Contour Stochastic Gradient Langevin Dynamics Algorithm for Simulations of Multi-modal Distributions

Wei Deng, Guang Lin, Faming Liang

Keywords Abstract Paper

Ordering Variables for Weighted Model Integration

Vincent Derkinderen, Evert Heylen, Pedro Zuidberg Dos Martires and Samuel Kolb, Luc Raedt

Keywords Abstract Paper

Provably efficient safe exploration via primal-dual policy optimization

Dongsheng Ding, Xiaohan Wei, Zhuoran Yang and Zhaoran Wang, Mihailo Jovanovic

Keywords Abstract Paper

Generalizing Complex Hypotheses on Product Distributions: Auctions, Prophet Inequalities, and Pandora's Problem

Chenghao Guo, Zhiyi Huang, Zhihao Gavin Tang, Xinzhi Zhang

Keywords Abstract Paper

Natural Policy Gradient Primal-Dual Method for Constrained Markov Decision Processes

Dongsheng Ding, Kaiqing Zhang, Tamer Basar, Mihailo Jovanovic

Keywords Abstract Paper

Efficient First-Order Contextual Bandits: Prediction, Allocation, and Triangular Discrimination

Dylan J Foster, Akshay Krishnamurthy

Keywords Abstract Paper

theory, reinforcement learning and planning, bandits, online learning

Lenient Regret and Good-Action Identification in Gaussian Process Bandits

Xu Cai, Selwyn Gomes, Jonathan Scarlett

Keywords Abstract Paper

Probabilistic Methods, Gaussian Processes and Bayesian non-parametrics

Consistent Structured Prediction with Max-Min Margin Markov Networks

Alex Nowak, Francis Bach, Alessandro Rudi

Keywords Abstract Paper

Sequential, Network, and Time-Series Modeling

Provably Efficient Reinforcement Learning with Kernel and Neural Function Approximations

Zhuoran Yang, Chi Jin, Zhaoran Wang and Mengdi Wang, Michael Jordan

Keywords Abstract Paper

First-Order Methods for Wasserstein Distributionally Robust MDP

Julien Grand-Clement, Christian Kroer

Keywords Abstract Paper

Theory, RL, Decisions and Control Theory

Fair regression with Wasserstein barycenters

Evgenii Chzhen, Christophe Denis, Mohamed Hebiri and Luca Oneto, Massimiliano Pontil

Keywords Abstract Paper

Optimality and Approximation with Policy Gradient Methods in Markov Decision Processes

Alekh Agarwal, Sham Kakade, Jason Lee, Gaurav Mahajan

Keywords Abstract Paper

Reinforcement learning, Non-convex optimization

Variational Bayesian Optimistic Sampling

Brendan O'Donoghue, Tor Lattimore

Keywords Abstract Paper

optimization, reinforcement learning and planning, generative model, bandits, online learning

Online Robust Reinforcement Learning with Model Uncertainty

Yue Wang, Shaofeng Zou

Keywords Abstract Paper

reinforcement learning and planning, robustness

An Empirical Process Approach to the Union Bound: Practical Algorithms for Combinatorial and Linear Bandits

Max Simchowitz, Christopher Tosh, Akshay Krishnamurthy and
Daniel Hsu, Thodoris Lykouris, Miro Dudik, Robert E Schapire

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Vincent Derkinderen, Evert Heylen, Pedro Zuidberg Dos Martires and
Samuel Kolb, Luc Raedt

Keywords Paper

Dongsheng Ding, Xiaohan Wei, Zhuoran Yang and
Zhaoran Wang, Mihailo Jovanovic

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Zhuoran Yang, Chi Jin, Zhaoran Wang and
Mengdi Wang, Michael Jordan

Keywords Paper

Keywords Paper

Evgenii Chzhen, Christophe Denis, Mohamed Hebiri and
Luca Oneto, Massimiliano Pontil

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Saeed Soori, Konstantin Mishchenko, Aryan Mokhtari and
Maryam Mehri Dehnavi, Mert Gurbuzbalaban

Keywords Paper

Jongmin Lee, Wonseok Jeon, Byung-Jun Lee and
Joelle Pineau, Kee-Eung Kim

Keywords Paper

Shengjia Zhao, Michael Kim, Roshni Sahoo and
Tengyu Ma, Stefano Ermon

Keywords Paper

Gen Li, Laixi Shi, Yuxin Chen and
Yuantao Gu, Yuejie Chi

Keywords Paper

Keywords Paper

Difan Zou, Jingfeng Wu, Vladimir Braverman and
Quanquan Gu, Sham Kakade

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper