Loop Estimator for Discounted Values in Markov Reward Processes

02/02/2021

Loop Estimator for Discounted Values in Markov Reward Processes

Falcon Z. Dai, Matthew R. Walter

Keywords:

Abstract Paper Similar Papers

Abstract: At the working heart of policy iteration algorithms commonly used and studied in the discounted setting of reinforcement learning, the policy evaluation step estimates the value of states with samples from a Markov reward process induced by following a Markov policy in a Markov decision process. We propose a simple and efficient estimator called loop estimator that exploits the regenerative structure of Markov reward processes without explicitly estimating a full model. Our method enjoys a space complexity of O(1) when estimating the value of a single positive recurrent state s unlike TD with O(S) or model-based methods with O(S^2). Moreover, the regenerative structure enables us to show, without relying on the generative model approach, that the estimator has an instance-dependent convergence rate of O~(\sqrt{\tau_s/T}) over steps T on a single sample path, where \tau_s is the maximal expected hitting time to state s. In preliminary numerical experiments, the loop estimator outperforms model-free methods, such as TD(k), and is competitive with the model-based estimator.

The video of this talk cannot be embedded. You can watch it here:

https://slideslive.com/38949140

(Link will open in new window)

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at AAAI 2021 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

13/04/2021

Provably efficient safe exploration via primal-dual policy optimization

Dongsheng Ding, Xiaohan Wei, Zhuoran Yang and
Zhaoran Wang, Mihailo Jovanovic

Keywords Paper

0

0

0

0

3:07

06/12/2021

Nearly Horizon-Free Offline Reinforcement Learning

Tongzheng Ren, Jialian Li, Bo Dai and
Simon Du, Sujay Sanghavi

Keywords Paper

theory, optimization, reinforcement learning and planning

0

0

0

0

8:44

06/12/2021

Online Robust Reinforcement Learning with Model Uncertainty

Yue Wang, Shaofeng Zou

Keywords Paper

reinforcement learning and planning, robustness

0

0

0

0

14:45

02/02/2021

Variance Penalized On-Policy and Off-Policy Actor-Critic

Arushi Jain, Gandharv Patil, Ayush Jain and
Khimya Khetarpal, Doina Precup

Keywords Paper

0

0

0

0

17:58

09/07/2020

Finite Time Analysis of Linear Two-timescale Stochastic Approximation with Markovian Noise

Maksim Kaledin, Eric Moulines, Alexey Naumov and
Vladislav Tadic, Hoi-To Wai

Keywords Paper

Stochastic optimization, Reinforcement learning

0

0

0

0

12:29

06/12/2020

On Reward-Free Reinforcement Learning with Linear Function Approximation

Ruosong Wang, Simon Du, Lin Yang, Russ Salakhutdinov

Keywords Paper

0

0

0

0

3:12

12/07/2020

Minimax-Optimal Off-Policy Evaluation with Linear Function Approximation

Yaqi Duan, Zeyu Jia, Mengdi Wang

Keywords Paper

Learning Theory

0

0

0

0

14:10

06/12/2020

Breaking the Sample Size Barrier in Model-Based Reinforcement Learning with a Generative Model

Gen Li, Yuting Wei, Yuejie Chi and
Yuantao Gu, Yuxin Chen

Keywords Paper

0

0

0

0

3:09

18/07/2021

Robust Reinforcement Learning using Least Squares Policy Iteration with Provable Performance Guarantees

Kishan Panaganti, Dileep Kalathil

Keywords Paper

Theory, RL, Decisions and Control Theory

0

0

0

0

5:15

06/12/2021

Learning One Representation to Optimize All Rewards

Ahmed Touati, Yann Ollivier

Keywords Paper

deep learning, reinforcement learning and planning, representation learning

0

0

0

0

14:52

12/07/2020

Momentum-Based Policy Gradient Methods

Feihu Huang, Shangqian Gao, Jian Pei, Heng Huang

Keywords Paper

Reinforcement Learning - General

0

0

0

0

13:28

26/04/2020

Reanalysis of Variance Reduced Temporal Difference Learning

Tengyu Xu, Zhe Wang, Yi Zhou, Yingbin Liang

Keywords Paper

Reinforcement Learning, TD learning, Markovian sample, Variance Reduction

0

0

0

0

4:29

02/02/2021

An Efficient Algorithm for Deep Stochastic Contextual Bandits

Tan Zhu, Guannan Liang, Chunjiang Zhu and
Haining Li, Jinbo Bi

Keywords Paper

0

0

0

0

14:36

06/12/2021

Uncertainty-Based Offline Reinforcement Learning with Diversified Q-Ensemble

Gaon An, Seungyong Moon, Jang-Hyun Kim, Hyun Oh Song

Keywords Paper

deep learning, reinforcement learning and planning

1

0

0

0

13:50

06/12/2021

Bayesian decision-making under misspecified priors with applications to meta-learning

Max Simchowitz, Christopher Tosh, Akshay Krishnamurthy and
Daniel Hsu, Thodoris Lykouris, Miro Dudik, Robert E Schapire

Keywords Paper

meta learning, bandits

0

0

0

0

14:58

06/12/2021

Near-Optimal Offline Reinforcement Learning via Double Variance Reduction

Ming Yin, Yu Bai, Yu-Xiang Wang

Keywords Paper

theory, optimization, reinforcement learning and planning

0

0

0

0

8:57

13/04/2021

Sample complexity bounds for two timescale value-based reinforcement learning algorithms

Tengyu Xu, Yingbin Liang

Keywords Paper

0

0

0

0

2:57

06/12/2021

Slice Sampling Reparameterization Gradients

David M Zoltowski, Diana Cai, Ryan Adams

Keywords Paper

optimization, machine learning, generative model

0

0

0

0

14:43

06/12/2021

Adversarial Intrinsic Motivation for Reinforcement Learning

Ishan Durugkar, Mauricio Tec, Scott Niekum, Peter Stone

Keywords Paper

reinforcement learning and planning, generative model

0

0

0

0

13:11

18/07/2021

Doubly Robust Off-Policy Actor-Critic: Convergence and Optimality

Tengyu Xu, Zhuoran Yang, Zhaoran Wang, Yingbin LIANG

Keywords Paper

Theory, RL, Decisions and Control Theory

0

0

0

0

4:23

26/04/2020

Ranking Policy Gradient

Kaixiang Lin, Jiayu Zhou

Keywords Paper

Sample-efficient reinforcement learning, off-policy learning.

0

0

0

0

5:43

18/07/2021

A Sharp Analysis of Model-based Reinforcement Learning with Self-Play

Qinghua Liu, Tiancheng Yu, Yu Bai, Chi Jin

Keywords Paper

Algorithms, Multitask and Transfer Learning, Algorithms, Meta-Learning; Applications, Object Recognition; Data, Challenges, Implementations, and Software, Benchmarks;, Theory, RL, Decisions and Control Theory

0

0

0

0

4:49

12/07/2020

Learning with Good Feature Representations in Bandits and in RL with a Generative Model

Gellért Weisz, Tor Lattimore, Csaba Szepesvari

Keywords Paper

Reinforcement Learning - Theory

0

0

0

0

15:20

06/12/2020

Least Squares Regression with Markovian Data: Fundamental Limits and Algorithms

Dheeraj Nagaraj, Xian Wu, Guy Bresler and
Prateek Jain, Praneeth Netrapalli

Keywords Paper

0

0

0

0

3:34

02/02/2021

Robust Reinforcement Learning: A Case Study in Linear Quadratic Regulation

Bo Pang, Zhong-Ping Jiang

Keywords Paper

0

0

0

0

20:01

06/12/2021

Stochastic Shortest Path: Minimax, Parameter-Free and Towards Horizon-Free Regret

Jean Tarbouriech, Runlong Zhou, Simon Du and
Matteo Pirotta, Michal Valko, Alessandro Lazaric

Keywords Paper

theory, reinforcement learning and planning

0

0

0

0

13:47

18/07/2021

Temporal Difference Learning as Gradient Splitting

Rui Liu, Alex Olshevsky

Keywords Paper

Theory, RL, Decisions and Control Theory

0

0

0

0

15:21

09/07/2020

Optimality and Approximation with Policy Gradient Methods in Markov Decision Processes

Alekh Agarwal, Sham Kakade, Jason Lee, Gaurav Mahajan

Keywords Paper

Reinforcement learning, Non-convex optimization

0

0

0

0

11:00

06/12/2020

Is Plug-in Solver Sample-Efficient for Feature-based Reinforcement Learning?

Qiwen Cui, Lin Yang

Keywords Paper

Algorithms -> Semi-Supervised Learning; Deep Learning -> Deep Autoencoders; Deep Learning -> Generative Models, Probabilistic Methods -> Variational Inference

0

0

0

0

3:25

06/12/2020

Robust Multi-Agent Reinforcement Learning with Model Uncertainty

Kaiqing Zhang, TAO SUN, Yunzhe Tao and
Sahika Genc, Sunil Mallya, Tamer Basar

Keywords Paper

0

0

0

0

3:11

06/12/2020

Provably Efficient Reinforcement Learning with Kernel and Neural Function Approximations

Zhuoran Yang, Chi Jin, Zhaoran Wang and
Mengdi Wang, Michael Jordan

Keywords Paper

0

0

0

0

3:42

06/12/2021

Non-Asymptotic Analysis for Two Time-scale TDC with General Smooth Function Approximation

Yue Wang, Shaofeng Zou, Yi Zhou

Keywords Paper

reinforcement learning and planning

0

0

0

0

14:28

06/12/2020

Deep Inverse Q-learning with Constraints

Gabriel Kalweit, Maria Huegle, Moritz Werling, Joschka Boedecker

Keywords Paper

0

0

0

0

3:14

18/07/2021

Model-based Reinforcement Learning for Continuous Control with Posterior Sampling

Ying Fan, Yifei Ming

Keywords Paper

Reinforcement Learning and Planning

0

0

0

0

18:34

02/02/2021

Sample Complexity of Policy Gradient Finding Second-Order Stationary Points

Long Yang, Qian Zheng, Gang Pan

Keywords Paper

0

0

0

0

14:41

06/12/2021

Identifiability in inverse reinforcement learning

Haoyang Cao, Samuel Cohen, Lukasz Szpruch

Keywords Paper

reinforcement learning and planning

0

0

0

0

15:07

06/12/2020

Model-Based Multi-Agent RL in Zero-Sum Markov Games with Near-Optimal Sample Complexity

Kaiqing Zhang, Sham Kakade, Tamer Basar, Lin Yang

Keywords Paper

0

0

0

0

3:25

18/07/2021

Near-Optimal Model-Free Reinforcement Learning in Non-Stationary Episodic MDPs

Weichao Mao, Kaiqing Zhang, Ruihao Zhu and
David Simchi-Levi, Tamer Basar

Keywords Paper

Theory, RL, Decisions and Control Theory

0

0

0

0

4:12

04/08/2021

Adaptivity in Adaptive Submodularity

Hossein Esfandiari, Amin Karbasi, Vahab Mirrokni

Keywords Paper

0

0

0

0

13:54

06/12/2020

Preference-based Reinforcement Learning with Finite-Time Guarantees

Yichong Xu, Ruosong Wang, Lin Yang and
Aarti Singh, Artur Dubrawski

Keywords Paper

0

0

0

0

3:04