Learning and Planning in Average-Reward Markov Decision Processes

Abstract: We introduce learning and planning algorithms for average-reward MDPs, including 1) the first general proven-convergent off-policy model-free control algorithm without reference states, 2) the first proven-convergent off-policy model-free prediction algorithm, and 3) the first off-policy learning algorithm that converges to the actual value function rather than to the value function plus an offset. All of our algorithms are based on using the temporal-difference error rather than the conventional error when updating the estimate of the average reward. Our proof techniques are a slight generalization of those by Abounadi, Bertsekas, and Borkar (2001). In experiments with an Access-Control Queuing Task, we show some of the difficulties that can arise when using methods that rely on reference states and argue that our new algorithms are significantly easier to use.

12/07/2020

Learning and Planning in Average-Reward Markov Decision Processes

Yi Wan, Abhishek Naik, Richard Sutton

Comments

Similar Papers

Model-free Reinforcement Learning in Infinite-horizon Average-reward Markov Decision Processes

Chen-Yu Wei, Mehdi Jafarnia, Haipeng Luo and Hiteshi Sharma, Rahul Jain

Keywords Abstract Paper

Reinforcement Learning - Theory

PC-MLP: Model-based Reinforcement Learning with Policy Cover Guided Exploration

Yuda Song, Wen Sun

Keywords Abstract Paper

Reinforcement Learning and Planning

EMaQ: Expected-Max Q-Learning Operator for Simple Yet Effective Offline and Online RL

Seyed Kamyar Seyed Ghasemipour, Dale Schuurmans, Shixiang Gu

Keywords Abstract Paper

Reinforcement Learning and Planning

Improved Corruption Robust Algorithms for Episodic Reinforcement Learning

Yifang Chen, Simon Du, Kevin Jamieson

Keywords Abstract Paper

, Optimization, Non-Convex Optimization, Theory, Online Learning Theory

Believe What You See: Implicit Constraint Approach for Offline Multi-Agent Reinforcement Learning

Yiqin Yang, Xiaoteng Ma, Li Chenghao and Zewu Zheng, Qiyuan Zhang, Gao Huang, Jun Yang, Qianchuan Zhao

Keywords Abstract Paper

reinforcement learning and planning

Double Reinforcement Learning for Efficient and Robust Off-Policy Evaluation

Nathan Kallus, Masatoshi Uehara

Keywords Abstract Paper

Reinforcement Learning - Theory

Ranking Policy Gradient

Kaixiang Lin, Jiayu Zhou

Keywords Abstract Paper

Sample-efficient reinforcement learning, off-policy learning.

Adaptive approximate policy iteration

Botao Hao, Nevena Lazic, Yasin Abbasi-Yadkori and Pooria Joulani, Csaba Szepesvari

Keywords Abstract Paper

Provably Efficient Reward-Agnostic Navigation with Linear Value Iteration

Andrea Zanette, Alessandro Lazaric, Mykel J Kochenderfer, Emma Brunskill

Keywords Abstract Paper

MOPO: Model-based Offline Policy Optimization

Tianhe (Kevin) Yu, Garrett Thomas, Lantao Yu and Stefano Ermon, James Zou, Sergey Levine, Chelsea Finn, Tengyu Ma

Keywords Abstract Paper

Reward-Biased Maximum Likelihood Estimation for Linear Stochastic Bandits

Yu-Heng Hung, Ping-Chun Hsieh, Xi Liu, P. R. Kumar

Keywords Abstract Paper

Adversarial Intrinsic Motivation for Reinforcement Learning

Ishan Durugkar, Mauricio Tec, Scott Niekum, Peter Stone

Keywords Abstract Paper

reinforcement learning and planning, generative model

A Local Temporal Difference Code for Distributional Reinforcement Learning

Pablo Tano, Peter Dayan, Alexandre Pouget

Keywords Abstract Paper

A Distribution-dependent Analysis of Meta Learning

Mikhail Konobeev, Ilja Kuzborskij, Csaba Szepesvari

Keywords Abstract Paper

Theory, Statistical Learning Theory

Learning the truth from only one side of the story

Heinrich Jiang, Qijia Jiang, Aldo Pacchiano

Keywords Abstract Paper

Query Training: Learning a Worse Model to Infer Better Marginals in Undirected Graphical Models with Hidden Variables

Miguel Lázaro-Gredilla, Wolfgang Lehrach, Nishad Gothoskar and Guangyao Zhou, Antoine Dedieu, Dileep George

Keywords Abstract Paper

Bridging Offline Reinforcement Learning and Imitation Learning: A Tale of Pessimism

Paria Rashidinejad, Banghua Zhu, Cong Ma and Jiantao Jiao, Stuart Russell

Keywords Abstract Paper

theory, reinforcement learning and planning, bandits

OpenMatch: Open-Set Semi-supervised Learning with Open-set Consistency Regularization

Kuniaki Saito, Donghyun Kim, Kate Saenko

Keywords Abstract Paper

semi-supervised learning

Latent Bandits Revisited

Joey Hong, Branislav Kveton, Manzil Zaheer and Yinlam Chow, Amr Ahmed, Craig Boutilier

Keywords Abstract Paper

Differentiable Meta-Learning of Bandit Policies

Craig Boutilier, Chih-wei Hsu, Branislav Kveton and Martin Mladenov, Csaba Szepesvari, Manzil Zaheer

Keywords Abstract Paper

Structured Prediction with Partial Labelling through the Infimum Loss

Vivien Cabannnes, Francis Bach, Alessandro Rudi

Keywords Abstract Paper

Sequential, Network, and Time-Series Modeling

Adaptive Discretization for Model-Based Reinforcement Learning

Chen-Yu Wei, Mehdi Jafarnia, Haipeng Luo and
Hiteshi Sharma, Rahul Jain

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Yiqin Yang, Xiaoteng Ma, Li Chenghao and
Zewu Zheng, Qiyuan Zhang, Gao Huang, Jun Yang, Qianchuan Zhao

Keywords Paper

Keywords Paper

Keywords Paper

Botao Hao, Nevena Lazic, Yasin Abbasi-Yadkori and
Pooria Joulani, Csaba Szepesvari

Keywords Paper

Keywords Paper

Tianhe (Kevin) Yu, Garrett Thomas, Lantao Yu and
Stefano Ermon, James Zou, Sergey Levine, Chelsea Finn, Tengyu Ma

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Miguel Lázaro-Gredilla, Wolfgang Lehrach, Nishad Gothoskar and
Guangyao Zhou, Antoine Dedieu, Dileep George

Keywords Paper

Paria Rashidinejad, Banghua Zhu, Cong Ma and
Jiantao Jiao, Stuart Russell

Keywords Paper

Keywords Paper

Joey Hong, Branislav Kveton, Manzil Zaheer and
Yinlam Chow, Amr Ahmed, Craig Boutilier

Keywords Paper

Craig Boutilier, Chih-wei Hsu, Branislav Kveton and
Martin Mladenov, Csaba Szepesvari, Manzil Zaheer

Keywords Paper

Keywords Paper

Sean Sinclair, Tianyu Wang, Gauri Jain and
Sid Banerjee, Christina Yu

Keywords Paper

Keywords Paper

Qing Li, Siyuan Huang, Yining Hong and
Yixin Chen, Ying Nian Wu, Song-Chun Zhu

Keywords Paper

Fei Feng, Ruosong Wang, Wotao Yin and
Simon Du, Lin Yang

Keywords Paper

Yeong-Dae Kwon, Jinho Choo, Byoungjip Kim and
Iljoo Yoon, Youngjune Gwon, Seungjai Min

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Sungryull Sohn, Sungtae Lee, Jongwook Choi and
Harm van Seijen, Mehdi Fatemi, Honglak Lee

Keywords Paper

Keywords Paper

Ted Moskovitz, Jack Parker-Holder, Aldo Pacchiano and
Michael Arbel, Michael Jordan

Keywords Paper

Keywords Paper

Max Simchowitz, Christopher Tosh, Akshay Krishnamurthy and
Daniel Hsu, Thodoris Lykouris, Miro Dudik, Robert E Schapire

Keywords Paper

Andreea-Ioana Deac, Petar Veličković, Ognjen Milinkovic and
Pierre-Luc Bacon, Jian Tang, Mladen Nikolic

Keywords Paper