Average-Reward Reinforcement Learning with Trust Region Methods

Abstract: Most of reinforcement learning algorithms optimize the discounted criterion which is beneficial to accelerate the convergence and reduce the variance of estimates. Although the discounted criterion is appropriate for certain tasks such as financial related problems, many engineering problems treat future rewards equally and prefer a long-run average criterion. In this paper, we study the reinforcement learning problem with the long-run average criterion. Firstly, we develop a unified trust region theory with discounted and average criteria. With the average criterion, a novel performance bound within the trust region is derived with the Perturbation Analysis (PA) theory. Secondly, we propose a practical algorithm named Average Policy Optimization (APO), which improves the value estimation with a novel technique named Average Value Constraint. To the best of our knowledge, our work is the first one to study the trust region approach with the average criterion and it complements the framework of reinforcement learning beyond the discounted criterion. Finally, experiments are conducted in the continuous control environment MuJoCo. In most tasks, APO performs better than the discounted PPO, which demonstrates the effectiveness of our approach.

12/07/2020

Average-Reward Reinforcement Learning with Trust Region Methods

Xiaoteng Ma, Xiaohang Tang, Li Xia, Jun Yang, Qianchuan Zhao

Comments

Similar Papers

Tightening Exploration in Upper Confidence Reinforcement Learning

Hippolyte Bourel, Odalric-Ambrym Maillard, Mohammad Sadegh Talebi

Keywords Abstract Paper

Reinforcement Learning - General

Responsive Safety in Reinforcement Learning

Adam Stooke, Joshua Achiam, Pieter Abbeel

Keywords Abstract Paper

Reinforcement Learning - Deep RL

Implementation Matters in Deep RL: A Case Study on PPO and TRPO

Logan Engstrom, Andrew Ilyas, Shibani Santurkar and Dimitris Tsipras, Firdaus Janoos, Larry Rudolph, Aleksander Madry

Keywords Abstract Paper

deep policy gradient methods, deep reinforcement learning, trpo, ppo

Doubly Robust Off-Policy Actor-Critic: Convergence and Optimality

Tengyu Xu, Zhuoran Yang, Zhaoran Wang, Yingbin LIANG

Keywords Abstract Paper

Theory, RL, Decisions and Control Theory

Learning Fair Policies in Multi-Objective (Deep) Reinforcement Learning with Average and Discounted Rewards

Umer Siddique, Paul Weng, Matthieu Zimmer

Keywords Abstract Paper

Reinforcement Learning - Deep RL

Ranking Policy Gradient

Kaixiang Lin, Jiayu Zhou

Keywords Abstract Paper

Sample-efficient reinforcement learning, off-policy learning.

Distributionally Robust Federated Averaging

Yuyang Deng, Mohammad Mahdi Kamani, Mehrdad Mahdavi

Keywords Abstract Paper

Temporal-Logic-Based Reward Shaping for Continuing Reinforcement Learning Tasks

Yuqian Jiang, Suda Bharadwaj, Bo Wu and Rishi Shah, Ufuk Topcu, Peter Stone

Keywords Abstract Paper

On-Policy Deep Reinforcement Learning for the Average-Reward Criterion

Yiming Zhang, Keith Ross

Keywords Abstract Paper

Reinforcement Learning and Planning

Independence-aware Advantage Estimation

Pushi Zhang, Li Zhao, Guoqing Liu and Jiang Bian, Minlie Huang, Tao Qin, Tie-Yan Liu

Keywords Abstract Paper

Machine Learning, Reinforcement Learning, Deep Reinforcement Learning

Task-Robust Model-Agnostic Meta-Learning

Liam Collins, Aryan Mokhtari, Sanjay Shakkottai

Keywords Abstract Paper

CAQL: Continuous Action Q-Learning

Moonkyung Ryu, Yinlam Chow, Ross Anderson and Christian Tjandraatmadja, Craig Boutilier

Keywords Abstract Paper

Reinforcement learning (RL), DQN, Continuous control, Mixed-Integer Programming (MIP)

Sample Efficient Reinforcement Learning with REINFORCE

Junzi Zhang, Jongho Kim, Brendan O'Donoghue, Stephen Boyd

Keywords Abstract Paper

Hindsight Trust Region Policy Optimization

Hanbo Zhang, Site Bai, Xuguang Lan and David Hsu, Nanning Zheng

Keywords Abstract Paper

Machine Learning, Deep Reinforcement Learning, Reinforcement Learning

Reinforcement Learning for Route Optimization with Robustness Guarantees

Tobias Jacobs, Francesco Alesiani, Gulcin Ermis

Keywords Abstract Paper

Machine Learning, Deep Reinforcement Learning, Planning under Uncertainty, Applications of Reinforcement Learning

Logistic q-learning

Joan Bas-Serrano, Sebastian Curi, Andreas Krause, Gergely Neu

Keywords Abstract Paper

Explicable Reward Design for Reinforcement Learning Agents

Rati Devidze, Goran Radanovic, Parameswaran Kamalaruban, Adish Singla

Keywords Abstract Paper

optimization, reinforcement learning and planning, interpretability

Variance Penalized On-Policy and Off-Policy Actor-Critic

Arushi Jain, Gandharv Patil, Ayush Jain and Khimya Khetarpal, Doina Precup

Keywords Abstract Paper

Reinforcement Learning Based Multi-Agent Resilient Control: From Deep Neural Networks to an Adaptive Law

Jian Hou, Fangyuan Wang, Lili Wang, Zhiyong Chen

Keywords Abstract Paper

Tactical Optimism and Pessimism for Deep Reinforcement Learning

Ted Moskovitz, Jack Parker-Holder, Aldo Pacchiano and Michael Arbel, Michael Jordan

Keywords Abstract Paper

reinforcement learning and planning, bandits

Taming Communication and Sample Complexities in Decentralized Policy Evaluation for Cooperative Multi-Agent Reinforcement Learning

Xin Zhang, Zhuqing Liu, Jia Liu and Zhengyuan Zhu, Songtao Lu

Keywords Abstract Paper

Keywords Paper

Keywords Paper

Logan Engstrom, Andrew Ilyas, Shibani Santurkar and
Dimitris Tsipras, Firdaus Janoos, Larry Rudolph, Aleksander Madry

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Yuqian Jiang, Suda Bharadwaj, Bo Wu and
Rishi Shah, Ufuk Topcu, Peter Stone

Keywords Paper

Keywords Paper

Pushi Zhang, Li Zhao, Guoqing Liu and
Jiang Bian, Minlie Huang, Tao Qin, Tie-Yan Liu

Keywords Paper

Keywords Paper

Moonkyung Ryu, Yinlam Chow, Ross Anderson and
Christian Tjandraatmadja, Craig Boutilier

Keywords Paper

Keywords Paper

Hanbo Zhang, Site Bai, Xuguang Lan and
David Hsu, Nanning Zheng

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Arushi Jain, Gandharv Patil, Ayush Jain and
Khimya Khetarpal, Doina Precup

Keywords Paper

Keywords Paper

Ted Moskovitz, Jack Parker-Holder, Aldo Pacchiano and
Michael Arbel, Michael Jordan

Keywords Paper

Xin Zhang, Zhuqing Liu, Jia Liu and
Zhengyuan Zhu, Songtao Lu

Keywords Paper

Wesley Suttle, Kaiqing Zhang, Zhuoran Yang and
Ji Liu, David N Kraemer

Keywords Paper

Keywords Paper

David Lindner, Matteo Turchetta, Sebastian Tschiatschek and
Kamil Ciosek, Andreas Krause

Keywords Paper

Guy Lorberbom, Chris J. Maddison, Nicolas Heess and
Tamir Hazan, Daniel Tarlow

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Somdeb Majumdar, Shauharda Khadka, Santiago Miret and
Stephen Mcaleer, Kagan Tumer

Keywords Paper

Keywords Paper

Keywords Paper

Gen Li, Yuting Wei, Yuejie Chi and
Yuantao Gu, Yuxin Chen

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper