Variational Policy Gradient Method for Reinforcement Learning with General Utilities

Abstract: In recent years, reinforcement learning systems with general goals beyond a cumulative sum of rewards have gained traction, such as in constrained problems, exploration, and acting upon prior experiences. In this paper, we consider policy optimization in Markov Decision Problems, where the objective is a general utility function of the state-action occupancy measure, which subsumes several of the aforementioned examples as special cases. Such generality invalidates the Bellman equation. As this means that dynamic programming no longer works, we focus on direct policy search. Analogously to the Policy Gradient Theorem \cite{sutton2000policy} available for RL with cumulative rewards, we derive a new Variational Policy Gradient Theorem for RL with general utilities, which establishes that the gradient may be obtained as the solution of a stochastic saddle point problem involving the Fenchel dual of the utility function. We develop a variational Monte Carlo gradient estimation algorithm to compute the policy gradient based on sample paths. Further, we prove that the variational policy gradient scheme converges globally to the optimal policy for the general objective, and we also establish its rate of convergence that matches or improves the convergence rate available in the case of RL with cumulative rewards.

06/12/2021

Variational Policy Gradient Method for Reinforcement Learning with General Utilities

Junyu Zhang, Alec Koppel, Amrit Bedi, Csaba Szepesvari, Mengdi Wang

Comments

Similar Papers

On the Convergence and Sample Efficiency of Variance-Reduced Policy Gradient Method

Junyu Zhang, Chengzhuo Ni, zheng Yu and Csaba Szepesvari, Mengdi Wang

Keywords Abstract Paper

theory, reinforcement learning and planning

Logistic q-learning

Joan Bas-Serrano, Sebastian Curi, Andreas Krause, Gergely Neu

Keywords Abstract Paper

Direct Policy Gradients: Direct Optimization of Policies in Discrete Action Spaces

Guy Lorberbom, Chris J. Maddison, Nicolas Heess and Tamir Hazan, Daniel Tarlow

Keywords Abstract Paper

Geometric Exploration for Online Control

Orestis Plevrakis, Elad Hazan

Keywords Abstract Paper

Taming Communication and Sample Complexities in Decentralized Policy Evaluation for Cooperative Multi-Agent Reinforcement Learning

Xin Zhang, Zhuqing Liu, Jia Liu and Zhengyuan Zhu, Songtao Lu

Keywords Abstract Paper

theory, optimization, reinforcement learning and planning

Reward is enough for convex MDPs

Tom Zahavy, Brendan O'Donoghue, Guillaume Desjardins, Satinder Singh

Keywords Abstract Paper

reinforcement learning and planning

Adaptive Submodular Maximization under Stochastic Item Costs

Srinivasan Parthasarathy

Keywords Abstract Paper

Combinatorial optimization, Approximation algorithms, Non-convex optimization, Stochastic optimization

CAQL: Continuous Action Q-Learning

Moonkyung Ryu, Yinlam Chow, Ross Anderson and Christian Tjandraatmadja, Craig Boutilier

Keywords Abstract Paper

Reinforcement learning (RL), DQN, Continuous control, Mixed-Integer Programming (MIP)

Settling the Variance of Multi-Agent Policy Gradients

Jakub Grudzien Kuba, Muning Wen, Linghui Meng and shangding gu, Haifeng Zhang, David Mguni, Jun Wang, Yaodong Yang

Keywords Abstract Paper

deep learning, reinforcement learning and planning

Risk-Aware Transfer in Reinforcement Learning using Successor Features

Michael Gimelfarb, Andre Barreto, Scott Sanner, Chi-Guhn Lee

Keywords Abstract Paper

reinforcement learning and planning, representation learning, transfer learning

Explicable Reward Design for Reinforcement Learning Agents

Rati Devidze, Goran Radanovic, Parameswaran Kamalaruban, Adish Singla

Keywords Abstract Paper

optimization, reinforcement learning and planning, interpretability

The Wasserstein Proximal Gradient Algorithm

Adil Salim, Anna Korba, Giulia Luise

Keywords Abstract Paper

On-Policy Deep Reinforcement Learning for the Average-Reward Criterion

Yiming Zhang, Keith Ross

Keywords Abstract Paper

Reinforcement Learning and Planning

Variational Model-based Policy Optimization

Yinlam Chow, Brandon Cui, Moonkyung Ryu, Mohammad Ghavamzadeh

Keywords Abstract Paper

Machine Learning, Reinforcement Learning

A Reduction from Reinforcement Learning to No-Regret Online Learning

Ching-An Cheng, Remi Tachet des Combes, Byron Boots, Geoff Gordon

Keywords Abstract Paper

Local policy search with Bayesian optimization

Sarah Müller, Alexander von Rohr, Sebastian Trimpe

Keywords Abstract Paper

theory, optimization, reinforcement learning and planning, active learning

Structured Policy Iteration for Linear Quadratic Regulator

Youngsuk Park, Ryan Rossi, Zheng Wen and Gang Wu, Handong Zhao

Keywords Abstract Paper

Reinforcement Learning - General

Reinforcement learning in parametric MDPs with exponential families

Sayak Ray Chowdhury, Aditya Gopalan, Odalric-Ambrym Maillard

Keywords Abstract Paper

Distributionally Robust Federated Averaging

Yuyang Deng, Mohammad Mahdi Kamani, Mehrdad Mahdavi

Keywords Abstract Paper

On the Global Convergence Rates of Softmax Policy Gradient Methods

Jincheng Mei, Chenjun Xiao, Csaba Szepesvari, Dale Schuurmans

Keywords Abstract Paper

Reinforcement Learning - Theory

Variance-Aware Off-Policy Evaluation with Linear Function Approximation

Yifei Min, Tianhao Wang, Dongruo Zhou, Quanquan Gu

Keywords Abstract Paper

Junyu Zhang, Chengzhuo Ni, zheng Yu and
Csaba Szepesvari, Mengdi Wang

Keywords Paper

Keywords Paper

Guy Lorberbom, Chris J. Maddison, Nicolas Heess and
Tamir Hazan, Daniel Tarlow

Keywords Paper

Keywords Paper

Xin Zhang, Zhuqing Liu, Jia Liu and
Zhengyuan Zhu, Songtao Lu

Keywords Paper

Keywords Paper

Keywords Paper

Moonkyung Ryu, Yinlam Chow, Ross Anderson and
Christian Tjandraatmadja, Craig Boutilier

Keywords Paper

Jakub Grudzien Kuba, Muning Wen, Linghui Meng and
shangding gu, Haifeng Zhang, David Mguni, Jun Wang, Yaodong Yang

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Youngsuk Park, Ryan Rossi, Zheng Wen and
Gang Wu, Handong Zhao

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Nhan Pham, Lam Nguyen, Dzung Phan and
PHUONG HA NGUYEN, Marten van Dijk, Quoc Tran-Dinh

Keywords Paper

Logan Engstrom, Andrew Ilyas, Shibani Santurkar and
Dimitris Tsipras, Firdaus Janoos, Larry Rudolph, Aleksander Madry

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper