Reinforcement learning for constrained markov decision processes

13/04/2021

Reinforcement learning for constrained markov decision processes

Ather Gattami, Qinbo Bai, Vaneet Aggarwal

Keywords:

Abstract Paper Similar Papers

Abstract: In this paper, we consider the problem of optimization and learning for constrained and multi-objective Markov decision processes, for both discounted rewards and expected average rewards. We formulate the problems as zero-sum games where one player (the agent) solves a Markov decision problem and its opponent solves a bandit optimization problem, which we here call Markov-Bandit games. We extend Q-learning to solve Markov-Bandit games and show that our new Q-learning algorithms converge to the optimal solutions of the zero-sum Markov-Bandit games, and hence converge to the optimal solutions of the constrained and multi-objective Markov decision problems. We provide numerical examples where we calculate the optimal policies and show by simulations that the algorithm converges to the calculated optimal policies. To the best of our knowledge, this is the first time Q-learning algorithms guarantee convergence to optimal stationary policies for the multi-objective Reinforcement Learning problem with discounted and expected average rewards, respectively.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at AISTATS 2021 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

26/04/2020

Actor-Critic Provably Finds Nash Equilibria of Linear-Quadratic Mean-Field Games

Zuyue Fu, Zhuoran Yang, Yongxin Chen, Zhaoran Wang

Keywords Paper

0

0

0

0

5:09

06/12/2020

The Mean-Squared Error of Double Q-Learning

Wentao Weng, Harsh Gupta, Niao He and
Lei Ying, R. Srikant

Keywords Paper

0

0

0

0

3:24

18/07/2021

Learning While Playing in Mean-Field Games: Convergence and Optimality

Qiaomin Xie, Zhuoran Yang, Zhaoran Wang, Andreea Minca

Keywords Paper

Applications, Privacy, Anonymity, and Security, Algorithms, Components Analysis (e.g., CCA, ICA, LDA, PCA), Reinforcement Learning and Planning, Multi-Agent RL

0

0

0

0

5:24

18/07/2021

Infinite-Dimensional Optimization for Zero-Sum Games via Variational Transport

Lewis Liu, Yufeng Zhang, Zhuoran Yang and
Reza Babanezhad, Zhaoran Wang

Keywords Paper

Theory, Statistical Learning Theory

0

0

0

0

5:20

19/08/2021

Learning Nash Equilibria in Zero-Sum Stochastic Games via Entropy-Regularized Policy Approximation

Yue Guan, Qifan Zhang, Panagiotis Tsiotras

Keywords Paper

Machine Learning, Reinforcement Learning, Multi-agent Learning, Noncooperative Games

0

0

0

0

12:26

12/07/2020

Expert Learning through Generalized Inverse Multiobjective Optimization: Models, Insights and Algorithms

Chaosheng Dong, Bo Zeng

Keywords Paper

Learning Theory

0

0

0

0

12:11

13/04/2021

Alternating direction method of multipliers for quantization

Tianjian Huang, Prajwal Singhania, Maziar Sanjabi and
Pabitra Mitra, Meisam Razaviyayn

Keywords Paper

1

0

0

0

2:43

12/07/2020

Sequential Transfer in Reinforcement Learning with a Generative Model

Andrea Tirinzoni, Riccardo Poiani, Marcello Restelli

Keywords Paper

Reinforcement Learning - General

0

0

0

0

10:54

18/07/2021

Mixed Nash Equilibria in the Adversarial Examples Game

Laurent Meunier, Meyer Scetbon, Rafael Pinot and
Jamal Atif, Yann Chevaleyre

Keywords Paper

Algorithms, Adversarial Examples

0

0

0

0

5:30

02/02/2021

An Efficient Algorithm for Deep Stochastic Contextual Bandits

Tan Zhu, Guannan Liang, Chunjiang Zhu and
Haining Li, Jinbo Bi

Keywords Paper

0

0

0

0

14:36

18/07/2021

Provably Efficient Algorithms for Multi-Objective Competitive RL

Tiancheng Yu, Yi Tian, Jingzhao Zhang, Suvrit Sra

Keywords Paper

Theory, RL, Decisions and Control Theory

0

0

0

0

17:04

26/08/2020

Ordered SGD: A New Stochastic Optimization Framework for Empirical Risk Minimization

Kenji Kawaguchi, Haihao Lu

Keywords Paper

0

0

0

0

14:10

09/07/2020

Learning Zero-Sum Simultaneous-Move Markov Games Using Function Approximation and Correlated Equilibrium

Qiaomin Xie, Yudong Chen, Zhaoran Wang, Zhuoran Yang

Keywords Paper

Reinforcement learning, Planning and control

0

0

0

0

15:16

26/08/2020

Solving Discounted Stochastic Two-Player Games with Near-Optimal Time and Sample Complexity

Aaron Sidford, Mengdi Wang, Lin Yang, Yinyu Ye

Keywords Paper

0

0

0

0

14:51

18/07/2021

Robust Reinforcement Learning using Least Squares Policy Iteration with Provable Performance Guarantees

Kishan Panaganti, Dileep Kalathil

Keywords Paper

Theory, RL, Decisions and Control Theory

0

0

0

0

5:15

19/08/2021

Reinforcement Learning for Route Optimization with Robustness Guarantees

Tobias Jacobs, Francesco Alesiani, Gulcin Ermis

Keywords Paper

Machine Learning, Deep Reinforcement Learning, Planning under Uncertainty, Applications of Reinforcement Learning

0

0

0

0

13:04

02/02/2021

Combinatorial Pure Exploration with Full-Bandit or Partial Linear Feedback

Yihan Du, Yuko Kuroki, Wei Chen

Keywords Paper

0

0

0

0

17:13

06/12/2020

A Feasible Level Proximal Point Method for Nonconvex Sparse Constrained Optimization

Digvijay Boob, Qi Deng, Guanghui Lan, Yilin Wang

Keywords Paper

0

0

0

0

2:54

14/09/2020

End-to-End Learning for Prediction and Optimization with Gradient Boosting

Takuya Konishi, Takuro Fukunaga

Keywords Paper

combinatorial optimization, boosting/ensemble methods

0

0

0

0

15:14

06/12/2020

Decentralized Langevin Dynamics for Bayesian Learning

Anjaly Parayil, He Bai, Jemin George, Prudhvi Gurram

Keywords Paper

0

0

0

0

3:17

12/07/2020

Stochastic Optimization for Non-convex Inf-Projection Problems

Yan Yan, Yi Xu, Lijun Zhang and
Wang Xiaoyu, Tianbao Yang

Keywords Paper

Optimization - Non-convex

0

0

0

0

14:13

06/12/2021

Generalization Guarantee of SGD for Pairwise Learning

Yunwen Lei, Mingrui Liu, Yiming Ying

Keywords Paper

optimization, machine learning

0

0

0

0

14:30

18/07/2021

From Poincaré Recurrence to Convergence in Imperfect Information Games: Finding Equilibrium via Regularization

Julien Perolat, Remi Munos, Jean-Baptiste Lespiau and
Shayegan Omidshafiei, Mark Rowland, Pedro Ortega, Neil Burch, Thomas Anthony, David Balduzzi, Bart De Vylder, Georgios Piliouras, Marc Lanctot, Karl Tuyls

Keywords Paper

Probabilistic Methods, Causal Inference, Reinforcement Learning and Planning, Multi-Agent RL, Probabilistic Methods, Graphical Models

0

0

0

0

5:24

13/04/2021

SONIA: A symmetric blockwise truncated optimization algorithm

Majid Jahani, MohammadReza Nazari, Rachael Tappenden and
Albert Berahas, Martin Takac

Keywords Paper

0

0

0

0

2:55

06/12/2020

Fictitious Play for Mean Field Games: Continuous Time Analysis and Applications

Sarah Perrin, Julien Perolat, Mathieu Lauriere and
Matthieu Geist, Romuald Elie, Olivier Pietquin

Keywords Paper

0

0

0

0

3:20

12/07/2020

On the Global Optimality of Model-Agnostic Meta-Learning

Lingxiao Wang, Qi Cai, Zhuoran Yang, Zhaoran Wang

Keywords Paper

Transfer, Multitask and Meta-learning

0

0

0

0

15:14

06/12/2021

Particle Dual Averaging: Optimization of Mean Field Neural Network with Global Convergence Rate Analysis

Atsushi Nitanda, Denny Wu, Taiji Suzuki

Keywords Paper

theory, deep learning, optimization

0

0

0

0

12:59

06/12/2021

Stochastic Gradient Descent-Ascent and Consensus Optimization for Smooth Games: Convergence Analysis under Expected Co-coercivity

Nicolas Loizou, Hugo Berard, Gauthier Gidel and
Ioannis Mitliagkas, Simon Lacoste-Julien

Keywords Paper

optimization

0

0

0

0

15:44

04/08/2021

Last-iterate Convergence of Decentralized Optimistic Gradient Descent/Ascent in Infinite-horizon Competitive Markov Games

Chen-Yu Wei, Chung-Wei Lee, Mengxiao Zhang, Haipeng Luo

Keywords Paper

0

0

0

0

18:24

02/02/2021

Computing Quantal Stackelberg Equilibrium in Extensive-Form Games

Jakub Černý, Viliam Lisý, Branislav Bošanský, Bo An

Keywords Paper

0

0

0

0

15:01

18/07/2021

A Sharp Analysis of Model-based Reinforcement Learning with Self-Play

Qinghua Liu, Tiancheng Yu, Yu Bai, Chi Jin

Keywords Paper

Algorithms, Multitask and Transfer Learning, Algorithms, Meta-Learning; Applications, Object Recognition; Data, Challenges, Implementations, and Software, Benchmarks;, Theory, RL, Decisions and Control Theory

0

0

0

0

4:49

20/07/2020

Deep Fictitious Play for Finding Markovian Nash Equilibrium in Multi-Agent Games

Jiequn Han, Ruimeng Hu

Keywords Paper

0

0

0

0

16:35

06/12/2020

Fair regression via plug-in estimator and recalibration with statistical guarantees

Evgenii Chzhen, Christophe Denis, Mohamed Hebiri and
Luca Oneto, Massimiliano Pontil

Keywords Paper

0

0

0

0

3:16

18/07/2021

Adaptive Sampling for Best Policy Identification in Markov Decision Processes

Aymen Al Marjani, Alexandre Proutiere

Keywords Paper

Theory, RL, Decisions and Control Theory

0

0

0

0

5:35

06/12/2021

Global Convergence to Local Minmax Equilibrium in Classes of Nonconvex Zero-Sum Games

Tanner Fiez, Lillian Ratliff, Eric Mazumdar and
Evan Faulkner, Adhyyan Narang

Keywords Paper

theory, optimization

0

0

0

0

15:13

04/08/2021

Online Markov Decision Processes with Aggregate Bandit Feedback

Alon Cohen, Haim Kaplan, Tomer Koren, Yishay Mansour

Keywords Paper

0

0

0

0

13:07

26/08/2020

Convergence Rates of Smooth Message Passing with Rounding in Entropy-Regularized MAP Inference

Jonathan Lee, Aldo Pacchiano, Michael Jordan

Keywords Paper

0

0

0

0

14:01

06/12/2020

Lower Bounds and Optimal Algorithms for Personalized Federated Learning

Filip Hanzely, Slavomír Hanzely, Samuel Horváth, Peter Richtarik

Keywords Paper

, Theory -> Learning Theory

0

0

0

0

3:24

06/12/2020

Preference learning along multiple criteria: A game-theoretic perspective

Kush Bhatia, Ashwin Pananjady, Peter Bartlett and
Anca Dragan, Martin Wainwright

Keywords Paper

0

0

0

0

3:22

26/08/2020

Finite-Time Analysis of Decentralized Temporal-Difference Learning with Linear Function Approximation

Jun Sun, Gang Wang, Georgios B. Giannakis and
Qinmin Yang, Zaiyue Yang

Keywords Paper

0

0

0

0

17:07