Decentralized Policy Gradient Descent Ascent for Safe Multi-Agent Reinforcement Learning

02/02/2021

Decentralized Policy Gradient Descent Ascent for Safe Multi-Agent Reinforcement Learning

Songtao Lu, Kaiqing Zhang, Tianyi Chen, Tamer Başar, Lior Horesh

Keywords:

Abstract Paper Similar Papers

Abstract: This paper deals with distributed reinforcement learning problems with safety constraints. In particular, we consider that a team of agents cooperate in a shared environment, where each agent has its individual reward function and safety constraints that involve all agents' joint actions. As such, the agents aim to maximize the team-average long-term return, subject to all the safety constraints. More intriguingly, no central controller is assumed to coordinate the agents, and both the rewards and constraints are only known to each agent locally/privately. Instead, the agents are connected by a peer-to-peer communication network to share information with their neighbors. In this work, we first formulate this problem as a distributed constrained Markov decision process (D-CMDP) with networked agents. Then, we propose a decentralized policy gradient (PG) method, Safe Dec-PG, to perform policy optimization based on this D-CMDP model over a network. Convergence guarantees, together with numerical results, showcase the superiority of the proposed algorithm. To the best of our knowledge, this is the first decentralized PG algorithm that accounts for the coupled safety constraints with a quantifiable convergence rate in multi-agent reinforcement learning. Finally, we emphasize that our algorithm is also novel in solving a class of decentralized stochastic nonconvex-concave minimax optimization problems, where both the algorithm design and corresponding theoretical analysis are of independent interest.

The video of this talk cannot be embedded. You can watch it here:

https://slideslive.com/38949266

(Link will open in new window)

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at AAAI 2021 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

13/04/2021

Provably efficient safe exploration via primal-dual policy optimization

Dongsheng Ding, Xiaohan Wei, Zhuoran Yang and
Zhaoran Wang, Mihailo Jovanovic

Keywords Paper

0

0

0

0

3:07

02/02/2021

Decentralized Multi-Agent Linear Bandits with Safety Constraints

Sanae Amani, Christos Thrampoulidis

Keywords Paper

0

0

0

0

19:13

12/07/2020

Kernel Methods for Cooperative Multi-Agent Learning with Delays

Abhimanyu Dubey, Alex `Sandy' Pentland

Keywords Paper

Planning, Control, and Multiagent Learning

0

0

0

0

12:57

06/12/2021

One More Step Towards Reality: Cooperative Bandits with Imperfect Communication

Udari Madhushani, Abhimanyu Dubey, Naomi Leonard, Alex Pentland

Keywords Paper

bandits

0

0

0

0

15:01

16/11/2020

Safe Policy Learning for Continuous Control

Yinlam Chow, Ofir Nachum, Aleksandra Faust and
Edgar Dueñez-Guzman, Mohammad Ghavamzadeh

Keywords Paper

0

0

0

0

5:20

18/07/2021

A Sharp Analysis of Model-based Reinforcement Learning with Self-Play

Qinghua Liu, Tiancheng Yu, Yu Bai, Chi Jin

Keywords Paper

Algorithms, Multitask and Transfer Learning, Algorithms, Meta-Learning; Applications, Object Recognition; Data, Challenges, Implementations, and Software, Benchmarks;, Theory, RL, Decisions and Control Theory

0

0

0

0

4:49

06/12/2020

Robust Multi-Agent Reinforcement Learning with Model Uncertainty

Kaiqing Zhang, TAO SUN, Yunzhe Tao and
Sahika Genc, Sunil Mallya, Tamer Basar

Keywords Paper

0

0

0

0

3:11

06/12/2021

Multi-Agent Reinforcement Learning in Stochastic Networked Systems

Yiheng Lin, Guannan Qu, Longbo Huang, Adam Wierman

Keywords Paper

reinforcement learning and planning, graph learning

0

0

0

0

11:20

18/07/2021

Online Submodular Resource Allocation with Applications to Rebalancing Shared Mobility Systems

Pier Giuseppe Sessa, Ilija Bogunovic, Andreas Krause, Maryam Kamgarpour

Keywords Paper

Algorithms, Online Learning Algorithms

0

0

0

0

3:22

02/02/2021

Stable Adversarial Learning under Distributional Shifts

Jiashuo Liu, Zheyan Shen, Peng Cui and
Linjun Zhou, Kun Kuang, Bo Li, Yishi Lin

Keywords Paper

0

0

0

0

14:30

26/08/2020

Truly Batch Model-Free Inverse Reinforcement Learning about Multiple Intentions

Giorgia Ramponi, Amarildo Likmeta, Alberto Maria Metelli and
Andrea Tirinzoni, Marcello Restelli

Keywords Paper

0

0

0

0

9:41

26/08/2020

Finite-Time Analysis of Decentralized Temporal-Difference Learning with Linear Function Approximation

Jun Sun, Gang Wang, Georgios B. Giannakis and
Qinmin Yang, Zaiyue Yang

Keywords Paper

0

0

0

0

17:07

09/07/2020

Learning Zero-Sum Simultaneous-Move Markov Games Using Function Approximation and Correlated Equilibrium

Qiaomin Xie, Yudong Chen, Zhaoran Wang, Zhuoran Yang

Keywords Paper

Reinforcement learning, Planning and control

0

0

0

0

15:16

06/12/2020

Cooperative Multi-player Bandit Optimization

Ilai Bistritz, Nicholas Bambos

Keywords Paper

0

0

0

0

3:13

06/12/2020

Stage-wise Conservative Linear Bandits

Ahmadreza Moradipari, Christos Thrampoulidis, Mahnoosh Alizadeh

Keywords Paper

0

0

0

0

3:18

12/07/2020

Learning Efficient Multi-agent Communication: An Information Bottleneck Approach

Rundong Wang, Xu He, Runsheng Yu and
Wei Qiu, Bo An, Zinovi Rabinovich

Keywords Paper

Reinforcement Learning - Deep RL

0

0

0

0

8:41

19/08/2021

Model-Based Reinforcement Learning for Infinite-Horizon Discounted Constrained Markov Decision Processes

Aria HasanzadeZonuzy, Dileep Kalathil, Srinivas Shakkottai

Keywords Paper

Machine Learning, Reinforcement Learning, Markov Decisions Processes

0

0

0

0

13:26

06/12/2021

Accommodating Picky Customers: Regret Bound and Exploration Complexity for Multi-Objective Reinforcement Learning

Jingfeng Wu, Vladimir Braverman, Lin Yang

Keywords Paper

theory, reinforcement learning and planning

0

0

0

0

14:22

02/02/2021

Reinforcement Learning Based Multi-Agent Resilient Control: From Deep Neural Networks to an Adaptive Law

Jian Hou, Fangyuan Wang, Lili Wang, Zhiyong Chen

Keywords Paper

0

0

0

0

15:48

19/08/2021

Altruism Design in Networked Public Goods Games

Sixie Yu, David Kempe, Yevgeniy Vorobeychik

Keywords Paper

Agent-based and Multi-agent Systems, Algorithmic Game Theory, Noncooperative Games

0

0

0

0

13:51

06/12/2020

Online Bayesian Persuasion

Matteo Castiglioni, Andrea Celli, Alberto Marchesi, Nicola Gatti

Keywords Paper

0

0

0

0

3:00

06/12/2020

Calibration of Shared Equilibria in General Sum Partially Observable Markov Games

Nelson Vadori, Sumitra Ganesh, Prashant Reddy, Manuela Veloso

Keywords Paper

0

0

0

0

3:18

12/07/2020

Optimizing Long-term Social Welfare in Recommender Systems: A Constrained Matching Approach

Martin Mladenov, Elliot Creager, Omer Ben-Porat and
Kevin Swersky, Richard Zemel, Craig Boutilier

Keywords Paper

Applications - Other

0

0

0

0

14:22

06/12/2021

Learning Equilibria in Matching Markets from Bandit Feedback

Meena Jagadeesan, Alexander Wei, Yixin Wang and
Michael Jordan, Jacob Steinhardt

Keywords Paper

bandits

0

0

0

0

15:04

06/12/2021

Optimality and Stability in Federated Learning: A Game-theoretic Approach

Kate Donahue, Jon Kleinberg

Keywords Paper

theory, federated learning

0

0

0

0

12:30

18/07/2021

Robust Reinforcement Learning using Least Squares Policy Iteration with Provable Performance Guarantees

Kishan Panaganti, Dileep Kalathil

Keywords Paper

Theory, RL, Decisions and Control Theory

0

0

0

0

5:15

06/12/2021

Learning One Representation to Optimize All Rewards

Ahmed Touati, Yann Ollivier

Keywords Paper

deep learning, reinforcement learning and planning, representation learning

0

0

0

0

14:52

12/07/2020

Optimizing Multiagent Cooperation via Policy Evolution and Shared Experiences

Somdeb Majumdar, Shauharda Khadka, Santiago Miret and
Stephen Mcaleer, Kagan Tumer

Keywords Paper

Reinforcement Learning - Deep RL

0

0

0

0

15:53

12/07/2020

Collaborative Machine Learning with Incentive-Aware Model Rewards

Rachael Hwee Ling Sim, Yehong Zhang, Bryan Kian Hsiang Low, Mun Choon Chan

Keywords Paper

Fairness, Equity, Justice, and Safety

0

0

0

0

14:29

02/02/2021

An Efficient Algorithm for Deep Stochastic Contextual Bandits

Tan Zhu, Guannan Liang, Chunjiang Zhu and
Haining Li, Jinbo Bi

Keywords Paper

0

0

0

0

14:36

06/12/2020

Multi-agent active perception with prediction rewards

Mikko Lauri, Frans Oliehoek

Keywords Paper

0

0

0

0

2:59

06/12/2021

Combinatorial Pure Exploration with Bottleneck Reward Function

Yihan Du, Yuko Kuroki, Wei Chen

Keywords Paper

theory, reinforcement learning and planning, bandits

0

0

0

0

11:53

02/02/2021

Bounded Risk-Sensitive Markov Games: Forward Policy Design and Inverse Reward Learning with Iterative Reasoning and Cumulative Prospect Theory

Ran Tian, Liting Sun, Masayoshi Tomizuka

Keywords Paper

0

0

0

0

16:28

06/12/2021

Sample-Efficient Learning of Stackelberg Equilibria in General-Sum Games

Yu Bai, Chi Jin, Huan Wang, Caiming Xiong

Keywords Paper

theory, reinforcement learning and planning, bandits

0

0

0

0

12:14

02/02/2021

WCSAC: Worst-Case Soft Actor Critic for Safety-Constrained Reinforcement Learning

Qisong Yang, Thiago D. Simão, Simon H Tindemans, Matthijs T. J. Spaan

Keywords Paper

0

0

0

0

17:28

06/12/2020

Reinforcement Learning with Combinatorial Actions: An Application to Vehicle Routing

Arthur Delarue, Ross Anderson, Christian Tjandraatmadja

Keywords Paper

0

0

0

0

3:24

12/07/2020

Constrained Markov Decision Processes via Backward Value Functions

Harsh Satija, Philip Amortila, Joelle Pineau

Keywords Paper

Reinforcement Learning - General

0

0

0

0

10:40

06/12/2021

Cooperative Stochastic Bandits with Asynchronous Agents and Constrained Feedback

Lin Yang, Yu-Zhen Janice Chen, Stephen Pasteris and
Mohammad Hajiesmaili, John C. S. Lui, Don Towsley

Keywords Paper

bandits

0

0

0

0

12:07

13/04/2021

Provably eﬃcient actor-critic for risk-sensitive and robust adversarial RL: A linear-quadratic case

Yufeng Zhang, Zhuoran Yang, Zhaoran Wang

Keywords Paper

0

0

0

0

2:53

06/12/2021

Learning Policies with Zero or Bounded Constraint Violation for Constrained MDPs

Tao Liu, Ruida Zhou, Dileep Kalathil and
Panganamala Kumar, Chao Tian

Keywords Paper

reinforcement learning and planning

0

0

0

0

11:47