Taming Communication and Sample Complexities in Decentralized Policy Evaluation for Cooperative Multi-Agent Reinforcement Learning

Abstract: Cooperative multi-agent reinforcement learning (MARL) has received increasing attention in recent years and has found many scientific and engineering applications. However, a key challenge arising from many cooperative MARL algorithm designs (e.g., the actor-critic framework) is the policy evaluation problem, which can only be conducted in a {\em decentralized} fashion. In this paper, we focus on decentralized MARL policy evaluation with nonlinear function approximation, which is often seen in deep MARL. We first show that the empirical decentralized MARL policy evaluation problem can be reformulated as a decentralized nonconvex-strongly-concave minimax saddle point problem. We then develop a decentralized gradient-based descent ascent algorithm called GT-GDA that enjoys a convergence rate of $\mathcal{O}(1/T)$. To further reduce the sample complexity, we propose two decentralized stochastic optimization algorithms called GT-SRVR and GT-SRVRI, which enhance GT-GDA by variance reduction techniques. We show that all algorithms all enjoy an $\mathcal{O}(1/T)$ convergence rate to a stationary point of the reformulated minimax problem. Moreover, the fast convergence rates of GT-SRVR and GT-SRVRI imply $\mathcal{O}(\epsilon^{-2})$ communication complexity and $\mathcal{O}(m\sqrt{n}\epsilon^{-2})$ sample complexity, where $m$ is the number of agents and $n$ is the length of trajectories. To our knowledge, this paper is the first work that achieves both $\mathcal{O}(\epsilon^{-2})$ sample complexity and $\mathcal{O}(\epsilon^{-2})$ communication complexity in decentralized policy evaluation for cooperative MARL. Our extensive experiments also corroborate the theoretical performance of our proposed decentralized policy evaluation algorithms.

06/12/2021

Deep Learning, Adversarial Networks, Applications, Fairness, Accountability, and Transparency, Theory, RL, Decisions and Control Theory

5:03

26/08/2020

Taming Communication and Sample Complexities in Decentralized Policy Evaluation for Cooperative Multi-Agent Reinforcement Learning

Xin Zhang, Zhuqing Liu, Jia Liu, Zhengyuan Zhu, Songtao Lu

Comments

Similar Papers

On the Convergence Theory of Debiased Model-Agnostic Meta-Reinforcement Learning

Alireza Fallah, Kristian Georgiev, Aryan Mokhtari, Asuman Ozdaglar

Keywords Abstract Paper

theory, optimization, reinforcement learning and planning, meta learning

Provably Correct Optimization and Exploration with Non-linear Policies

Fei Feng, Wotao Yin, Alekh Agarwal, Lin Yang

Keywords Abstract Paper

Deep Learning, Adversarial Networks, Applications, Fairness, Accountability, and Transparency, Theory, RL, Decisions and Control Theory

Finite-Time Analysis of Decentralized Temporal-Difference Learning with Linear Function Approximation

Jun Sun, Gang Wang, Georgios B. Giannakis and Qinmin Yang, Zaiyue Yang

Keywords Abstract Paper

Bilevel Optimization: Convergence Analysis and Enhanced Design

Kaiyi Ji, Junjie Yang, Yingbin LIANG

Keywords Abstract Paper

Optimization, Non-Convex Optimization

Sample Efficient Policy Gradient Methods with Recursive Variance Reduction

Pan Xu, Felicia Gao, Quanquan Gu

Keywords Abstract Paper

Policy Gradient, Reinforcement Learning, Sample Efficiency

Momentum-Based Policy Gradient Methods

Feihu Huang, Shangqian Gao, Jian Pei, Heng Huang

Keywords Abstract Paper

Reinforcement Learning - General

Provably Efficient Reinforcement Learning with Linear Function Approximation

Chi Jin, Zhuoran Yang, Zhaoran Wang, Michael Jordan

Keywords Abstract Paper

Reinforcement learning,

Multi-Agent Reinforcement Learning in Stochastic Networked Systems

Yiheng Lin, Guannan Qu, Longbo Huang, Adam Wierman

Keywords Abstract Paper

reinforcement learning and planning, graph learning

On the Convergence and Sample Efficiency of Variance-Reduced Policy Gradient Method

Junyu Zhang, Chengzhuo Ni, zheng Yu and Csaba Szepesvari, Mengdi Wang

Keywords Abstract Paper

theory, reinforcement learning and planning

Distributionally Robust Federated Averaging

Yuyang Deng, Mohammad Mahdi Kamani, Mehrdad Mahdavi

Keywords Abstract Paper

Structured Policy Iteration for Linear Quadratic Regulator

Youngsuk Park, Ryan Rossi, Zheng Wen and Gang Wu, Handong Zhao

Keywords Abstract Paper

Reinforcement Learning - General

CAQL: Continuous Action Q-Learning

Moonkyung Ryu, Yinlam Chow, Ross Anderson and Christian Tjandraatmadja, Craig Boutilier

Keywords Abstract Paper

Reinforcement learning (RL), DQN, Continuous control, Mixed-Integer Programming (MIP)

Policy Finetuning: Bridging Sample-Efficient Offline and Online Reinforcement Learning

Tengyang Xie, Nan Jiang, Huan Wang and Caiming Xiong, Yu Bai

Keywords Abstract Paper

theory, optimization, reinforcement learning and planning

Provably Efficient Reinforcement Learning with Kernel and Neural Function Approximations

Zhuoran Yang, Chi Jin, Zhaoran Wang and Mengdi Wang, Michael Jordan

Keywords Abstract Paper

A Reduction from Reinforcement Learning to No-Regret Online Learning

Ching-An Cheng, Remi Tachet des Combes, Byron Boots, Geoff Gordon

Keywords Abstract Paper

A simpler approach to accelerated optimization: iterative averaging meets optimism

Pooria Joulani, Anant Raj, András György, Csaba Szepesvari

Keywords Abstract Paper

Online Learning, Active Learning, and Bandits

Leveraging Non-uniformity in First-order Non-convex Optimization

Jincheng Mei, Yue Gao, Bo Dai and Csaba Szepesvari, Dale Schuurmans

Keywords Abstract Paper

Theory, RL, Decisions and Control Theory

Policy Optimization in Adversarial MDPs: Improved Exploration via Dilated Bonuses

Haipeng Luo, Chen-Yu Wei, Chung-Wei Lee

Keywords Abstract Paper

optimization, reinforcement learning and planning, bandits

Optimality and Approximation with Policy Gradient Methods in Markov Decision Processes

Alekh Agarwal, Sham Kakade, Jason Lee, Gaurav Mahajan

Keywords Abstract Paper

Reinforcement learning, Non-convex optimization

Co-Adaptation of Algorithmic and Implementational Innovations in Inference-based Deep Reinforcement Learning

Hiroki Furuta, Tadashi Kozuno, Tatsuya Matsushima and Yutaka Matsuo, Shixiang (Shane) Gu

Keywords Abstract Paper

reinforcement learning and planning

Keywords Paper

Keywords Paper

Jun Sun, Gang Wang, Georgios B. Giannakis and
Qinmin Yang, Zaiyue Yang

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Junyu Zhang, Chengzhuo Ni, zheng Yu and
Csaba Szepesvari, Mengdi Wang

Keywords Paper

Keywords Paper

Youngsuk Park, Ryan Rossi, Zheng Wen and
Gang Wu, Handong Zhao

Keywords Paper

Moonkyung Ryu, Yinlam Chow, Ross Anderson and
Christian Tjandraatmadja, Craig Boutilier

Keywords Paper

Tengyang Xie, Nan Jiang, Huan Wang and
Caiming Xiong, Yu Bai

Keywords Paper

Zhuoran Yang, Chi Jin, Zhaoran Wang and
Mengdi Wang, Michael Jordan

Keywords Paper

Keywords Paper

Keywords Paper

Jincheng Mei, Yue Gao, Bo Dai and
Csaba Szepesvari, Dale Schuurmans

Keywords Paper

Keywords Paper

Keywords Paper

Hiroki Furuta, Tadashi Kozuno, Tatsuya Matsushima and
Yutaka Matsuo, Shixiang (Shane) Gu

Keywords Paper

Keywords Paper

Jakub Grudzien Kuba, Muning Wen, Linghui Meng and
shangding gu, Haifeng Zhang, David Mguni, Jun Wang, Yaodong Yang

Keywords Paper

Nhan Pham, Lam Nguyen, Dzung Phan and
PHUONG HA NGUYEN, Marten van Dijk, Quoc Tran-Dinh

Keywords Paper

Xiaoteng Ma, Xiaohang Tang, Li Xia and
Jun Yang, Qianchuan Zhao

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Logan Engstrom, Andrew Ilyas, Shibani Santurkar and
Dimitris Tsipras, Firdaus Janoos, Larry Rudolph, Aleksander Madry

Keywords Paper

Keywords Paper

Keywords Paper

Guy Lorberbom, Chris J. Maddison, Nicolas Heess and
Tamir Hazan, Daniel Tarlow

Keywords Paper

Keywords Paper

Bei Peng, Tabish Rashid, Christian Schroeder de Witt and
Pierre-Alexandre Kamienny, Philip Torr, Wendelin Boehmer, Shimon Whiteson

Keywords Paper

Keywords Paper