Model-Based Multi-Agent RL in Zero-Sum Markov Games with Near-Optimal Sample Complexity

06/12/2020

Model-Based Multi-Agent RL in Zero-Sum Markov Games with Near-Optimal Sample Complexity

Kaiqing Zhang, Sham Kakade, Tamer Basar, Lin Yang

Keywords:

Abstract Paper Similar Papers

Abstract: Model-based reinforcement learning (RL), which finds an optimal policy using an empirical model, has long been recognized as one of the cornerstones of RL. It is especially suitable for multi-agent RL (MARL), as it naturally decouples the learning and the planning phases, and avoids the non-stationarity problem when all agents are improving their policies simultaneously using samples. Though intuitive and widely-used, the sample complexity of model-based MARL algorithms has been investigated relatively much less often. In this paper, we aim to address the fundamental open question about the sample complexity of model-based MARL. We study arguably the most basic MARL setting: two-player discounted zero-sum Markov games, given only access to a generative model of state transition. We show that model-based MARL achieves a sample complexity of $\tilde \cO(|\cS||\cA||\cB|(1-\gamma)^{-3}\epsilon^{-2})$ for finding the Nash equilibrium (NE) \emph{value} up to some $\epsilon$ error, and the $\epsilon$-NE \emph{policies}, where $\gamma$ is the discount factor, and $\cS,\cA,\cB$ denote the state space, and the action spaces for the two agents. We also show that this method is near-minimax optimal with a tight dependence on $1-\gamma$ and $|\cS|$ by providing a lower bound of $\Omega(|\cS|(|\cA|+|\cB|)(1-\gamma)^{-3}\epsilon^{-2})$. Our results justify the efficiency of this simple model-based approach in the multi-agent RL setting.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at NeurIPS 2020 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

18/07/2021

A Sharp Analysis of Model-based Reinforcement Learning with Self-Play

Qinghua Liu, Tiancheng Yu, Yu Bai, Chi Jin

Keywords Paper

Algorithms, Multitask and Transfer Learning, Algorithms, Meta-Learning; Applications, Object Recognition; Data, Challenges, Implementations, and Software, Benchmarks;, Theory, RL, Decisions and Control Theory

0

0

0

0

4:49

06/12/2020

Is Plug-in Solver Sample-Efficient for Feature-based Reinforcement Learning?

Qiwen Cui, Lin Yang

Keywords Paper

Algorithms -> Semi-Supervised Learning; Deep Learning -> Deep Autoencoders; Deep Learning -> Generative Models, Probabilistic Methods -> Variational Inference

0

0

0

0

3:25

06/12/2021

Bridging Offline Reinforcement Learning and Imitation Learning: A Tale of Pessimism

Paria Rashidinejad, Banghua Zhu, Cong Ma and
Jiantao Jiao, Stuart Russell

Keywords Paper

theory, reinforcement learning and planning, bandits

0

0

0

0

12:21

09/07/2020

Learning Zero-Sum Simultaneous-Move Markov Games Using Function Approximation and Correlated Equilibrium

Qiaomin Xie, Yudong Chen, Zhaoran Wang, Zhuoran Yang

Keywords Paper

Reinforcement learning, Planning and control

0

0

0

0

15:16

06/12/2021

Policy Finetuning: Bridging Sample-Efficient Offline and Online Reinforcement Learning

Tengyang Xie, Nan Jiang, Huan Wang and
Caiming Xiong, Yu Bai

Keywords Paper

theory, optimization, reinforcement learning and planning

1

0

0

0

10:57

06/12/2020

On Reward-Free Reinforcement Learning with Linear Function Approximation

Ruosong Wang, Simon Du, Lin Yang, Russ Salakhutdinov

Keywords Paper

0

0

0

0

3:12

06/12/2021

Stochastic Shortest Path: Minimax, Parameter-Free and Towards Horizon-Free Regret

Jean Tarbouriech, Runlong Zhou, Simon Du and
Matteo Pirotta, Michal Valko, Alessandro Lazaric

Keywords Paper

theory, reinforcement learning and planning

0

0

0

0

13:47

06/12/2021

Nearly Horizon-Free Offline Reinforcement Learning

Tongzheng Ren, Jialian Li, Bo Dai and
Simon Du, Sujay Sanghavi

Keywords Paper

theory, optimization, reinforcement learning and planning

0

0

0

0

8:44

06/12/2020

Least Squares Regression with Markovian Data: Fundamental Limits and Algorithms

Dheeraj Nagaraj, Xian Wu, Guy Bresler and
Prateek Jain, Praneeth Netrapalli

Keywords Paper

0

0

0

0

3:34

04/08/2021

Is Reinforcement Learning More Difficult Than Bandits? A Near-optimal Algorithm Escaping the Curse of Horizon

Zihan Zhang, Xiangyang Ji, Simon Du

Keywords Paper

0

0

0

0

12:37

26/08/2020

A Reduction from Reinforcement Learning to No-Regret Online Learning

Ching-An Cheng, Remi Tachet des Combes, Byron Boots, Geoff Gordon

Keywords Paper

0

0

0

0

14:33

26/04/2020

Ranking Policy Gradient

Kaixiang Lin, Jiayu Zhou

Keywords Paper

Sample-efficient reinforcement learning, off-policy learning.

0

0

0

0

5:43

06/12/2021

Bayesian decision-making under misspecified priors with applications to meta-learning

Max Simchowitz, Christopher Tosh, Akshay Krishnamurthy and
Daniel Hsu, Thodoris Lykouris, Miro Dudik, Robert E Schapire

Keywords Paper

meta learning, bandits

0

0

0

0

14:58

06/12/2021

Near-Optimal Offline Reinforcement Learning via Double Variance Reduction

Ming Yin, Yu Bai, Yu-Xiang Wang

Keywords Paper

theory, optimization, reinforcement learning and planning

0

0

0

0

8:57

18/07/2021

Shortest-Path Constrained Reinforcement Learning for Sparse Reward Tasks

Sungryull Sohn, Sungtae Lee, Jongwook Choi and
Harm van Seijen, Mehdi Fatemi, Honglak Lee

Keywords Paper

Reinforcement Learning and Planning, Deep RL

0

0

0

0

5:19

06/12/2021

Accommodating Picky Customers: Regret Bound and Exploration Complexity for Multi-Objective Reinforcement Learning

Jingfeng Wu, Vladimir Braverman, Lin Yang

Keywords Paper

theory, reinforcement learning and planning

0

0

0

0

14:22

06/12/2020

POMO: Policy Optimization with Multiple Optima for Reinforcement Learning

Yeong-Dae Kwon, Jinho Choo, Byoungjip Kim and
Iljoo Yoon, Youngjune Gwon, Seungjai Min

Keywords Paper

0

0

0

0

3:19

12/07/2020

Momentum-Based Policy Gradient Methods

Feihu Huang, Shangqian Gao, Jian Pei, Heng Huang

Keywords Paper

Reinforcement Learning - General

0

0

0

0

13:28

18/07/2021

Near-Optimal Model-Free Reinforcement Learning in Non-Stationary Episodic MDPs

Weichao Mao, Kaiqing Zhang, Ruihao Zhu and
David Simchi-Levi, Tamer Basar

Keywords Paper

Theory, RL, Decisions and Control Theory

0

0

0

0

4:12

12/07/2020

Model-free Reinforcement Learning in Infinite-horizon Average-reward Markov Decision Processes

Chen-Yu Wei, Mehdi Jafarnia, Haipeng Luo and
Hiteshi Sharma, Rahul Jain

Keywords Paper

Reinforcement Learning - Theory

0

0

0

0

13:40

06/12/2021

Taming Communication and Sample Complexities in Decentralized Policy Evaluation for Cooperative Multi-Agent Reinforcement Learning

Xin Zhang, Zhuqing Liu, Jia Liu and
Zhengyuan Zhu, Songtao Lu

Keywords Paper

theory, optimization, reinforcement learning and planning

0

0

0

0

14:54

06/12/2021

Sample-Efficient Reinforcement Learning for Linearly-Parameterized MDPs with a Generative Model

Bingyan Wang, Yuling Yan, Jianqing Fan

Keywords Paper

theory, reinforcement learning and planning, generative model

0

0

0

0

7:34

18/07/2021

Towards Tight Bounds on the Sample Complexity of Average-reward MDPs

Yujia Jin, Aaron Sidford

Keywords Paper

Theory, RL, Decisions and Control Theory

0

0

0

0

5:05

26/04/2020

CAQL: Continuous Action Q-Learning

Moonkyung Ryu, Yinlam Chow, Ross Anderson and
Christian Tjandraatmadja, Craig Boutilier

Keywords Paper

Reinforcement learning (RL), DQN, Continuous control, Mixed-Integer Programming (MIP)

0

0

0

0

5:36

06/12/2021

Learning One Representation to Optimize All Rewards

Ahmed Touati, Yann Ollivier

Keywords Paper

deep learning, reinforcement learning and planning, representation learning

0

0

0

0

14:52

06/12/2020

PC-PG: Policy Cover Directed Exploration for Provable Policy Gradient Learning

Alekh Agarwal, Mikael Henaff, Sham Kakade, Wen Sun

Keywords Paper

0

0

0

0

3:13

18/07/2021

Provably Efficient Reinforcement Learning for Discounted MDPs with Feature Mapping

Dongruo Zhou, Jiafan He, Quanquan Gu

Keywords Paper

Reinforcement Learning and Planning

0

0

0

0

5:20

09/07/2020

Model-Based Reinforcement Learning with a Generative Model is Minimax Optimal

Alekh Agarwal, Sham Kakade, Lin Yang

Keywords Paper

Reinforcement learning, Sampling algorithms

0

0

0

0

15:13

26/04/2020

Sample Efficient Policy Gradient Methods with Recursive Variance Reduction

Pan Xu, Felicia Gao, Quanquan Gu

Keywords Paper

Policy Gradient, Reinforcement Learning, Sample Efficiency

0

0

0

0

4:40

06/12/2021

Policy Optimization in Adversarial MDPs: Improved Exploration via Dilated Bonuses

Haipeng Luo, Chen-Yu Wei, Chung-Wei Lee

Keywords Paper

optimization, reinforcement learning and planning, bandits

0

0

0

0

15:17

06/12/2020

Provably Efficient Reinforcement Learning with Kernel and Neural Function Approximations

Zhuoran Yang, Chi Jin, Zhaoran Wang and
Mengdi Wang, Michael Jordan

Keywords Paper

0

0

0

0

3:42

26/08/2020

Solving Discounted Stochastic Two-Player Games with Near-Optimal Time and Sample Complexity

Aaron Sidford, Mengdi Wang, Lin Yang, Yinyu Ye

Keywords Paper

0

0

0

0

14:51

06/12/2020

Breaking the Sample Size Barrier in Model-Based Reinforcement Learning with a Generative Model

Gen Li, Yuting Wei, Yuejie Chi and
Yuantao Gu, Yuxin Chen

Keywords Paper

0

0

0

0

3:09

18/07/2021

Adaptive Sampling for Best Policy Identification in Markov Decision Processes

Aymen Al Marjani, Alexandre Proutiere

Keywords Paper

Theory, RL, Decisions and Control Theory

0

0

0

0

5:35

13/04/2021

Provably efficient safe exploration via primal-dual policy optimization

Dongsheng Ding, Xiaohan Wei, Zhuoran Yang and
Zhaoran Wang, Mihailo Jovanovic

Keywords Paper

0

0

0

0

3:07

06/12/2021

Reward-Free Model-Based Reinforcement Learning with Linear Function Approximation

Weitong ZHANG, Dongruo Zhou, Quanquan Gu

Keywords Paper

reinforcement learning and planning

0

0

0

0

11:53

09/07/2020

Finite Time Analysis of Linear Two-timescale Stochastic Approximation with Markovian Noise

Maksim Kaledin, Eric Moulines, Alexey Naumov and
Vladislav Tadic, Hoi-To Wai

Keywords Paper

Stochastic optimization, Reinforcement learning

0

0

0

0

12:29

18/07/2021

EMaQ: Expected-Max Q-Learning Operator for Simple Yet Effective Offline and Online RL

Seyed Kamyar Seyed Ghasemipour, Dale Schuurmans, Shixiang Gu

Keywords Paper

Reinforcement Learning and Planning

0

0

0

1

5:54

04/08/2021

Nearly Minimax Optimal Reinforcement Learning for Linear Mixture MDPs

Dongruo Zhou, Quanquan Gu, Csaba Szepesvari

Keywords Paper

0

0

0

0

16:33

02/02/2021

Reinforcement Learning Based Multi-Agent Resilient Control: From Deep Neural Networks to an Adaptive Law

Jian Hou, Fangyuan Wang, Lili Wang, Zhiyong Chen

Keywords Paper

0

0

0

0

15:48