Near-Optimal Reinforcement Learning with Self-Play

06/12/2020

Near-Optimal Reinforcement Learning with Self-Play

Yu Bai, Chi Jin, Tiancheng Yu

Keywords: Theory -> Regularization, Applications -> Fairness, Accountability, and Transparency

Abstract Paper Similar Papers

Abstract: This paper considers the problem of designing optimal algorithms for reinforcement learning in two-player zero-sum games. We focus on self-play algorithms which learn the optimal policy by playing against itself without any direct supervision. In a tabular episodic Markov game with S states, A max-player actions and B min-player actions, the best existing algorithm for finding an approximate Nash equilibrium requires \tlO(S^2AB) steps of game playing, when only highlighting the dependency on (S,A,B). In contrast, the best existing lower bound scales as \Omega(S(A+B)) and has a significant gap from the upper bound. This paper closes this gap for the first time: we propose an optimistic variant of the Nash Q-learning algorithm with sample complexity \tlO(SAB), and a new Nash V-learning algorithm with sample complexity \tlO(S(A+B)). The latter result matches the information-theoretic lower bound in all problem-dependent parameters except for a polynomial factor of the length of each episode. In addition, we present a computational hardness result for learning the best responses against a fixed opponent in Markov games---a learning objective different from finding the Nash equilibrium.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at NeurIPS 2020 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

12/07/2020

Provable Self-Play Algorithms for Competitive Reinforcement Learning

Yu Bai, Chi Jin

Keywords Paper

Reinforcement Learning - Theory

0

0

0

0

14:28

18/07/2021

A Sharp Analysis of Model-based Reinforcement Learning with Self-Play

Qinghua Liu, Tiancheng Yu, Yu Bai, Chi Jin

Keywords Paper

Algorithms, Multitask and Transfer Learning, Algorithms, Meta-Learning; Applications, Object Recognition; Data, Challenges, Implementations, and Software, Benchmarks;, Theory, RL, Decisions and Control Theory

0

0

0

0

4:49

06/12/2021

Decentralized Q-learning in Zero-sum Markov Games

Muhammed Sayin, Kaiqing Zhang, David Leslie and
Tamer Basar, Asuman Ozdaglar

Keywords Paper

reinforcement learning and planning

0

0

0

0

15:07

18/07/2021

Provably Efficient Fictitious Play Policy Optimization for Zero-Sum Markov Games with Structured Transitions

Shuang Qiu, Xiaohan Wei, Jieping Ye and
Zhaoran Wang, Zhuoran Yang

Keywords Paper

Reinforcement Learning and Planning

0

0

0

0

11:21

09/07/2020

Learning Zero-Sum Simultaneous-Move Markov Games Using Function Approximation and Correlated Equilibrium

Qiaomin Xie, Yudong Chen, Zhaoran Wang, Zhuoran Yang

Keywords Paper

Reinforcement learning, Planning and control

0

0

0

0

15:16

06/12/2020

Pipeline PSRO: A Scalable Approach for Finding Approximate Nash Equilibria in Large Games

Stephen Mcaleer, J.B. Lanier, Roy Fox, Pierre Baldi

Keywords Paper

0

0

0

0

3:12

19/08/2021

Temporal Induced Self-Play for Stochastic Bayesian Games

Weizhe Chen, Zihan Zhou, Yi Wu, Fei Fang

Keywords Paper

Agent-based and Multi-agent Systems, Multi-agent Learning, Applications of Reinforcement Learning

0

0

0

0

11:52

02/02/2021

Model-Free Online Learning in Unknown Sequential Decision Making Problems and Games

Gabriele Farina, Tuomas Sandholm

Keywords Paper

0

0

0

0

17:09

04/08/2021

Last-iterate Convergence of Decentralized Optimistic Gradient Descent/Ascent in Infinite-horizon Competitive Markov Games

Chen-Yu Wei, Chung-Wei Lee, Mengxiao Zhang, Haipeng Luo

Keywords Paper

0

0

0

0

18:24

06/12/2020

Preference-based Reinforcement Learning with Finite-Time Guarantees

Yichong Xu, Ruosong Wang, Lin Yang and
Aarti Singh, Artur Dubrawski

Keywords Paper

0

0

0

0

3:04

06/12/2020

Independent Policy Gradient Methods for Competitive Reinforcement Learning

Constantinos Daskalakis, Dylan Foster, Noah Golowich

Keywords Paper

Applications -> Web Applications and Internet Data; Theory -> Learning Theory, Probabilistic Methods -> Causal Inference

0

0

0

0

3:23

18/07/2021

Learning While Playing in Mean-Field Games: Convergence and Optimality

Qiaomin Xie, Zhuoran Yang, Zhaoran Wang, Andreea Minca

Keywords Paper

Applications, Privacy, Anonymity, and Security, Algorithms, Components Analysis (e.g., CCA, ICA, LDA, PCA), Reinforcement Learning and Planning, Multi-Agent RL

0

0

0

0

5:24

06/12/2021

Reinforcement Learning in Reward-Mixing MDPs

Jeongyeol Kwon, Yonathan Efroni, Constantine Caramanis, Shie Mannor

Keywords Paper

theory, reinforcement learning and planning

0

0

0

0

12:57

06/12/2021

Learning in two-player zero-sum partially observable Markov games with perfect recall

Tadashi Kozuno, Pierre Ménard, Remi Munos, Michal Valko

Keywords Paper

reinforcement learning and planning, bandits, online learning

0

0

0

0

9:31

06/12/2021

Neural Auto-Curricula in Two-Player Zero-Sum Games

Xidong Feng, Oliver Slumbers, Ziyu Wan and
Bo Liu, Stephen McAleer, Ying Wen, Jun Wang, Yaodong Yang

Keywords Paper

deep learning, optimization, reinforcement learning and planning, meta learning

0

0

0

0

14:46

12/07/2020

Implicit Learning Dynamics in Stackelberg Games: Equilibria Characterization, Convergence Analysis, and Empirical Study

Tanner Fiez, Benjamin Chasnov, Lillian Ratliff

Keywords Paper

Learning Theory

0

0

0

0

15:14

12/07/2020

Near-optimal Regret Bounds for Stochastic Shortest Path

Aviv Rosenberg, Alon Cohen, Yishay Mansour, Haim Kaplan

Keywords Paper

Online Learning, Active Learning, and Bandits

0

0

0

0

15:57

06/12/2020

Model-Based Multi-Agent RL in Zero-Sum Markov Games with Near-Optimal Sample Complexity

Kaiqing Zhang, Sham Kakade, Tamer Basar, Lin Yang

Keywords Paper

0

0

0

0

3:25

18/07/2021

A New Formalism, Method and Open Issues for Zero-Shot Coordination

Johannes Treutlein, Michael Dennis, Caspar Oesterheld, Jakob Foerster

Keywords Paper

Reinforcement Learning and Planning, Multi-Agent RL

0

0

0

0

5:28

06/12/2021

Learning One Representation to Optimize All Rewards

Ahmed Touati, Yann Ollivier

Keywords Paper

deep learning, reinforcement learning and planning, representation learning

0

0

0

0

14:52

06/12/2021

Exploration-Exploitation in Multi-Agent Competition: Convergence with Bounded Rationality

Stefanos Leonardos, Georgios Piliouras, Kelly Spendlove

Keywords Paper

reinforcement learning and planning

0

0

0

0

14:11

06/12/2020

Provably Efficient Exploration for Reinforcement Learning Using Unsupervised Learning

Fei Feng, Ruosong Wang, Wotao Yin and
Simon Du, Lin Yang

Keywords Paper

Reinforcement Learning and Planning -> Decision and Control, Probabilistic Methods -> Gaussian Processes

0

0

0

0

3:11

18/07/2021

Adaptive Sampling for Best Policy Identification in Markov Decision Processes

Aymen Al Marjani, Alexandre Proutiere

Keywords Paper

Theory, RL, Decisions and Control Theory

0

0

0

0

5:35

26/04/2020

CAQL: Continuous Action Q-Learning

Moonkyung Ryu, Yinlam Chow, Ross Anderson and
Christian Tjandraatmadja, Craig Boutilier

Keywords Paper

Reinforcement learning (RL), DQN, Continuous control, Mixed-Integer Programming (MIP)

0

0

0

0

5:36

12/07/2020

Invariant Risk Minimization Games

Kartik Ahuja, Karthikeyan Shanmugam, Kush Varshney, Amit Dhurandhar

Keywords Paper

Causality

0

0

0

0

14:57

26/04/2020

Posterior sampling for multi-agent reinforcement learning: solving extensive games with imperfect information

Yichi Zhou, Jialian Li, Jun Zhu

Keywords Paper

0

0

0

0

12:55

06/12/2021

Global Convergence to Local Minmax Equilibrium in Classes of Nonconvex Zero-Sum Games

Tanner Fiez, Lillian Ratliff, Eric Mazumdar and
Evan Faulkner, Adhyyan Narang

Keywords Paper

theory, optimization

0

0

0

0

15:13

19/08/2021

Boosting Offline Reinforcement Learning with Residual Generative Modeling

Hua Wei, Deheng Ye, Zhao Liu and
Hao Wu, Bo Yuan, Qiang Fu, Wei Yang, Zhenhui Li

Keywords Paper

Machine Learning Applications, Applications of Reinforcement Learning, Game Playing, Reinforcement Learning

0

0

0

0

11:32

06/12/2021

Optimal Algorithms for Stochastic Contextual Preference Bandits

Aadirupa Saha

Keywords Paper

bandits

0

0

0

0

16:00

12/07/2020

No-Regret Exploration in Goal-Oriented Reinforcement Learning

Jean Tarbouriech, Evrard Garcelon, Michal Valko and
Matteo Pirotta, Alessandro Lazaric

Keywords Paper

Reinforcement Learning - General

0

0

0

0

11:14

06/12/2021

Combinatorial Pure Exploration with Bottleneck Reward Function

Yihan Du, Yuko Kuroki, Wei Chen

Keywords Paper

theory, reinforcement learning and planning, bandits

0

0

0

0

11:53

03/05/2021

Iterative Empirical Game Solving via Single Policy Best Response

Max Smith, Thomas Anthony, Michael Wellman

Keywords Paper

Reinforcement Learning, Multiagent Learning, Empirical Game Theory

0

0

0

0

8:49

18/07/2021

Breaking the Deadly Triad with a Target Network

Shangtong Zhang, Hengshuai Yao, Shimon Whiteson

Keywords Paper

Theory

0

0

0

0

5:11

04/08/2021

Online Markov Decision Processes with Aggregate Bandit Feedback

Alon Cohen, Haim Kaplan, Tomer Koren, Yishay Mansour

Keywords Paper

0

0

0

0

13:07

06/12/2021

Stochastic Shortest Path: Minimax, Parameter-Free and Towards Horizon-Free Regret

Jean Tarbouriech, Runlong Zhou, Simon Du and
Matteo Pirotta, Michal Valko, Alessandro Lazaric

Keywords Paper

theory, reinforcement learning and planning

0

0

0

0

13:47

06/12/2020

Follow the Perturbed Leader: Optimism and Fast Parallel Algorithms for Smooth Minimax Games

Arun Suggala, Praneeth Netrapalli

Keywords Paper

1

1

0

0

3:29

06/12/2020

No-Regret Learning and Mixed Nash Equilibria: They Do Not Mix

Manolis Vlatakis-Gkaragkounis, Lampros Flokas, Thanasis Lianeas and
Panayotis Mertikopoulos, Georgios Piliouras

Keywords Paper

Algorithms -> Semi-Supervised Learning; Applications -> Computer Vision; Deep Learning, Applications -> Computational Photography

0

0

0

0

3:10

02/02/2021

Computing Quantal Stackelberg Equilibrium in Extensive-Form Games

Jakub Černý, Viliam Lisý, Branislav Bošanský, Bo An

Keywords Paper

0

0

0

0

15:01

06/12/2021

Minimax Regret for Stochastic Shortest Path

Alon Cohen, Yonathan Efroni, Yishay Mansour, Aviv Rosenberg

Keywords Paper

reinforcement learning and planning, online learning

0

0

0

0

14:55

06/12/2021

Policy Finetuning: Bridging Sample-Efficient Offline and Online Reinforcement Learning

Tengyang Xie, Nan Jiang, Huan Wang and
Caiming Xiong, Yu Bai

Keywords Paper

theory, optimization, reinforcement learning and planning

1

0

0

0

10:57