Learning in two-player zero-sum partially observable Markov games with perfect recall

06/12/2021

Learning in two-player zero-sum partially observable Markov games with perfect recall

Tadashi Kozuno, Pierre Ménard, Remi Munos, Michal Valko

Keywords: reinforcement learning and planning, bandits, online learning

Abstract Paper Similar Papers

Abstract: We study the problem of learning a Nash equilibrium (NE) in an extensive game with imperfect information (EGII) through self-play. Precisely, we focus on two-player, zero-sum, episodic, tabular EGII under the \textit{perfect-recall} assumption where the only feedback is realizations of the game (bandit feedback). In particular the \textit{dynamics of the EGII is not known}---we can only access it by sampling or interacting with a game simulator. For this learning setting, we provide the Implicit Exploration Online Mirror Descent (IXOMD) algorithm. It is a model-free algorithm with a high-probability bound on convergence rate to the NE of order $1/\sqrt{T}$ where~$T$ is the number of played games. Moreover IXOMD is computationally efficient as it needs to perform the updates only along the sampled trajectory.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at NeurIPS 2021 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

13/04/2021

Reinforcement learning for mean field games with strategic complementarities

Kiyeob Lee, Desik Rengarajan, Dileep Kalathil, Srinivas Shakkottai

Keywords Paper

0

0

0

0

2:57

12/07/2020

Implicit Learning Dynamics in Stackelberg Games: Equilibria Characterization, Convergence Analysis, and Empirical Study

Tanner Fiez, Benjamin Chasnov, Lillian Ratliff

Keywords Paper

Learning Theory

0

0

0

0

15:14

12/07/2020

Provable Self-Play Algorithms for Competitive Reinforcement Learning

Yu Bai, Chi Jin

Keywords Paper

Reinforcement Learning - Theory

0

0

0

0

14:28

06/12/2020

No-Regret Learning and Mixed Nash Equilibria: They Do Not Mix

Manolis Vlatakis-Gkaragkounis, Lampros Flokas, Thanasis Lianeas and
Panayotis Mertikopoulos, Georgios Piliouras

Keywords Paper

Algorithms -> Semi-Supervised Learning; Applications -> Computer Vision; Deep Learning, Applications -> Computational Photography

0

0

0

0

3:10

09/07/2020

Learning Zero-Sum Simultaneous-Move Markov Games Using Function Approximation and Correlated Equilibrium

Qiaomin Xie, Yudong Chen, Zhaoran Wang, Zhuoran Yang

Keywords Paper

Reinforcement learning, Planning and control

0

0

0

0

15:16

18/07/2021

A Sharp Analysis of Model-based Reinforcement Learning with Self-Play

Qinghua Liu, Tiancheng Yu, Yu Bai, Chi Jin

Keywords Paper

Algorithms, Multitask and Transfer Learning, Algorithms, Meta-Learning; Applications, Object Recognition; Data, Challenges, Implementations, and Software, Benchmarks;, Theory, RL, Decisions and Control Theory

0

0

0

0

4:49

06/12/2020

Near-Optimal Reinforcement Learning with Self-Play

Yu Bai, Chi Jin, Tiancheng Yu

Keywords Paper

Theory -> Regularization, Applications -> Fairness, Accountability, and Transparency

0

0

0

0

3:33

06/12/2021

Exploration-Exploitation in Multi-Agent Competition: Convergence with Bounded Rationality

Stefanos Leonardos, Georgios Piliouras, Kelly Spendlove

Keywords Paper

reinforcement learning and planning

0

0

0

0

14:11

06/12/2020

Follow the Perturbed Leader: Optimism and Fast Parallel Algorithms for Smooth Minimax Games

Arun Suggala, Praneeth Netrapalli

Keywords Paper

1

1

0

0

3:29

06/12/2020

Independent Policy Gradient Methods for Competitive Reinforcement Learning

Constantinos Daskalakis, Dylan Foster, Noah Golowich

Keywords Paper

Applications -> Web Applications and Internet Data; Theory -> Learning Theory, Probabilistic Methods -> Causal Inference

0

0

0

0

3:23

06/12/2021

XDO: A Double Oracle Algorithm for Extensive-Form Games

Stephen McAleer, JB Lanier, Kevin A Wang and
Pierre Baldi, Roy Fox

Keywords Paper

reinforcement learning and planning

0

0

0

0

14:51

19/08/2021

Temporal Induced Self-Play for Stochastic Bayesian Games

Weizhe Chen, Zihan Zhou, Yi Wu, Fei Fang

Keywords Paper

Agent-based and Multi-agent Systems, Multi-agent Learning, Applications of Reinforcement Learning

0

0

0

0

11:52

06/12/2021

Global Convergence to Local Minmax Equilibrium in Classes of Nonconvex Zero-Sum Games

Tanner Fiez, Lillian Ratliff, Eric Mazumdar and
Evan Faulkner, Adhyyan Narang

Keywords Paper

theory, optimization

0

0

0

0

15:13

04/08/2021

Last-iterate Convergence of Decentralized Optimistic Gradient Descent/Ascent in Infinite-horizon Competitive Markov Games

Chen-Yu Wei, Chung-Wei Lee, Mengxiao Zhang, Haipeng Luo

Keywords Paper

0

0

0

0

18:24

06/12/2021

Decentralized Q-learning in Zero-sum Markov Games

Muhammed Sayin, Kaiqing Zhang, David Leslie and
Tamer Basar, Asuman Ozdaglar

Keywords Paper

reinforcement learning and planning

0

0

0

0

15:07

06/12/2021

Neural Auto-Curricula in Two-Player Zero-Sum Games

Xidong Feng, Oliver Slumbers, Ziyu Wan and
Bo Liu, Stephen McAleer, Ying Wen, Jun Wang, Yaodong Yang

Keywords Paper

deep learning, optimization, reinforcement learning and planning, meta learning

0

0

0

0

14:46

12/07/2020

Invariant Risk Minimization Games

Kartik Ahuja, Karthikeyan Shanmugam, Kush Varshney, Amit Dhurandhar

Keywords Paper

Causality

0

0

0

0

14:57

13/04/2021

A limited-capacity minimax theorem for non-convex games or: How i learned to stop worrying about mixed-nash and love neural nets

Gauthier Gidel, David Balduzzi, Wojciech Czarnecki and
Marta Garnelo, Yoram Bachrach

Keywords Paper

0

0

0

0

2:51

06/12/2021

Sample-Efficient Learning of Stackelberg Equilibria in General-Sum Games

Yu Bai, Chi Jin, Huan Wang, Caiming Xiong

Keywords Paper

theory, reinforcement learning and planning, bandits

0

0

0

0

12:14

02/02/2021

Convergence Analysis of No-Regret Bidding Algorithms in Repeated Auctions

Zhe Feng, Guru Guruganesh, Christopher Liaw and
Aranyak Mehta, Abhishek Sethi

Keywords Paper

0

0

0

0

20:14

18/07/2021

Learning in Nonzero-Sum Stochastic Games with Potentials

David Mguni, Yutong Wu, Yali Du and
Yaodong Yang, Ziyi Wang, M. Li, Ying Wen, Joel Jennings, Jun Wang

Keywords Paper

Theory, Game Theory and Computational Economics

0

0

0

0

5:36

12/07/2020

Finite-Time Last-Iterate Convergence for Multi-Agent Learning in Games

Darren Lin, Zhengyuan Zhou, Panayotis Mertikopoulos, Michael Jordan

Keywords Paper

Learning Theory

1

1

0

0

12:31

06/12/2020

Fictitious Play for Mean Field Games: Continuous Time Analysis and Applications

Sarah Perrin, Julien Perolat, Mathieu Lauriere and
Matthieu Geist, Romuald Elie, Olivier Pietquin

Keywords Paper

0

0

0

0

3:20

04/08/2021

Online Markov Decision Processes with Aggregate Bandit Feedback

Alon Cohen, Haim Kaplan, Tomer Koren, Yishay Mansour

Keywords Paper

0

0

0

0

13:07

06/12/2020

Stateful Posted Pricing with Vanishing Regret via Dynamic Deterministic Markov Decision Processes

Yuval Emek, Ron Lavi, Rad Niazadeh, Yangguang Shi

Keywords Paper

0

0

0

0

3:10

04/08/2021

Survival of the strictest: Stable and unstable equilibria under regularized learning with partial information

Angeliki Giannou, Emmanouil Vasileios Vlatakis-Gkaragkounis, Panayotis Mertikopoulos

Keywords Paper

0

0

0

0

16:33

04/08/2021

Learning in Matrix Games can be Arbitrarily Complex

Gabriel P Andrade, Rafael Frongillo, Georgios Piliouras

Keywords Paper

0

0

0

0

14:59

09/07/2020

New Potential-Based Bounds for Prediction with Expert Advice

Vladimir A Kobzar, Robert Kohn, Zhilei Wang

Keywords Paper

Online learning, Adversarial learning and robustness, Economics, game theory, and incentives

1

0

0

0

14:01

06/12/2021

Online Learning in Periodic Zero-Sum Games

Tanner Fiez, Ryann Sim, Stratis Skoulakis and
Georgios Piliouras, Lillian Ratliff

Keywords Paper

theory, robustness, online learning

0

0

0

0

12:44

18/07/2021

Learning While Playing in Mean-Field Games: Convergence and Optimality

Qiaomin Xie, Zhuoran Yang, Zhaoran Wang, Andreea Minca

Keywords Paper

Applications, Privacy, Anonymity, and Security, Algorithms, Components Analysis (e.g., CCA, ICA, LDA, PCA), Reinforcement Learning and Planning, Multi-Agent RL

0

0

0

0

5:24

02/02/2021

Evolutionary Game Theory Squared: Evolving Agents in Endogenously Evolving Zero-Sum Games

Stratis Skoulakis, Tanner Fiez, Ryann Sim and
Georgios Piliouras, Lillian Ratliff

Keywords Paper

0

0

0

0

20:14

18/07/2021

Infinite-Dimensional Optimization for Zero-Sum Games via Variational Transport

Lewis Liu, Yufeng Zhang, Zhuoran Yang and
Reza Babanezhad, Zhaoran Wang

Keywords Paper

Theory, Statistical Learning Theory

0

0

0

0

5:20

06/12/2021

Neural Active Learning with Performance Guarantees

Zhilei Wang, Pranjal Awasthi, Christoph Dann and
Ayush Sekhari, Claudio Gentile

Keywords Paper

deep learning, active learning

0

0

0

0

10:43

18/07/2021

Efficient Performance Bounds for Primal-Dual Reinforcement Learning from Demonstrations

Angeliki Kamoutsi, Goran Banjac, John Lygeros

Keywords Paper

Theory, RL, Decisions and Control Theory

0

0

0

0

5:29

06/12/2021

Online learning in MDPs with linear function approximation and bandit feedback.

Gergely Neu, Julia Olkhovskaya

Keywords Paper

reinforcement learning and planning, bandits, online learning

0

0

0

0

13:24

02/02/2021

Computing Quantal Stackelberg Equilibrium in Extensive-Form Games

Jakub Černý, Viliam Lisý, Branislav Bošanský, Bo An

Keywords Paper

0

0

0

0

15:01

26/04/2020

Actor-Critic Provably Finds Nash Equilibria of Linear-Quadratic Mean-Field Games

Zuyue Fu, Zhuoran Yang, Yongxin Chen, Zhaoran Wang

Keywords Paper

0

0

0

0

5:09

02/02/2021

Evolution Strategies for Approximate Solution of Bayesian Games

Zun Li, Michael P. Wellman

Keywords Paper

0

0

0

0

18:18

06/12/2021

Learning One Representation to Optimize All Rewards

Ahmed Touati, Yann Ollivier

Keywords Paper

deep learning, reinforcement learning and planning, representation learning

0

0

0

0

14:52

06/12/2021

Combinatorial Pure Exploration with Bottleneck Reward Function

Yihan Du, Yuko Kuroki, Wei Chen

Keywords Paper

theory, reinforcement learning and planning, bandits

0

0

0

0

11:53