Provably Efficient Fictitious Play Policy Optimization for Zero-Sum Markov Games with Structured Transitions

18/07/2021

Provably Efficient Fictitious Play Policy Optimization for Zero-Sum Markov Games with Structured Transitions

Shuang Qiu, Xiaohan Wei, Jieping Ye, Zhaoran Wang, Zhuoran Yang

Keywords: Reinforcement Learning and Planning

Abstract Paper Similar Papers

Abstract: While single-agent policy optimization in a fixed environment has attracted a lot of research attention recently in the reinforcement learning community, much less is known theoretically when there are multiple agents playing in a potentially competitive environment. We take steps forward by proposing and analyzing new fictitious play policy optimization algorithms for two-player zero-sum Markov games with structured but unknown transitions. We consider two classes of transition structures: factored independent transition and single-controller transition. For both scenarios, we prove tight $\widetilde{\mathcal{O}}(\sqrt{T})$ regret bounds after $T$ steps in a two-agent competitive game scenario. The regret of each player is measured against a potentially adversarial opponent who can choose a single best policy in hindsight after observing the full policy sequence. Our algorithms feature a combination of Upper Confidence Bound (UCB)-type optimism and fictitious play under the scope of simultaneous policy optimization in a non-stationary environment. When both players adopt the proposed algorithms, their overall optimality gap is $\widetilde{\mathcal{O}}(\sqrt{T})$.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at ICML 2021 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

12/07/2020

Provable Self-Play Algorithms for Competitive Reinforcement Learning

Yu Bai, Chi Jin

Keywords Paper

Reinforcement Learning - Theory

0

0

0

0

14:28

09/07/2020

Learning Zero-Sum Simultaneous-Move Markov Games Using Function Approximation and Correlated Equilibrium

Qiaomin Xie, Yudong Chen, Zhaoran Wang, Zhuoran Yang

Keywords Paper

Reinforcement learning, Planning and control

0

0

0

0

15:16

06/12/2020

Near-Optimal Reinforcement Learning with Self-Play

Yu Bai, Chi Jin, Tiancheng Yu

Keywords Paper

Theory -> Regularization, Applications -> Fairness, Accountability, and Transparency

0

0

0

0

3:33

06/12/2020

Learning Strategy-Aware Linear Classifiers

Yiling Chen, Yang Liu, Chara Podimata

Keywords Paper

0

0

0

0

3:15

02/02/2021

Model-Free Online Learning in Unknown Sequential Decision Making Problems and Games

Gabriele Farina, Tuomas Sandholm

Keywords Paper

0

0

0

0

17:09

18/07/2021

A Sharp Analysis of Model-based Reinforcement Learning with Self-Play

Qinghua Liu, Tiancheng Yu, Yu Bai, Chi Jin

Keywords Paper

Algorithms, Multitask and Transfer Learning, Algorithms, Meta-Learning; Applications, Object Recognition; Data, Challenges, Implementations, and Software, Benchmarks;, Theory, RL, Decisions and Control Theory

0

0

0

0

4:49

06/12/2020

Follow the Perturbed Leader: Optimism and Fast Parallel Algorithms for Smooth Minimax Games

Arun Suggala, Praneeth Netrapalli

Keywords Paper

1

1

0

0

3:29

06/12/2020

Preference-based Reinforcement Learning with Finite-Time Guarantees

Yichong Xu, Ruosong Wang, Lin Yang and
Aarti Singh, Artur Dubrawski

Keywords Paper

0

0

0

0

3:04

02/02/2021

Computing Quantal Stackelberg Equilibrium in Extensive-Form Games

Jakub Černý, Viliam Lisý, Branislav Bošanský, Bo An

Keywords Paper

0

0

0

0

15:01

12/07/2020

Near-optimal Regret Bounds for Stochastic Shortest Path

Aviv Rosenberg, Alon Cohen, Yishay Mansour, Haim Kaplan

Keywords Paper

Online Learning, Active Learning, and Bandits

0

0

0

0

15:57

12/07/2020

Online Learning with Imperfect Hints

Aditya Bhaskara, Ashok Cutkosky, Ravi Kumar, Manish Purohit

Keywords Paper

Online Learning, Active Learning, and Bandits

1

1

1

1

13:17

02/02/2021

Projection-free Online Learning in Dynamic Environments

Yuanyu Wan, Bo Xue, Lijun Zhang

Keywords Paper

0

0

0

0

15:41

03/05/2021

The Importance of Pessimism in Fixed-Dataset Policy Optimization

Jacob Buckman, Carles Gelada, Marc G Bellemare

Keywords Paper

reinforcement learning, offline reinforcement learning, deep learning

0

0

0

0

6:54

18/07/2021

Online Learning in Unknown Markov Games

Yi Tian, Yuanhao Wang, Tiancheng Yu, Suvrit Sra

Keywords Paper

Theory, RL, Decisions and Control Theory

0

0

0

0

5:13

06/12/2020

Minimax Regret of Switching-Constrained Online Convex Optimization: No Phase Transition

Lin Chen, Qian Yu, Hannah Lawrence, Amin Karbasi

Keywords Paper

0

0

0

0

3:19

06/12/2021

Optimal Algorithms for Stochastic Contextual Preference Bandits

Aadirupa Saha

Keywords Paper

bandits

0

0

0

0

16:00

06/12/2020

Pipeline PSRO: A Scalable Approach for Finding Approximate Nash Equilibria in Large Games

Stephen Mcaleer, J.B. Lanier, Roy Fox, Pierre Baldi

Keywords Paper

0

0

0

0

3:12

18/07/2021

Adversarial Dueling Bandits

Aadirupa Saha, Tomer Koren, Yishay Mansour

Keywords Paper

Algorithms, Ranking and Preference Learning

0

0

0

0

5:58

04/08/2021

Adaptive Learning in Continuous Games: Optimal Regret Bounds and Convergence to Nash Equilibrium

Yu-Guan Hsieh, Kimon Antonakopoulos, Panayotis Mertikopoulos

Keywords Paper

0

0

0

0

16:09

06/12/2021

Minimax Regret for Stochastic Shortest Path

Alon Cohen, Yonathan Efroni, Yishay Mansour, Aviv Rosenberg

Keywords Paper

reinforcement learning and planning, online learning

0

0

0

0

14:55

03/08/2020

Randomized Exploration for Non-Stationary Stochastic Linear Bandits

Baekjin Kim, Ambuj Tewari

Keywords Paper

0

0

0

0

7:55

13/04/2021

Contextual blocking bandits

Soumya Basu, Orestis Papadigenopoulos, Constantine Caramanis, Sanjay Shakkottai

Keywords Paper

0

0

0

0

2:47

04/08/2021

Efficient Bandit Convex Optimization: Beyond Linear Losses

Arun Sai Suggala, Pradeep Ravikumar, Praneeth Netrapalli

Keywords Paper

0

0

0

0

20:29

03/08/2020

Regret Bounds for Decentralized Learning in Cooperative Multi-Agent Dynamical Systems

Seyed Mohammad Asghari, Yi Ouyang, Ashutosh Nayyar

Keywords Paper

0

0

0

0

7:49

26/04/2020

Double Neural Counterfactual Regret Minimization

Hui Li, Kailiang Hu, Shaohua Zhang and
Yuan Qi, Le Song

Keywords Paper

Counterfactual Regret Minimization, Imperfect Information game, Neural Strategy, Deep Learning, Robust Sampling

0

0

0

0

4:49

06/12/2020

Adaptive Online Estimation of Piecewise Polynomial Trends

Dheeraj Baby, Yu-Xiang Wang

Keywords Paper

1

1

0

1

3:12

12/07/2020

No-Regret Exploration in Goal-Oriented Reinforcement Learning

Jean Tarbouriech, Evrard Garcelon, Michal Valko and
Matteo Pirotta, Alessandro Lazaric

Keywords Paper

Reinforcement Learning - General

0

0

0

0

11:14

19/08/2021

Temporal Induced Self-Play for Stochastic Bayesian Games

Weizhe Chen, Zihan Zhou, Yi Wu, Fei Fang

Keywords Paper

Agent-based and Multi-agent Systems, Multi-agent Learning, Applications of Reinforcement Learning

0

0

0

0

11:52

06/12/2021

Accommodating Picky Customers: Regret Bound and Exploration Complexity for Multi-Objective Reinforcement Learning

Jingfeng Wu, Vladimir Braverman, Lin Yang

Keywords Paper

theory, reinforcement learning and planning

0

0

0

0

14:22

06/12/2021

Decentralized Q-learning in Zero-sum Markov Games

Muhammed Sayin, Kaiqing Zhang, David Leslie and
Tamer Basar, Asuman Ozdaglar

Keywords Paper

reinforcement learning and planning

0

0

0

0

15:07

06/12/2021

Cooperative Stochastic Bandits with Asynchronous Agents and Constrained Feedback

Lin Yang, Yu-Zhen Janice Chen, Stephen Pasteris and
Mohammad Hajiesmaili, John C. S. Lui, Don Towsley

Keywords Paper

bandits

0

0

0

0

12:07

06/12/2020

Stage-wise Conservative Linear Bandits

Ahmadreza Moradipari, Christos Thrampoulidis, Mahnoosh Alizadeh

Keywords Paper

0

0

0

0

3:18

04/08/2021

Minimax Regret for Stochastic Shortest Path with Adversarial Costs and Known Transition

Liyu Chen, Haipeng Luo, Chen-Yu Wei

Keywords Paper

0

0

0

0

14:48

12/07/2020

Naive Exploration is Optimal for Online LQR

Max Simchowitz, Dylan Foster

Keywords Paper

Reinforcement Learning - Theory

0

0

0

0

15:12

06/12/2021

Policy Optimization in Adversarial MDPs: Improved Exploration via Dilated Bonuses

Haipeng Luo, Chen-Yu Wei, Chung-Wei Lee

Keywords Paper

optimization, reinforcement learning and planning, bandits

0

0

0

0

15:17

06/12/2021

Logarithmic Regret from Sublinear Hints

Aditya Bhaskara, Ashok Cutkosky, Ravi Kumar, Manish Purohit

Keywords Paper

optimization, online learning

0

0

0

0

14:36

02/02/2021

Policy Optimization as Online Learning with Mediator Feedback

Alberto Maria Metelli, Matteo Papini, Pierluca D'Oro, Marcello Restelli

Keywords Paper

0

0

0

0

16:44

18/07/2021

Provably Correct Optimization and Exploration with Non-linear Policies

Fei Feng, Wotao Yin, Alekh Agarwal, Lin Yang

Keywords Paper

Deep Learning, Adversarial Networks, Applications, Fairness, Accountability, and Transparency, Theory, RL, Decisions and Control Theory

0

0

0

0

5:03

04/08/2021

Online Markov Decision Processes with Aggregate Bandit Feedback

Alon Cohen, Haim Kaplan, Tomer Koren, Yishay Mansour

Keywords Paper

0

0

0

0

13:07

06/12/2020

Dynamic Regret of Convex and Smooth Functions

Peng Zhao, Yu-Jie Zhang, Lijun Zhang, Zhi-Hua Zhou

Keywords Paper

1

1

1

1

3:09