Learning Nash Equilibria in Zero-Sum Stochastic Games via Entropy-Regularized Policy Approximation

19/08/2021

Learning Nash Equilibria in Zero-Sum Stochastic Games via Entropy-Regularized Policy Approximation

Yue Guan, Qifan Zhang, Panagiotis Tsiotras

Keywords: Machine Learning, Reinforcement Learning, Multi-agent Learning, Noncooperative Games

Abstract Paper Similar Papers

Abstract: We explore the use of policy approximations to reduce the computational cost of learning Nash equilibria in zero-sum stochastic games. We propose a new Q-learning type algorithm that uses a sequence of entropy-regularized soft policies to approximate the Nash policy during the Q-function updates. We prove that under certain conditions, by updating the entropy regularization, the algorithm converges to a Nash equilibrium. We also demonstrate the proposed algorithm's ability to transfer previous training experiences, enabling the agents to adapt quickly to new environments. We provide a dynamic hyper-parameter scheduling scheme to further expedite convergence. Empirical results applied to a number of stochastic games verify that the proposed algorithm converges to the Nash equilibrium, while exhibiting a major speed-up over existing algorithms.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at IJCAI 2021 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

18/07/2021

Learning While Playing in Mean-Field Games: Convergence and Optimality

Qiaomin Xie, Zhuoran Yang, Zhaoran Wang, Andreea Minca

Keywords Paper

Applications, Privacy, Anonymity, and Security, Algorithms, Components Analysis (e.g., CCA, ICA, LDA, PCA), Reinforcement Learning and Planning, Multi-Agent RL

0

0

0

0

5:24

26/04/2020

Actor-Critic Provably Finds Nash Equilibria of Linear-Quadratic Mean-Field Games

Zuyue Fu, Zhuoran Yang, Yongxin Chen, Zhaoran Wang

Keywords Paper

0

0

0

0

5:09

06/12/2020

Stochastic Optimization for Performative Prediction

Celestine Mendler-Dünner, Juan Perdomo, Tijana Zrnic, Moritz Hardt

Keywords Paper

0

0

0

0

3:17

18/07/2021

Provably Efficient Algorithms for Multi-Objective Competitive RL

Tiancheng Yu, Yi Tian, Jingzhao Zhang, Suvrit Sra

Keywords Paper

Theory, RL, Decisions and Control Theory

0

0

0

0

17:04

12/07/2020

Fast computation of Nash Equilibria in Imperfect Information Games

Remi Munos, Julien Perolat, Jean-Baptiste Lespiau and
Mark Rowland, Bart De Vylder, Marc Lanctot, Finbarr Timbers, Daniel Hennes, Shayegan Omidshafiei, Audrunas Gruslys, Mohammad Gheshlaghi Azar, Edward Lockhart, Karl Tuyls

Keywords Paper

Learning Theory

0

0

0

0

8:37

13/04/2021

Reinforcement learning for constrained markov decision processes

Ather Gattami, Qinbo Bai, Vaneet Aggarwal

Keywords Paper

0

0

0

0

3:08

03/05/2021

Global optimality of softmax policy gradient with single hidden layer neural networks in the mean-field regime

Andrea Agazzi, Jianfeng Lu

Keywords Paper

policy gradient, mean-field dynamics, entropy regularization, neural networks

0

0

0

0

5:09

18/07/2021

Global Convergence of Policy Gradient for Linear-Quadratic Mean-Field Control/Game in Continuous Time

Weichen Wang, Jiequn Han, Zhuoran Yang, Zhaoran Wang

Keywords Paper

Theory

0

0

0

0

5:03

06/12/2021

Efficient Generalization with Distributionally Robust Learning

Soumyadip Ghosh, Mark Squillante, Ebisa Wollega

Keywords Paper

optimization, machine learning

0

0

0

0

14:57

12/07/2020

Sequential Transfer in Reinforcement Learning with a Generative Model

Andrea Tirinzoni, Riccardo Poiani, Marcello Restelli

Keywords Paper

Reinforcement Learning - General

0

0

0

0

10:54

18/07/2021

Robust Reinforcement Learning using Least Squares Policy Iteration with Provable Performance Guarantees

Kishan Panaganti, Dileep Kalathil

Keywords Paper

Theory, RL, Decisions and Control Theory

0

0

0

0

5:15

13/04/2021

Logistic q-learning

Joan Bas-Serrano, Sebastian Curi, Andreas Krause, Gergely Neu

Keywords Paper

0

0

0

0

2:44

06/12/2020

Deep Inverse Q-learning with Constraints

Gabriel Kalweit, Maria Huegle, Moritz Werling, Joschka Boedecker

Keywords Paper

0

0

0

0

3:14

06/12/2020

Sharper Generalization Bounds for Pairwise Learning

Yunwen Lei, Antoine Ledent, Marius Kloft

Keywords Paper

0

0

0

0

3:20

06/12/2020

Sample Complexity of Asynchronous Q-Learning: Sharper Analysis and Variance Reduction

Gen Li, Yuting Wei, Yuejie Chi and
Yuantao Gu, Yuxin Chen

Keywords Paper

0

0

0

0

3:06

16/11/2020

Harnessing Distribution Ratio Estimators for Learning Agents with Quality and Diversity

Tanmay Gangwani, Jian Peng, Yuan Zhou

Keywords Paper

0

0

0

0

4:27

06/12/2021

Non-Asymptotic Analysis for Two Time-scale TDC with General Smooth Function Approximation

Yue Wang, Shaofeng Zou, Yi Zhou

Keywords Paper

reinforcement learning and planning

0

0

0

0

14:28

02/02/2021

An Efficient Algorithm for Deep Stochastic Contextual Bandits

Tan Zhu, Guannan Liang, Chunjiang Zhu and
Haining Li, Jinbo Bi

Keywords Paper

0

0

0

0

14:36

04/08/2021

Adaptivity in Adaptive Submodularity

Hossein Esfandiari, Amin Karbasi, Vahab Mirrokni

Keywords Paper

0

0

0

0

13:54

03/08/2020

No-regret Exploration in Contextual Reinforcement Learning

Aditya Modi, Ambuj Tewari

Keywords Paper

0

0

0

0

8:19

06/12/2021

A Max-Min Entropy Framework for Reinforcement Learning

Seungyul Han, Youngchul Sung

Keywords Paper

optimization, reinforcement learning and planning

0

0

0

0

14:35

18/07/2021

A Regret Minimization Approach to Iterative Learning Control

Naman Agarwal, Elad Hazan, Anirudha Majumdar, Karan Singh

Keywords Paper

Reinforcement Learning and Planning, Planning and Control

0

0

0

0

5:13

26/08/2020

Finite-Time Analysis of Decentralized Temporal-Difference Learning with Linear Function Approximation

Jun Sun, Gang Wang, Georgios B. Giannakis and
Qinmin Yang, Zaiyue Yang

Keywords Paper

0

0

0

0

17:07

06/12/2020

Lifelong Policy Gradient Learning of Factored Policies for Faster Training Without Forgetting

Jorge Mendez, Boyu Wang, Eric Eaton

Keywords Paper

0

0

0

0

3:24

06/12/2020

Submodular Meta-Learning

Arman Adibi, Aryan Mokhtari, Hamed Hassani

Keywords Paper

0

0

0

0

3:17

03/05/2021

Optimal Rates for Averaged Stochastic Gradient Descent under Neural Tangent Kernel Regime

Atsushi Nitanda, Taiji Suzuki

Keywords Paper

stochastic gradient descent, neural tangent kernel, over-parameterization, two-layer neural network

0

0

0

0

18:48

06/12/2020

Reconciling Modern Deep Learning with Traditional Optimization Analyses: The Intrinsic Learning Rate

Zhiyuan Li, Kaifeng Lyu, Sanjeev Arora

Keywords Paper

0

0

0

0

3:23

26/08/2020

Finite-Time Error Bounds for Biased Stochastic Approximation with Applications to Q-Learning

Gang Wang, Georgios B. Giannakis

Keywords Paper

0

0

0

0

14:03

12/07/2020

Provably Efficient Model-based Policy Adaptation

Yuda Song, Aditi Mavalankar, Wen Sun, Sicun Gao

Keywords Paper

Reinforcement Learning - Deep RL

0

0

0

0

15:49

13/04/2021

Learning infinite-horizon average-reward MDPs with linear function approximation

Chen-Yu Wei, Mehdi Jafarnia Jahromi, Haipeng Luo, Rahul Jain

Keywords Paper

0

0

0

0

3:23

06/12/2021

Differential Privacy Dynamics of Langevin Diffusion and Noisy Gradient Descent

Rishav Chourasia, Jiayuan Ye, Reza Shokri

Keywords Paper

optimization, privacy

0

0

0

0

14:55

03/05/2021

Entropic gradient descent algorithms and wide flat minima

Fabrizio Pittorino, Carlo Lucibello, Christoph Feinauer and
Gabriele Perugini, Carlo Baldassi, Elizaveta Demyanenko, Riccardo Zecchina

Keywords Paper

flat minima, belief-propagation, statistical physics, entropic algorithms

0

0

0

0

5:38

26/04/2020

Sample Efficient Policy Gradient Methods with Recursive Variance Reduction

Pan Xu, Felicia Gao, Quanquan Gu

Keywords Paper

Policy Gradient, Reinforcement Learning, Sample Efficiency

0

0

0

0

4:40

19/08/2021

Reinforcement Learning for Route Optimization with Robustness Guarantees

Tobias Jacobs, Francesco Alesiani, Gulcin Ermis

Keywords Paper

Machine Learning, Deep Reinforcement Learning, Planning under Uncertainty, Applications of Reinforcement Learning

0

0

0

0

13:04

26/04/2020

Ranking Policy Gradient

Kaixiang Lin, Jiayu Zhou

Keywords Paper

Sample-efficient reinforcement learning, off-policy learning.

0

0

0

0

5:43

03/08/2020

Finite-Memory Near-Optimal Learning for Markov Decision Processes with Long-Run Average Reward

Jan Kretinsky, Fabian Michel, Lukas Michel, Guillermo Perez

Keywords Paper

0

0

0

0

8:00

12/07/2020

Learning to Score Behaviors for Guided Policy Optimization

Aldo Pacchiano, Jack Parker-Holder, Yunhao Tang and
Krzysztof Choromanski, Anna Choromanska, Michael Jordan

Keywords Paper

Reinforcement Learning - General

0

0

0

0

14:10

12/07/2020

Tightening Exploration in Upper Confidence Reinforcement Learning

Hippolyte Bourel, Odalric-Ambrym Maillard, Mohammad Sadegh Talebi

Keywords Paper

Reinforcement Learning - General

0

0

0

0

16:14

26/08/2020

A Distributional Analysis of Sampling-Based Reinforcement Learning Algorithms

Philip Amortila, Doina Precup, Prakash Panangaden, Marc G. Bellemare

Keywords Paper

0

0

0

0

15:15

19/08/2021

Learning in Markets: Greed Leads to Chaos but Following the Price is Right

Yun Kuen Cheung, Stefanos Leonardos, Georgios Piliouras

Keywords Paper

Agent-based and Multi-agent Systems, Economic Paradigms, Auctions and Market-Based Systems, Multi-agent Learning, Noncooperative Games

0

0

0

0

15:26