Solving Discounted Stochastic Two-Player Games with Near-Optimal Time and Sample Complexity

Abstract: In this paper we settle the sampling complexity of solving discounted two-player turn-based zero-sum stochastic games up to polylogarithmic factors. Given a stochastic game with discount factor $\gamma\in(0,1)$ we provide an algorithm that computes an $\epsilon$-optimal strategy with high-probability given $\tilde{O}((1 - \gamma)^{-3} \epsilon^{-2})$ samples from the transition function for each state-action-pair. Our algorithm runs in time nearly linear in the number of samples and uses space nearly linear in the number of state-action pairs. As stochastic games generalize Markov decision processes (MDPs) our runtime and sample complexities are optimal due to \cite{azar2013minimax}. We achieve our results by showing how to generalize a near-optimal Q-learning based algorithms for MDP, in particular \cite{sidford2018near}, to two-player strategy computation algorithms. This overcomes limitations of standard Q-learning and strategy iteration or alternating minimization based approaches and we hope will pave the way for future reinforcement learning results by facilitating the extension of MDP results to multi-agent settings with little loss.

09/07/2020

Algorithms, Multitask and Transfer Learning, Algorithms, Meta-Learning; Applications, Object Recognition; Data, Challenges, Implementations, and Software, Benchmarks;, Theory, RL, Decisions and Control Theory

4:49

06/12/2020

Solving Discounted Stochastic Two-Player Games with Near-Optimal Time and Sample Complexity

Aaron Sidford, Mengdi Wang, Lin Yang, Yinyu Ye

Comments

Similar Papers

Learning Zero-Sum Simultaneous-Move Markov Games Using Function Approximation and Correlated Equilibrium

Qiaomin Xie, Yudong Chen, Zhaoran Wang, Zhuoran Yang

Keywords Abstract Paper

Reinforcement learning, Planning and control

A Sharp Analysis of Model-based Reinforcement Learning with Self-Play

Qinghua Liu, Tiancheng Yu, Yu Bai, Chi Jin

Keywords Abstract Paper

Algorithms, Multitask and Transfer Learning, Algorithms, Meta-Learning; Applications, Object Recognition; Data, Challenges, Implementations, and Software, Benchmarks;, Theory, RL, Decisions and Control Theory

Model-Based Multi-Agent RL in Zero-Sum Markov Games with Near-Optimal Sample Complexity

Kaiqing Zhang, Sham Kakade, Tamer Basar, Lin Yang

Keywords Abstract Paper

Solving K-MDPs

Jonathan Ferrer-Mestres, Thomas G. Dietterich, Olivier Buffet, Iadine Chadès

Keywords Abstract Paper

state abstraction, interpretability, MDP, computational sustainability

Infinite-Dimensional Optimization for Zero-Sum Games via Variational Transport

Lewis Liu, Yufeng Zhang, Zhuoran Yang and Reza Babanezhad, Zhaoran Wang

Keywords Abstract Paper

Theory, Statistical Learning Theory

Stochastic Gradient Descent-Ascent and Consensus Optimization for Smooth Games: Convergence Analysis under Expected Co-coercivity

Nicolas Loizou, Hugo Berard, Gauthier Gidel and Ioannis Mitliagkas, Simon Lacoste-Julien

Keywords Abstract Paper

Distributionally Robust Federated Averaging

Yuyang Deng, Mohammad Mahdi Kamani, Mehrdad Mahdavi

Keywords Abstract Paper

Towards Tight Bounds on the Sample Complexity of Average-reward MDPs

Yujia Jin, Aaron Sidford

Keywords Abstract Paper

Theory, RL, Decisions and Control Theory

Finite-Sample Analysis of Off-Policy TD-Learning via Generalized Bellman Operators

Zaiwei Chen, Siva Theja Maguluri, Sanjay Shakkottai, Karthikeyan Shanmugam

Keywords Abstract Paper

On the Rate of Convergence of Regularized Learning in Games: From Bandits and Uncertainty to Optimism and Beyond

Angeliki Giannou, Emmanouil-Vasileios Vlatakis-Gkaragkounis, Panayotis Mertikopoulos

Keywords Abstract Paper

optimization, bandits

Provably Efficient Algorithms for Multi-Objective Competitive RL

Tiancheng Yu, Yi Tian, Jingzhao Zhang, Suvrit Sra

Keywords Abstract Paper

Theory, RL, Decisions and Control Theory

Adaptive Sampling for Best Policy Identification in Markov Decision Processes

Aymen Al Marjani, Alexandre Proutiere

Keywords Abstract Paper

Theory, RL, Decisions and Control Theory

A Tight and Unified Analysis of Gradient-Based Methods for a Whole Spectrum of Differentiable Games

Waïss Azizian, Ioannis Mitliagkas, Simon Lacoste-Julien, Gauthier Gidel

Keywords Abstract Paper

Sample Efficient Reinforcement Learning via Low-Rank Matrix Estimation

Devavrat Shah, Dogyoon Song, Zhi Xu, Yuzhe Yang

Keywords Abstract Paper

Evolution Strategies for Approximate Solution of Bayesian Games

Zun Li, Michael P. Wellman

Keywords Abstract Paper

Lower Bounds and Optimal Algorithms for Smooth and Strongly Convex Decentralized Optimization Over Time-Varying Networks

Dmitry Kovalev, Elnur Gasanov, Alexander Gasnikov, Peter Richtarik

Keywords Abstract Paper

Fourier Sparse Leverage Scores and Approximate Kernel Learning

Tamas Erdelyi, Cameron Musco, Christopher Musco

Keywords Abstract Paper

Fast Policy Extragradient Methods for Competitive Games with Entropy Regularization

Shicong Cen, Yuting Wei, Yuejie Chi

Keywords Abstract Paper

theory, optimization, reinforcement learning and planning

XDO: A Double Oracle Algorithm for Extensive-Form Games

Stephen McAleer, JB Lanier, Kevin A Wang and Pierre Baldi, Roy Fox

Keywords Abstract Paper

Finite-Sample Analysis of Off-Policy Natural Actor-Critic Algorithm

sajad khodadadian, Zaiwei Chen, Siva Maguluri

Keywords Abstract Paper

List-Decodable Mean Estimation in Nearly-PCA Time

Ilias Diakonikolas, Daniel Kane, Daniel Kongsgaard and Jerry Li, Kevin Tian

Keywords Abstract Paper

theory, clustering

Randomized Algorithms for Submodular Function Maximization with a $k$-System Constraint

Shuang Cui, Kai Han, Tianshuai Zhu and Jing Tang, Benwei Wu, He Huang

Keywords Abstract Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Lewis Liu, Yufeng Zhang, Zhuoran Yang and
Reza Babanezhad, Zhaoran Wang

Keywords Paper

Nicolas Loizou, Hugo Berard, Gauthier Gidel and
Ioannis Mitliagkas, Simon Lacoste-Julien

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Stephen McAleer, JB Lanier, Kevin A Wang and
Pierre Baldi, Roy Fox

Keywords Paper

Keywords Paper

Ilias Diakonikolas, Daniel Kane, Daniel Kongsgaard and
Jerry Li, Kevin Tian

Keywords Paper

Shuang Cui, Kai Han, Tianshuai Zhu and
Jing Tang, Benwei Wu, He Huang

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Tanner Fiez, Lillian Ratliff, Eric Mazumdar and
Evan Faulkner, Adhyyan Narang

Keywords Paper

Tongzheng Ren, Jialian Li, Bo Dai and
Simon Du, Sujay Sanghavi

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Nicolas Loizou, Hugo Berard, Alexia Jolicoeur-Martineau and
Pascal Vincent, Simon Lacoste-Julien, Ioannis Mitliagkas

Keywords Paper

Kush Bhatia, Ashwin Pananjady, Peter Bartlett and
Anca Dragan, Martin Wainwright

Keywords Paper

Keywords Paper

Keywords Paper