Combining Deep Reinforcement Learning and Search for Imperfect-Information Games

06/12/2020

Combining Deep Reinforcement Learning and Search for Imperfect-Information Games

Noam Brown, Anton Bakhtin, Adam Lerer, Qucheng Gong

Keywords: Algorithms -> Density Estimation; Algorithms -> Similarity and Distance Learning; Applications -> Computer Vision; Theory, Deep Learning

Abstract Paper Similar Papers

Abstract: The combination of deep reinforcement learning and search at both training and test time is a powerful paradigm that has led to a number of successes in single-agent settings and perfect-information games, best exemplified by AlphaZero. However, prior algorithms of this form cannot cope with imperfect-information games. This paper presents ReBeL, a general framework for self-play reinforcement learning and search that provably converges to a Nash equilibrium in any two-player zero-sum game. In the simpler setting of perfect-information games, ReBeL reduces to an algorithm similar to AlphaZero. Results in two different imperfect-information games show ReBeL converges to an approximate Nash equilibrium. We also show ReBeL achieves superhuman performance in heads-up no-limit Texas hold'em poker, while using far less domain knowledge than any prior poker AI.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at NeurIPS 2020 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

26/04/2020

Double Neural Counterfactual Regret Minimization

Hui Li, Kailiang Hu, Shaohua Zhang and
Yuan Qi, Le Song

Keywords Paper

Counterfactual Regret Minimization, Imperfect Information game, Neural Strategy, Deep Learning, Robust Sampling

0

0

0

0

4:49

19/08/2021

Combining Tree Search and Action Prediction for State-of-the-Art Performance in DouDiZhu

Yunsheng Zhang, Dong Yan, Bei Shi and
Haobo Fu, Qiang Fu, Hang Su, Jun Zhu, Ning Chen

Keywords Paper

Machine Learning, Reinforcement Learning, Game Playing and Machine Learning

0

0

0

0

12:03

06/12/2021

Subgame solving without common knowledge

Brian Zhang, Tuomas Sandholm

Keywords Paper

0

0

0

0

14:40

06/12/2021

XDO: A Double Oracle Algorithm for Extensive-Form Games

Stephen McAleer, JB Lanier, Kevin A Wang and
Pierre Baldi, Roy Fox

Keywords Paper

reinforcement learning and planning

0

0

0

0

14:51

06/12/2021

Neural Auto-Curricula in Two-Player Zero-Sum Games

Xidong Feng, Oliver Slumbers, Ziyu Wan and
Bo Liu, Stephen McAleer, Ying Wen, Jun Wang, Yaodong Yang

Keywords Paper

deep learning, optimization, reinforcement learning and planning, meta learning

0

0

0

0

14:46

12/07/2020

Sparsified Linear Programming for Zero-Sum Equilibrium Finding

Brian Zhang, Tuomas Sandholm

Keywords Paper

Learning Theory

0

0

0

0

13:27

18/07/2021

DouZero: Mastering DouDizhu with Self-Play Deep Reinforcement Learning

Daochen Zha, Jingru Xie, Wenye Ma and
Sheng Zhang, Xiangru Lian, Xia Hu, Ji Liu

Keywords Paper

Reinforcement Learning and Planning, Deep RL

0

0

0

0

5:20

12/07/2020

Provable Self-Play Algorithms for Competitive Reinforcement Learning

Yu Bai, Chi Jin

Keywords Paper

Reinforcement Learning - Theory

0

0

0

0

14:28

06/12/2020

Pipeline PSRO: A Scalable Approach for Finding Approximate Nash Equilibria in Large Games

Stephen Mcaleer, J.B. Lanier, Roy Fox, Pierre Baldi

Keywords Paper

0

0

0

0

3:12

06/12/2021

Scalable Online Planning via Reinforcement Learning Fine-Tuning

Arnaud Fickinger, Hengyuan Hu, Brandon Amos and
Stuart Russell, Noam Brown

Keywords Paper

deep learning, reinforcement learning and planning

0

0

0

0

8:04

18/07/2021

SCC: an efficient deep reinforcement learning agent mastering the game of StarCraft II

Xiangjun Wang, Junxiao SONG, Penghui Qi and
Peng Peng, Zhenkun Tang, Wei Zhang, Weimin Li, Xiongjun Pi, Jujie He, Chao Gao, Haitao Long, Quan Yuan

Keywords Paper

Reinforcement Learning and Planning, Deep RL

0

0

0

0

5:11

02/02/2021

Finding and Certifying (Near-)Optimal Strategies in Black-Box Extensive-Form Games

Brian Hu Zhang, Tuomas Sandholm

Keywords Paper

0

0

0

0

15:00

12/07/2020

Converging to Team-Maxmin Equilibria in Zero-Sum Multiplayer Games

Youzhi Zhang, Bo An

Keywords Paper

Learning Theory

0

0

0

0

15:52

02/02/2021

Convergence Analysis of No-Regret Bidding Algorithms in Repeated Auctions

Zhe Feng, Guru Guruganesh, Christopher Liaw and
Aranyak Mehta, Abhishek Sethi

Keywords Paper

0

0

0

0

20:14

13/04/2021

A limited-capacity minimax theorem for non-convex games or: How i learned to stop worrying about mixed-nash and love neural nets

Gauthier Gidel, David Balduzzi, Wojciech Czarnecki and
Marta Garnelo, Yoram Bachrach

Keywords Paper

0

0

0

0

2:51

06/12/2020

Learning to Play No-Press Diplomacy with Best Response Policy Iteration

Thomas Anthony, Tom Eccles, Andrea Tacchetti and
János Kramár, Ian Gemp, Thomas Hudson, Nicolas Porcel, Marc Lanctot, Julien Perolat, Richard Everett, Satinder Singh, Thore Graepel, Yoram Bachrach

Keywords Paper

0

0

0

0

3:23

04/08/2021

Learning in Matrix Games can be Arbitrarily Complex

Gabriel P Andrade, Rafael Frongillo, Georgios Piliouras

Keywords Paper

0

0

0

0

14:59

06/12/2021

Exploration-Exploitation in Multi-Agent Competition: Convergence with Bounded Rationality

Stefanos Leonardos, Georgios Piliouras, Kelly Spendlove

Keywords Paper

reinforcement learning and planning

0

0

0

0

14:11

18/07/2021

Sample Efficient Reinforcement Learning In Continuous State Spaces: A Perspective Beyond Linearity

Dhruv Malik, Aldo Pacchiano, Vishwak Srinivasan, Yuanzhi Li

Keywords Paper

Deep Learning, Deep Learning, Efficient Training Methods; Deep Learning, Optimization for Deep Networks, Theory, RL, Decisions and Control Theory

0

0

0

0

5:11

06/12/2021

Exploiting Opponents Under Utility Constraints in Sequential Games

Martino Bernasconi-de-Luca, Federico Cacciamani, Simone Fioravanti and
Nicola Gatti, Alberto Marchesi, Francesco Trovò

Keywords Paper

online learning

0

0

0

0

12:59

06/12/2021

Learning in two-player zero-sum partially observable Markov games with perfect recall

Tadashi Kozuno, Pierre Ménard, Remi Munos, Michal Valko

Keywords Paper

reinforcement learning and planning, bandits, online learning

0

0

0

0

9:31

06/12/2020

Small Nash Equilibrium Certificates in Very Large Games

Brian Zhang, Tuomas Sandholm

Keywords Paper

0

0

0

0

3:16

02/02/2021

Double Oracle Algorithm for Computing Equilibria in Continuous Games

Lukáš Adam, Rostislav Horčík, Tomáš Kasl, Tomáš Kroupa

Keywords Paper

0

0

0

0

20:40

03/05/2021

Taming GANs with Lookahead-Minmax

Tatjana Chavdarova, Matteo Pagliardini, Sebastian Stich and
François Fleuret, Martin Jaggi

Keywords Paper

Generative Adversarial Networks, Minmax

0

0

0

0

5:25

18/07/2021

Learning in Nonzero-Sum Stochastic Games with Potentials

David Mguni, Yutong Wu, Yali Du and
Yaodong Yang, Ziyi Wang, M. Li, Ying Wen, Joel Jennings, Jun Wang

Keywords Paper

Theory, Game Theory and Computational Economics

0

0

0

0

5:36

06/12/2020

Towards Playing Full MOBA Games with Deep Reinforcement Learning

Deheng Ye, Guibin Chen, Wen Zhang and
Sheng Chen, Bo Yuan, Bo Liu, Jia Chen, Zhao Liu, Fuhao Qiu, Hongsheng Yu, Yinyuting Yin, Bei Shi, Liang Wang, Tengfei Shi, Qiang Fu, Wei Yang, Lanxiao Huang, Wei Liu

Keywords Paper

0

0

0

0

3:22

06/12/2021

Learning Diverse Policies in MOBA Games via Macro-Goals

Yiming Gao, Bei Shi, Xueying Du and
Liang Wang, Guangwei Chen, Zhenjie Lian, Fuhao Qiu, GUOAN HAN, Weixuan Wang, Deheng Ye, Qiang Fu, Wei Yang, Lanxiao Huang

Keywords Paper

reinforcement learning and planning

0

0

0

0

6:49

12/07/2020

ConQUR: Mitigating Delusional Bias in Deep Q-Learning

DiJia Su, Jayden Ooi, Tyler Lu and
Dale Schuurmans, Craig Boutilier

Keywords Paper

Reinforcement Learning - General

0

0

0

0

15:04

18/07/2021

Robust Learning-Augmented Caching: An Experimental Study

Jakub Chłędowski, Adam Polak, Bartosz Szabucki, Konrad Zolna

Keywords Paper

Applications

0

0

0

0

4:52

06/12/2021

A Winning Hand: Compressing Deep Networks Can Improve Out-of-Distribution Robustness

James Diffenderfer, Brian Bartoldson, Shreya Chaganti and
Jize Zhang, Bhavya Kailkhura

Keywords Paper

deep learning, robustness

0

0

0

0

10:43

02/02/2021

Escaping Local Optima with Non-Elitist Evolutionary Algorithms

Duc-Cuong Dang, Anton Eremeev, Per Kristian Lehre

Keywords Paper

0

0

0

0

20:33

06/12/2021

No-Press Diplomacy from Scratch

Anton Bakhtin, David Wu, Adam Lerer, Noam Brown

Keywords Paper

theory, reinforcement learning and planning

0

0

0

0

12:37

03/05/2021

Human-Level Performance in No-Press Diplomacy via Equilibrium Search

Jonathan Gray, Adam Lerer, Anton Bakhtin, Noam Brown

Keywords Paper

reinforcement learning, game theory, no-regret learning, regret minimization, multi-agent systems

0

0

0

0

15:10

09/07/2020

Learning Zero-Sum Simultaneous-Move Markov Games Using Function Approximation and Correlated Equilibrium

Qiaomin Xie, Yudong Chen, Zhaoran Wang, Zhuoran Yang

Keywords Paper

Reinforcement learning, Planning and control

0

0

0

0

15:16

19/08/2021

Temporal Induced Self-Play for Stochastic Bayesian Games

Weizhe Chen, Zihan Zhou, Yi Wu, Fei Fang

Keywords Paper

Agent-based and Multi-agent Systems, Multi-agent Learning, Applications of Reinforcement Learning

0

0

0

0

11:52

18/07/2021

Modelling Behavioural Diversity for Learning in Open-Ended Games

Nicolas Perez-Nieves, Yaodong Yang, Oliver Slumbers and
David Mguni, Ying Wen, Jun Wang

Keywords Paper

Theory, Game Theory and Computational Economics

0

0

0

0

17:06

06/12/2021

Combinatorial Pure Exploration with Bottleneck Reward Function

Yihan Du, Yuko Kuroki, Wei Chen

Keywords Paper

theory, reinforcement learning and planning, bandits

0

0

0

0

11:53

02/02/2021

Safe Search for Stackelberg Equilibria in Extensive-Form Games

Chun Kai Ling, Noam Brown

Keywords Paper

0

0

0

0

18:49

18/07/2021

Continuous Coordination As a Realistic Scenario for Lifelong Learning

Hadi Nekoei, Akilesh Badrinaaraayanan, Aaron Courville, Sarath Chandar

Keywords Paper

Algorithms, Continual Learning

0

0

0

0

5:27

06/12/2020

Theory-Inspired Path-Regularized Differential Network Architecture Search

Pan Zhou, Caiming Xiong, Richard Socher, Steven Hoi

Keywords Paper

0

0

0

0

3:18