Promoting Coordination through Policy Regularization in Multi-Agent Deep Reinforcement Learning

Abstract: In multi-agent reinforcement learning, discovering successful collective behaviors is challenging as it requires exploring a joint action space that grows exponentially with the number of agents. While the tractability of independent agent-wise exploration is appealing, this approach fails on tasks that require elaborate group strategies. We argue that coordinating the agents' policies can guide their exploration and we investigate techniques to promote such an inductive bias. We propose two policy regularization methods: TeamReg, which is based on inter-agent action predictability and CoachReg that relies on synchronized behavior selection. We evaluate each approach on four challenging continuous control tasks with sparse rewards that require varying levels of coordination as well as on the discrete action Google Research Football environment. Our experiments show improved performance across many cooperative multi-agent problems. Finally, we analyze the effects of our proposed methods on the policies that our agents learn and show that our methods successfully enforce the qualities that we propose as proxies for coordinated behaviors.

02/02/2021

NeuralAC: Learning Cooperation and Competition Effects for Match Outcome Prediction

Yin Gu, Qi Liu, Kai Zhang and
Zhenya Huang, Runze Wu, Jianrong Tao

Tim Seyde, Igor Gilitschenski, Wilko Schwarting and
Bartolomeo Stellato, Martin Riedmiller, Markus Wulfmeier, Daniela Rus

Multi-Agent Deep Reinforcement Learning, Deep Reinforcement Learning, Leader–Follower Markov Game, Expensive Coordination

3:55

06/12/2020

Abbas Abdolmaleki, Sandy Huang, Leonard Hasenclever and
Michael Neunert, Martina Zambelli, Murilo Martins, Francis Song, Nicolas Heess, Raia Hadsell, Martin Riedmiller

Zhe Liu, Lina Yao, Lei Bai and
Xianzhi Wang, Can Wang

Tianyu Pang, Xiao Yang, Yinpeng Dong and
Kun Xu, Jun Zhu, Hang Su

Vivek Veeriah, Tom Zahavy, Matteo Hessel and
Zhongwen Xu, Junhyuk Oh, Iurii Kemaev, Hado van Hasselt, David Silver, Satinder Singh

David Mguni, Yutong Wu, Yali Du and
Yaodong Yang, Ziyi Wang, M. Li, Ying Wen, Joel Jennings, Jun Wang

CAGLAR Gulcehre, Ziyu Wang, Alexander Novikov and
Thomas Paine, Sergio Gómez, Konrad Zolna, Rishabh Agarwal, Josh Merel, Daniel Mankowitz, Cosmin Paduraru, Gabriel Dulac-Arnold, Jerry Li, Mohammad Norouzi, Matthew Hoffman, Nicolas Heess, Nando de Freitas

Optimization, Non-Convex Optimization, Reinforcement Learning and Planning, Neuroscience and Cognitive Science, Reasoning; Optimization, Combinatorial Optimization; Reinforcement Learning and Plannin

5:24

02/02/2021