Multi-Agent Task-Oriented Dialog Policy Learning with Role-Aware Reward Decomposition

Abstract: Many studies have applied reinforcement learning to train a dialog policy and show great promise these years. One common approach is to employ a user simulator to obtain a large number of simulated user experiences for reinforcement learning algorithms. However, modeling a realistic user simulator is challenging. A rule-based simulator requires heavy domain expertise for complex tasks, and a data-driven simulator requires considerable data and it is even unclear how to evaluate a simulator. To avoid explicitly building a user simulator beforehand, we propose Multi-Agent Dialog Policy Learning, which regards both the system and the user as the dialog agents. Two agents interact with each other and are jointly learned simultaneously. The method uses the actor-critic framework to facilitate pretraining and improve scalability. We also propose Hybrid Value Network for the role-aware reward decomposition to integrate role-specific domain knowledge of each agent in the task-oriented dialog. Results show that our method can successfully build a system policy and a user policy simultaneously, and two agents can achieve a high task success rate through conversational interaction.

03/05/2021

Dong Ki Kim, Miao Liu, Matthew Riemer and
Chuangchuang Sun, Marwa Abdulhai, Golnaz Habibi, Sebastian Lopez-Cot, Gerald Tesauro, Jonathan How

Keywords Paper

Reinforcement Learning and Planning, Multi-Agent RL, Algorithms, Representation Learning, Algorithms, Relational Learning

5:20

01/07/2020

Baohe Zhang, Raghu Rajan, Luis Pineda and
Nathan Lambert, André Biedenkapp, Kurtland Chua, Frank Hutter, Roberto Calandra

Partial Amortization, Model Predictive Control, Planning, Mutual Information, Skill Discovery, World Models, Model-Based Reinforcement Learning

5:10

14/09/2020

Machine Learning, Deep Reinforcement Learning, Transfer, Adaptation, Multi-task Learning, Approximate Probabilistic Inference, Bayesian Networks

12:09

26/04/2020

Agent-based and Multi-agent Systems, Multi-agent Learning, Knowledge Representation Languages, Logics for Knowledge Representation, Reasoning about Knowledge and Belief

9:57

03/05/2021

Domain Adaption, Third-Person Imitation, Observational Imitation, Reinforcement Learning, Machine Learning, Mutual Information, Imitation Learning

4:51

03/05/2021

CausalWorld: A Robotic Manipulation Benchmark for Causal Structure and Transfer Learning

Ossama Ahmed, Frederik Träuble, Anirudh Goyal and
Alexander Neitz, Manuel Wuthrich, Yoshua Bengio, Bernhard Schoelkopf, Stefan Bauer

Abbas Abdolmaleki, Sandy Huang, Leonard Hasenclever and
Michael Neunert, Martina Zambelli, Murilo Martins, Francis Song, Nicolas Heess, Raia Hadsell, Martin Riedmiller

Machine Learning, Transfer, Adaptation, Multi-task Learning, Reinforcement Learning, Incremental Learning, Learning in Robotics

11:02

02/02/2021