Towards Unified Dialogue System Evaluation: A Comprehensive Analysis of Current Evaluation Protocols

01/07/2020

Towards Unified Dialogue System Evaluation: A Comprehensive Analysis of Current Evaluation Protocols

Sarah E. Finch, Jinho D. Choi

Keywords:

Abstract Paper Similar Papers

Abstract: As conversational AI-based dialogue management has increasingly become a trending topic, the need for a standardized and reliable evaluation procedure grows even more pressing. The current state of affairs suggests various evaluation protocols to assess chat-oriented dialogue management systems, rendering it difficult to conduct fair comparative studies across different approaches and gain an insightful understanding of their values. To foster this research, a more robust evaluation protocol must be set in place. This paper presents a comprehensive synthesis of both automated and human evaluation methods on dialogue systems, identifying their shortcomings while accumulating evidence towards the most effective evaluation dimensions. A total of 20 papers from the last two years are surveyed to analyze three types of evaluation protocols: automated, static, and interactive. Finally, the evaluation dimensions used in these papers are compared against our expert evaluation on the system-user dialogue data collected from the Alexa Prize 2020.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at SIGDIAL 2020 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

01/07/2020

Sketch-Fill-A-R: A Persona-Grounded Chit-Chat Generation Framework

Michael Shum, Stephan Zheng, Wojciech Kryscinski and
Caiming Xiong, Richard Socher

Keywords Paper

0

0

0

0

13:56

04/07/2020

Image-Chat: Engaging Grounded Conversations

Kurt Shuster, Samuel Humeau, Antoine Bordes, Jason Weston

Keywords Paper

large-scale architectures, IGC task, neural architectures, image representations

0

0

0

0

11:35

16/11/2020

BiST: Bi-directional Spatio-Temporal Reasoning for Video-Grounded Dialogues

Hung Le, Doyen Sahoo, Nancy Chen, Steven C.H. Hoi

Keywords Paper

video-grounded dialogues, high-resolution queries, video setting, bi-directional learning

0

0

0

0

11:05

04/07/2020

uBLEU: Uncertainty-Aware Automatic Evaluation Method for Open-Domain Dialogue Systems

Tsuta Yuma, Naoki Yoshinaga, Masashi Toyoda

Keywords Paper

Open-Domain Systems, uBLEU, Uncertainty-Aware Method, ΔBLEU

0

0

0

0

11:07

04/07/2020

Speaker Sensitive Response Evaluation Model

JinYeong Bak, Alice Oh

Keywords Paper

Speaker Model, Automatic generation, open-domain generation, automatic models

0

0

0

0

10:40

08/12/2020

Recent Neural Methods on Slot Filling and Intent Classification for Task-Oriented Dialogue Systems: A Survey

Samuel Louvan, Bernardo Magnini

Keywords Paper

0

0

0

0

14:44

04/07/2020

Dense-Caption Matching and Frame-Selection Gating for Temporal Localization in VideoQA

Hyounghun Kim, Zineng Tang, Mohit Bansal

Keywords Paper

Dense-Caption Matching, Temporal VideoQA, answering questions, frame problem

0

0

0

0

10:56

16/11/2020

What is More Likely to Happen Next? Video-and-Language Future Event Prediction

Jie Lei, Licheng Yu, Tamara Berg, Mohit Bansal

Keywords Paper

video-and-language prediction, ai models, vlep, adversarial procedure

0

0

0

0

11:58

23/08/2020

Towards building an intelligent chatbot for customer service: Learning to respond at the appropriate time

Che Liu, Junfeng Jiang, Chao Xiong and
Yi Yang, Jieping Ye

Keywords Paper

customer service, triggering model, chatbot, self-supervised learning

0

0

0

0

10:34

16/11/2020

Multi-View Sequence-to-Sequence Models with Conversational Structure for Abstractive Dialogue Summarization

Jiaao Chen, Diyi Yang

Keywords Paper

text summarization, nlp, summarizing text, human-humanmachine interaction

0

0

0

0

12:02

19/04/2021

I beg to differ: A study of constructive disagreement in online conversations

Christine De Kock, Andreas Vlachos

Keywords Paper

0

0

0

0

11:37

06/12/2020

How Can I Explain This to You? An Empirical Study of Deep Neural Network Explanation Methods

Jeya Vikranth Jeyakumar, Joseph Noor, Yu-Hsi Cheng and
Luis Garcia, Mani Srivastava

Keywords Paper

0

0

0

0

3:19

06/12/2021

Is Automated Topic Model Evaluation Broken? The Incoherence of Coherence

Alexander Hoyle, Pranav Goel, Andrew Hian-Cheong and
Denis Peskov, Jordan Boyd-Graber, Philip Resnik

Keywords Paper

0

0

0

0

15:00

19/04/2021

Recipes for building an open-domain chatbot

Stephen Roller, Emily Dinan, Naman Goyal and
Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Eric Michael Smith, Y-Lan Boureau, Jason Weston

Keywords Paper

0

0

0

1

11:33

04/07/2020

ConvLab-2: An Open-Source Toolkit for Building, Evaluating, and Diagnosing Dialogue Systems

Qi Zhu, Zheng Zhang, Yan Fang and
Xiang Li, Ryuichi Takanobu, Jinchao Li, Baolin Peng, Jianfeng Gao, Xiaoyan Zhu, Minlie Huang

Keywords Paper

end-to-end evaluation, dialogue systems, error analysis, ConvLab-2

0

0

0

0

10:56

06/12/2020

Dynamic Fusion of Eye Movement Data and Verbal Narrations in Knowledge-rich Domains

Zhan Shaw, Qi Yu, Rui Li and
Pengcheng Shi, Anne Haake

Keywords Paper

0

0

0

0

3:22

04/07/2020

Large Scale Multi-Actor Generative Dialog Modeling

Alex Boyd, Raul Puri, Mohammad Shoeybi and
Mostofa Patwary, Bryan Catanzaro

Keywords Paper

Large Modeling, generation, style matching, automatic evaluations

0

0

0

0

11:49

04/07/2020

Multi-Domain Dialogue Acts and Response Co-Generation

Kai Wang, Junfeng Tian, Rui Wang and
Xiaojun Quan, Jianxing Yu

Keywords Paper

Generating responses, task-oriented systems, response generation, automatic evaluations

0

0

0

1

10:01

19/08/2021

Mental Models of AI Agents in a Cooperative Game Setting (Extended Abstract)

Katy Ilonka Gero, Zahra Ashktorab, Casey Dugan and
Qian Pan, James Johnson, Werner Geyer, Maria Ruiz, Sarah Miller, David R. Millen, Murray Campbell, Sadhana Kumaravel, Wei Zhang

Keywords Paper

Humans and AI, Human-AI Collaboration, Human-Computer Interaction, Game Playing, Cognitive Modeling

0

0

0

0

15:08

19/08/2021

Reasoning-Based Learning of Interpretable ML Models

Alexey Ignatiev, Joao Marques-Silva, Nina Narodytska, Peter J. Stuckey

Keywords Paper

Constraints and SAT, General, General, General

0

0

0

0

14:43

02/02/2021

Converse, Focus and Guess - Towards Multi-Document Driven Dialogue

Han Liu, Caixia Yuan, Xiaojie Wang and
Yushu Yang, Huixing Jiang, Zhongyuan Wang

Keywords Paper

0

0

0

0

17:28

16/11/2020

F1 is Not Enough! Models and Evaluation Towards User-Centered Explainable Question Answering

Hendrik Schuff, Heike Adel, Ngoc Thang Vu

Keywords Paper

reasoning process, user study, model selection, explainable systems

0

0

0

0

12:03

16/11/2020

Interpretable Multi-dataset Evaluation for Named Entity Recognition

Jinlan Fu, Pengfei Liu, Graham Neubig

Keywords Paper

natural tasks, interpretable evaluation, named task, analysis tool

0

0

0

0

11:11

16/11/2020

Dialogue Response Ranking Training with Large-Scale Human Feedback Data

Xiang Gao, Yizhe Zhang, Michel Galley and
Chris Brockett, Bill Dolan

Keywords Paper

feedback prediction, ranking problem, predicting feedback, open-domain models

0

0

0

0

11:57

19/08/2021

A Survey on Response Selection for Retrieval-based Dialogues

Chongyang Tao, Jiazhan Feng, Rui Yan and
Wei Wu, Daxin Jiang

Keywords Paper

Natural language processing, General

0

0

0

0

11:44

04/07/2020

DoQA - Accessing Domain-Specific FAQs via Conversational QA

Jon Ander Campos, Arantxa Otegi, Aitor Soroa and
Jan Deriu, Mark Cieliebak, Eneko Agirre

Keywords Paper

DoQA FAQs, conversational interfaces, information scenario, IR scenario

0

0

0

0

12:35

04/07/2020

Can You Put it All Together: Evaluating Conversational Agents' Ability to Blend Skills

Eric Michael Smith, Mary Williamson, Kurt Shuster and
Jason Weston, Y-Lan Boureau

Keywords Paper

conversational agent, open-domain agent, model schemes, multi-task training

0

0

0

1

11:39

19/08/2021

Automated Facilitation Support in Online Forum

Wen Gu

Keywords Paper

Multidisciplinary Topics and Applications, Social Sciences, Knowledge-based Software Engineering, Reasoning about Knowledge and Belief

0

0

0

0

13:48

01/07/2020

Is this Dialogue Coherent? Learning from Dialogue Acts and Entities

Alessandra Cervone, Giuseppe Riccardi

Keywords Paper

0

0

0

0

10:58

07/06/2021

Machine Learning Explanations to Prevent Overtrust in Fake News Detection

Sina Mohseni, Fan Yang, Shiva Pentyala and
Mengnan Du, Yi Liu, Nic Lupfer, Xia Hu, Shuiwang Ji, Eric Ragan

Keywords Paper

Qualitative and quantitative studies of social media, Credibility of online content, Trust, reputation, recommendation systems, Human computer interaction, social media tools, navigation and visualization

0

0

0

0

8:01

25/07/2020

Investigating reference dependence effects on user search interaction and satisfaction: A behavioral economics perspective

Jiqun Liu, Fangyuan Han

Keywords Paper

reference dependent effects, search satisfaction, behavioral economics, interactive information retrieval, user behavior

0

0

0

0

16:41

23/08/2020

Context-to-session matching: Utilizing whole session for response selection in information-seeking dialogue systems

Zhenxin Fu, Shaobo Cui, Mingyue Shang and
Feng Ji, Dongyan Zhao, Haiqing Chen, Rui Yan

Keywords Paper

text matching, graph attention network, response selection

0

0

0

0

13:33

16/11/2020

Cross Copy Network for Dialogue Generation

Changzhen Ji, Xin Zhou, Yating Zhang and
Xiaozhong Liu, Changlong Sun, Conghui Zhu, Tiejun Zhao

Keywords Paper

dialogue generation, model training, utterance generation, court debate

0

0

0

0

9:37

16/11/2020

Multi-hop Inference for Question-driven Summarization

Yang Deng, Wenxuan Zhang, Wai Lam

Keywords Paper

question-driven summarization, question-driven method, multi-hop generator, multi-hop

0

0

0

0

13:22

04/07/2020

Dynamic Online Conversation Recommendation

Xingshan Zeng, Jing Li, Lu Wang and
Zhiming Mao, Kam-Fai Wong

Keywords Paper

Dynamic Recommendation, neural architecture, Trending topics, social users

0

0

0

0

11:36

08/12/2020

A Comprehensive Evaluation of Incremental Speech Recognition and Diarization for Conversational AI

Angus Addlesee, Yanchao Yu, Arash Eshghi

Keywords Paper

0

0

0

0

13:04

06/12/2020

A Simple Language Model for Task-Oriented Dialogue

Ehsan Hosseini-Asl, Bryan McCann, Chien-Sheng Wu and
Semih Yavuz, Richard Socher

Keywords Paper

0

0

0

0

3:21

16/11/2020

Conversational Document Prediction to Assist Customer Care Agents

Jatin Ganhotra, Haggai Roitman, Doron Cohen and
Nathaniel Mills, Chulaka Gunasekara, Yosi Mass, Sachindra Joshi, Luis Lastras, David Konopnicki

Keywords Paper

customer conversations, predicting documents, customer agents, information models

0

0

0

0

6:38

26/10/2020

TLdR: Policy Summarization for Factored SSP Problems Using Temporal Abstractions

Sarath Sreedharan, Siddharth Srivastava, Subbarao Kambhampati

Keywords Paper

Policy Summarization, SSP, Explanatory Dialogues, Landmarks

0

0

0

0

9:40

25/04/2020

Mental Models of AI Agents in a Cooperative Game Setting

Katy Gero, Zahra Ashktorab, Casey Dugan and
Qian Pan, James Johnson, Werner Geyer, Maria Ruiz, Sarah Miller, David Millen, Murray Campbell, Sadhana Kumaravel, Wei Zhang

Keywords Paper

artificial intelligence, mental models, conceptual models, games, word games, ai agents, think-aloud

0

0

0

0

15:05