Towards Holistic and Automatic Evaluation of Open-Domain Dialogue Generation

Abstract: Open-domain dialogue generation has gained increasing attention in Natural Language Processing. Its evaluation requires a holistic means. Human ratings are deemed as the gold standard. As human evaluation is inefficient and costly, an automated substitute is highly desirable. In this paper, we propose holistic evaluation metrics that capture different aspects of open-domain dialogues. Our metrics consist of (1) GPT-2 based context coherence between sentences in a dialogue, (2) GPT-2 based fluency in phrasing, (3) n-gram based diversity in responses to augmented queries, and (4) textual-entailment-inference based logical self-consistency. The empirical validity of our metrics is demonstrated by strong correlations with human judgments. We open source the code and relevant materials.

Towards Holistic and Automatic Evaluation of Open-Domain Dialogue Generation

Bo Pang, Erik Nijkamp, Wenjuan Han, Linqi Zhou, Yixian Liu, Kewei Tu

Comments

Similar Papers

Cross-Lingual Semantic Role Labeling with High-Quality Translated Training Corpus

Hao Fei, Meishan Zhang, Donghong Ji

Keywords Abstract Paper

Cross-Lingual Labeling, semantic labeling, natural understanding, model transferring

CoDA: Contrast-enhanced and Diversity-promoting Data Augmentation for Natural Language Understanding

Yanru Qu, Dinghan Shen, Yelong Shen and Sandra Sajeev, Weizhu Chen, Jiawei Han

Keywords Abstract Paper

consistency training, contrastive learning, data augmentation, natural language understanding

Generating Hierarchical Explanations on Text Classification via Feature Interaction Detection

Hanjie Chen, Guangtao Zheng, Yangfeng Ji

Keywords Abstract Paper

Text Classification, Generating explanations, natural processing, model prediction

LAReQA: Language-Agnostic Answer Retrieval from a Multilingual Pool

Uma Roy, Noah Constant, Rami Al-Rfou and Aditya Barua, Aaron Phillips, Yinfei Yang

Keywords Abstract Paper

language-agnostic retrieval, cross-lingual tasks, cross-lingual retrieval, alignment

Towards a decomposable metric for explainable evaluation of text generation from AMR

Juri Opitz, Anette Frank

Keywords Abstract Paper

Local Explanation of Dialogue Response Generation

Yi-Lin Tuan, Connor Pryor, Wenhu Chen and Lise Getoor, William Yang Wang

Keywords Abstract Paper

machine learning

Tangled up in BLEU: Reevaluating the Evaluation of Automatic Machine Translation Evaluation Metrics

Nitika Mathur, Timothy Baldwin, Trevor Cohn

Keywords Abstract Paper

judging metrics, assessment, pairwise ranking, thresholding

Logical Natural Language Generation from Open-Domain Tables

Wenhu Chen, Jianshu Chen, Yu Su and Zhiyu Chen, William Yang Wang

Keywords Abstract Paper

Logical Generation, neural NLG, surface-level realizations, logical inference

Measuring Systematic Generalization in Neural Proof Generation with Transformers

Nicolas Gontier, Koustuv Sinha, Siva Reddy, Chris Pal

Keywords Abstract Paper

ReQue: A configurable workflow and dataset collection for query refinement

Mahtab Tamannaee, Hossein Fani, Fattane Zarrinkalam and Jamil Samouh, Samad Paydar, Ebrahim Bagheri

Keywords Abstract Paper

gold standard dataset, query refinement, reproducibility

Improving Commonsense Causal Reasoning by Adversarial Training and Data Augmentation

Ieva Staliūnaitė, Philip John Gorinski, Ignacio Iacobacci

Keywords Abstract Paper

Expanding, retrieving and infilling: Diversifying cross-domain question generation with flexible templates

Xiaojing Yu, Anxiao Jiang

Keywords Abstract Paper

A Deep Metric Learning Method for Biomedical Passage Retrieval

Andrés Rosso-Mateus, Fabio A. González, Manuel Montes-y-Gómez

Keywords Abstract Paper

Speak to your Parser: Interactive Text-to-SQL with Natural Language Feedback

Ahmed Elgohary, Saghar Hosseini, Ahmed Hassan Awadallah

Keywords Abstract Paper

semantic correction, one-shot translation, correction task, Parser

Evaluating the Factual Consistency of Abstractive Text Summarization

Wojciech Kryscinski, Bryan McCann, Caiming Xiong, Richard Socher

Keywords Abstract Paper

assessing algorithms, natural inference, fact checking, auxiliary tasks

Generating Senses and RoLes: An End-to-End Model for Dependency- and Span-based Semantic Role Labeling

Rexhina Blloshmi, Simone Conia, Rocco Tripodi, Roberto Navigli

Keywords Abstract Paper

Natural Language Processing, Natural Language Semantics, Natural Language Generation, Natural Language Processing

Towards automatically generating Questions under Discussion to link information and discourse structure

Kordula De Kuthy, Madeeswaran Kannan, Haemanth Santhi Ponnusamy, Detmar Meurers

Keywords Abstract Paper

Low-Resource Generation of Multi-hop Reasoning Questions

Jianxing Yu, Wei Liu, Shuang Qiu and Qinliang Su, Kai Wang, Xiaojun Quan, Jian Yin

Keywords Abstract Paper

Low-Resource Questions, generating questions, machine comprehension, multi-hop model

Profile Consistency Identification for Open-domain Dialogue Agents

Haoyu Song, Yan Wang, Wei-Nan Zhang and Zhengyu Zhao, Ting Liu, Xiaojiang Liu

Keywords Abstract Paper

attribute consistency, profile identification, dialogue agents, key-value model

uBLEU: Uncertainty-Aware Automatic Evaluation Method for Open-Domain Dialogue Systems

Tsuta Yuma, Naoki Yoshinaga, Masashi Toyoda

Keywords Abstract Paper

Open-Domain Systems, uBLEU, Uncertainty-Aware Method, ΔBLEU

Automatic Machine Translation Evaluation using Source Language Inputs and Cross-lingual Language Model

Kosuke Takahashi, Katsuhito Sudoh, Satoshi Nakamura

Keywords Paper

Yanru Qu, Dinghan Shen, Yelong Shen and
Sandra Sajeev, Weizhu Chen, Jiawei Han

Keywords Paper

Keywords Paper

Uma Roy, Noah Constant, Rami Al-Rfou and
Aditya Barua, Aaron Phillips, Yinfei Yang

Keywords Paper

Keywords Paper

Yi-Lin Tuan, Connor Pryor, Wenhu Chen and
Lise Getoor, William Yang Wang

Keywords Paper

Keywords Paper

Wenhu Chen, Jianshu Chen, Yu Su and
Zhiyu Chen, William Yang Wang

Keywords Paper

Keywords Paper

Mahtab Tamannaee, Hossein Fani, Fattane Zarrinkalam and
Jamil Samouh, Samad Paydar, Ebrahim Bagheri

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Jianxing Yu, Wei Liu, Shuang Qiu and
Qinliang Su, Kai Wang, Xiaojun Quan, Jian Yin

Keywords Paper

Haoyu Song, Yan Wang, Wei-Nan Zhang and
Zhengyu Zhao, Ting Liu, Xiaojiang Liu

Keywords Paper

Keywords Paper

Keywords Paper

Alexander Hoyle, Pranav Goel, Andrew Hian-Cheong and
Denis Peskov, Jordan Boyd-Graber, Philip Resnik

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Edoardo Barba, Luigi Procopio, Caterina Lacerra and
Tommaso Pasini, Roberto Navigli

Keywords Paper

Paul Roit, Ayal Klein, Daniela Stepanov and
Jonathan Mamou, Julian Michael, Gabriel Stanovsky, Luke Zettlemoyer, Ido Dagan

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Deepak Gupta, Hardik Chauhan, Ravi Tej Akella and
Asif Ekbal, Pushpak Bhattacharyya

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper