What do Models Learn from Question Answering Datasets?

16/11/2020

What do Models Learn from Question Answering Datasets?

Priyanka Sen, Amir Saffari

Keywords: question answering, reading comprehension, bert-based models, question variations

Abstract Paper Similar Papers

Abstract: While models have reached superhuman performance on popular question answering (QA) datasets such as SQuAD, they have yet to outperform humans on the task of question answering itself. In this paper, we investigate if models are learning reading comprehension from QA datasets by evaluating BERT-based models across five datasets. We evaluate models on their generalizability to out-of-domain examples, responses to missing or incorrect data, and ability to handle question variations. We find that no single dataset is robust to all of our experiments and identify shortcomings in both datasets and evaluation methods. Following our analysis, we make recommendations for building future QA datasets that better evaluate the task of question answering through reading comprehension. We also release code to convert QA datasets to a shared format for easier experimentation at https://github.com/amazon-research/qa-dataset-converter

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at EMNLP 2020 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

02/02/2021

What's the Best Place for an AI Conference, Vancouver or _______: Why Completing Comparative Questions is Difficult

‪Avishai Zagoury‬, Einat Minkov, Idan Szpektor, William W. Cohen

Keywords Paper

0

0

0

0

15:15

26/04/2020

Neural Symbolic Reader: Scalable Integration of Distributed and Symbolic Representations for Reading Comprehension

Xinyun Chen, Chen Liang, Adams Wei Yu and
Denny Zhou, Dawn Song, Quoc V. Le

Keywords Paper

neural symbolic, reading comprehension, question answering

0

0

0

0

4:50

19/04/2021

Scalable evaluation and improvement of document set expansion via neural positive-unlabeled learning

Alon Jacovi, Gang Niu, Yoav Goldberg, Masashi Sugiyama

Keywords Paper

0

0

0

0

10:27

04/07/2020

Document Modeling with Graph Attention Networks for Multi-grained Machine Reading Comprehension

Bo Zheng, Haoyang Wen, Yaobo Liang and
Nan Duan, Wanxiang Che, Daxin Jiang, Ming Zhou, Ting Liu

Keywords Paper

Document Modeling, Multi-grained Comprehension, machine comprehension, Graph Networks

0

0

0

0

10:51

02/02/2021

EQG-RACE: Examination-Type Question Generation

Xin Jia, Wenjie Zhou, Xu Sun, Yunfang Wu

Keywords Paper

0

0

0

0

14:41

08/12/2020

HIT-SCIR at SemEval-2020 Task 5: Training Pre-trained Language Model with Pseudo-labeling Data for Counterfactuals Detection

Xiao Ding, Dingkui Hao, Yuewei Zhang and
Kuo Liao, Zhongyang Li, Bing Qin, Ting Liu

Keywords Paper

0

0

0

0

6:58

02/02/2021

Retrospective Reader for Machine Reading Comprehension

Zhuosheng Zhang, Junjie Yang, Hai Zhao

Keywords Paper

0

0

0

0

9:55

04/07/2020

Explicit Memory Tracker with Coarse-to-Fine Reasoning for Conversational Machine Reading

Yifan Gao, Chien-Sheng Wu, Shafiq Joty and
Caiming Xiong, Richard Socher, Irwin King, Michael Lyu, Steven C.H. Hoi

Keywords Paper

Conversational Reading, decision making, Explicit Tracker, Coarse-to-Fine Reasoning

0

0

0

0

11:51

02/02/2021

Learning Contextual Representations for Semantic Parsing with Generation-Augmented Pre-Training

Peng Shi, Patrick Ng, Zhiguo Wang and
Henghui Zhu, Alexander Hanbo Li, Jun Wang, Cicero Nogueira dos Santos, Bing Xiang

Keywords Paper

0

0

0

0

15:15

05/12/2020

Vocabulary matters: A simple yet effective approach to paragraph-level question generation

Vishwajeet Kumar, Manish Joshi, Ganesh Ramakrishnan, Yuan-Fang Li

Keywords Paper

0

0

0

0

8:36

16/11/2020

PathQG: Neural Question Generation from Facts

Siyuan Wang, Zhongyu Wei, Zhihao Fan and
Zengfeng Huang, Weijian Sun, Qi Zhang, Xuanjing Huang

Keywords Paper

question generation, query learning, query-based generation, sequence problem

0

0

0

0

11:16

16/11/2020

What Does My QA Model Know? Devising Controlled Probes using Expert

Kyle Richardson, Ashish Sabharwal

Keywords Paper

knowledge challenges, benchmark tasks, diagnostic tasks, taxonomic reasoning

0

0

0

0

12:16

02/02/2021

Knowledge-driven Data Construction for Zero-shot Evaluation in Commonsense Question Answering

Kaixin Ma, Filip Ilievski, Jonathan Francis and
Yonatan Bisk, Eric Nyberg, Alessandro Oltramari

Keywords Paper

0

0

0

0

18:24

04/07/2020

STARC: Structured Annotations for Reading Comprehension

Yevgeni Berzak, Jonathan Malmaud, Roger Levy

Keywords Paper

Reading Comprehension, Structured Comprehension, evaluation comprehension, SAT-like materials

0

0

0

0

12:11

14/06/2020

MetaIQA: Deep Meta-Learning for No-Reference Image Quality Assessment

Hancheng Zhu, Leida Li, Jinjian Wu and
Weisheng Dong, Guangming Shi

Keywords Paper

image quality assessment, convolutional neural networks, gradient optimization-based deep meta-learning, highly generalizable

0

0

0

0

0:57

16/11/2020

Learning from Task Descriptions

Orion Weller, Nicholas Lourie, Matt Gardner, Matthew Peters

Keywords Paper

task-oriented evaluation, systematic generalization, machine systems, nlp systems

0

0

0

0

11:48

04/07/2020

Unsupervised FAQ Retrieval with Question Generation and BERT

Yosi Mass, Boaz Carmeli, Haggai Roitman, David Konopnicki

Keywords Paper

Unsupervised Retrieval, Question Generation, Frequently retrieval, fully method

0

0

0

0

7:16

02/06/2020

VQuAnDa: Verbalization QUestion ANswering DAtaset

Endri Kacupaj, Hamid Zafar, Jens Lehmann, Maria Maleshkova

Keywords Paper

0

0

0

0

22:29

08/12/2020

Reinforced Multi-task Approach for Multi-hop Question Generation

Deepak Gupta, Hardik Chauhan, Ravi Tej Akella and
Asif Ekbal, Pushpak Bhattacharyya

Keywords Paper

0

0

0

0

11:50

04/07/2020

FEQA: A Question Answering Evaluation Framework for Faithfulness Assessment in Abstractive Summarization

Esin Durmus, He He, Mona Diab

Keywords Paper

Faithfulness Assessment, Abstractive Summarization, evaluating summary, reading comprehension

0

0

0

1

12:13

06/12/2021

Introspective Distillation for Robust Question Answering

Yulei Niu, Hanwang Zhang

Keywords Paper

0

0

0

0

8:11

12/07/2020

Task Understanding from Confusing Multi-task Data

Xin Su, Yizhou Jiang, Shangqi Guo, Feng Chen

Keywords Paper

General Machine Learning Techniques

0

0

0

0

15:29

06/12/2021

Grad2Task: Improved Few-shot Text Classification Using Gradients for Task Representation

Jixuan Wang, Kuan-Chieh Wang, Frank Rudzicz, Michael Brudno

Keywords Paper

machine learning, transformers, meta learning, language, transfer learning

0

0

0

0

14:45

06/12/2020

Bongard-LOGO: A New Benchmark for Human-Level Concept Learning and Reasoning

Weili Nie, Zhiding Yu, Lei Mao and
Ankit Patel, Yuke Zhu, Anima Anandkumar

Keywords Paper

0

0

0

0

3:23

06/12/2020

On the Value of Out-of-Distribution Testing: An Example of Goodhart's Law

Damien Teney, Ehsan Abbasnejad, Kushal Kafle and
Robik Shrestha, Christopher Kanan, Anton van den Hengel

Keywords Paper

0

0

0

0

3:21

04/07/2020

How to Ask Good Questions? Try to Leverage Paraphrases

Xin Jia, Wenjie Zhou, Xu Sun, Yunfang Wu

Keywords Paper

question generation(QG, sentence-level generation, diversity training, Paraphrases

0

0

0

0

10:13

19/08/2021

Cross-Domain Few-Shot Classification via Adversarial Task Augmentation

Haoqing Wang, Zhi-Hong Deng

Keywords Paper

Computer Vision, Recognition, Adversarial Machine Learning, Deep Learning

0

0

0

0

10:39

06/12/2021

On Memorization in Probabilistic Deep Generative Models

Gerrit van den Burg, Chris Williams

Keywords Paper

deep learning, self-supervised learning, generative model

0

0

0

0

12:04

06/12/2021

End-to-End Training of Multi-Document Reader and Retriever for Open-Domain Question Answering

Devendra Singh, Siva Reddy, Will Hamilton and
Chris Dyer, Dani Yogatama

Keywords Paper

0

0

0

0

14:42

04/07/2020

Crossing Variational Autoencoders for Answer Retrieval

Wenhao Yu, Lingfei Wu, Qingkai Zeng and
Shu Tao, Yu Deng, Meng Jiang

Keywords Paper

Answer Retrieval, vector questions/answers, Question-answer alignment, SQuAD

0

0

0

0

4:56

06/12/2021

Self-Attention Between Datapoints: Going Beyond Individual Input-Output Pairs in Deep Learning

Jannik Kossen, Neil Band, Clare Lyle and
Aidan Gomez, Thomas Rainforth, Yarin Gal

Keywords Paper

deep learning, transformers

0

0

0

0

9:54

16/11/2020

An Imitation Game for Learning Semantic Parsers from User Interaction

Ziyu Yao, Yiqi Tang, Wen-tau Yih and
Huan Sun, Yu Su

Keywords Paper

bootstrapping, fine-tuning parsers, theoretical analysis, text-to-sql problem

0

0

0

0

11:49

04/07/2020

Clinical Reading Comprehension: A Thorough Analysis of the emrQA Dataset

Xiang Yue, Bernal Jimenez Gutierrez, Huan Sun

Keywords Paper

Clinical Comprehension, Machine comprehension, annotation, question answering

0

0

0

0

11:40

26/04/2020

ReClor: A Reading Comprehension Dataset Requiring Logical Reasoning

Weihao Yu, Zihang Jiang, Yanfei Dong, Jiashi Feng

Keywords Paper

reading comprehension, logical reasoning, natural language processing

0

0

0

0

4:11

19/04/2021

Do multi-hop question answering systems know how to answer the single-hop sub-questions?

Yixuan Tang, Hwee Tou Ng, Anthony Tung

Keywords Paper

0

0

0

0

6:23

04/07/2020

Cross-Lingual Unsupervised Sentiment Classification with Multi-View Transfer Learning

Hongliang Fei, Ping Li

Keywords Paper

Cross-Lingual Classification, sentiment classification, unsupervised system, classification

0

0

0

0

12:23

04/07/2020

Harvesting and Refining Question-Answer Pairs for Unsupervised QA

Zhongli Li, Wenhui Wang, Li Dong and
Furu Wei, Ke Xu

Keywords Paper

Unsupervised QA, Question Answering, Question QA, QA

0

0

0

0

10:28

16/11/2020

Dialogue Response Ranking Training with Large-Scale Human Feedback Data

Xiang Gao, Yizhe Zhang, Michel Galley and
Chris Brockett, Bill Dolan

Keywords Paper

feedback prediction, ranking problem, predicting feedback, open-domain models

0

0

0

0

11:57

16/11/2020

Unsupervised Reference-Free Summary Quality Evaluation via Contrastive Learning

Hanlu Wu, Tengfei Ma, Lingfei Wu and
Tariro Manyumwa, Shouling Ji

Keywords Paper

summarization task, document system, rouge, unsupervised learning

0

0

0

0

11:16

25/07/2020

A pairwise probe for understanding BERT fine-tuning on machine reading comprehension

Jie Cai, Zhengzhou Zhu, Ping Nie, Qian Liu

Keywords Paper

machine reading comprehension, pairwise, fine-tune, BERT

0

0

0

0

6:38