STARC: Structured Annotations for Reading Comprehension

Abstract: We present STARC (Structured Annotations for Reading Comprehension), a new annotation framework for assessing reading comprehension with multiple choice questions. Our framework introduces a principled structure for the answer choices and ties them to textual span annotations. The framework is implemented in OneStopQA, a new high-quality dataset for evaluation and analysis of reading comprehension in English. We use this dataset to demonstrate that STARC can be leveraged for a key new application for the development of SAT-like reading comprehension materials: automatic annotation quality probing via span ablation experiments. We further show that it enables in-depth analyses and comparisons between machine and human reading comprehension behavior, including error distributions and guessing ability. Our experiments also reveal that the standard multiple choice dataset in NLP, RACE, is limited in its ability to measure reading comprehension. 47% of its questions can be guessed by machines without accessing the passage, and 18% are unanimously judged by humans as not having a unique correct answer. OneStopQA provides an alternative test set for reading comprehension which alleviates these shortcomings and has a substantially higher human ceiling performance.

02/02/2021

STARC: Structured Annotations for Reading Comprehension

Yevgeni Berzak, Jonathan Malmaud, Roger Levy

Comments

Similar Papers

Retrospective Reader for Machine Reading Comprehension

Zhuosheng Zhang, Junjie Yang, Hai Zhao

Keywords Abstract Paper

Explicit Memory Tracker with Coarse-to-Fine Reasoning for Conversational Machine Reading

Yifan Gao, Chien-Sheng Wu, Shafiq Joty and Caiming Xiong, Richard Socher, Irwin King, Michael Lyu, Steven C.H. Hoi

Keywords Abstract Paper

Conversational Reading, decision making, Explicit Tracker, Coarse-to-Fine Reasoning

A Human Evaluation of AMR-to-English Generation Systems

Emma Manning, Shira Wein, Nathan Schneider

Keywords Abstract Paper

ReClor: A Reading Comprehension Dataset Requiring Logical Reasoning

Weihao Yu, Zihang Jiang, Yanfei Dong, Jiashi Feng

Keywords Abstract Paper

reading comprehension, logical reasoning, natural language processing

What do Models Learn from Question Answering Datasets?

Priyanka Sen, Amir Saffari

Keywords Abstract Paper

question answering, reading comprehension, bert-based models, question variations

Audio-Oriented Multimodal Machine Comprehension via Dynamic Inter- and Intra-modality Attention

Zhiqi Huang, Fenglin Liu, Xian Wu and Shen Ge, Helin Wang, Wei Fan, Yuexian Zou

Keywords Abstract Paper

A Diverse Corpus for Evaluating and Developing English Math Word Problem Solvers

Shen-yun Miao, Chao-Chun Liang, Keh-Yih Su

Keywords Abstract Paper

AI progress, English Solvers, MWP solvers, ASDiv

MOCHA: A Dataset for Training and Evaluating Generative Reading Comprehension Metrics

Anthony Chen, Gabriel Stanovsky, Sameer Singh, Matt Gardner

Keywords Abstract Paper

reading comprehension, generation problem, mocha, lerc

BARTScore: Evaluating Generated Text as Text Generation

Weizhe Yuan, Graham Neubig, Pengfei Liu

Keywords Abstract Paper

Machine Learning-Driven Language Assessment

Burr Settles, Masato Hagiwara, Geoffrey T. LaFlair

Keywords Abstract Paper

Machine Assessment, language assessments, natural processing, computer-adaptive testing

SCDE: Sentence Cloze Dataset with High Quality Distractors From Examinations

Xiang Kong, Varun Gangal, Eduard Hovy

Keywords Abstract Paper

SCDE, computational models, sentence prediction, joint solving

Unsupervised Quality Estimation for Neural Machine Translation

Marina Fomicheva, Shuo Sun, Lisa Yankovskaya and Frédéric Blain, Francisco Guzmán, Mark Fishel, Nikolaos Aletras, Vishrav Chaudhary, Lucia Specia

Keywords Abstract Paper

machine mt, real-world applications, qe, uncertainty quantification

LAReQA: Language-Agnostic Answer Retrieval from a Multilingual Pool

Uma Roy, Noah Constant, Rami Al-Rfou and Aditya Barua, Aaron Phillips, Yinfei Yang

Keywords Abstract Paper

language-agnostic retrieval, cross-lingual tasks, cross-lingual retrieval, alignment

X-FACTR: Multilingual Factual Knowledge Retrieval from Pretrained Language Models

Zhengbao Jiang, Antonios Anastasopoulos, Jun Araki and Haibo Ding, Graham Neubig

Keywords Abstract Paper

factual retrieval, language models, lms, probing methods

Generating similes effortlessly like a Pro: A Style Transfer Approach for Simile Generation

Tuhin Chakrabarty, Smaranda Muresan, Nanyun Peng

Keywords Abstract Paper

human imagination, simile generation, mapping properties, sequence model

Is the understanding of explicit discourse relations required in machine reading comprehension?

Yulong Wu, Viktor Schlegel, Riza Batista-Navarro

Keywords Abstract Paper

Read, attend, and exclude: Multi-choice reading comprehension by mimicking human reasoning process

Chenbin Zhang, Congjian Luo, Junyu Lu and Ao Liu, Bing Bai, Kun Bai, Zenglin Xu

Keywords Abstract Paper

multi-choice reading comprehension, answer exclusion, gated fusion, multi-rounds reasoning process

Event Extraction as Machine Reading Comprehension

Jian Liu, Yubo Chen, Kang Liu and Wei Bi, Xiaojiang Liu

Keywords Abstract Paper

event extraction, ee, information task, classification task

FEQA: A Question Answering Evaluation Framework for Faithfulness Assessment in Abstractive Summarization

Esin Durmus, He He, Mona Diab

Keywords Abstract Paper

Faithfulness Assessment, Abstractive Summarization, evaluating summary, reading comprehension

Learning Contextual Representations for Semantic Parsing with Generation-Augmented Pre-Training

Peng Shi, Patrick Ng, Zhiguo Wang and Henghui Zhu, Alexander Hanbo Li, Jun Wang, Cicero Nogueira dos Santos, Bing Xiang

Keywords Abstract Paper

Multi-choice Relational Reasoning for Machine Reading Comprehension

Wuya Chen, Xiaojun Quan, Chunyu Kit and Zhengcheng Min, Jiahai Wang

Keywords Paper

Yifan Gao, Chien-Sheng Wu, Shafiq Joty and
Caiming Xiong, Richard Socher, Irwin King, Michael Lyu, Steven C.H. Hoi

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Zhiqi Huang, Fenglin Liu, Xian Wu and
Shen Ge, Helin Wang, Wei Fan, Yuexian Zou

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Marina Fomicheva, Shuo Sun, Lisa Yankovskaya and
Frédéric Blain, Francisco Guzmán, Mark Fishel, Nikolaos Aletras, Vishrav Chaudhary, Lucia Specia

Keywords Paper

Uma Roy, Noah Constant, Rami Al-Rfou and
Aditya Barua, Aaron Phillips, Yinfei Yang

Keywords Paper

Zhengbao Jiang, Antonios Anastasopoulos, Jun Araki and
Haibo Ding, Graham Neubig

Keywords Paper

Keywords Paper

Keywords Paper

Chenbin Zhang, Congjian Luo, Junyu Lu and
Ao Liu, Bing Bai, Kun Bai, Zenglin Xu

Keywords Paper

Jian Liu, Yubo Chen, Kang Liu and
Wei Bi, Xiaojiang Liu

Keywords Paper

Keywords Paper

Peng Shi, Patrick Ng, Zhiguo Wang and
Henghui Zhu, Alexander Hanbo Li, Jun Wang, Cicero Nogueira dos Santos, Bing Xiang

Keywords Paper

Wuya Chen, Xiaojun Quan, Chunyu Kit and
Zhengcheng Min, Jiahai Wang

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Xin Lin, Zhenya Huang, Hongke Zhao and
Enhong Chen, Qi Liu, Hao Wang, Shijin Wang

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Haoran Li, Abhinav Arora, Shuohui Chen and
Anchit Gupta, Sonal Gupta, Yashar Mehdad

Keywords Paper

Robert Ridley, Liang He, Xin-yu Dai and
Shujian Huang, Jiajun Chen

Keywords Paper

Keywords Paper

Keywords Paper

Yifan Gao, Chien-Sheng Wu, Jingjing Li and
Shafiq Joty, Steven C.H. Hoi, Caiming Xiong, Irwin King, Michael Lyu

Keywords Paper

Jonathan H Clark, Jennimaria Palomaki, Vitaly Nikolaev and
Eunsol Choi, Dan Garrette, Michael Collins, Tom Kwiatkowski

Keywords Paper

Keywords Paper