Training Question Answering Models From Synthetic Data

Abstract: Question and answer generation is a data augmentation method that aims to improve question answering (QA) models given the limited amount of human labeled data. However, a considerable gap remains between synthetic and human-generated question-answer pairs. This work aims to narrow this gap by taking advantage of large language models and explores several factors such as model size, quality of pretrained models, scale of data synthesized, and algorithmic choices. On the SQuAD1.1 question answering task, we achieve higher accuracy using solely synthetic questions and answers than when using the SQuAD1.1 training set questions alone. Removing access to real Wikipedia data, we synthesize questions and answers from a synthetic text corpus generated by an 8.3 billion parameter GPT-2 model and achieve 88.4 Exact Match (EM) and 93.9 F1 score on the SQuAD1.1 dev set. We further apply our methodology to SQuAD2.0 and show a 2.8 absolute gain on EM score compared to prior work using synthetic data.

19/08/2021

Training Question Answering Models From Synthetic Data

Raul Puri, Ryan Spring, Mohammad Shoeybi, Mostofa Patwary, Bryan Catanzaro

Comments

Similar Papers

ALaSca: an Automated approach for Large-Scale Lexical Substitution

Caterina Lacerra, Tommaso Pasini, Rocco Tripodi, Roberto Navigli

Keywords Abstract Paper

Natural Language Processing, Natural Language Semantics, Resources and Evaluation

DAGA: Data Augmentation with a Generation Approach forLow-resource Tagging Tasks

Bosheng Ding, Linlin Liu, Lidong Bing and Canasai Kruengkrai, Thien Hai Nguyen, Shafiq Joty, Luo Si, Chunyan Miao

Keywords Abstract Paper

machine learning, generalization, low-resource tasks, named recognition

Unsupervised Question Decomposition for Question Answering

Ethan Perez, Patrick Lewis, Wen-tau Yih and Kyunghyun Cho, Douwe Kiela

Keywords Abstract Paper

question qa, labeling questions, one-to-n transduction, qa

DoQA - Accessing Domain-Specific FAQs via Conversational QA

Jon Ander Campos, Arantxa Otegi, Aitor Soroa and Jan Deriu, Mark Cieliebak, Eneko Agirre

Keywords Abstract Paper

DoQA FAQs, conversational interfaces, information scenario, IR scenario

Progressively pretrained dense corpus index for open-domain question answering

Wenhan Xiong, Hong Wang, William Yang Wang

Keywords Abstract Paper

X-FACTR: Multilingual Factual Knowledge Retrieval from Pretrained Language Models

Zhengbao Jiang, Antonios Anastasopoulos, Jun Araki and Haibo Ding, Graham Neubig

Keywords Abstract Paper

factual retrieval, language models, lms, probing methods

Event Extraction as Machine Reading Comprehension

Jian Liu, Yubo Chen, Kang Liu and Wei Bi, Xiaojiang Liu

Keywords Abstract Paper

event extraction, ee, information task, classification task

C2C-GenDA: Cluster-to-Cluster Generation for Data Augmentation of Slot Filling

Yutai Hou, Sanyuan Chen, Wanxiang Che and Cheng Chen, Ting Liu

Keywords Abstract Paper

Partially-Aligned Data-to-Text Generation with Distant Supervision

Zihao Fu, Bei Shi, Wai Lam and Lidong Bing, Zhiyuan Liu

Keywords Abstract Paper

data-to-text task, generation task, dataset problem, over-generation problem

Expanding, retrieving and infilling: Diversifying cross-domain question generation with flexible templates

Xiaojing Yu, Anxiao Jiang

Keywords Abstract Paper

Harvesting and Refining Question-Answer Pairs for Unsupervised QA

Zhongli Li, Wenhui Wang, Li Dong and Furu Wei, Ke Xu

Keywords Abstract Paper

Unsupervised QA, Question Answering, Question QA, QA

Emergent Communication Pretraining for Few-Shot Machine Translation

Yaoyiran Li, Edoardo Maria Ponti, Ivan Vulić, Anna Korhonen

Keywords Abstract Paper

X-SRL: A Parallel Cross-Lingual Semantic Role Labeling Dataset

Angel Daza, Anette Frank

Keywords Abstract Paper

generalization learning, multilingual learning, high-quality translation, srl

Do Explicit Alignments Robustly Improve Multilingual Encoders?

Shijie Wu, Mark Dredze

Keywords Abstract Paper

multilingual, unsupervised encoders, cross-lingual representation, contrastive objective

Syntax-Aware Opinion Role Labeling with Dependency Graph Convolutional Networks

Bo Zhang, Yue Zhang, Rui Wang and Zhenghua Li, Min Zhang

Keywords Abstract Paper

Syntax-Aware Labeling, Opinion labeling, ORL, opinion task

Educating Text Autoencoders: Latent Representation Guidance via Denoising

Tianxiao Shen, Jonas Mueller, Regina Barzilay, Tommi Jaakkola

Keywords Abstract Paper

Deep Learning - Generative Models and Autoencoders

Code and Named Entity Recognition in StackOverflow

Jeniya Tabassum, Mounica Maddela, Wei Xu, Alan Ritter

Keywords Abstract Paper

Named Recognition, computer domain, StackOverflow, NLP techniques

Domain Transfer based Data Augmentation for Neural Query Translation

Liang Yao, Baosong Yang, Haibo Zhang and Boxing Chen, Weihua Luo

Keywords Abstract Paper

Precise Task Formalization Matters in Winograd Schema Evaluations

Haokun Liu, William Huang, Dhara Mungra, Samuel R. Bowman

Keywords Abstract Paper

task formalization, input specification, ablation, formalization decisions

Expand and Filter: CUNI and LMU Systems for the WNGT 2020 Duolingo Shared Task

Jindřich Libovický, Zdeněk Kasner, Jindřich Helcl, Ondřej Dušek

Keywords Abstract Paper

Bootstrapping Techniques for Polysynthetic Morphological Analysis

William Lane, Steven Bird

Keywords Paper

Bosheng Ding, Linlin Liu, Lidong Bing and
Canasai Kruengkrai, Thien Hai Nguyen, Shafiq Joty, Luo Si, Chunyan Miao

Keywords Paper

Ethan Perez, Patrick Lewis, Wen-tau Yih and
Kyunghyun Cho, Douwe Kiela

Keywords Paper

Jon Ander Campos, Arantxa Otegi, Aitor Soroa and
Jan Deriu, Mark Cieliebak, Eneko Agirre

Keywords Paper

Keywords Paper

Zhengbao Jiang, Antonios Anastasopoulos, Jun Araki and
Haibo Ding, Graham Neubig

Keywords Paper

Jian Liu, Yubo Chen, Kang Liu and
Wei Bi, Xiaojiang Liu

Keywords Paper

Yutai Hou, Sanyuan Chen, Wanxiang Che and
Cheng Chen, Ting Liu

Keywords Paper

Zihao Fu, Bei Shi, Wai Lam and
Lidong Bing, Zhiyuan Liu

Keywords Paper

Keywords Paper

Zhongli Li, Wenhui Wang, Li Dong and
Furu Wei, Ke Xu

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Bo Zhang, Yue Zhang, Rui Wang and
Zhenghua Li, Min Zhang

Keywords Paper

Keywords Paper

Keywords Paper

Liang Yao, Baosong Yang, Haibo Zhang and
Boxing Chen, Weihua Luo

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Mihaela Bornea, Lin Pan, Sara Rosenthal and
Radu Florian, Avirup Sil

Keywords Paper

Yuxuan Song, Ning Miao, Hao Zhou and
Lantao Yu, Mingxuan Wang, Lei Li

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Man Luo, Shailaja Keyur Sampat, Riley Tallman and
Yankai Zeng, Manuha Vancha, Akarshan Sajja, Chitta Baral

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Zehui Lin, Xiao Pan, Mingxuan Wang and
Xipeng Qiu, Jiangtao Feng, Hao Zhou, Lei Li

Keywords Paper

Keywords Paper

Shuyan Zhou, Shruti Rijhwani, John Wieting and
Jaime Carbonell, Graham Neubig

Keywords Paper

Ankur Parikh, Xuezhi Wang, Sebastian Gehrmann and
Manaal Faruqui, Bhuwan Dhingra, Diyi Yang, Dipanjan Das

Keywords Paper

Honglei Zhuang, Fang Guo, Chao Zhang and
Liyuan Liu, Jiawei Han

Keywords Paper