Span Selection Pre-training for Question Answering

04/07/2020

Span Selection Pre-training for Question Answering

Michael Glass, Alfio Gliozzo, Rishav Chakravarti, Anthony Ferritto, Lin Pan, G P Shrivatsa Bhargav, Dinesh Garg, Avi Sil

Keywords: Question Answering, language tasks, Next Prediction, pre-training task

Abstract Paper Similar Papers

Abstract: BERT (Bidirectional Encoder Representations from Transformers) and related pre-trained Transformers have provided large gains across many language understanding tasks, achieving a new state-of-the-art (SOTA). BERT is pretrained on two auxiliary tasks: Masked Language Model and Next Sentence Prediction. In this paper we introduce a new pre-training task inspired by reading comprehension to better align the pre-training from memorization to understanding. Span Selection PreTraining (SSPT) poses cloze-like training instances, but rather than draw the answer from the model’s parameters, it is selected from a relevant passage. We find significant and consistent improvements over both BERT-BASE and BERT-LARGE on multiple Machine Reading Comprehension (MRC) datasets. Specifically, our proposed model has strong empirical evidence as it obtains SOTA results on Natural Questions, a new benchmark MRC dataset, outperforming BERT-LARGE by 3 F1 points on short answer prediction. We also show significant impact in HotpotQA, improving answer prediction F1 by 4 points and supporting fact prediction F1 by 1 point and outperforming the previous best system. Moreover, we show that our pre-training approach is particularly effective when training data is limited, improving the learning curve by a large amount.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at ACL 2020 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

26/04/2020

Pretrained Encyclopedia: Weakly Supervised Knowledge-Pretrained Language Model

Wenhan Xiong, Jingfei Du, William Yang Wang, Veselin Stoyanov

Keywords Paper

0

0

0

0

5:00

04/07/2020

Distilling Knowledge Learned in BERT for Text Generation

Yen-Chun Chen, Zhe Gan, Yu Cheng and
Jingzhou Liu, Jingjing Liu

Keywords Paper

Text Generation, language tasks, language generation, generation tasks

0

0

0

0

10:41

19/04/2021

Cross-lingual visual pre-training for multimodal machine translation

Ozan Caglayan, Menekse Kuyu, Mustafa Sercan Amac and
Pranava Madhyastha, Erkut Erdem, Aykut Erdem, Lucia Specia

Keywords Paper

0

0

0

0

6:16

16/11/2020

Partially-Aligned Data-to-Text Generation with Distant Supervision

Zihao Fu, Bei Shi, Wai Lam and
Lidong Bing, Zhiyuan Liu

Keywords Paper

data-to-text task, generation task, dataset problem, over-generation problem

0

0

0

0

11:58

16/11/2020

Recall and Learn: Fine-tuning Deep Pretrained Language Models with Less Forgetting

Sanyuan Chen, Yutai Hou, Yiming Cui and
Wanxiang Che, Ting Liu, Xiangzhan Yu

Keywords Paper

pretraining, pretraining tasks, learning tasks, fine-tuning bert-large

0

0

0

1

10:52

26/04/2020

StructBERT: Incorporating Language Structures into Pre-training for Deep Language Understanding

Wei Wang, Bin Bi, Ming Yan and
Chen Wu, Jiangnan Xia, Zuyi Bao, Liwei Peng, Luo Si

Keywords Paper

0

0

0

0

5:34

06/12/2020

MPNet: Masked and Permuted Pre-training for Language Understanding

Kaitao Song, Xu Tan, Tao Qin and
Jianfeng Lu, Tie-Yan Liu

Keywords Paper

0

0

0

0

3:23

22/06/2020

How Context Affects Language Models' Factual Predictions

Fabio Petroni, Patrick Lewis, Aleksandra Piktus and
Tim Rocktäschel, Yuxiang Wu, Alexander H. Miller, Sebastian Riedel

Keywords Paper

0

0

0

0

10:16

06/12/2020

ConvBERT: Improving BERT with Span-based Dynamic Convolution

Zi-Hang Jiang, Weihao Yu, Daquan Zhou and
Yunpeng Chen, Jiashi Feng, Shuicheng Yan

Keywords Paper

0

0

0

0

3:20

03/05/2021

InfoBERT: Improving Robustness of Language Models from An Information Theoretic Perspective

Boxin Wang, Shuohang Wang, Yu Cheng and
Zhe Gan, Ruoxi Jia, Bo Li, Jingjing Liu

Keywords Paper

adversarial training, QA, NLI, BERT, information theory, adversarial robustness

0

0

0

0

5:21

25/07/2020

A pairwise probe for understanding BERT fine-tuning on machine reading comprehension

Jie Cai, Zhengzhou Zhu, Ping Nie, Qian Liu

Keywords Paper

machine reading comprehension, pairwise, fine-tune, BERT

0

0

0

0

6:38

06/12/2021

DOBF: A Deobfuscation Pre-Training Objective for Programming Languages

Marie-Anne Lachaux, Baptiste Roziere, Marc Szafraniec, Guillaume Lample

Keywords Paper

self-supervised learning

0

0

0

0

13:09

04/07/2020

Intermediate-Task Transfer Learning with Pretrained Language Models: When and Why Does It Work?

Yada Pruksachatkun, Jason Phang, Haokun Liu and
Phu Mon Htut, Xiaoyi Zhang, Richard Yuanzhe Pang, Clara Vania, Katharina Kann, Samuel R. Bowman

Keywords Paper

Intermediate-Task Learning, natural tasks, data-rich task, intermediate-task training

0

0

0

0

14:47

04/07/2020

Cross-Lingual Semantic Role Labeling with High-Quality Translated Training Corpus

Hao Fei, Meishan Zhang, Donghong Ji

Keywords Paper

Cross-Lingual Labeling, semantic labeling, natural understanding, model transferring

0

0

0

0

10:32

16/11/2020

Improving AMR Parsing with Sequence-to-Sequence Pre-training

Dongqin Xu, Junhui Li, Muhua Zhu and
Min Zhang, Guodong Zhou

Keywords Paper

abstract parsing, amr parsing, sequence-to-sequence parsing, machine translation

0

0

0

0

11:42

19/04/2021

ENPAR:enhancing entity and entity pair representations for joint entity relation extraction

Yijun Wang, Changzhi Sun, Yuanbin Wu and
Hao Zhou, Lei Li, Junchi Yan

Keywords Paper

0

0

0

0

7:23

04/07/2020

Encoder-Decoder Models Can Benefit from Pre-trained Masked Language Models in Grammatical Error Correction

Masahiro Kaneko, Masato Mita, Shun Kiyono and
Jun Suzuki, Kentaro Inui

Keywords Paper

Grammatical Correction, GEC, Encoder-Decoder Models, Pre-trained Models

0

0

0

0

6:44

19/04/2021

Better neural machine translation by extracting linguistic information from BERT

Hassan S. Shavarani, Anoop Sarkar

Keywords Paper

0

0

0

0

12:15

04/07/2020

Syntactic Data Augmentation Increases Robustness to Inference Heuristics

Junghyun Min, R. Thomas McCoy, Dipanjan Das and
Emily Pitler, Tal Linzen

Keywords Paper

Syntactic Augmentation, natural inference, natural NLI, NLI

0

0

0

0

6:59

05/12/2020

Investigating learning dynamics of BERT fine-tuning

Yaru Hao, Li Dong, Furu Wei, Ke Xu

Keywords Paper

0

0

0

0

7:10

08/12/2020

Designing Templates for Eliciting Commonsense Knowledge from Pretrained Sequence-to-Sequence Models

Jheng-Hong Yang, Sheng-Chieh Lin, Rodrigo Nogueira and
Ming-Feng Tsai, Chuan-Ju Wang, Jimmy Lin

Keywords Paper

0

0

0

0

9:14

06/12/2020

Towards Interpretable Natural Language Understanding with Explanations as Latent Variables

Wangchunshu Zhou, Jinyi Hu, Hanlin Zhang and
Xiaodan Liang, Maosong Sun, Chenyan Xiong, Jian Tang

Keywords Paper

0

0

0

0

3:22

16/11/2020

DagoBERT: Generating Derivational Morphology with a Pretrained Language Model

Valentin Hofmann, Janet Pierrehumbert, Hinrich Schütze

Keywords Paper

full finetuning, derivation generation, pretrained models, plms

0

0

0

0

10:15

04/07/2020

Improving Disfluency Detection by Self-Training a Self-Attentive Model

Paria Jamshid Lou, Mark Johnson

Keywords Paper

Disfluency Detection, joint parsing, Self-Attentive Model, Self-attentive parsers

0

0

0

0

12:37

19/08/2021

Automatically Paraphrasing via Sentence Reconstruction and Round-trip Translation

Zilu Guo, Zhongqiang Huang, Kenny Q. Zhu and
Guandan Chen, Kaibo Zhang, Boxing Chen, Fei Huang

Keywords Paper

Natural Language Processing, Machine Translation, Natural Language Generation, NLP Applications and Tools

0

0

0

0

13:53

16/11/2020

PRover: Proof Generation for Interpretable Reasoning over Rules

Swarnadeep Saha, Sayan Ghosh, Shashank Srivastava, Mohit Bansal

Keywords Paper

inference, qa generation, generalization, qa task

0

0

0

0

11:30

26/04/2020

Incorporating BERT into Neural Machine Translation

Jinhua Zhu, Yingce Xia, Lijun Wu and
Di He, Tao Qin, Wengang Zhou, Houqiang Li, Tieyan Liu

Keywords Paper

BERT, Neural Machine Translation

0

0

0

0

4:47

25/07/2020

Table search using a deep contextualized language model

Zhiyu Chen, Mohamed Trabelsi, Jeff Heflin and
Yinan Xu, Brian D. Davison

Keywords Paper

neural networks, pretrained language model, table search, information retrieval

0

0

0

0

12:33

12/07/2020

Stabilizing Transformers for Reinforcement Learning

Emilio Parisotto, Francis Song, Jack Rae and
Razvan Pascanu, Caglar Gulcehre, Siddhant Jayakumar, Max Jaderberg, Raphael Lopez Kaufman, Aidan Clark, Seb Noury, Matthew Botvinick, Nicolas Heess, Raia Hadsell

Keywords Paper

Reinforcement Learning - Deep RL

0

0

0

0

14:20

06/12/2020

Incorporating BERT into Parallel Sequence Decoding with Adapters

Junliang Guo, Zhirui Zhang, Linli Xu and
Hao-Ran Wei, Boxing Chen, Enhong Chen

Keywords Paper

0

0

0

0

3:17

02/02/2021

LRC-BERT: Latent-representation Contrastive Knowledge Distillation for Natural Language Understanding

Hao Fu, Shaojun Zhou, Qihong Yang and
Junjie Tang, Guiquan Liu, Kaikui Liu, Xiaolong Li

Keywords Paper

0

0

0

0

15:25

03/05/2021

Deberta: Decoding-Enhanced Bert With Disentangled Attention

Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen

Keywords Paper

Position Encoding, Attention, Natural Language Processing, Language Model Pre-training, Transformer

0

0

0

0

6:06

16/11/2020

PAIR: Planning and Iterative Refinement in Pre-trained Transformers for Long Text Generation

Xinyu Hua, Lu Wang

Keywords Paper

generation, pre-trained transformers, content-controlled framework, pair

0

0

0

0

12:12

02/02/2021

*-CFQ: Analyzing the Scalability of Machine Learning on a Compositional Task

Dmitry Tsarkov, Tibor Tihon, Nathan Scales and
Nikola Momchev, Danila Sinopalnikov, Nathanael Schärli

Keywords Paper

0

0

0

0

16:33

04/07/2020

How does BERT's attention change when you fine-tune? An analysis methodology and a case study in negation scope

Yiyun Zhao, Steven Bethard

Keywords Paper

downstream task, NLP problems, knowledge-related tasks, downstream tasks

0

0

0

0

11:43

16/11/2020

Improving Grammatical Error Correction Models with Purpose-Built Adversarial Examples

Lihao Wang, Xiaoqing Zheng

Keywords Paper

grammatical correction, sequence-to-sequence learning, neural networks, gec

0

0

0

0

11:40

16/11/2020

An Empirical Study on Robustness to Spurious Correlations using Pre-trained Language Models

Lifu Tu, Garima Lalwani, Spandana Gella, He He

Keywords Paper

generalization, natural inference, paraphrase identification, pre-trained models

0

0

0

0

11:55

02/02/2021

SALNet: Semi-supervised Few-Shot Text Classification with Attention-based Lexicon Construction

Ju-Hyoung Lee, Sang-Ki Ko, Yo-Sub Han

Keywords Paper

0

0

0

0

15:28

16/11/2020

LAReQA: Language-Agnostic Answer Retrieval from a Multilingual Pool

Uma Roy, Noah Constant, Rami Al-Rfou and
Aditya Barua, Aaron Phillips, Yinfei Yang

Keywords Paper

language-agnostic retrieval, cross-lingual tasks, cross-lingual retrieval, alignment

0

0

0

0

12:07

04/07/2020

Perturbed Masking: Parameter-free Probing for Analyzing and Interpreting BERT

Zhiyong Wu, Yun Chen, Ben Kao, Qun Liu

Keywords Paper

Analyzing BERT, linguistic tasks, dependency parsing, probing tasks

0

0

0

0

11:00