Progressively pretrained dense corpus index for open-domain question answering

19/04/2021

Progressively pretrained dense corpus index for open-domain question answering

Wenhan Xiong, Hong Wang, William Yang Wang

Keywords:

Abstract Paper Similar Papers

Abstract: Commonly used information retrieval methods such as TF-IDF in open-domain question answering (QA) systems are insufficient to capture deep semantic matching that goes beyond lexical overlaps. Some recent studies consider the retrieval process as maximum inner product search (MIPS) using dense question and paragraph representations, achieving promising results on several information-seeking QA datasets. However, the pretraining of the dense vector representations is highly resource-demanding, <i>e.g.</i>, requires a very large batch size and lots of training steps. In this work, we propose a sample-efficient method to pretrain the paragraph encoder. First, instead of using heuristically created pseudo question-paragraph pairs for pretraining, we use an existing pretrained sequence-to-sequence model to build a strong question generator that creates high-quality pretraining data. Second, we propose a simple progressive pretraining algorithm to ensure the existence of effective negative samples in each batch. Across three open-domain QA datasets, our method consistently outperforms a strong dense retrieval baseline that uses 6 times more computation for training. On two of the datasets, our method achieves more than 4-point absolute improvement in terms of answer exact match.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at EACL 2021 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

26/04/2020

Pre-training Tasks for Embedding-based Large-scale Retrieval

Wei-Cheng Chang, Felix X. Yu, Yin-Wen Chang and
Yiming Yang, Sanjiv Kumar

Keywords Paper

natural language processing, large-scale retrieval, unsupervised representation learning, paragraph-level pre-training, two-tower Transformer models

0

0

0

1

4:39

19/10/2020

Efficient neural query auto completion

Sida Wang, Weiwei Guo, Huiji Gao, Bo Long

Keywords Paper

deep learning, query auto completion, neural language model

0

0

0

0

9:59

16/11/2020

Partially-Aligned Data-to-Text Generation with Distant Supervision

Zihao Fu, Bei Shi, Wai Lam and
Lidong Bing, Zhiyuan Liu

Keywords Paper

data-to-text task, generation task, dataset problem, over-generation problem

0

0

0

0

11:58

22/06/2020

Syntactic Question Abstraction and Retrieval for Data-Scarce Semantic Parsing

Wonseok Hwang, Jinyeong Yim, Seunghyun Park, Minjoon Seo

Keywords Paper

Semantic Parsing, NLIDB, WikiSQL, Question Answering, SQL, Information Retrieval

0

0

0

0

4:37

08/12/2020

Domain Transfer based Data Augmentation for Neural Query Translation

Liang Yao, Baosong Yang, Haibo Zhang and
Boxing Chen, Weihua Luo

Keywords Paper

0

0

0

0

10:57

19/08/2021

Proposal-free One-stage Referring Expression via Grid-Word Cross-Attention

Wei Suo, MengYang Sun, Peng Wang, Qi Wu

Keywords Paper

Computer Vision, Language and Vision, Structural and Model-Based Approaches, Knowledge Representation and Reasoning

0

0

0

0

17:31

02/02/2021

HR-Depth: High Resolution Self-Supervised Monocular Depth Estimation

Xiaoyang Lyu, Liang Liu, Mengmeng Wang and
Xin Kong, Lina Liu, Yong Liu, Xinxin Chen, Yi Yuan

Keywords Paper

0

0

0

0

12:10

06/12/2021

Structured Reordering for Modeling Latent Alignments in Sequence Transduction

bailin wang, Mirella Lapata, Ivan Titov

Keywords Paper

language

0

0

0

0

15:00

07/09/2020

BCaR: Beginner Classifier as Regularization Towards Generalizable Re-ID

Masato Tamura, Tomoaki Yoshinaga

Keywords Paper

person re-identification, generalizable, soft label, knowledge distillation, Re-ID, domain generalization

0

0

0

0

6:53

16/11/2020

Knowledge-guided Open Attribute Value Extraction with Reinforcement Learning

Ye Liu, Sheng Zhang, Rui Song and
Suo Feng, Yanghua Xiao

Keywords Paper

open extraction, question-answering task, information system, kg

0

0

0

0

11:55

16/11/2020

Self-Supervised Meta-Learning for Few-Shot Natural Language Classification Tasks

Trapit Bansal, Rishikesh Jha, Tsendsuren Munkhdalai, Andrew McCallum

Keywords Paper

nlp applications, fine-tuning, meta-learning problem, supervised tasks

0

0

0

0

11:49

12/07/2020

PoKED: A Semi-Supervised System for Word Sense Disambiguation

Feng Wei

Keywords Paper

Deep Learning - Generative Models and Autoencoders

0

0

0

0

15:39

19/04/2021

Enconter: Entity constrained progressive sequence generation via insertion-based transformer

Lee Hsun Hsieh, Yang-Yin Lee, Ee-Peng Lim

Keywords Paper

0

0

0

0

11:28

06/12/2021

NxMTransformer: Semi-Structured Sparsification for Natural Language Understanding via ADMM

Connor Holmes, Minjia Zhang, Yuxiong He, Bo Wu

Keywords Paper

optimization, transformers, language

0

0

0

0

10:53

04/07/2020

Multi-source Meta Transfer for Low Resource Multiple-Choice Question Answering

Ming Yan, Hao Zhang, Di Jin, Joey Tianyi Zhou

Keywords Paper

Multi-source Transfer, Low Answering, Multiple-choice answering, machine comprehension

0

0

0

0

7:40

19/04/2021

Generating syntactically controlled paraphrases without using annotated parallel pairs

Kuan-Hao Huang, Kai-Wei Chang

Keywords Paper

0

0

0

1

10:41

08/12/2020

A Deep Metric Learning Method for Biomedical Passage Retrieval

Andrés Rosso-Mateus, Fabio A. González, Manuel Montes-y-Gómez

Keywords Paper

0

0

0

0

14:58

16/11/2020

Gradient-guided Unsupervised Lexically Constrained Text Generation

Lei Sha

Keywords Paper

lexically generation, real-world applications, lexically-constrained generation, unsupervised problem

0

0

0

0

11:39

18/07/2021

Self-Damaging Contrastive Learning

Ziyu Jiang, Tianlong Chen, Bobak Mortazavi, Zhangyang Wang

Keywords Paper

Algorithms, Unsupervised Learning

0

0

0

1

5:10

02/02/2021

PREMERE: Meta-Reweighting via Self-Ensembling for Point-of-Interest Recommendation

Minseok Kim, Hwanjun Song, Doyoung Kim and
Kijung Shin, Jae-Gil Lee

Keywords Paper

0

0

0

0

14:42

06/12/2020

DynaBERT: Dynamic BERT with Adaptive Width and Depth

Lu Hou, Zhiqi Huang, Lifeng Shang and
Xin Jiang, Xiao Chen, Qun Liu

Keywords Paper

0

0

0

0

2:59

04/07/2020

Coach: A Coarse-to-Fine Approach for Cross-domain Slot Filling

Zihan Liu, Genta Indra Winata, Peng Xu, Pascale Fung

Keywords Paper

Cross-domain Filling, task-oriented systems, slot filling, data problem

0

0

0

0

6:59

06/12/2021

Environment Generation for Zero-Shot Compositional Reinforcement Learning

Izzeddin Gur, Natasha Jaques, Yingjie Miao and
Jongwook Choi, Manoj Tiwari, Honglak Lee, Aleksandra Faust

Keywords Paper

reinforcement learning and planning, robustness, graph learning

0

0

0

0

8:40

06/12/2021

Grad2Task: Improved Few-shot Text Classification Using Gradients for Task Representation

Jixuan Wang, Kuan-Chieh Wang, Frank Rudzicz, Michael Brudno

Keywords Paper

machine learning, transformers, meta learning, language, transfer learning

0

0

0

0

14:45

06/12/2021

Automatic Unsupervised Outlier Model Selection

Yue Zhao, Ryan Rossi, Leman Akoglu

Keywords Paper

machine learning, self-supervised learning, meta learning, clustering

0

0

0

0

15:08

02/02/2021

Data Augmentation for Abstractive Query-Focused Multi-Document Summarization

Ramakanth Pasunuru, Asli Celikyilmaz, Michel Galley and
Chenyan Xiong, Yizhe Zhang, Mohit Bansal, Jianfeng Gao

Keywords Paper

0

0

0

0

16:46

02/02/2021

Going Deeper With Directly-Trained Larger Spiking Neural Networks

Hanle Zheng, Yujie Wu, Lei Deng and
Yifan Hu, Guoqi Li

Keywords Paper

0

0

0

0

16:29

04/07/2020

Paraphrase Generation by Learning How to Edit from Samples

Amirhossein Kazemnejad, Mohammadreza Salehi, Mahdieh Soleymani Baghshah

Keywords Paper

Paraphrase Generation, Neural sequence, sequence generation, retrieval-based method

0

0

0

0

12:20

26/08/2020

Improving Maximum Likelihood Training for Text Generation with Density Ratio Estimation

Yuxuan Song, Ning Miao, Hao Zhou and
Lantao Yu, Mingxuan Wang, Lei Li

Keywords Paper

0

0

0

0

12:32

17/08/2020

NASOQ: Numerically accurate sparsity-oriented QP solver

Kazem Cheshmi, Danny M. Kaufman, Shoaib Kamil, Maryam Mehri Dehnavi

Keywords Paper

indefinite factorization, numerical optimization, contact simulation, sparse row modification, mesh deformation, quadratic programming, sparse linear algebra

0

0

0

0

15:27

08/12/2020

Emergent Communication Pretraining for Few-Shot Machine Translation

Yaoyiran Li, Edoardo Maria Ponti, Ivan Vulić, Anna Korhonen

Keywords Paper

0

0

0

0

14:42

02/02/2021

Meta-Transfer Learning for Low-Resource Abstractive Summarization

Yi-Syuan Chen, Hong-Han Shuai

Keywords Paper

0

0

0

0

19:10

02/02/2021

GLISTER: Generalization based Data Subset Selection for Efficient and Robust Learning

Krishnateja Killamsetty, Durga Sivasubramanian, Ganesh Ramakrishnan, Rishabh Iyer

Keywords Paper

0

0

0

0

19:14

04/07/2020

Using Context in Neural Machine Translation Training Objectives

Danielle Saunders, Felix Stahlberg, Bill Byrne

Keywords Paper

Neural training, NMT training, document-level training, NMT objective

0

0

0

0

6:48

16/11/2020

Plug and Play Autoencoders for Conditional Text Generation

Florian Mai, Nikolaos Pappas, Ivan Montero and
Noah A. Smith, James Henderson

Keywords Paper

conditional tasks, style transfer, style tasks, text autoencoders

0

0

0

0

9:23

26/04/2020

Plug and Play Language Models: A Simple Approach to Controlled Text Generation

Sumanth Dathathri, Andrea Madotto, Janice Lan and
Jane Hung, Eric Frank, Piero Molino, Jason Yosinski, Rosanne Liu

Keywords Paper

controlled text generation, generative models, conditional generative models, language modeling, transformer

0

0

1

1

4:58

16/11/2020

The World is Not Binary: Learning to Rank with Grayscale Data for Dialogue Response Selection

Zibo Lin, Deng Cai, Yan Wang and
Xiaojiang Liu, Haitao Zheng, Shuming Shi

Keywords Paper

response selection, retrieval-based systems, learning-to-rank problem, learning-to-rank

0

0

0

0

12:03

02/02/2021

Improving the Efficiency and Effectiveness for BERT-based Entity Resolution

Bing Li, Yukai Miao, Yaoshu Wang and
Yifang Sun, Wei Wang

Keywords Paper

0

1

0

0

14:53

16/11/2020

Coarse-to-Fine Query Focused Multi-Document Summarization

Yumo Xu, Mirella Lapata

Keywords Paper

modeling interactions, query summarization, assembling summaries, question answering

0

0

0

0

11:30

19/04/2021

Retrieval, re-ranking and multi-task learning for knowledge-base question answering

Zhiguo Wang, Patrick Ng, Ramesh Nallapati, Bing Xiang

Keywords Paper

0

0

0

0

11:12