A Methodology for Creating Question Answering Corpora Using Inverse Data Annotation

04/07/2020

A Methodology for Creating Question Answering Corpora Using Inverse Data Annotation

Jan Deriu, Katsiaryna Mlynchyk, Philippe Schläpfer, Alvaro Rodrigo, Dirk von Grünigen, Nicolas Kaiser, Kurt Stockinger, Eneko Agirre, Mark Cieliebak

Keywords: question answering, annotation, Inverse Annotation, intermediate representation

Abstract Paper Similar Papers

Abstract: In this paper, we introduce a novel methodology to efficiently construct a corpus for question answering over structured data. For this, we introduce an intermediate representation that is based on the logical query plan in a database, called Operation Trees (OT). This representation allows us to invert the annotation process without loosing flexibility in the types of queries that we generate. Furthermore, it allows for fine-grained alignment of the tokens to the operations. Thus, we randomly generate OTs from a context free grammar and annotators just have to write the appropriate question and assign the tokens. We compare our corpus OTTA (Operation Trees and Token Assignment), a large semantic parsing corpus for evaluating natural language interfaces to databases, to Spider and LC-QuaD 2.0 and show that our methodology more than triples the annotation speed while maintaining the complexity of the queries. Finally, we train a state-of-the-art semantic parsing model on our data and show that our dataset is a challenging dataset and that the token alignment can be leveraged to significantly increase the performance.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at ACL 2020 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

16/11/2020

PathQG: Neural Question Generation from Facts

Siyuan Wang, Zhongyu Wei, Zhihao Fan and
Zengfeng Huang, Weijian Sun, Qi Zhang, Xuanjing Huang

Keywords Paper

question generation, query learning, query-based generation, sequence problem

0

0

0

0

11:16

25/07/2020

Web table retrieval using multimodal deep learning

Roee Shraga, Haggai Roitman, Guy Feigenblat, Mustafa Cannim

Keywords Paper

experimentation, multimodal deep-learning, table retrieval

0

0

0

0

14:08

26/04/2020

Variational Template Machine for Data-to-Text Generation

Rong Ye, Wenxian Shi, Hao Zhou and
Zhongyu Wei, Lei Li

Keywords Paper

0

0

0

0

4:55

04/07/2020

TaPas: Weakly Supervised Table Parsing via Pre-training

Jonathan Herzig, Pawel Krzysztof Nowak, Thomas Müller and
Francesco Piccinno, Julian Eisenschlos

Keywords Paper

Weakly Parsing, semantic task, question tables, SQA

0

0

0

0

12:49

02/02/2021

Encoder-Decoder Based Unified Semantic Role Labeling with Label-Aware Syntax

Hao Fei, Fei Li, Bobo Li, Donghong Ji

Keywords Paper

0

0

0

0

16:10

03/05/2021

Autoregressive Entity Retrieval

Nicola De Cao, Gautier Izacard, Sebastian Riedel, Fabio Petroni

Keywords Paper

constrained beam search, entity disambiguation, end-to-end entity linking, entity linking, autoregressive language model, document retrieval, entity retrieval

0

0

0

0

10:14

08/12/2020

A Tale of Two Linkings: Dynamically Gating between Schema Linking and Structural Linking for Text-to-SQL Parsing

Sanxing Chen, Aidan San, Xiaodong Liu, Yangfeng Ji

Keywords Paper

0

0

0

0

12:15

02/06/2020

Keyword Search over RDF Using Document-Centric Information Retrieval Systems

Giorgos Kadilierakis, Pavlos Fafalios, Panagiotis Papadakos, Yannis Tzitzikas

Keywords Paper

0

0

0

0

26:26

12/07/2020

Mapping natural-language problems to formal-language solutions using structured neural representations

Kezhen Chen, Qiuyuan Huang, Hamid Palangi and
Paul Smolensky, Ken Forbus, Jianfeng Gao

Keywords Paper

Applications - Neuroscience, Cognitive Science, Biology and Health

0

0

0

0

11:34

19/04/2021

Expanding, retrieving and infilling: Diversifying cross-domain question generation with flexible templates

Xiaojing Yu, Anxiao Jiang

Keywords Paper

0

0

0

0

11:40

04/07/2020

Photon: A Robust Cross-Domain Text-to-SQL System

Jichuan Zeng, Xi Victoria Lin, Steven C.H. Hoi and
Richard Socher, Caiming Xiong, Michael Lyu, Irwin King

Keywords Paper

natural communication, programming, Photon, Robust System

0

0

0

0

9:05

15/11/2020

Unifying Execution of Imperative Generators and Declarative Specifications

Pengyu Nie, Marinela Parovic, Zhiqiang Zang and
Sarfraz Khurshid, Aleksandar Milicevic, Milos Gligoric

Keywords Paper

Imperative generators, declarative specifications, Deuterium

0

0

0

0

14:47

26/04/2020

Neural Module Networks for Reasoning over Text

Nitish Gupta, Kevin Lin, Dan Roth and
Sameer Singh, Matt Gardner

Keywords Paper

question answering, compositionality, neural module networks, multi-step reasoning, reading comprehension

0

0

0

0

4:36

16/11/2020

A Predicate-Function-Argument Annotation of Natural Language for Open-Domain Information eXpression

Mingming Sun, Wenyue Hua, Zoey Liu and
Xin Wang, Kangjie Zheng, Ping Li

Keywords Paper

inference operations, oie algorithms, featured strategies, pipeline

0

0

0

0

11:47

18/07/2021

You Only Sample (Almost) Once: Linear Cost Self-Attention Via Bernoulli Sampling

Zhanpeng Zeng, Yunyang Xiong, Sathya Ravi and
Shailesh Acharya, Glenn Fung, Vikas Singh

Keywords Paper

Applications, Natural Language Processing

0

0

0

0

5:16

19/01/2020

Program Synthesis by Type-Guided Abstraction Refinement

Zheng Guo, Michael B. James, David Justo and
Jiaxiao Zhou, Ziteng Wang, Ranjit Jhala, Nadia Polikarpova

Keywords Paper

Abstract Interpretation, Type Systems, Program Synthesis

0

0

0

0

20:30

06/12/2020

Finding the Homology of Decision Boundaries with Active Learning

Weizhi Li, Gautam Dasarathy, Karthi Natesan Ramamurthy, Visar Berisha

Keywords Paper

Algorithms -> AutoML; Applications -> Fairness, Accountability, and Transparency; Optimization -> Stochastic Optimization, Algorithms -> Classification

0

0

0

0

3:27

19/08/2021

Cardinality Queries over DL-Lite Ontologies

Meghyn Bienvenu, Quentin Manière, Michaël Thomazo

Keywords Paper

Knowledge Representation and Reasoning, Computational Complexity of Reasoning, Description Logics and Ontologies

0

0

0

0

15:02

04/07/2020

Semantic Graphs for Generating Deep Questions

Liangming Pan, Yuxi Xie, Yansong Feng and
Tat-Seng Chua, Min-Yen Kan

Keywords Paper

Generating Questions, Deep Generation, Deep DQG, reasoning

0

0

0

0

15:43

02/02/2021

HMS: A Hierarchical Solver with Dependency-Enhanced Understanding for Math Word Problem

Xin Lin, Zhenya Huang, Hongke Zhao and
Enhong Chen, Qi Liu, Hao Wang, Shijin Wang

Keywords Paper

0

0

0

0

18:01

04/07/2020

Logical Natural Language Generation from Open-Domain Tables

Wenhu Chen, Jianshu Chen, Yu Su and
Zhiyu Chen, William Yang Wang

Keywords Paper

Logical Generation, neural NLG, surface-level realizations, logical inference

0

0

0

0

11:48

02/02/2021

Knowledge-Base Degrees of Inconsistency: Complexity and Counting

Johannes K. Fichte, Markus Hecher, Arne Meier

Keywords Paper

0

0

0

0

19:03

22/06/2020

Learning Relation Entailment with Structured and Textual Information

Zhengbao Jiang, Jun Araki, Donghan Yu and
Ruohong Zhang, Wei Xu, Yiming Yang, Graham Neubig

Keywords Paper

relation entailment, structured information, textual information

0

0

0

0

4:57

04/07/2020

A Novel Cascade Binary Tagging Framework for Relational Triple Extraction

Zhepei Wei, Jianlin Su, Yue Wang and
Yuan Tian, Yi Chang

Keywords Paper

Relational Extraction, large-scale construction, overlapping problem, relational task

0

0

0

0

11:05

02/06/2020

A Simple Method for Inducing Class Taxonomies in Knowledge Graphs

Marcin Pietrasik, Marek Reformat

Keywords Paper

0

0

0

0

18:09

19/04/2021

Structural encoding and pre-training matter: Adapting BERT for table-based fact verification

Rui Dong, David Smith

Keywords Paper

0

0

0

0

11:42

04/07/2020

Knowledge Graph-Augmented Abstractive Summarization with Semantic-Driven Cloze Reward

Luyang Huang, Lingfei Wu, Lu Wang

Keywords Paper

Knowledge Summarization, abstractive summarization, semantic interpretation, generation summaries

0

0

0

0

12:01

02/02/2021

Topology-Aware Correlations Between Relations for Inductive Link Prediction in Knowledge Graphs

Jiajun Chen, Huarui He, Feng Wu, Jie Wang

Keywords Paper

0

0

0

0

13:53

04/07/2020

Learning to Recover from Multi-Modality Errors for Non-Autoregressive Neural Machine Translation

Qiu Ran, Yankai Lin, Peng Li, Jie Zhou

Keywords Paper

Non-Autoregressive Translation, Non-Autoregressive , inference process, multi-modality problem

0

0

0

0

8:34

06/12/2020

PyGlove: Symbolic Programming for Automated Machine Learning

Daiyi Peng, Xuanyi Dong, Esteban Real and
Mingxing Tan, Yifeng Lu, Gabriel Bender, Hanxiao Liu, Adam Kraft, Chen Liang, Quoc V Le

Keywords Paper

0

0

0

0

3:17

06/12/2020

Strongly Incremental Constituency Parsing with Graph Neural Networks

Kaiyu Yang, Jia Deng

Keywords Paper

Deep Learning -> Generative Models, Algorithms -> Large Scale Learning

0

0

0

0

3:19

02/02/2021

Curriculum-Meta Learning for Order-Robust Continual Relation Extraction

Tongtong Wu, Xuekai Li, Yuan-Fang Li and
Gholamreza Haffari, Guilin Qi, Yujin Zhu, Guoqiang Xu

Keywords Paper

0

0

0

0

11:33

02/02/2021

Dynamic Hybrid Relation Exploration Network for Cross-Domain Context-Dependent Semantic Parsing

Binyuan Hui, Ruiying Geng, Qiyu Ren and
Binhua Li, Yongbin Li, Jian Sun, Fei Huang, Luo Si, Pengfei Zhu, Xiaodan Zhu

Keywords Paper

0

0

0

0

14:22

02/02/2021

Rethinking Boundaries: End-To-End Recognition of Discontinuous Mentions with Pointer Networks

Hao Fei, Donghong Ji, Bobo Li and
Yijiang Liu, Yafeng Ren, Fei Li

Keywords Paper

0

0

0

0

16:51

15/11/2020

DiffStream: Differential Output Testing for Stream Processing Programs

Konstantinos Kallas, Filip Niksic, Caleb Stanford, Rajeev Alur

Keywords Paper

runtime verification, differential testing, stream processing

0

0

0

0

15:50

14/06/2020

Auto-Encoding Twin-Bottleneck Hashing

Yuming Shen, Jie Qin, Jiaxin Chen and
Mengyang Yu, Li Liu, Fan Zhu, Fumin Shen, Ling Shao

Keywords Paper

image hashing, data retrieval, unsupervised learning, graph neural networks

0

0

0

0

1:00

12/09/2020

Datalog Rewritability and Data Complexity of ALCHOIF with Closed Predicates

Tomasz Gogacz, Sanja Lukumbuzya, Magdalena Ortiz, Mantas Šimkus

Keywords Paper

Description logics-General, Ontology-based data access, integration, and exchange-General, Logic programming, answer set programming, constraint logic programming-General

0

0

0

0

16:06

16/11/2020

QADiscourse - Discourse Relations as QA Pairs: Representation, Crowdsourcing and Baselines

Valentina Pyatkin, Ayal Klein, Reut Tsarfaty, Ido Dagan

Keywords Paper

natural understanding, predicting relations, discourse relations, question-and-answer pairs

0

0

0

0

11:22

04/07/2020

Benchmarking Multimodal Regex Synthesis with Complex Structures

Xi Ye, Qiaochu Chen, Isil Dillig, Greg Durrett

Keywords Paper

Multimodal Synthesis, regular generation, regex tasks, StackOverflow

0

0

0

0

11:51

26/04/2020

TabFact: A Large-scale Dataset for Table-based Fact Verification

Wenhu Chen, Hongmin Wang, Jianshu Chen and
Yunkai Zhang, Hong Wang, Shiyang Li, Xiyou Zhou, William Yang Wang

Keywords Paper

Fact Verification, Tabular Data, Symbolic Reasoning

0

0

0

0

5:49