SciREX: A Challenge Dataset for Document-Level Information Extraction

04/07/2020

SciREX: A Challenge Dataset for Document-Level Information Extraction

Sarthak Jain, Madeleine van Zuylen, Hannaneh Hajishirzi, Iz Beltagy

Keywords: Document-Level Extraction, IE tasks, salient identification, document identification

Abstract Paper Similar Papers

Abstract: Extracting information from full documents is an important problem in many domains, but most previous work focus on identifying relationships within a sentence or a paragraph. It is challenging to create a large-scale information extraction (IE) dataset at the document level since it requires an understanding of the whole document to annotate entities and their document-level relationships that usually span beyond sentences or even sections. In this paper, we introduce SciREX, a document level IE dataset that encompasses multiple IE tasks, including salient entity identification and document level N-ary relation identification from scientific articles. We annotate our dataset by integrating automatic and human annotations, leveraging existing scientific knowledge resources. We develop a neural model as a strong baseline that extends previous state-of-the-art IE models to document-level IE. Analyzing the model performance shows a significant gap between human performance and current baselines, inviting the community to use our dataset as a challenge to develop document-level IE models. Our data and code are publicly available at https://github.com/allenai/SciREX .

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at ACL 2020 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

02/02/2021

Enhancing Scientific Papers Summarization with Citation Graph

Chenxin An, Ming Zhong, Yiran Chen and
Danqing Wang, Xipeng Qiu, Xuanjing Huang

Keywords Paper

0

0

0

0

13:40

08/12/2020

Provenance for Linguistic Corpora through Nanopublications

Timo Lek, Anna de Groot, Tobias Kuhn, Roser Morante

Keywords Paper

0

0

0

0

13:54

19/10/2020

Multimodal knowledge graph for deep learning papers and code

Amar Viswanathan Kannan, Dmitriy Fradkin, Ioannis Akrotirianakis and
Tugba Kulahcioglu, Arquimedes Canedo, Aditi Roy, Shih-Yuan Yu, Malawade Arnav, Mohammad Abdullah Al Faruque

Keywords Paper

multimodal information retrieval, scientific knowledge graphs, knowledge graphs, scientific knowledge graph exploration, deep learning

0

0

0

0

4:51

22/06/2020

Predicting Institution Hierarchies with Set-based Models

Derek Tam, Nicholas Monath, Ari Kobren, Andrew McCallum

Keywords Paper

Hierarchies, Sets, Transformers, Institutions

0

0

0

0

4:45

02/02/2021

i-Algebra: Towards Interactive Interpretability of Deep Neural Networks

Xinyang Zhang, Ren Pang, Shouling Ji and
Fenglong Ma, Ting Wang

Keywords Paper

0

0

0

0

18:38

03/05/2021

HW-NAS-Bench: Hardware-Aware Neural Architecture Search Benchmark

Chaojian Li, Zhongzhi Yu, Yonggan Fu and
Yongan Zhang, Yang Zhao, Haoran You, Qixuan Yu, Yue Wang, Cong Hao, Yingyan Lin

Keywords Paper

AutoML, Benchmark, Hardware-Aware Neural Architecture Search

0

0

0

0

11:02

16/11/2020

Neural Topic Modeling with Cycle-Consistent Adversarial Training

Xuemeng Hu, Rui Wang, Deyu Zhou, Yuxuan Xiong

Keywords Paper

neural modeling, deep models, adversarial-neural model, adversarially network

0

0

0

1

9:57

19/08/2021

A Survey on Spoken Language Understanding: Recent Advances and New Frontiers

Libo Qin, Tianbao Xie, Wanxiang Che, Ting Liu

Keywords Paper

Natural language processing, General

0

0

0

0

14:57

18/07/2021

Joining datasets via data augmentation in the label space for neural networks

Jake Zhao Zhao, Mingfeng Ou, linji Xue and
Yunkai Cui, Sai Wu, Gang Chen

Keywords Paper

Deep Learning, Theory, Statistical Physics of Learning, Optimization, Non-Convex Optimization; Theory

0

0

0

0

5:14

16/11/2020

Learning from Context or Names? An Empirical Study on Neural Relation Extraction

Hao Peng, Tianyu Gao, Xu Han and
Yankai Lin, Peng Li, Zhiyuan Liu, Maosong Sun, Jie Zhou

Keywords Paper

relation benchmarks, re scenarios, neural models, re models

0

0

0

0

11:56

02/06/2020

Piveau: A Large-Scale Open Data Management Platform Based on Semantic Web Technologies

Fabian Kirstein, Kyriakos Stefanidis, Benjamin Dittwald and
Simon Dutkowski, Sebastian Urbanek, Manfred Hauswirth

Keywords Paper

0

0

0

0

28:26

19/08/2021

Layer-Assisted Neural Topic Modeling over Document Networks

Yiming Wang, Ximing Li, Jihong Ouyang

Keywords Paper

Machine Learning, Learning Graphical Models, Bayesian Networks, Graphical Models

0

0

0

0

12:18

12/07/2020

Retrieval Augmented Language Model Pre-Training

Kelvin Guu, Kenton Lee, Zora Tung and
Panupong Pasupat, Mingwei Chang

Keywords Paper

Applications - Language, Speech and Dialog

0

0

0

0

14:44

06/12/2021

Generalized Shape Metrics on Neural Representations

Alex H Williams, Erin Kunz, Simon Kornblith, Scott Linderman

Keywords Paper

deep learning, machine learning, generative model, representation learning

0

0

0

0

10:55

19/10/2020

Semantic search over structured data

Sainyam Galhotra, Udayan Khurana

Keywords Paper

data lake, dataset search, semantic search

0

0

0

0

4:58

25/07/2020

SummPip: Unsupervised multi-document summarization with sentence graph compression

Jinming Zhao, Ming Liu, Longxiang Gao and
Yuan Jin, Lan Du, He Zhao, He Zhang, Gholamreza Haffari

Keywords Paper

summarization, cluster, sentence graph, text compression

0

0

0

0

9:47

16/11/2020

Widget Captioning: Generating Natural Language Description for Mobile User Interface Elements

Yang Li, Gang Li, Luheng He and
Jingjie Zheng, Hong Li, Zhiwei Guan

Keywords Paper

mobile uis, automatically descriptions, widget captioning, multimodal task

0

0

0

0

11:16

02/02/2021

Dynamic Multi-Context Attention Networks for Citation Forecasting of Scientific Publications

Taoran Ji, Nathan Self, Kaiqun Fu and
Zhiqian Chen, Naren Ramakrishnan, Chang-Tien Lu

Keywords Paper

0

0

0

0

17:58

02/02/2021

Meta-Transfer Learning for Low-Resource Abstractive Summarization

Yi-Syuan Chen, Hong-Han Shuai

Keywords Paper

0

0

0

0

19:10

02/02/2021

Author Homepage Discovery in CiteSeerX

Krutarth Patel, Cornelia Caragea, Doina Caragea, C. Lee Giles

Keywords Paper

0

0

0

0

16:27

16/11/2020

MAVEN: A Massive General Domain Event Detection Dataset

Xiaozhi Wang, Ziqi Wang, Xu Han and
Wangyi Jiang, Rong Han, Zhiyuan Liu, Juanzi Li, Peng Li, Yankai Lin, Jie Zhou

Keywords Paper

event detection, event, ed, identifying words

0

0

0

0

11:17

16/11/2020

Summarizing Text on Any Aspects: A Knowledge-Informed Weakly-Supervised Approach

Bowen Tan, Lianhui Qin, Eric Xing, Zhiting Hu

Keywords Paper

aspect-based summarization, weak method, aspect scheme, supervision data

0

0

0

0

6:38

12/07/2020

PoKED: A Semi-Supervised System for Word Sense Disambiguation

Feng Wei

Keywords Paper

Deep Learning - Generative Models and Autoencoders

0

0

0

0

15:39

19/04/2021

CD^2CR: Co-reference resolution across documents and domains

James Ravenscroft, Amanda Clare, Arie Cattan and
Ido Dagan, Maria Liakata

Keywords Paper

0

0

0

0

11:22

08/12/2020

Best Practices for Data-Efficient Modeling in NLG:How to Train Production-Ready Neural Models with Less Data

Ankit Arun, Soumya Batra, Vikas Bhardwaj and
Ashwini Challa, Pinar Donmez, Peyman Heidari, Hakan Inan, Shashank Jain, Anuj Kumar, Shawn Mei, Karthik Mohan, Michael White

Keywords Paper

0

0

0

0

15:01

16/11/2020

Long-Short Term Masking Transformer: A Simple but Effective Baseline for Document-level Neural Machine Translation

Pei Zhang, Boxing Chen, Niyu Ge, Kai Fan

Keywords Paper

document-level translation, document-level systems, context-aware architecture, transformer

0

0

0

0

6:36

12/09/2020

Plausible Reasoning about EL-Ontologies using Concept Interpolation

Yazmín Ibáñez-García, Víctor Gutiérrez-Basulto, Steven Schockaert

Keywords Paper

Description logics-General, Commonsense reasoning-General, Knowledge representation languages-General, Concept formation, similarity-based reasoning-General

0

0

0

0

15:50

02/06/2020

Fostering Scientific Meta-analyses with Knowledge Graphs: A Case-Study

Ilaria Tiddi, Daniel Balliet, Annette ten Teije

Keywords Paper

0

0

0

0

31:24

04/07/2020

Machine Reading of Historical Events

Or Honovich, Lucas Torroba Hennigen, Omri Abend, Shay B. Cohen

Keywords Paper

Machine Events, Machine reading, NLP, classification

0

0

0

0

12:01

02/06/2020

Investigating Software Usage in the Social Sciences: A Knowledge Graph Approach

David Schindler, Benjamin Zapilko, Frank Krüger

Keywords Paper

0

0

0

0

27:13

14/09/2020

Active Learning for Hierarchical Multi-Label Classification

Felipe Kenji Nakano, Ricardo Cerri, Vens Celin

Keywords Paper

0

0

0

0

15:42

19/04/2021

Metric-type identification for multi-level header numerical tables in scientific papers

Lya Hulliyyatus Suadaa, Hidetaka Kamigaito, Manabu Okumura, Hiroya Takamura

Keywords Paper

0

0

0

0

10:55

04/07/2020

The SOFC-Exp Corpus and Neural Approaches to Information Extraction in the Materials Science Domain

Annemarie Friedrich, Heike Adel, Federico Tomazic and
Johannes Hingerl, Renou Benteau, Anika Marusczyk, Lukas Lange

Keywords Paper

Information Extraction, information task, materials science, slot tasks

0

0

0

0

11:04

26/04/2020

Multiplicative Interactions and Where to Find Them

Siddhant M. Jayakumar, Wojciech M. Czarnecki, Jacob Menick and
Jonathan Schwarz, Jack Rae, Simon Osindero, Yee Whye Teh, Tim Harley, Razvan Pascanu

Keywords Paper

multiplicative interactions, hypernetworks, attention

0

0

0

0

5:34

01/07/2020

Beyond Domain APIs: Task-oriented Conversational Modeling with Unstructured Knowledge Access

Seokhwan Kim, Mihail Eric, Karthik Gopalakrishnan and
Behnam Hedayatnia, Yang Liu, Dilek Hakkani-Tur

Keywords Paper

0

0

0

0

11:34

04/07/2020

One Size Does Not Fit All: Generating and Evaluating Variable Number of Keyphrases

Xingdi Yuan, Tong Wang, Rui Meng and
Khushboo Thaker, Peter Brusilovsky, Daqing He, Adam Trischler

Keywords Paper

modeling perspectives, variable-number generation, keyphrase tasks, neural models

0

0

0

0

12:08

02/02/2021

HopRetriever: Retrieve Hops over Wikipedia to Answer Complex Questions

Shaobo Li, Xiaoguang Li, Lifeng Shang and
Xin Jiang, Qun Liu, Chengjie Sun, Zhenzhou Ji, Bingquan Liu

Keywords Paper

0

0

0

0

15:11

19/10/2020

Event-driven network for cross-modal retrieval

Zhixiong Zeng, Nan Xu, Wenji Mao

Keywords Paper

cross-modal retrieval, event embedding, text representation

0

0

0

0

5:59

26/04/2020

Variational Template Machine for Data-to-Text Generation

Rong Ye, Wenxian Shi, Hao Zhou and
Zhongyu Wei, Lei Li

Keywords Paper

0

0

0

0

4:55

19/10/2020

Neural relation extraction on wikipedia tables for augmenting knowledge graphs

Erin Macdonald, Denilson Barbosa

Keywords Paper

information extraction, benchmarking, web tables

0

0

0

0

6:14