ParaSCI: A large scientific paraphrase dataset for longer paraphrase generation

19/04/2021

ParaSCI: A large scientific paraphrase dataset for longer paraphrase generation

Qingxiu Dong, Xiaojun Wan, Yue Cao

Keywords:

Abstract Paper Similar Papers

Abstract: We propose ParaSCI, the first large-scale paraphrase dataset in the scientific field, including 33,981 paraphrase pairs from ACL (ParaSCI-ACL) and 316,063 pairs from arXiv (ParaSCI-arXiv). Digging into characteristics and common patterns of scientific papers, we construct this dataset though intra-paper and inter-paper methods, such as collecting citations to the same paper or aggregating definitions by scientific terms. To take advantage of sentences paraphrased partially, we put up PDBERT as a general paraphrase discovering method. The major advantages of paraphrases in ParaSCI lie in the prominent length and textual diversity, which is complementary to existing paraphrase datasets. ParaSCI obtains satisfactory results on human evaluation and downstream tasks, especially long paraphrase generation.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at EACL 2021 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

19/08/2021

Automatically Paraphrasing via Sentence Reconstruction and Round-trip Translation

Zilu Guo, Zhongqiang Huang, Kenny Q. Zhu and
Guandan Chen, Kaibo Zhang, Boxing Chen, Fei Huang

Keywords Paper

Natural Language Processing, Machine Translation, Natural Language Generation, NLP Applications and Tools

0

0

0

0

13:53

08/12/2020

Exploring Cross-sentence Contexts for Named Entity Recognition with BERT

Jouni Luoma, Sampo Pyysalo

Keywords Paper

0

0

0

0

14:39

08/12/2020

SaSAKE: Syntax and Semantics Aware Keyphrase Extraction from Research Papers

T.y.s.s Santosh, Debarshi Kumar Sanyal, Plaban Kumar Bhowmick, Partha Pratim Das

Keywords Paper

0

0

0

0

15:36

08/12/2020

Aspect-based Document Similarity for Research Papers

Malte Ostendorff, Terry Ruas, Till Blume and
Bela Gipp, Georg Rehm

Keywords Paper

0

0

0

0

14:50

02/02/2021

We Can Explain Your Research in Layman's Terms: Towards Automating Science Journalism at Scale

Rumen Dangovski, Michelle Shen, Dawson Byrd and
Li Jing, Desislava Tsvetkova, Preslav Nakov, Marin Soljačić

Keywords Paper

0

0

0

0

19:47

19/04/2021

Globalizing BERT-based transformer architectures for long document summarization

Quentin Grail, Julien Perez, Eric Gaussier

Keywords Paper

0

0

0

0

11:53

16/11/2020

Better Highlighting: Creating Sub-Sentence Summary Highlights

Sangwoo Cho, Kaiqiang Song, Chen Li and
Dong Yu, Hassan Foroosh, Fei Liu

Keywords Paper

highlighting, summarization, abstractive summarizers, determinantal processes

0

0

0

0

12:02

04/07/2020

How to Ask Good Questions? Try to Leverage Paraphrases

Xin Jia, Wenjie Zhou, Xu Sun, Yunfang Wu

Keywords Paper

question generation(QG, sentence-level generation, diversity training, Paraphrases

0

0

0

0

10:13

04/07/2020

Cross-Lingual Semantic Role Labeling with High-Quality Translated Training Corpus

Hao Fei, Meishan Zhang, Donghong Ji

Keywords Paper

Cross-Lingual Labeling, semantic labeling, natural understanding, model transferring

0

0

0

0

10:32

19/04/2021

Scientific discourse tagging for evidence extraction

Xiangci Li, Gully Burns, Nanyun Peng

Keywords Paper

0

0

0

0

10:48

22/11/2021

From Seq2Seq Recognition to Handwritten Word Embeddings

George Retsinas, Giorgos Sfikas, Christophoros Nikou, Petros Maragos

Keywords Paper

keyword spotting, handwritten text recognition, sequence-to-sequence

0

0

0

0

2:59

04/07/2020

Neural CRF Model for Sentence Alignment in Text Simplification

Chao Jiang, Mounica Maddela, Wuwei Lan and
Yang Zhong, Wei Xu

Keywords Paper

Sentence Alignment, Text Simplification, monolingual task, automatic evaluation

0

0

0

1

11:55

04/07/2020

Fact-based Text Editing

Hayate Iso, Chao Qiao, Hang Li

Keywords Paper

Fact-based Editing, text task, text editing, automatically dataset

0

0

0

0

12:41

04/07/2020

Understanding Points of Correspondence between Sentences for Abstractive Summarization

Logan Lebanoff, John Muchovej, Franck Dernoncourt and
Doo Soon Kim, Lidan Wang, Walter Chang, Fei Liu

Keywords Paper

Abstractive Summarization, coreference resolution, summarization, cohesive devices

0

0

0

0

11:47

04/07/2020

Attend to Medical Ontologies: Content Selection for Clinical Abstractive Summarization

Sajad Sotudeh Gharebagh, Nazli Goharian, Ross Filice

Keywords Paper

Content Selection, Clinical Summarization, text task, content problem

0

0

0

0

7:03

16/11/2020

Training Question Answering Models From Synthetic Data

Raul Puri, Ryan Spring, Mohammad Shoeybi and
Mostofa Patwary, Bryan Catanzaro

Keywords Paper

question generation, squad task, em, data method

0

0

0

0

11:33

15/06/2020

Multi-modal synthesis of regular expressions

Qiaochu Chen, Xinyu Wang, Xi Ye and
Greg Durrett, Isil Dillig

Keywords Paper

Programming by Example, Programming by Natural Languages, Program Synthesis, Regular Expression

0

0

0

0

16:17

02/02/2021

Consecutive Decoding for Speech-to-text Translation

Qianqian Dong, Mingxuan Wang, Hao Zhou and
Shuang Xu, Bo Xu, Lei Li

Keywords Paper

0

0

0

0

14:20

05/12/2020

Exploiting WordNet synset and hypernym representations for answer selection

Weikang Li, Yunfang Wu

Keywords Paper

0

0

0

0

7:04

04/07/2020

SpanBERT: Improving Pre-training by Representing and Predicting Spans

Mandar Joshi, Danqi Chen, Yinhan Liu and
Daniel S. Weld, Luke Zettlemoyer, Omer Levy

Keywords Paper

span tasks, question answering, coreference resolution, OntoNotes task

0

0

0

0

14:14

04/07/2020

Unsupervised Paraphrasing by Simulated Annealing

Xianggen Liu, Lili Mou, Fandong Meng and
Hao Zhou, Jie Zhou, Sen Song

Keywords Paper

Unsupervised Paraphrasing, paraphrase generation, optimization problem, Unsupervised Paraphrasing

0

0

0

0

11:36

19/04/2021

Communicative-function-based sentence classification for construction of an academic formulaic expression database

Kenichi Iwatsuki, Akiko Aizawa

Keywords Paper

0

0

0

0

9:17

16/11/2020

X-SRL: A Parallel Cross-Lingual Semantic Role Labeling Dataset

Angel Daza, Anette Frank

Keywords Paper

generalization learning, multilingual learning, high-quality translation, srl

0

0

0

0

9:24

04/07/2020

“Who said it, and Why?” Provenance for Natural Language Claims

Yi Zhang, Zachary Ives, Dan Roth

Keywords Paper

Natural Claims, generating content, publishing, provenance inference

0

0

0

0

12:22

08/12/2020

On the Helpfulness of Document Context to Sentence Simplification

Renliang Sun, Zhe Lin, Xiaojun Wan

Keywords Paper

0

0

0

0

14:21

22/06/2020

Using BibTeX to Automatically Generate Labeled Data for Citation Field Extraction

Dung Thai, Zhiyang Xu, Nicholas Monath and
Boris Veytsman, Andrew McCallum

Keywords Paper

sequence labeling, information extraction, auto-generated dataset

0

0

0

0

4:56

26/04/2020

Neural Machine Translation with Universal Visual Representation

Zhuosheng Zhang, Kehai Chen, Rui Wang and
Masao Utiyama, Eiichiro Sumita, Zuchao Li, Hai Zhao

Keywords Paper

Neural Machine Translation, Visual Representation, Multimodal Machine Translation, Language Representation

0

0

0

0

4:50

01/07/2020

Interactive Extractive Search over Biomedical Corpora

Hillel Taub Tabib, Micah Shlain, Shoval Sadde and
Dan Lahav, Matan Eyal, Yaara Cohen, Yoav Goldberg

Keywords Paper

0

0

0

0

10:01

16/11/2020

ToTTo: A Controlled Table-To-Text Generation Dataset

Ankur Parikh, Xuezhi Wang, Sebastian Gehrmann and
Manaal Faruqui, Bhuwan Dhingra, Diyi Yang, Dipanjan Das

Keywords Paper

controlled task, high-precision generation, totto, dataset process

0

0

0

0

11:53

04/07/2020

Facet-Aware Evaluation for Extractive Summarization

Yuning Mao, Liyuan Liu, Qi Zhu and
Xiang Ren, Jiawei Han

Keywords Paper

Facet-Aware Evaluation, Extractive Summarization, fine-grained evaluation, comparative analysis

0

0

0

0

11:43

04/07/2020

Generalizing Natural Language Analysis through Span-relation Representations

Zhengbao Jiang, Wei Xu, Jun Araki, Graham Neubig

Keywords Paper

Natural Analysis, Natural processing, dependency parsing, semantic labeling

0

0

0

0

8:30

04/07/2020

MATINF: A Jointly Labeled Large-Scale Dataset for Classification, Question Answering and Summarization

Canwen Xu, Jiaxin Pei, Hongtao Wu and
Yiyu Liu, Chenliang Li

Keywords Paper

Classification, Question Answering, Summarization, Natural Processing

0

0

0

0

7:00

26/08/2020

Better Long-Range Dependency By Bootstrapping A Mutual Information Regularizer

Yanshuai Cao, Peng Xu

Keywords Paper

0

0

0

0

15:00

19/04/2021

On robustness of neural semantic parsers

Shuo Huang, Zhuang Li, Lizhen Qu, Lei Pan

Keywords Paper

0

0

0

0

11:11

03/05/2021

On Learning Universal Representations Across Languages

Xiangpeng Wei, Rongxiang Weng, Yue Hu and
Luxi Xing, Heng Yu, Weihua Luo

Keywords Paper

hierarchical contrastive learning, cross-lingual pretraining, universal representation learning

0

0

0

0

3:51

07/06/2021

Tracking Knowledge Propagation Across Wikipedia Languages

Rodolfo Vieira Valentim, Giovanni Comarela, Souneil Park, Diego Sáez-Trumper

Keywords Paper

Centrality/influence of social media publications and authors, Social network analysis, communities identification, expertise and authority discovery, Trend identification and tracking, time series forecasting, Measuring predictability of real world phenom

0

0

0

0

2:56

04/07/2020

Active Learning for Coreference Resolution using Discrete Annotation

Belinda Z. Li, Gabriel Stanovsky, Luke Zettlemoyer

Keywords Paper

Coreference Resolution, active resolution, Active Learning, Discrete Annotation

0

0

0

0

6:36

08/12/2020

AutoMeTS: The Autocomplete for Medical Text Simplification

Hoang Van, David Kauchak, Gondy Leroy

Keywords Paper

0

0

0

0

13:29

08/12/2020

A Deep Metric Learning Method for Biomedical Passage Retrieval

Andrés Rosso-Mateus, Fabio A. González, Manuel Montes-y-Gómez

Keywords Paper

0

0

0

0

14:58

08/12/2020

Semantic Structural Decomposition for Neural Machine Translation

Elior Sulem, Omri Abend, Ari Rappoport

Keywords Paper

0

0

0

0

9:54