ASSET: A Dataset for Tuning and Evaluation of Sentence Simplification Models with Multiple Rewriting Transformations

04/07/2020

ASSET: A Dataset for Tuning and Evaluation of Sentence Simplification Models with Multiple Rewriting Transformations

Fernando Alva-Manchego, Louis Martin, Antoine Bordes, Carolina Scarton, Benoît Sagot, Lucia Specia

Keywords: Tuning Models, rewriting transformations, automatic simplification, splitting

Abstract Paper Similar Papers

Abstract: In order to simplify a sentence, human editors perform multiple rewriting transformations: they split it into several shorter sentences, paraphrase words (i.e. replacing complex words or phrases by simpler synonyms), reorder components, and/or delete information deemed unnecessary. Despite these varied range of possible text alterations, current models for automatic sentence simplification are evaluated using datasets that are focused on a single transformation, such as lexical paraphrasing or splitting. This makes it impossible to understand the ability of simplification models in more realistic settings. To alleviate this limitation, this paper introduces ASSET, a new dataset for assessing sentence simplification in English. ASSET is a crowdsourced multi-reference corpus where each simplification was produced by executing several rewriting transformations. Through quantitative and qualitative experiments, we show that simplifications in ASSET are better at capturing characteristics of simplicity when compared to other standard evaluation datasets for the task. Furthermore, we motivate the need for developing better methods for automatic evaluation using ASSET, since we show that current popular metrics may not be suitable when multiple simplification transformations are performed.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at ACL 2020 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

16/11/2020

Zero-Shot Crosslingual Sentence Simplification

Jonathan Mallinson, Rico Sennrich, Mirella Lapata

Keywords Paper

sentence simplification, translation, simplification, encoder-decoder models

0

0

0

0

10:34

16/11/2020

Small but Mighty: New Benchmarks for Split and Rephrase

Li Zhang, Huaiyu Zhu, Siddhartha Brahma, Yunyao Li

Keywords Paper

text task, fine-grained evaluation, automatic process, rule-based model

0

0

0

0

6:58

16/11/2020

X-SRL: A Parallel Cross-Lingual Semantic Role Labeling Dataset

Angel Daza, Anette Frank

Keywords Paper

generalization learning, multilingual learning, high-quality translation, srl

0

0

0

0

9:24

08/12/2020

Multi-Word Lexical Simplification

Piotr Przybyła, Matthew Shardlow

Keywords Paper

0

0

0

0

14:50

04/07/2020

Probing for Referential Information in Language Models

Ionut-Teodor Sorodoc, Kristina Gulordava, Gemma Boleda

Keywords Paper

Probing, probe tasks, Language Models, LSTM architectures

0

0

0

0

11:31

16/11/2020

Reformulating Unsupervised Style Transfer as Paraphrase Generation

Kalpesh Krishna, John Wieting, Mohit Iyyer

Keywords Paper

style transfer, attribute transfer, unsupervised transfer, paraphrase problem

0

0

0

0

11:46

04/07/2020

Syn-QG: Syntactic and Shallow Semantic Rules for Question Generation

Kaustubh Dhole, Christopher D. Manning

Keywords Paper

Question Generation, syntactic transformation, crowd-sourced evaluations, generating questions

0

0

0

0

12:24

04/07/2020

A Systematic Study of Inner-Attention-Based Sentence Representations in Multilingual Neural Machine Translation

Raúl Vázquez, Alessandro Raganato, Mathias Creutz, Jörg Tiedemann

Keywords Paper

Multilingual Translation, Neural translation, transfer learning, translation

0

0

0

0

14:05

01/07/2020

Towards Reversal-Based Textual Data Augmentation for NLI Problems with Opposable Classes

Alexey Tarasov

Keywords Paper

0

0

0

0

9:06

04/07/2020

Unsupervised Multilingual Sentence Embeddings for Parallel Corpus Mining

Ivana Kvapilíková, Mikel Artetxe, Gorka Labaka and
Eneko Agirre, Ondřej Bojar

Keywords Paper

Unsupervised Embeddings, Parallel Mining, multilingual embeddings, parallel tasks

0

0

0

0

11:30

04/07/2020

Neural Syntactic Preordering for Controlled Paraphrase Generation

Tanya Goyal, Greg Durrett

Keywords Paper

Controlled Generation, Paraphrasing sentences, machine translation, Neural Preordering

0

0

0

0

11:37

19/08/2021

ALaSca: an Automated approach for Large-Scale Lexical Substitution

Caterina Lacerra, Tommaso Pasini, Rocco Tripodi, Roberto Navigli

Keywords Paper

Natural Language Processing, Natural Language Semantics, Resources and Evaluation

0

0

0

0

14:27

19/04/2021

StructSum: Summarization via structured representations

Vidhisha Balachandran, Artidoro Pagnoni, Jay Yoon Lee and
Dheeraj Rajagopal, Jaime Carbonell, Yulia Tsvetkov

Keywords Paper

0

0

0

0

6:32

06/12/2021

FastCorrect: Fast Error Correction with Edit Alignment for Automatic Speech Recognition

Yichong Leng, Xu Tan, Linchen Zhu and
Jin Xu, Renqian Luo, Linquan Liu, Tao Qin, Xiangyang Li, Edward Lin, Tie-Yan Liu

Keywords Paper

0

0

0

0

13:44

03/05/2021

CoDA: Contrast-enhanced and Diversity-promoting Data Augmentation for Natural Language Understanding

Yanru Qu, Dinghan Shen, Yelong Shen and
Sandra Sajeev, Weizhu Chen, Jiawei Han

Keywords Paper

consistency training, contrastive learning, data augmentation, natural language understanding

0

0

0

0

6:02

06/12/2021

Controlled Text Generation as Continuous Optimization with Multiple Constraints

Sachin Kumar, Eric Malmi, Aliaksei Severyn, Yulia Tsvetkov

Keywords Paper

optimization

0

0

0

0

14:02

04/07/2020

Neural CRF Model for Sentence Alignment in Text Simplification

Chao Jiang, Mounica Maddela, Wuwei Lan and
Yang Zhong, Wei Xu

Keywords Paper

Sentence Alignment, Text Simplification, monolingual task, automatic evaluation

0

0

0

1

11:55

04/07/2020

Enabling Language Models to Fill in the Blanks

Chris Donahue, Mina Lee, Percy Liang

Keywords Paper

text infilling, predicting text, writing tools, language modeling

0

0

0

0

7:01

02/02/2021

Contextualized Rewriting for Text Summarization

Guangsheng Bao, Yue Zhang

Keywords Paper

0

0

0

0

17:38

04/07/2020

Conditional Augmentation for Aspect Term Extraction via Masked Sequence-to-Sequence Generation

Kun Li, Chengbo Chen, Xiaojun Quan and
Qing Ling, Yan Song

Keywords Paper

Conditional Augmentation, Aspect Extraction, sentiment analysis, data augmentation

0

0

0

0

11:30

02/02/2021

Fake it Till You Make it: Self-Supervised Semantic Shifts for Monolingual Word Embedding Tasks

Maurício Gruppi, Pin-Yu Chen, Sibel Adali

Keywords Paper

0

0

0

0

19:35

04/07/2020

Multimodal Transformer for Multimodal Machine Translation

Shaowei Yao, Xiaojun Wan

Keywords Paper

Multimodal MMT, Multimodal, MMT, representation images

1

0

0

0

5:11

02/02/2021

Object Relation Attention for Image Paragraph Captioning

Li-Chuan Yang, Chih-Yuan Yang, Jane Yung-jen Hsu

Keywords Paper

0

0

0

0

15:03

02/02/2021

Simple is not Easy: A Simple Strong Baseline for TextVQA and TextCaps

Qi Zhu, Chenyu Gao, Peng Wang, Qi Wu

Keywords Paper

0

0

0

0

15:58

16/11/2020

Multilingual AMR-to-Text Generation

Angela Fan, Claire Gardent

Keywords Paper

multilingual generation, cross-lingual embeddings, pretraining, multilingual models

0

0

0

0

12:06

04/07/2020

MMPE: A Multi-Modal Interface using Handwriting, Touch Reordering, and Speech Commands for Post-Editing Machine Translation

Nico Herbig, Santanu Pal, Tim Düwel and
Kalliopi Meladaki, Mahsa Monshizadeh, Vladislav Hnatovskiy, Antonio Krüger, Josef van Genabith

Keywords Paper

Post-Editing Translation, Post-Editing , translation, PE MT

0

0

0

0

11:52

01/07/2020

Supertagging with CCG primitives

Aditya Bhargava, Gerald Penn

Keywords Paper

0

0

0

0

5:00

03/05/2021

A Mathematical Exploration of Why Language Models Help Solve Downstream Tasks

Nikunj Saunshi, Sadhika Malladi, Sanjeev Arora

Keywords Paper

representation learning, self-supervised learning, language models, theory, transfer learning, natural language processing, unsupervised learning

0

0

0

0

5:16

04/07/2020

Improving Adversarial Text Generation by Modeling the Distant Future

Ruiyi Zhang, Changyou Chen, Zhe Gan and
Wenlin Wang, Dinghan Shen, Guoyin Wang, Zheng Wen, Lawrence Carin

Keywords Paper

Adversarial Generation, long generation, next-word prediction, generator optimization

0

0

0

0

10:32

14/06/2020

Cascade EF-GAN: Progressive Facial Expression Editing With Local Focuses

Rongliang Wu, Gongjie Zhang, Shijian Lu, Tao Chen

Keywords Paper

gan, facial expression editing, image synthesis

0

0

0

0

5:01

16/11/2020

Generationary or “How We Went beyond Word Sense Inventories and Learned to Gloss”

Michele Bevilacqua, Marco Maru, Roberto Navigli

Keywords Paper

generative modeling, definition modeling, discriminative tasks, word disambiguation

0

0

0

0

11:49

25/07/2020

Attending to inter-sentential features in neural text classification

Billy Chiu, Sunil Kumar Sahu, Neha Sengupta and
Derek Thomas, Mohammady Mahdy

Keywords Paper

graph network, hybrid neural network, attention mechanism

0

0

0

0

6:41

08/12/2020

AutoMeTS: The Autocomplete for Medical Text Simplification

Hoang Van, David Kauchak, Gondy Leroy

Keywords Paper

0

0

0

0

13:29

16/11/2020

MODE-LSTM: A Parameter-efficient Recurrent Network with Multi-Scale for Sentence Classification

Qianli Ma, Zhenxi Lin, Jiangyue Yan and
Zipeng Chen, Liuhong Yu

Keywords Paper

sentence classification, extracting features, generalization, cnn models

0

0

0

0

10:35

12/07/2020

How recurrent networks implement contextual processing in sentiment analysis

Niru Maheswaranathan, David Sussillo

Keywords Paper

Applications - Language, Speech and Dialog

0

0

0

0

14:01

08/12/2020

What Can We Learn from Noun Substitutions in Revision Histories?

Talita Anthonio, Michael Roth

Keywords Paper

0

0

0

0

15:00

08/12/2020

Corpus-based Identification of Verbs Participating in Verb Alternations Using Classification and Manual Annotation

Esther Seyffarth, Laura Kallmeyer

Keywords Paper

0

0

0

0

14:57

06/12/2021

Refining Language Models with Compositional Explanations

Huihan Yao, Ying Chen, Qinyuan Ye and
Xisen Jin, Xiang Ren

Keywords Paper

machine learning, fairness, language

0

0

0

0

13:17

16/11/2020

Do Explicit Alignments Robustly Improve Multilingual Encoders?

Shijie Wu, Mark Dredze

Keywords Paper

multilingual, unsupervised encoders, cross-lingual representation, contrastive objective

0

0

0

0

7:14

26/04/2020

Residual Energy-Based Models for Text Generation

Yuntian Deng, Anton Bakhtin, Myle Ott and
Arthur Szlam, Marc'Aurelio Ranzato

Keywords Paper

energy-based models, text generation

0

0

0

0

4:59