Grammatical Error Correction in Low Error Density Domains: A New Benchmark and Analyses

16/11/2020

Grammatical Error Correction in Low Error Density Domains: A New Benchmark and Analyses

Simon Flachs, Ophélie Lacroix, Helen Yannakoudakis, Marek Rei, Anders Søgaard

Keywords: gec applications, gec, gec systems, internal model

Abstract Paper Similar Papers

Abstract: Evaluation of grammatical error correction (GEC) systems has primarily focused on essays written by non-native learners of English, which however is only part of the full spectrum of GEC applications. We aim to broaden the target domain of GEC and release CWEB, a new benchmark for GEC consisting of website text generated by English speakers of varying levels of proficiency. Website data is a common and important domain that contains far fewer grammatical errors than learner essays, which we show presents a challenge to state-of-the-art GEC systems. We demonstrate that a factor behind this is the inability of systems to rely on a strong internal language model in low error density domains. We hope this work shall facilitate the development of open-domain GEC models that generalize to different topics and genres.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at EMNLP 2020 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

19/04/2021

PPT: Parsimonious parser transfer for unsupervised cross-lingual adaptation

Kemal Kurniawan, Lea Frermann, Philip Schulz, Trevor Cohn

Keywords Paper

0

0

0

0

11:52

04/07/2020

Learning Spoken Language Representations with Neural Lattice Language Modeling

Chao-Wei Huang, Yun-Nung Chen

Keywords Paper

NLP tasks, spoken tasks, intent detection, Spoken Representations

0

0

0

0

6:39

04/07/2020

Diversifying Dialogue Generation with Non-Conversational Text

Hui Su, Xiaoyu Shen, Sanqiang Zhao and
Zhou Xiao, Pengwei Hu, Randy Zhong, Cheng Niu, Jie Zhou

Keywords Paper

Diversifying Generation, low-diversity problem, open-domain generation, dialogue generation

0

0

0

1

10:53

04/07/2020

LINSPECTOR: Multilingual Probing Tasks for Word Representations

Gözde Gül Sahin, Clara Vania, Ilia Kuznetsov, Iryna Gurevych

Keywords Paper

Word Representations, NLP, classification tasks, probing tasks

0

0

0

0

11:51

16/11/2020

Design Challenges in Low-resource Cross-lingual Entity Linking

Xingyu Fu, Weijia Shi, Xiaodong Yu and
Zian Zhao, Dan Roth

Keywords Paper

cross-lingual linking, cross-lingual, xel, grounding entities

0

0

0

0

11:36

08/12/2020

Less is Better: A cognitively inspired unsupervised model for language segmentation

Jinbiao Yang, Stefan L. Frank, Antal van den Bosch

Keywords Paper

0

0

0

0

10:27

16/11/2020

Zero-Shot Crosslingual Sentence Simplification

Jonathan Mallinson, Rico Sennrich, Mirella Lapata

Keywords Paper

sentence simplification, translation, simplification, encoder-decoder models

0

0

0

0

10:34

02/02/2021

Commonsense Knowledge Augmentation for Low-Resource Languages via Adversarial Learning

Bosung Kim, Juae Kim, Youngjoong Ko, Jungyun Seo

Keywords Paper

0

0

0

0

19:38

04/07/2020

Unsupervised Paraphasia Classification in Aphasic Speech

Sharan Pai, Nikhil Sachdeva, Prince Sachdeva, Rajiv Ratn Shah

Keywords Paper

Unsupervised Classification, speech disorder, naming detection, treatment

0

0

0

0

10:02

26/04/2020

Cross-Lingual Ability of Multilingual BERT: An Empirical Study

Karthikeyan K, Zihan Wang, Stephen Mayhew, Dan Roth

Keywords Paper

Cross-Lingual Learning, Multilingual BERT

0

0

0

0

4:31

19/04/2021

Changing the mind of transformers for topically-controllable language generation

Haw-Shiuan Chang, Jiaming Yuan, Mohit Iyyer, Andrew McCallum

Keywords Paper

0

0

0

0

11:47

08/12/2020

Semi-Supervised Cleansing of Web Argument Corpora

Jonas Dorsch, Henning Wachsmuth

Keywords Paper

0

0

0

0

12:56

06/12/2020

Language Models are Few-Shot Learners

Tom B Brown, Ben Mann, Nick Ryder and
Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen M Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel Ziegler, Jeffrey Wu, Clemens Winter, Chris Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, Dario Amodei

Keywords Paper

0

0

0

0

3:11

16/11/2020

X-FACTR: Multilingual Factual Knowledge Retrieval from Pretrained Language Models

Zhengbao Jiang, Antonios Anastasopoulos, Jun Araki and
Haibo Ding, Graham Neubig

Keywords Paper

factual retrieval, language models, lms, probing methods

0

0

0

0

9:45

16/11/2020

Event Extraction as Machine Reading Comprehension

Jian Liu, Yubo Chen, Kang Liu and
Wei Bi, Xiaojiang Liu

Keywords Paper

event extraction, ee, information task, classification task

0

0

0

0

11:15

18/07/2021

Reasoning Over Virtual Knowledge Bases With Open Predicate Relations

Haitian Sun, Patrick Verga, Bhuwan Dhingra and
Russ Salakhutdinov, William Cohen

Keywords Paper

Applications, Natural Language Processing

0

0

0

0

4:55

16/11/2020

Improving Low Compute Language Modeling with In-Domain Embedding Initialisation

Charles Welch, Rada Mihalcea, Jonathan K. Kummerfeld

Keywords Paper

nlp applications, language model, language research, byte-pair encoding

0

0

0

0

5:12

04/07/2020

BLEURT: Learning Robust Metrics for Text Generation

Thibault Sellam, Dipanjan Das, Ankur Parikh

Keywords Paper

Learning Metrics, Text Generation, WMT task, pre-training scheme

0

0

0

0

11:46

05/12/2020

Mixed-lingual pre-training for cross-lingual summarization

Ruochen Xu, Chenguang Zhu, Yu Shi and
Michael Zeng, Xuedong Huang

Keywords Paper

0

0

0

0

11:49

02/02/2021

SARG: A Novel Semi Autoregressive Generator for Multi-turn Incomplete Utterance Restoration

Mengzuo Huang, Feng Li, Wuhe Zou, Weidong Zhang

Keywords Paper

0

0

0

0

14:50

07/06/2020

Learning Cross-Lingual Word Embeddings from Twitter via Distant Supervision

Jose Camacho-Collados, Yerai Doval Mosquera, Eugenio Martínez-Cámara and
Luis Espinosa-Anke, Francesco Barbieri, Steven Schockaert

Keywords Paper

embedding spaces, embeddings, languages, learning, performance, representations, shared, spaces, texts, twitter, word embeddings, words

0

0

0

0

10:39

16/11/2020

MultiCQA: Zero-Shot Transfer of Self-Supervised Text Matching Models on a Massive Scale

Andreas Rücklé, Jonas Pfeiffer, Iryna Gurevych

Keywords Paper

answer tasks, zero-shot transfer, text models, self-supervised training

0

0

0

0

10:07

02/02/2021

Bridging the Domain Gap: Improve Informal Language Translation via Counterfactual Domain Adaptation

Ke Wang, Guandan Chen, Zhongqiang Huang and
Xiaojun Wan, Fei Huang

Keywords Paper

0

0

0

0

18:24

16/11/2020

Visually Grounded Compound PCFGs

Yanpeng Zhao, Ivan Titov

Keywords Paper

exploiting groundings, language understanding, gradient estimates, fully-differentiable learning

0

0

0

0

12:24

19/08/2021

Knowledge-Aware Dialogue Generation via Hierarchical Infobox Accessing and Infobox-Dialogue Interaction Graph Network

Sixing Wu, Minghui Wang, Dawei Zhang and
Yang Zhou, Ying Li, Zhonghai Wu

Keywords Paper

Natural Language Processing, Dialogue, Natural Language Generation

0

0

0

0

13:50

16/11/2020

XL-AMR: Enabling Cross-Lingual AMR Parsing with Transfer Learning Techniques

Rexhina Blloshmi, Rocco Tripodi, Roberto Navigli

Keywords Paper

encoding semantics, cross-lingual parsing, english parsing, amr

0

0

0

0

11:20

16/11/2020

Experience Grounds Language

Yonatan Bisk, Ari Holtzman, Jesse Thomason and
Jacob Andreas, Yoshua Bengio, Joyce Chai, Mirella Lapata, Angeliki Lazaridou, Jonathan May, Aleksandr Nisnevich, Nicolas Pinto, Joseph Turian

Keywords Paper

language research, tasks, linguistic communication, natural processing

0

0

0

0

11:55

08/12/2020

TableGPT: Few-shot Table-to-Text Generation with Table Structure Reconstruction and Content Matching

Heng Gong, Yawei Sun, Xiaocheng Feng and
Bing Qin, Wei Bi, Xiaojiang Liu, Ting Liu

Keywords Paper

0

0

0

0

8:45

01/07/2020

Linguistic Features for Readability Assessment

Tovly Deutsch, Masoud Jasbi, Stuart Shieber

Keywords Paper

0

0

0

0

12:06

18/07/2021

Fused Acoustic and Text Encoding for Multimodal Bilingual Pretraining and Speech Translation

Renjie Zheng, Junkun Chen, Mingbo Ma, Liang Huang

Keywords Paper

Applications, Natural Language Processing

0

0

0

0

5:19

19/04/2021

Disfluency correction using unsupervised and semi-supervised learning

Nikhil Saini, Drumil Trivedi, Shreya Khare and
Tejas Dhamecha, Preethi Jyothi, Samarth Bharadwaj, Pushpak Bhattacharyya

Keywords Paper

0

0

0

0

7:13

16/11/2020

Simulated multiple reference training improves low-resource machine translation

Huda Khayrallah, Brian Thompson, Matt Post, Philipp Koehn

Keywords Paper

machine mt, mt, simulated training, simulated

0

0

0

0

6:56

05/12/2020

Vocabulary matters: A simple yet effective approach to paragraph-level question generation

Vishwajeet Kumar, Manish Joshi, Ganesh Ramakrishnan, Yuan-Fang Li

Keywords Paper

0

0

0

0

8:36

04/07/2020

Hypernymy Detection for Low-Resource Languages via Meta Learning

Changlong Yu, Jialong Han, Haisong Zhang, Wilfred Ng

Keywords Paper

Hypernymy Detection, lexical entailment, natural tasks, monolingual detection

0

0

0

0

6:53

02/02/2021

Multilingual Transfer Learning for QA using Translation as Data Augmentation

Mihaela Bornea, Lin Pan, Sara Rosenthal and
Radu Florian, Avirup Sil

Keywords Paper

0

0

0

0

15:44

04/07/2020

MuTual: A Dataset for Multi-Turn Dialogue Reasoning

Leyang Cui, Yu Wu, Shujie Liu and
Yue Zhang, Ming Zhou

Keywords Paper

Multi-Turn Reasoning, conversation reasoning, conversation research, reasoning problems

0

0

0

0

14:57

04/07/2020

GPT-too: A Language-Model-First Approach for AMR-to-Text Generation

Manuel Mager, Ramón Fernandez Astudillo, Tahira Naseem and
Md Arafat Sultan, Young-Suk Lee, Radu Florian, Salim Roukos

Keywords Paper

AMR-to-Text Generation, GPT-too, Language-Model-First Approach, AMRs

0

0

0

0

6:59

08/12/2020

Automatic Learning of Modality Exclusivity Norms with Crosslingual Word Embeddings

Emmanuele Chersoni, Rong Xiang, Qin Lu, Chu-Ren Huang

Keywords Paper

0

0

0

0

9:53

04/07/2020

A Comprehensive Analysis of Preprocessing for Word Representation Learning in Affective Tasks

Nastaran Babanejad, Ameeta Agrawal, Aijun An, Manos Papagelis

Keywords Paper

Word Learning, Affective Tasks, sentiment analysis, emotion classification

0

0

0

0

12:53

19/04/2021

Multilingual and cross-lingual document classification: A meta-learning approach

Niels Heijden, Helen Yannakoudakis, Pushkar Mishra, Ekaterina Shutova

Keywords Paper

0

0

0

0

11:51