A Deep Generative Approach to Native Language Identification

Abstract: Native language identification (NLI) – identifying the native language (L1) of a person based on his/her writing in the second language (L2) – is useful for a variety of purposes, including marketing, security, and educational applications. From a traditional machine learning perspective,NLI is usually framed as a multi-class classification task, where numerous designed features are combined in order to achieve the state-of-the-art results. We introduce a deep generative language modelling (LM) approach to NLI, which consists in fine-tuning a GPT-2 model separately on texts written by the authors with the same L1, and assigning a label to an unseen text based on the minimum LM loss with respect to one of these fine-tuned GPT-2 models. Our method outperforms traditional machine learning approaches and currently achieves the best results on the benchmark NLI datasets.

02/02/2021

Ankit Arun, Soumya Batra, Vikas Bhardwaj and
Ashwini Challa, Pinar Donmez, Peyman Heidari, Hakan Inan, Shashank Jain, Anuj Kumar, Shawn Mei, Karthik Mohan, Michael White

A Deep Generative Approach to Native Language Identification

Ehsan Lotfi, Ilia Markov, Walter Daelemans

Comments

Similar Papers

FILTER: An Enhanced Fusion Method for Cross-lingual Language Understanding

Yuwei Fang, Shuohang Wang, Zhe Gan and Siqi Sun, Jingjing Liu

Keywords Abstract Paper

Best Practices for Data-Efficient Modeling in NLG:How to Train Production-Ready Neural Models with Less Data

Ankit Arun, Soumya Batra, Vikas Bhardwaj and Ashwini Challa, Pinar Donmez, Peyman Heidari, Hakan Inan, Shashank Jain, Anuj Kumar, Shawn Mei, Karthik Mohan, Michael White

Keywords Abstract Paper

Surface Realization Using Pretrained Language Models

Farhood Farahnak, Laya Rafiee, Leila Kosseim, Thomas Fevens

Keywords Abstract Paper

DeepMet: A Reading Comprehension Paradigm for Token-level Metaphor Detection

Chuandong Su, Fumiyo Fukumoto, Xiaoxi Huang and Jiyi Li, Rongbo Wang, Zhiqun Chen

Keywords Abstract Paper

A Probabilistic Formulation of Unsupervised Text Style Transfer

Junxian He, Xinyi Wang, Graham Neubig, Taylor Berg-Kirkpatrick

Keywords Abstract Paper

unsupervised text style transfer, deep latent sequence model

A Deep Metric Learning Method for Biomedical Passage Retrieval

Andrés Rosso-Mateus, Fabio A. González, Manuel Montes-y-Gómez

Keywords Abstract Paper

A Latent Morphology Model for Open-Vocabulary Neural Machine Translation

Duygu Ataman, Wilker Aziz, Alexandra Birch

Keywords Abstract Paper

neural machine translation, low-resource languages, latent-variable models

LAReQA: Language-Agnostic Answer Retrieval from a Multilingual Pool

Uma Roy, Noah Constant, Rami Al-Rfou and Aditya Barua, Aaron Phillips, Yinfei Yang

Keywords Abstract Paper

language-agnostic retrieval, cross-lingual tasks, cross-lingual retrieval, alignment

Automatic Word Association Norms (AWAN)

Jorge Reyes-Magaña, Gerardo Sierra Martínez, Gemma Bel-Enguix, Helena Gomez-Adorno

Keywords Abstract Paper

Native-like Expression Identification by Contrasting Native and Proficient Second Language Speakers

Oleksandr Harust, Yugo Murawaki, Sadao Kurohashi

Keywords Abstract Paper

MAD-X: An Adapter-Based Framework for Multi-Task Cross-Lingual Transfer

Jonas Pfeiffer, Ivan Vulić, Iryna Gurevych, Sebastian Ruder

Keywords Abstract Paper

transfer, pre-training, cross transfer, named recognition

Enabling Language Models to Fill in the Blanks

Chris Donahue, Mina Lee, Percy Liang

Keywords Abstract Paper

text infilling, predicting text, writing tools, language modeling

BARTScore: Evaluating Generated Text as Text Generation

Weizhe Yuan, Graham Neubig, Pengfei Liu

Keywords Abstract Paper

OrigamiNet: Weakly-Supervised, Segmentation-Free, One-Step, Full Page Text Recognition by learning to unfold

Mohamed Yousef, Tom E. Bishop

Keywords Abstract Paper

text recognition, weakly supervised, handwriting recognition, convolutional neural network fully convolutional, ctc

Multilingual Transfer Learning for QA using Translation as Data Augmentation

Mihaela Bornea, Lin Pan, Sara Rosenthal and Radu Florian, Avirup Sil

Keywords Abstract Paper

Simulated multiple reference training improves low-resource machine translation

Huda Khayrallah, Brian Thompson, Matt Post, Philipp Koehn

Keywords Abstract Paper

machine mt, mt, simulated training, simulated

Cross-Lingual Unsupervised Sentiment Classification with Multi-View Transfer Learning

Hongliang Fei, Ping Li

Keywords Abstract Paper

Cross-Lingual Classification, sentiment classification, unsupervised system, classification

Language models for lexical inference in context

Martin Schmitt, Hinrich Schütze

Keywords Abstract Paper

On Learning Universal Representations Across Languages

Xiangpeng Wei, Rongxiang Weng, Yue Hu and Luxi Xing, Heng Yu, Weihua Luo

Keywords Abstract Paper

hierarchical contrastive learning, cross-lingual pretraining, universal representation learning

Masked Language Model Scoring

Julian Salazar, Davis Liang, Toan Q. Nguyen, Katrin Kirchhoff

Keywords Abstract Paper

Masked Scoring, NLP tasks, domain adaptation, language scoring

Coding Textual Inputs Boosts the Accuracy of Neural Networks

Abdul Rafae Khan, Jia Xu, Weiwei Sun

Keywords Abstract Paper

natural tasks, nlp, neural-network-based systems, machine translation

From Seq2Seq Recognition to Handwritten Word Embeddings

George Retsinas, Giorgos Sfikas, Christophoros Nikou, Petros Maragos

Yuwei Fang, Shuohang Wang, Zhe Gan and
Siqi Sun, Jingjing Liu

Keywords Paper

Ankit Arun, Soumya Batra, Vikas Bhardwaj and
Ashwini Challa, Pinar Donmez, Peyman Heidari, Hakan Inan, Shashank Jain, Anuj Kumar, Shawn Mei, Karthik Mohan, Michael White

Keywords Paper

Keywords Paper

Chuandong Su, Fumiyo Fukumoto, Xiaoxi Huang and
Jiyi Li, Rongbo Wang, Zhiqun Chen

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Uma Roy, Noah Constant, Rami Al-Rfou and
Aditya Barua, Aaron Phillips, Yinfei Yang

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Mihaela Bornea, Lin Pan, Sara Rosenthal and
Radu Florian, Avirup Sil

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Xiangpeng Wei, Rongxiang Weng, Yue Hu and
Luxi Xing, Heng Yu, Weihua Luo

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Bosheng Ding, Linlin Liu, Lidong Bing and
Canasai Kruengkrai, Thien Hai Nguyen, Shafiq Joty, Luo Si, Chunyan Miao

Keywords Paper

Zhengbao Jiang, Antonios Anastasopoulos, Jun Araki and
Haibo Ding, Graham Neubig

Keywords Paper

Keywords Paper

Jingjing Li, Zichao Li, Lili Mou and
Xin Jiang, Michael Lyu, Irwin King

Keywords Paper

Fei Yuan, Linjun Shou, Xuanyu Bai and
Ming Gong, Yaobo Liang, Nan Duan, Yan Fu, Daxin Jiang

Keywords Paper

Pengyu Cheng, Martin Renqiang Min, Dinghan Shen and
Christopher Malon, Yizhe Zhang, Yitong Li, Lawrence Carin

Keywords Paper

Keywords Paper

Shuo Sun, Marina Fomicheva, Frédéric Blain and
Vishrav Chaudhary, Ahmed El-Kishky, Adithya Renduchintala, Francisco Guzmán, Lucia Specia

Keywords Paper

Jinghui Qin, Lihui Lin, Xiaodan Liang and
Rumin Zhang, Liang Lin

Keywords Paper

Keywords Paper