Cross-Linguistic Syntactic Evaluation of Word Prediction Models

Abstract: A range of studies have concluded that neural word prediction models can distinguish grammatical from ungrammatical sentences with high accuracy. However, these studies are based primarily on monolingual evidence from English. To investigate how these models' ability to learn syntax varies by language, we introduce CLAMS (Cross-Linguistic Assessment of Models on Syntax), a syntactic evaluation suite for monolingual and multilingual models. CLAMS includes subject-verb agreement challenge sets for English, French, German, Hebrew and Russian, generated from grammars we develop. We use CLAMS to evaluate LSTM language models as well as monolingual and multilingual BERT. Across languages, monolingual LSTMs achieved high accuracy on dependencies without attractors, and generally poor accuracy on agreement across object relative clauses. On other constructions, agreement accuracy was generally higher in languages with richer morphology. Multilingual models generally underperformed monolingual models. Multilingual BERT showed high syntactic accuracy on English, but noticeable deficiencies in other languages.

04/07/2020

Cross-Linguistic Syntactic Evaluation of Word Prediction Models

Aaron Mueller, Garrett Nicolai, Panayiota Petrou-Zeniou, Natalia Talmina, Tal Linzen

Comments

Similar Papers

Probing for Referential Information in Language Models

Ionut-Teodor Sorodoc, Kristina Gulordava, Gemma Boleda

Keywords Abstract Paper

Probing, probe tasks, Language Models, LSTM architectures

Encodings of Source Syntax: Similarities in NMT Representations Across Target Languages

Tyler A. Chang, Anna Rafferty

Keywords Abstract Paper

LINSPECTOR: Multilingual Probing Tasks for Word Representations

Gözde Gül Sahin, Clara Vania, Ilia Kuznetsov, Iryna Gurevych

Keywords Abstract Paper

Word Representations, NLP, classification tasks, probing tasks

X-FACTR: Multilingual Factual Knowledge Retrieval from Pretrained Language Models

Zhengbao Jiang, Antonios Anastasopoulos, Jun Araki and Haibo Ding, Graham Neubig

Keywords Abstract Paper

factual retrieval, language models, lms, probing methods

Understanding the effects of word-level linguistic annotations in under-resourced neural machine translation

Víctor M. Sánchez-Cartagena, Juan Antonio Pérez-Ortiz, Felipe Sánchez-Martínez

Keywords Abstract Paper

Recurrent Neural Network Language Models Always Learn English-Like Relative Clause Attachment

Forrest Davis, Marten van Schijndel

Keywords Abstract Paper

production, Recurrent Always, language models, RNN LMs

Have We Solved The Hard Problem? It’s Not Easy! Contextual Lexical Contrast as a Means to Probe Neural Coherence

Wenqiang Lei, Yisong Miao, Runpeng Xie and Bonnie Webber, Meichun Liu, Tat-Seng Chua, Nancy F. Chen

Keywords Abstract Paper

BLiMP: The Benchmark of Linguistic Minimal Pairs for English

Alex Warstadt, Alicia Parrish, Haokun Liu and Anhad Monananey, Wei Peng, Sheng-Fu Wang, Samuel Bowman

Keywords Abstract Paper

linguistic, blimp, lms, linguist-crafted templates

Zero-Shot Crosslingual Sentence Simplification

Jonathan Mallinson, Rico Sennrich, Mirella Lapata

Keywords Abstract Paper

sentence simplification, translation, simplification, encoder-decoder models

SpanAlign: Sentence Alignment Method based on Cross-Language Span Prediction and ILP

Katsuki Chousa, Masaaki Nagata, Masaaki Nishino

Keywords Abstract Paper

Compositional and Lexical Semantics in RoBERTa, BERT and DistilBERT: A Case Study on CoQA

Ieva Staliūnaitė, Ignacio Iacobacci

Keywords Abstract Paper

nlp tasks, conversational task, semantic labeling, contextualized embeddings

Specializing Unsupervised Pretraining Models for Word-Level Semantic Similarity

Anne Lauscher, Ivan Vulić, Edoardo Maria Ponti and Anna Korhonen, Goran Glavaš

Keywords Abstract Paper

Automatic Learning of Modality Exclusivity Norms with Crosslingual Word Embeddings

Emmanuele Chersoni, Rong Xiang, Qin Lu, Chu-Ren Huang

Keywords Abstract Paper

Text Classification by Contrastive Learning and Cross-lingual Data Augmentation for Alzheimer’s Disease Detection

Zhiqiang Guo, Zhaoci Liu, Zhenhua Ling and Shijin Wang, Lingjing Jin, Yunxia Li

Keywords Abstract Paper

X-SRL: A Parallel Cross-Lingual Semantic Role Labeling Dataset

Angel Daza, Anette Frank

Keywords Abstract Paper

generalization learning, multilingual learning, high-quality translation, srl

Visually Grounded Compound PCFGs

Yanpeng Zhao, Ivan Titov

Keywords Abstract Paper

exploiting groundings, language understanding, gradient estimates, fully-differentiable learning

Word Frequency Does Not Predict Grammatical Knowledge in Language Models

Charles Yu, Ryan Sie, Nicolas Tedeschi, Leon Bergen

Keywords Abstract Paper

reflexive anaphora, grammatical tasks, neural models, language models

An exploratory study on multilingual quality estimation

Shuo Sun, Marina Fomicheva, Frédéric Blain and Vishrav Chaudhary, Ahmed El-Kishky, Adithya Renduchintala, Francisco Guzmán, Lucia Specia

Keywords Abstract Paper

Deep subjecthood: Higher-order grammatical features in multilingual BERT

Isabel Papadimitriou, Ethan A. Chi, Richard Futrell, Kyle Mahowald

Keywords Abstract Paper

COGS: A Compositional Generalization Challenge Based on Semantic Interpretation

Najoung Kim, Tal Linzen

Keywords Abstract Paper

compositional generalization, language architectures, cogs, lstms

Commonsense Knowledge Augmentation for Low-Resource Languages via Adversarial Learning

Bosung Kim, Juae Kim, Youngjoong Ko, Jungyun Seo

Keywords Abstract Paper

A Sentiment-annotated Dataset of English Causal Connectives

Marta Andersson, Murathan Kurfalı, Robert Östling

Keywords Paper

Keywords Paper

Keywords Paper

Zhengbao Jiang, Antonios Anastasopoulos, Jun Araki and
Haibo Ding, Graham Neubig

Keywords Paper

Keywords Paper

Keywords Paper

Wenqiang Lei, Yisong Miao, Runpeng Xie and
Bonnie Webber, Meichun Liu, Tat-Seng Chua, Nancy F. Chen

Keywords Paper

Alex Warstadt, Alicia Parrish, Haokun Liu and
Anhad Monananey, Wei Peng, Sheng-Fu Wang, Samuel Bowman

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Anne Lauscher, Ivan Vulić, Edoardo Maria Ponti and
Anna Korhonen, Goran Glavaš

Keywords Paper

Keywords Paper

Zhiqiang Guo, Zhaoci Liu, Zhenhua Ling and
Shijin Wang, Lingjing Jin, Yunxia Li

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Shuo Sun, Marina Fomicheva, Frédéric Blain and
Vishrav Chaudhary, Ahmed El-Kishky, Adithya Renduchintala, Francisco Guzmán, Lucia Specia

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Adhiguna Kuncoro, Lingpeng Kong, Daniel Fried and
Dani Yogatama, Laura Rimell, Chris Dyer, Phil Blunsom

Keywords Paper

Keywords Paper

Bohan Li, Hao Zhou, Junxian He and
Mingxuan Wang, Yiming Yang, Lei Li

Keywords Paper

Marius Mosbach, Stefania Degaetano-Ortlieb, Marie-Pauline Krielke and
Badr M. Abdullah, Dietrich Klakow

Keywords Paper

Keywords Paper

Matthias Sperber, Hendra Setiawan, Christian Gollan and
Udhay Nallasamy, Matthias Paulik

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Xiangpeng Wei, Rongxiang Weng, Yue Hu and
Luxi Xing, Heng Yu, Weihua Luo

Keywords Paper

Keywords Paper