Getting the ##life out of living: How Adequate Are Word-Pieces for Modelling Complex Morphology?

Abstract: This work investigates the most basic units that underlie contextualized word embeddings, such as BERT — the so-called word pieces. In Morphologically-Rich Languages (MRLs) which exhibit morphological fusion and non-concatenative morphology, the different units of meaning within a word may be fused, intertwined, and cannot be separated linearly. Therefore, when using word-pieces in MRLs, we must consider that: (1) a linear segmentation into sub-word units might not capture the full morphological complexity of words; and (2) representations that leave morphological knowledge on sub-word units inaccessible might negatively affect performance. Here we empirically examine the capacity of word-pieces to capture morphology by investigating the task of multi-tagging in Modern Hebrew, as a proxy to evaluate the underlying segmentation. Our results show that, while models trained to predict multi-tags for complete words outperform models tuned to predict the distinct tags of WPs, we can improve the WPs tag prediction by purposefully constraining the word-pieces to reflect their internal functions. We suggest that linguistically-informed word-pieces schemes, that make the morphological structure explicit, might boost performance for MRLs.

04/07/2020

Sentiment, Syntax, Probe, BERT, Hyperbolic

5:10

01/07/2020

Getting the ##life out of living: How Adequate Are Word-Pieces for Modelling Complex Morphology?

Stav Klein, Reut Tsarfaty

Comments

Similar Papers

Joint Diacritization, Lemmatization, Normalization, and Fine-Grained Morphological Tagging

Nasser Zalmout, Nizar Habash

Keywords Abstract Paper

Joint features, joint modeling, Lemmatization, Normalization

Have We Solved The Hard Problem? It’s Not Easy! Contextual Lexical Contrast as a Means to Probe Neural Coherence

Wenqiang Lei, Yisong Miao, Runpeng Xie and Bonnie Webber, Meichun Liu, Tat-Seng Chua, Nancy F. Chen

Keywords Abstract Paper

Why is penguin more similar to polar bear than to sea gull? Analyzing conceptual knowledge in distributional models

Pia Sommerauer

Keywords Abstract Paper

word ing, distributional models, BERT, ELMO

On Position Embeddings in BERT

Wang Benyou, Lifeng Shang, Christina Lioma and Xin Jiang, Hao Yang, Qun Liu, Jakob Simonsen

Keywords Abstract Paper

pretrained language model., Position Embedding, BERT

Probing BERT in Hyperbolic Spaces

Boli Chen, Yao Fu, Guangwei Xu and Pengjun Xie, Chuanqi Tan, Mosha Chen, Liping Jing

Keywords Abstract Paper

Sentiment, Syntax, Probe, BERT, Hyperbolic

Contextual and Non-Contextual Word Embeddings: an in-depth Linguistic Investigation

Alessio Miaschi, Felice Dell’Orletta

Keywords Abstract Paper

Detecting Fine-Grained Cross-Lingual Semantic Divergences without Supervision by Learning to Rank

Eleftheria Briakou, Marine Carpuat

Keywords Abstract Paper

detecting content, cross-lingual nlp, machine problem, annotation

Specializing Unsupervised Pretraining Models for Word-Level Semantic Similarity

Anne Lauscher, Ivan Vulić, Edoardo Maria Ponti and Anna Korhonen, Goran Glavaš

Keywords Abstract Paper

An Unsupervised Sentence Embedding Method by Mutual Information Maximization

Yan Zhang, Ruidan He, Zuozhu Liu and Kwan Hui Lim, Lidong Bing

Keywords Abstract Paper

sentence-pair tasks, clustering, semantic search, downstream tasks

Named entity recognition in multi-level contexts

Yubo Chen, Chuhan Wu, Tao Qi and Zhigang Yuan, Yongfeng Huang

Keywords Abstract Paper

Exploiting Semantic Relations for Fine-grained Entity Typing

Hongliang Dai, Yangqiu Song, Xin Li

Keywords Abstract Paper

Fine-grained Entity Typing, Hypernym Extraction, Semantic Role Labeling

Spying on Your Neighbors: Fine-grained Probing of Contextual Embeddings for Information about Surrounding Words

Josef Klafka, Allyson Ettinger

Keywords Abstract Paper

Fine-grained Embeddings, NLP tasks, probing tasks, encoding information

BERTScore: Evaluating Text Generation with BERT

Tianyi Zhang*, Varsha Kishore*, Felix Wu* and Kilian Q. Weinberger, Yoav Artzi

Keywords Abstract Paper

Metric, Evaluation, Contextual Embedding, Text Generation

Understanding the effects of word-level linguistic annotations in under-resourced neural machine translation

Víctor M. Sánchez-Cartagena, Juan Antonio Pérez-Ortiz, Felipe Sánchez-Martínez

Keywords Abstract Paper

Fake it Till You Make it: Self-Supervised Semantic Shifts for Monolingual Word Embedding Tasks

Maurício Gruppi, Pin-Yu Chen, Sibel Adali

Keywords Abstract Paper

A Multitask Learning Approach for Diacritic Restoration

Sawsan Alqahtani, Ajay Mishra, Mona Diab

Keywords Abstract Paper

Diacritic Restoration, computational processing, restoring diacritics, NLP problems

Multilingual Alignment of Contextual Word Representations

Steven Cao, Nikita Kitaev, Dan Klein

Keywords Abstract Paper

multilingual, natural language processing, embedding alignment, BERT, word embeddings, transfer

“talk to me with left, right, and angles”: Lexical entrainment in spoken Hebrew dialogue

Andreas Weise, Vered Silber-Varod, Anat Lerner and Julia Hirschberg, Rivka Levitan

Keywords Abstract Paper

A Latent Morphology Model for Open-Vocabulary Neural Machine Translation

Duygu Ataman, Wilker Aziz, Alexandra Birch

Keywords Abstract Paper

neural machine translation, low-resource languages, latent-variable models

A Bilingual Generative Transformer for Semantic Sentence Embedding

John Wieting, Graham Neubig, Taylor Berg-Kirkpatrick

Keywords Abstract Paper

source separation, semantic encoding, data distributions, unsupervised evaluations

PolyLM: Learning about polysemy through language modeling

Alan Ansell, Felipe Bravo-Marquez, Bernhard Pfahringer

Keywords Abstract Paper

Keywords Paper

Wenqiang Lei, Yisong Miao, Runpeng Xie and
Bonnie Webber, Meichun Liu, Tat-Seng Chua, Nancy F. Chen

Keywords Paper

Keywords Paper

Wang Benyou, Lifeng Shang, Christina Lioma and
Xin Jiang, Hao Yang, Qun Liu, Jakob Simonsen

Keywords Paper

Boli Chen, Yao Fu, Guangwei Xu and
Pengjun Xie, Chuanqi Tan, Mosha Chen, Liping Jing

Keywords Paper

Keywords Paper

Keywords Paper

Anne Lauscher, Ivan Vulić, Edoardo Maria Ponti and
Anna Korhonen, Goran Glavaš

Keywords Paper

Yan Zhang, Ruidan He, Zuozhu Liu and
Kwan Hui Lim, Lidong Bing

Keywords Paper

Yubo Chen, Chuhan Wu, Tao Qi and
Zhigang Yuan, Yongfeng Huang

Keywords Paper

Keywords Paper

Keywords Paper

Tianyi Zhang, Varsha Kishore, Felix Wu* and
Kilian Q. Weinberger, Yoav Artzi

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Andreas Weise, Vered Silber-Varod, Anat Lerner and
Julia Hirschberg, Rivka Levitan

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Aaron Mueller, Garrett Nicolai, Panayiota Petrou-Zeniou and
Natalia Talmina, Tal Linzen

Keywords Paper

Na Li, Zied Bouraoui, Jose Camacho-Collados and
Luis Espinosa-Anke, Qing Gu, Steven Schockaert

Keywords Paper

Keywords Paper

Xinwei Geng, Longyue Wang, Xing Wang and
Bing Qin, Ting Liu, Zhaopeng Tu

Keywords Paper

Seung Jun Moon, Sangwoo Mo, Kimin Lee and
Jaeho Lee, Jinwoo Shin

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Alex Warstadt, Alicia Parrish, Haokun Liu and
Anhad Monananey, Wei Peng, Sheng-Fu Wang, Samuel Bowman

Keywords Paper

Zihan Liu, Genta I Winata, Samuel Cahyawijaya and
Andrea Madotto, Zhaojiang Lin, Pascale Fung

Keywords Paper

Ivan Vulić, Edoardo Maria Ponti, Robert Litschko and
Goran Glavaš, Anna Korhonen

Keywords Paper

Keywords Paper

Keywords Paper