XL-WiC: A Multilingual Benchmark for Evaluating Semantic Contextualization

Abstract: The ability to correctly model distinct meanings of a word is crucial for the effectiveness of semantic representation techniques. However, most existing evaluation benchmarks for assessing this criterion are tied to sense inventories (usually WordNet), restricting their usage to a small subset of knowledge-based representation techniques. The Word-in-Context dataset (WiC) addresses the dependence on sense inventories by reformulating the standard disambiguation task as a binary classification problem; but, it is limited to the English language. We put forward a large multilingual benchmark, XL-WiC, featuring gold standards in 12 new languages from varied language families and with different degrees of resource availability, opening room for evaluation scenarios such as zero-shot cross-lingual transfer. We perform a series of experiments to determine the reliability of the datasets and to set performance baselines for several recent contextualized multilingual models. Experimental results show that even when no tagged instances are available for a target language, models trained solely on the English data can attain competitive performance in the task of distinguishing different meanings of a word, even for distant languages. XL-WiC is available at https://pilehvar.github.io/xlwic/.

19/04/2021

XL-WiC: A Multilingual Benchmark for Evaluating Semantic Contextualization

Alessandro Raganato, Tommaso Pasini, Jose Camacho-Collados, Mohammad Taher Pilehvar

Comments

Similar Papers

Framing word sense disambiguation as a multi-label problem for model-agnostic knowledge integration

Simone Conia, Roberto Navigli

Keywords Abstract Paper

With More Contexts Comes Better Performance: Contextualized Sense Embeddings for All-Round Word Sense Disambiguation

Bianca Scarlini, Tommaso Pasini, Roberto Navigli

Keywords Abstract Paper

natural processing, english task, word-in-context task, contextualized embeddings

Specializing Unsupervised Pretraining Models for Word-Level Semantic Similarity

Anne Lauscher, Ivan Vulić, Edoardo Maria Ponti and Anna Korhonen, Goran Glavaš

Keywords Abstract Paper

Generating Senses and RoLes: An End-to-End Model for Dependency- and Span-based Semantic Role Labeling

Rexhina Blloshmi, Simone Conia, Rocco Tripodi, Roberto Navigli

Keywords Abstract Paper

Natural Language Processing, Natural Language Semantics, Natural Language Generation, Natural Language Processing

FILTER: An Enhanced Fusion Method for Cross-lingual Language Understanding

Yuwei Fang, Shuohang Wang, Zhe Gan and Siqi Sun, Jingjing Liu

Keywords Abstract Paper

Transferring Monolingual Model to Low-Resource Language: The Case of Tigrinya

Abrhalei Frezghi Tela, Abraham Woubie Zewoudie, Ville Hautamäki

Keywords Abstract Paper

natural tasks, NLP, downstream task, pre-training

Cross-Lingual Unsupervised Sentiment Classification with Multi-View Transfer Learning

Hongliang Fei, Ping Li

Keywords Abstract Paper

Cross-Lingual Classification, sentiment classification, unsupervised system, classification

Semantics Altering Modifications for Evaluating Comprehension in Machine Reading

Viktor Schlegel, Goran Nenadic, Riza Batista-Navarro

Keywords Abstract Paper

Correlation-Guided Representation for Multi-Label Text Classification

Qian-Wen Zhang, Ximing Zhang, Zhao Yan and Ruifang Liu, Yunbo Cao, Min-Ling Zhang

Keywords Abstract Paper

Machine Learning, Multi-instance; Multi-label; Multi-view learning, Classification, Text Classification

On Learning Universal Representations Across Languages

Xiangpeng Wei, Rongxiang Weng, Yue Hu and Luxi Xing, Heng Yu, Weihua Luo

Keywords Abstract Paper

hierarchical contrastive learning, cross-lingual pretraining, universal representation learning

Multilingual Transfer Learning for QA using Translation as Data Augmentation

Mihaela Bornea, Lin Pan, Sara Rosenthal and Radu Florian, Avirup Sil

Keywords Abstract Paper

Audio-Oriented Multimodal Machine Comprehension via Dynamic Inter- and Intra-modality Attention

Zhiqi Huang, Fenglin Liu, Xian Wu and Shen Ge, Helin Wang, Wei Fan, Yuexian Zou

Keywords Abstract Paper

Generationary or “How We Went beyond Word Sense Inventories and Learned to Gloss”

Michele Bevilacqua, Marco Maru, Roberto Navigli

Keywords Abstract Paper

generative modeling, definition modeling, discriminative tasks, word disambiguation

Exemplification Modeling: Can You Give Me an Example, Please?

Edoardo Barba, Luigi Procopio, Caterina Lacerra and Tommaso Pasini, Roberto Navigli

Keywords Abstract Paper

Natural Language Processing, Natural Language Semantics, Resources and Evaluation

A Bilingual Generative Transformer for Semantic Sentence Embedding

John Wieting, Graham Neubig, Taylor Berg-Kirkpatrick

Keywords Abstract Paper

source separation, semantic encoding, data distributions, unsupervised evaluations

Generating Dialogue Responses from a Semantic Latent Space

Wei-Jen Ko, Avik Ray, Yilin Shen, Hongxia Jin

Keywords Abstract Paper

generation responses, regression task, open-domain models, end-to-end classification

Masked Language Model Scoring

Julian Salazar, Davis Liang, Toan Q. Nguyen, Katrin Kirchhoff

Keywords Abstract Paper

Masked Scoring, NLP tasks, domain adaptation, language scoring

SentiX: A Sentiment-Aware Pre-Trained Model for Cross-Domain Sentiment Analysis

Jie Zhou, Junfeng Tian, Rui Wang and Yuanbin Wu, Wenming Xiao, Liang He

Keywords Abstract Paper

Cross-Lingual Ability of Multilingual BERT: An Empirical Study

Karthikeyan K, Zihan Wang, Stephen Mayhew, Dan Roth

Keywords Abstract Paper

Cross-Lingual Learning, Multilingual BERT

A Semantically Consistent and Syntactically Variational Encoder-Decoder Framework for Paraphrase Generation

Wenqing Chen, Jidong Tian, Liqiang Xiao and Hao He, Yaohui Jin

Keywords Abstract Paper

Automatic Learning of Modality Exclusivity Norms with Crosslingual Word Embeddings

Emmanuele Chersoni, Rong Xiang, Qin Lu, Chu-Ren Huang

Keywords Abstract Paper

On the Sentence Embeddings from Pre-trained Language Models

Keywords Paper

Keywords Paper

Anne Lauscher, Ivan Vulić, Edoardo Maria Ponti and
Anna Korhonen, Goran Glavaš

Keywords Paper

Keywords Paper

Yuwei Fang, Shuohang Wang, Zhe Gan and
Siqi Sun, Jingjing Liu

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Qian-Wen Zhang, Ximing Zhang, Zhao Yan and
Ruifang Liu, Yunbo Cao, Min-Ling Zhang

Keywords Paper

Xiangpeng Wei, Rongxiang Weng, Yue Hu and
Luxi Xing, Heng Yu, Weihua Luo

Keywords Paper

Mihaela Bornea, Lin Pan, Sara Rosenthal and
Radu Florian, Avirup Sil

Keywords Paper

Zhiqi Huang, Fenglin Liu, Xian Wu and
Shen Ge, Helin Wang, Wei Fan, Yuexian Zou

Keywords Paper

Keywords Paper

Edoardo Barba, Luigi Procopio, Caterina Lacerra and
Tommaso Pasini, Roberto Navigli

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Jie Zhou, Junfeng Tian, Rui Wang and
Yuanbin Wu, Wenming Xiao, Liang He

Keywords Paper

Keywords Paper

Wenqing Chen, Jidong Tian, Liqiang Xiao and
Hao He, Yaohui Jin

Keywords Paper

Keywords Paper

Bohan Li, Hao Zhou, Junxian He and
Mingxuan Wang, Yiming Yang, Lei Li

Keywords Paper

Keywords Paper

Zhengbao Jiang, Antonios Anastasopoulos, Jun Araki and
Haibo Ding, Graham Neubig

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Haipeng Sun, Rui Wang, Kehai Chen and
Masao Utiyama, Eiichiro Sumita, Tiejun Zhao

Keywords Paper

Wenqiang Lei, Yisong Miao, Runpeng Xie and
Bonnie Webber, Meichun Liu, Tat-Seng Chua, Nancy F. Chen

Keywords Paper

Arjun Akula, Spandana Gella, Yaser Al-Onaizan and
Song-Chun Zhu, Siva Reddy

Keywords Paper

Linyang Li, Ruotian Ma, Qipeng Guo and
Xiangyang Xue, Xipeng Qiu

Keywords Paper

Naoki Otani, Satoru Ozaki, Xingyuan Zhao and
Yucen Li, Micaelah St Johns, Lori Levin

Keywords Paper

Keywords Paper

Wentao Ma, Yiming Cui, Chenglei Si and
Ting Liu, Shijin Wang, Guoping Hu

Keywords Paper

Keywords Paper

Alexandre Tamborrino, Nicola Pellicanò, Baptiste Pannier and
Pascal Voitot, Louise Naudin

Keywords Paper