CluBERT: A Cluster-Based Approach for Learning Sense Distributions in Multiple Languages

04/07/2020

CluBERT: A Cluster-Based Approach for Learning Sense Distributions in Multiple Languages

Tommaso Pasini, Federico Scozzafava, Bianca Scarlini

Keywords: English tasks, disambiguation, multilingual tasks, CluBERT

Abstract Paper Similar Papers

Abstract: Knowing the Most Frequent Sense (MFS) of a word has been proved to help Word Sense Disambiguation (WSD) models significantly. However, the scarcity of sense-annotated data makes it difficult to induce a reliable and high-coverage distribution of the meanings in a language vocabulary. To address this issue, in this paper we present CluBERT, an automatic and multilingual approach for inducing the distributions of word senses from a corpus of raw sentences. Our experiments show that CluBERT learns distributions over English senses that are of higher quality than those extracted by alternative approaches. When used to induce the MFS of a lemma, CluBERT attains state-of-the-art results on the English Word Sense Disambiguation tasks and helps to improve the disambiguation performance of two off-the-shelf WSD models. Moreover, our distributions also prove to be effective in other languages, beating all their alternatives for computing the MFS on the multilingual WSD tasks. We release our sense distributions in five different languages at https://github.com/SapienzaNLP/clubert.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at ACL 2020 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

04/07/2020

Moving Down the Long Tail of Word Sense Disambiguation with Gloss Informed Bi-encoders

Terra Blevins, Luke Zettlemoyer

Keywords Paper

Word Disambiguation, Word WSD, WSD, sense disambiguation

0

0

0

0

11:18

26/04/2020

FreeLB: Enhanced Adversarial Training for Natural Language Understanding

Chen Zhu, Yu Cheng, Zhe Gan and
Siqi Sun, Tom Goldstein, Jingjing Liu

Keywords Paper

0

0

0

0

5:26

19/08/2021

MultiMirror: Neural Cross-lingual Word Alignment for Multilingual Word Sense Disambiguation

Luigi Procopio, Edoardo Barba, Federico Martelli, Roberto Navigli

Keywords Paper

Natural Language Processing, Natural Language Semantics, Resources and Evaluation

0

0

0

0

12:25

16/11/2020

Don't Neglect the Obvious: On the Role of Unambiguous Words in Word Sense Disambiguation

Daniel Loureiro, Jose Camacho-Collados

Keywords Paper

word disambiguation, word, wsd, pre-trained models

0

0

0

0

7:12

19/04/2021

PolyLM: Learning about polysemy through language modeling

Alan Ansell, Felipe Bravo-Marquez, Bernhard Pfahringer

Keywords Paper

0

0

0

0

11:40

06/12/2021

PortaSpeech: Portable and High-Quality Generative Text-to-Speech

Yi Ren, Jinglin Liu, Zhou Zhao

Keywords Paper

generative model

0

0

0

0

10:15

16/11/2020

On Negative Interference in Multilingual Models: Findings and A Meta-Learning Treatment

Zirui Wang, Zachary C. Lipton, Yulia Tsvetkov

Keywords Paper

multilingual models, meta-learning algorithm, multilingual representations, negative interference

0

0

0

0

12:03

03/05/2021

Beyond Categorical Label Representations for Image Classification

Boyuan Chen, Yu Li, Sunand Raghupathi, Hod Lipson

Keywords Paper

Representation Learning, Image Classification, Label Representation

0

0

0

0

3:26

16/11/2020

Sparse Text Generation

Pedro Henrique Martins, Zita Marinho, André F. T. Martins

Keywords Paper

story completion, dialogue generation, text generators, language models

0

0

0

0

11:27

08/12/2020

An analysis of language models for metaphor recognition

Arthur Neidlein, Philip Wiesenbach, Katja Markert

Keywords Paper

0

0

0

0

13:52

02/02/2021

Have We Solved The Hard Problem? It’s Not Easy! Contextual Lexical Contrast as a Means to Probe Neural Coherence

Wenqiang Lei, Yisong Miao, Runpeng Xie and
Bonnie Webber, Meichun Liu, Tat-Seng Chua, Nancy F. Chen

Keywords Paper

0

0

0

0

18:55

03/05/2021

Variational Information Bottleneck for Effective Low-Resource Fine-Tuning

Rabeeh Karimi Mahabadi, Yonatan Belinkov, James Henderson

Keywords Paper

variational information bottleneck, biases, robust, over-fitting, large-scale pre-trained language models, NLP, Transfer learning

0

0

0

0

5:07

04/07/2020

Addressing Posterior Collapse with Mutual Information for Improved Variational Neural Machine Translation

Arya D. McCarthy, Xian Li, Jiatao Gu, Ning Dong

Keywords Paper

Variational Translation, posterior collapse, auxiliary task, uncertainty

0

0

0

0

11:00

16/11/2020

Generationary or “How We Went beyond Word Sense Inventories and Learned to Gloss”

Michele Bevilacqua, Marco Maru, Roberto Navigli

Keywords Paper

generative modeling, definition modeling, discriminative tasks, word disambiguation

0

0

0

0

11:49

08/12/2020

Free the Plural: Unrestricted Split-Antecedent Anaphora Resolution

Juntao Yu, Nafise Sadat Moosavi, Silviu Paun, Massimo Poesio

Keywords Paper

0

0

0

0

14:33

14/06/2020

On Vocabulary Reliance in Scene Text Recognition

Zhaoyi Wan, Jielei Zhang, Liang Zhang and
Jiebo Luo, Cong Yao

Keywords Paper

scene text recognition, text spotting, document analysis, ocr, scene text detection, sequence recognition, language and vision

0

0

0

0

1:00

03/05/2021

Active Contrastive Learning of Audio-Visual Video Representations

Shuang Ma, Zhaoyang Zeng, Daniel McDuff, Yale Song

Keywords Paper

video recognition, audio-visual representation, self-supervised learning, active learning, contrastive representation learning

0

0

0

0

5:22

14/06/2020

More Grounded Image Captioning by Distilling Image-Text Matching Model

Yuanen Zhou, Meng Wang, Daqing Liu and
Zhenzhen Hu, Hanwang Zhang

Keywords Paper

grounded image captioning, image-text matching, visual grounding, cross-task knowledge distillation

0

0

0

0

1:01

04/07/2020

Discrete Latent Variable Representations for Low-Resource Text Classification

Shuning Jin, Sam Wiseman, Karl Stratos, Karen Livescu

Keywords Paper

Low-Resource Classification, Discrete Representations, discrete models, continuous representations

0

0

0

0

11:17

16/11/2020

With More Contexts Comes Better Performance: Contextualized Sense Embeddings for All-Round Word Sense Disambiguation

Bianca Scarlini, Tommaso Pasini, Roberto Navigli

Keywords Paper

natural processing, english task, word-in-context task, contextualized embeddings

0

0

0

0

12:11

04/07/2020

Max-Margin Incremental CCG Parsing

Miloš Stanojević, Mark Steedman

Keywords Paper

Incremental parsing, human processing, ASR, MT

0

0

0

0

11:39

07/09/2020

Learning Effectively from Noisy Supervision for Weakly Supervised Semantic Segmentation

Wenbin Xie, Qiaoqiao Wei, Zheng Li, Hui Zhang

Keywords Paper

Semantic Segmentation, Weakly Supervised Semantic Segmentation, Self Attention

0

0

0

0

3:46

16/11/2020

Pronoun-Targeted Fine-tuning for NMT with Hybrid Losses

Prathyusha Jwalapuram, Shafiq Joty, Youlin Shen

Keywords Paper

pronoun translations, pronoun translation, neural training, backtranslation

0

0

0

0

11:37

04/07/2020

Feature Projection for Improved Text Classification

Qi Qin, Wenpeng Hu, Bing Liu

Keywords Paper

Text Classification, classification, sentiment classification, Bert classification

0

0

0

0

10:57

03/05/2021

FairFil: Contrastive Neural Debiasing Method for Pretrained Text Encoders

Pengyu Cheng, Weituo Hao, Siyang Yuan and
Shijing Si, Lawrence Carin

Keywords Paper

Mutual Information, Pretrained Text Encoders, Contrastive Learning, Fairness

0

0

0

0

4:43

16/11/2020

Mind Your Inflections! Improving NLP for Non-Standard Englishes with Base-Inflection Encoding

Samson Tan, Shafiq Joty, Lav Varshney, Min-Yen Kan

Keywords Paper

comprehension, fine-tuning models, downstream tasks, nlp systems

0

0

0

0

10:22

04/07/2020

Interpreting Pretrained Contextualized Representations via Reductions to Static Embeddings

Rishi Bommasani, Kelly Davis, Claire Cardie

Keywords Paper

Interpreting Representations, downstream applications, static embeddings, Pretrained Representations

0

0

0

0

12:07

16/11/2020

Visually Grounded Compound PCFGs

Yanpeng Zhao, Ivan Titov

Keywords Paper

exploiting groundings, language understanding, gradient estimates, fully-differentiable learning

0

0

0

0

12:24

08/12/2020

Multi-task Learning of Spoken Language Understanding by Integrating N-Best Hypotheses with Hierarchical Attention

Mingda Li, Xinyue Liu, Weitong Ruan and
Luca Soldaini, Wael Hamza, Chengwei Su

Keywords Paper

0

0

0

0

14:43

06/12/2021

TriBERT: Human-centric Audio-visual Representation Learning

Tanzila Rahman, Mengyu Yang, Leonid Sigal

Keywords Paper

transformers, representation learning

0

0

0

0

13:54

08/12/2020

Specializing Unsupervised Pretraining Models for Word-Level Semantic Similarity

Anne Lauscher, Ivan Vulić, Edoardo Maria Ponti and
Anna Korhonen, Goran Glavaš

Keywords Paper

0

0

0

0

13:01

06/12/2020

Unsupervised Data Augmentation for Consistency Training

Qizhe Xie, Zihang Dai, Eduard Hovy and
Thang Luong, Quoc V Le

Keywords Paper

0

0

0

0

3:29

19/04/2021

Framing word sense disambiguation as a multi-label problem for model-agnostic knowledge integration

Simone Conia, Roberto Navigli

Keywords Paper

0

0

0

0

6:38

02/02/2021

Label Confusion Learning to Enhance Text Classification Models

Biyang Guo, Songqiao Han, Xiao Han and
Hailiang Huang, Ting Lu

Keywords Paper

0

0

0

0

15:17

14/06/2020

Discriminative Multi-Modality Speech Recognition

Bo Xu, Cheng Lu, Yandong Guo, Jacob Wang

Keywords Paper

multi-modal, audio-visual, speech recognition, lip reading, deep learning, eleatt-gru, deep learning

0

0

0

0

1:01

02/02/2021

Analogy Training Multilingual Encoders

Nicolas Garneau, Mareike Hartmann, Anders Sandholm and
Sebastian Ruder, Ivan Vulić, Anders Søgaard

Keywords Paper

0

0

0

0

14:03

02/02/2021

Conceptualized and Contextualized Gaussian Embedding

Chen Qian, Fuli Feng, Lijie Wen, Tat-Seng Chua

Keywords Paper

0

0

0

0

14:47

19/08/2021

Modelling General Properties of Nouns by Selectively Averaging Contextualised Embeddings

Na Li, Zied Bouraoui, Jose Camacho-Collados and
Luis Espinosa-Anke, Qing Gu, Steven Schockaert

Keywords Paper

Natural Language Processing, Natural Language Semantics, Natural Language Processing

0

0

0

0

14:09

04/07/2020

How Does Selective Mechanism Improve Self-Attention Networks?

Xinwei Geng, Longyue Wang, Xing Wang and
Bing Qin, Ting Liu, Zhaopeng Tu

Keywords Paper

NLP tasks, natural inference, semantic labelling, machine translation

0

0

0

0

11:43

06/12/2021

How Should Pre-Trained Language Models Be Fine-Tuned Towards Adversarial Robustness?

Xinshuai Dong, Anh Tuan Luu, Min Lin and
Shuicheng Yan, Hanwang Zhang

Keywords Paper

robustness, adversarial robustness and security, language

0

0

0

0

10:26