Simple, Interpretable and Stable Method for Detecting Words with Usage Change across Corpora

04/07/2020

Simple, Interpretable and Stable Method for Detecting Words with Usage Change across Corpora

Hila Gonen, Ganesh Jawahar, Djamé Seddah, Yoav Goldberg

Keywords: computational science, word embeddings, vector alignment, vector spaces

Abstract Paper Similar Papers

Abstract: The problem of comparing two bodies of text and searching for words that differ in their usage between them arises often in digital humanities and computational social science. This is commonly approached by training word embeddings on each corpus, aligning the vector spaces, and looking for words whose cosine distance in the aligned space is large. However, these methods often require extensive filtering of the vocabulary to perform well, and - as we show in this work - result in unstable, and hence less reliable, results. We propose an alternative approach that does not use vector space alignment, and instead considers the neighbors of each word. The method is simple, interpretable and stable. We demonstrate its effectiveness in 9 different setups, considering different corpus splitting criteria (age, gender and profession of tweet authors, time of tweet) and different languages (English, French and Hebrew).

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at ACL 2020 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

19/04/2021

Cross-lingual contextualized topic models with zero-shot learning

Federico Bianchi, Silvia Terragni, Dirk Hovy and
Debora Nozza, Elisabetta Fersini

Keywords Paper

0

0

0

0

6:36

02/02/2021

FILTER: An Enhanced Fusion Method for Cross-lingual Language Understanding

Yuwei Fang, Shuohang Wang, Zhe Gan and
Siqi Sun, Jingjing Liu

Keywords Paper

0

0

0

0

17:39

04/07/2020

Effectively Aligning and Filtering Parallel Corpora under Sparse Data Conditions

Steinþór Steingrímsson, Hrafn Loftsson, Andy Way

Keywords Paper

Aligning Corpora, machine systems, data problem, alignment problem

0

0

0

0

11:47

02/02/2021

Simple is not Easy: A Simple Strong Baseline for TextVQA and TextCaps

Qi Zhu, Chenyu Gao, Peng Wang, Qi Wu

Keywords Paper

0

0

0

0

15:58

02/02/2021

Fake it Till You Make it: Self-Supervised Semantic Shifts for Monolingual Word Embedding Tasks

Maurício Gruppi, Pin-Yu Chen, Sibel Adali

Keywords Paper

0

0

0

0

19:35

04/07/2020

LINSPECTOR: Multilingual Probing Tasks for Word Representations

Gözde Gül Sahin, Clara Vania, Ilia Kuznetsov, Iryna Gurevych

Keywords Paper

Word Representations, NLP, classification tasks, probing tasks

0

0

0

0

11:51

08/12/2020

Manifold Learning-based Word Representation Refinement Incorporating Global and Local Information

Wenyu Zhao, Dong Zhou, Lin Li, Jinjun Chen

Keywords Paper

0

0

0

0

14:59

04/07/2020

Towards Robustifying NLI Models Against Lexical Dataset Biases

Xiang Zhou, Mohit Bansal

Keywords Paper

Natural Inference, data augmentation, Robustifying Models, deep models

0

0

0

0

11:34

05/12/2020

Massively multilingual document alignment with cross-lingual sentence-mover’s distance

Ahmed El-Kishky, Francisco Guzmán

Keywords Paper

0

0

0

0

14:59

03/05/2021

Active Contrastive Learning of Audio-Visual Video Representations

Shuang Ma, Zhaoyang Zeng, Daniel McDuff, Yale Song

Keywords Paper

video recognition, audio-visual representation, self-supervised learning, active learning, contrastive representation learning

0

0

0

0

5:22

26/08/2020

Unsupervised Hierarchy Matching with Optimal Transport over Hyperbolic Spaces

David Alvarez-Melis, Youssef Mroueh, Tommi Jaakkola

Keywords Paper

0

0

1

1

15:14

04/07/2020

Jointly Learning to Align and Summarize for Neural Cross-Lingual Summarization

Yue Cao, Hui Liu, Xiaojun Wan

Keywords Paper

Neural Summarization, Cross-lingual summarization, cross-lingual training, pipeline methods

0

0

0

0

9:30

14/06/2020

ContourNet: Taking a Further Step Toward Accurate Arbitrary-Shaped Scene Text Detection

Yuxin Wang, Hongtao Xie, Zheng-Jun Zha and
Mengting Xing, Zilong Fu, Yongdong Zhang

Keywords Paper

scene text detection, arbitrary shapes, false-positive suppression, large scale variance

0

0

0

0

1:01

26/04/2020

A Probabilistic Formulation of Unsupervised Text Style Transfer

Junxian He, Xinyi Wang, Graham Neubig, Taylor Berg-Kirkpatrick

Keywords Paper

unsupervised text style transfer, deep latent sequence model

0

0

0

0

5:02

04/07/2020

Cross-modal Language Generation using Pivot Stabilization for Web-scale Language Coverage

Ashish V. Thapliyal, Radu Soricut

Keywords Paper

Cross-modal Generation, Web-scale Coverage, Cross-modal tasks, Pivot Stabilization

0

0

0

0

11:43

02/02/2021

Bigram and Unigram Based Text Attack via Adaptive Monotonic Heuristic Search

Xinghao Yang, Weifeng Liu, James Bailey and
Dacheng Tao, Wei Liu

Keywords Paper

0

0

0

0

17:17

08/12/2020

AraBench: Benchmarking Dialectal Arabic-English Machine Translation

Hassan Sajjad, Ahmed Abdelali, Nadir Durrani, Fahim Dalvi

Keywords Paper

0

0

0

0

13:45

06/12/2021

CentripetalText: An Efficient Text Instance Representation for Scene Text Detection

Tao Sheng, Jie Chen, Zhouhui Lian

Keywords Paper

robustness

0

0

0

0

9:55

03/05/2021

Filtered Inner Product Projection for Crosslingual Embedding Alignment

Vin Sachidananda, Ziyi Yang, Chenguang Zhu

Keywords Paper

multilingual representations, natural language processing, word embeddings

0

0

0

0

5:22

07/06/2021

Discovering and Categorising Language Biases in Reddit

Xavier Ferrer, Tom Van Nuenen, Jose M. Such, Natalia Criado

Keywords Paper

Qualitative and quantitative studies of social media, Social network analysis, communities identification, expertise and authority discovery, Subjectivity in textual data, sentiment analysis, polarity/opinion identification and extraction, linguistic analy

0

0

0

0

8:03

30/11/2020

Scale-Aware Polar Representation for Arbitrarily-Shaped Text Detection

Yanguang Bi, Zhiqiang Hu

Keywords Paper

0

0

0

0

9:56

14/06/2020

Visual Grounding in Video for Unsupervised Word Translation

Gunnar A. Sigurdsson, Jean-Baptiste Alayrac, Aida Nematzadeh and
Lucas Smaira, Mateusz Malinowski, João Carreira, Phil Blunsom, Andrew Zisserman

Keywords Paper

video, translation, multimodal learning, unsupervised learning, unsupervised translation, youtube, howto100m, multilingual, language, deep learning

0

0

0

0

1:01

19/08/2021

Text-based Person Search via Multi-Granularity Embedding Learning

Chengji Wang, Zhiming Luo, Yaojin Lin, Shaozi Li

Keywords Paper

Computer Vision, Language and Vision, Recognition

0

0

0

0

12:25

08/12/2020

A Deep Metric Learning Method for Biomedical Passage Retrieval

Andrés Rosso-Mateus, Fabio A. González, Manuel Montes-y-Gómez

Keywords Paper

0

0

0

0

14:58

16/11/2020

MODE-LSTM: A Parameter-efficient Recurrent Network with Multi-Scale for Sentence Classification

Qianli Ma, Zhenxi Lin, Jiangyue Yan and
Zipeng Chen, Liuhong Yu

Keywords Paper

sentence classification, extracting features, generalization, cnn models

0

0

0

0

10:35

04/07/2020

BPE-Dropout: Simple and Effective Subword Regularization

Ivan Provilkov, Dmitrii Emelianenko, Elena Voita

Keywords Paper

open problem, machine translation, subword segmentation, training

0

0

0

0

9:33

04/07/2020

RPD: A Distance Function Between Word Embeddings

Xuhui Zhou, Shujian Huang, Zaixiang Zheng

Keywords Paper

RPD, Word Embeddings, training processes, Relative Distance

0

0

0

0

11:13

16/11/2020

Wasserstein Distance Regularized Sequence Representation for Text Matching in Asymmetrical Domains

Weijie Yu, Chen Xu, Jun Xu and
Liang Pang, Xiaopeng Gao, Xiaozhao Wang, Ji-Rong Wen

Keywords Paper

real-world practices, text matching, matching models, match method

0

0

0

0

11:43

04/07/2020

Gender Bias in Multilingual Embeddings and Cross-Lingual Transfer

Jieyu Zhao, Subhabrata Mukherjee, Saghar Hosseini and
Kai-Wei Chang, Ahmed Hassan Awadallah

Keywords Paper

cross-lingual transfer, multilingual embeddings, NLP applications, bias analysis

0

0

0

0

11:42

08/12/2020

Informative Manual Evaluation of Machine Translation Output

Maja Popović

Keywords Paper

0

0

0

0

15:26

16/11/2020

Do Explicit Alignments Robustly Improve Multilingual Encoders?

Shijie Wu, Mark Dredze

Keywords Paper

multilingual, unsupervised encoders, cross-lingual representation, contrastive objective

0

0

0

0

7:14

08/12/2020

Is it Great or Terrible? Preserving Sentiment in Neural Machine Translation of Arabic Reviews

Hadeel Saadany, Constantin Orasan

Keywords Paper

0

0

0

0

14:35

12/07/2020

Aligned Cross Entropy for Non-Autoregressive Machine Translation

Marjan Ghazvininejad, Vladimir Karpukhin, Luke Zettlemoyer, Omer Levy

Keywords Paper

Applications - Language, Speech and Dialog

0

0

0

0

14:43

19/08/2021

Deep Unified Cross-Modality Hashing by Pairwise Data Alignment

Yimu Wang, Bo Xue, Quan Cheng and
Yuhui Chen, Lijun Zhang

Keywords Paper

Computer Vision, Recognition, Information Retrieval, Deep Learning

0

0

0

0

13:11

16/11/2020

Simultaneous Machine Translation with Visual Context

Ozan Caglayan, Julia Ive, Veneta Haralampieva and
Pranava Madhyastha, Loïc Barrault, Lucia Specia

Keywords Paper

simt, multimodal approaches, simt frameworks, visually-grounded models

0

0

0

0

12:34

04/07/2020

TransS-Driven Joint Learning Architecture for Implicit Discourse Relation Recognition

Ruifang He, Jian Wang, Fengyu Guo, Yugui Han

Keywords Paper

Implicit Recognition, discourse understanding, TransS-Driven Architecture, multi-level encoder

0

0

0

0

11:42

14/06/2020

SwapText: Image Based Texts Transfer in Scenes

Qiangpeng Yang, Jun Huang, Wei Lin

Keywords Paper

text style transfer, gan, image synthesis.

0

0

0

0

1:01

04/07/2020

Enhancing Answer Boundary Detection for Multilingual Machine Reading Comprehension

Fei Yuan, Linjun Shou, Xuanyu Bai and
Ming Gong, Yaobo Liang, Nan Duan, Yan Fu, Daxin Jiang

Keywords Paper

Multilingual Comprehension, multilingual MRC, MRC, sentence tasks

0

0

0

0

8:30

25/07/2020

Attending to inter-sentential features in neural text classification

Billy Chiu, Sunil Kumar Sahu, Neha Sengupta and
Derek Thomas, Mohammady Mahdy

Keywords Paper

graph network, hybrid neural network, attention mechanism

0

0

0

0

6:41

16/11/2020

Comparative Evaluation of Label-Agnostic Selection Bias in Multilingual Hate Speech Datasets

Nedjma Ousidhoum, Yangqiu Song, Dit-Yan Yeung

Keywords Paper

classification, data process, topic models, selection bias

0

0

0

0

12:07