Comparative Evaluation of Label-Agnostic Selection Bias in Multilingual Hate Speech Datasets

16/11/2020

Comparative Evaluation of Label-Agnostic Selection Bias in Multilingual Hate Speech Datasets

Nedjma Ousidhoum, Yangqiu Song, Dit-Yan Yeung

Keywords: classification, data process, topic models, selection bias

Abstract Paper Similar Papers

Abstract: Work on bias in hate speech typically aims to improve classification performance while relatively overlooking the quality of the data. We examine selection bias in hate speech in a language and label independent fashion. We first use topic models to discover latent semantics in eleven hate speech corpora, then, we present two bias evaluation metrics based on the semantic similarity between topics and search words frequently used to build corpora. We discuss the possibility of revising the data collection process by comparing datasets and analyzing contrastive case studies.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at EMNLP 2020 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

04/07/2020

Contextualizing Hate Speech Classifiers with Post-hoc Explanation

Brendan Kennedy, Xisen Jin, Aida Mostafazadeh Davani and
Morteza Dehghani, Xiang Ren

Keywords Paper

Contextualizing Classifiers, Post-hoc Explanation, Hate classifiers, fine-tuned classifiers

1

1

0

0

7:09

07/06/2021

Discovering and Categorising Language Biases in Reddit

Xavier Ferrer, Tom Van Nuenen, Jose M. Such, Natalia Criado

Keywords Paper

Qualitative and quantitative studies of social media, Social network analysis, communities identification, expertise and authority discovery, Subjectivity in textual data, sentiment analysis, polarity/opinion identification and extraction, linguistic analy

0

0

0

0

8:03

06/12/2021

Implicit Semantic Response Alignment for Partial Domain Adaptation

Wenxiao Xiao, Zhengming Ding, Hongfu Liu

Keywords Paper

domain adaptation, transfer learning

0

0

0

0

11:43

19/08/2021

Bias Silhouette Analysis: Towards Assessing the Quality of Bias Metrics for Word Embedding Models

Maximilian Spliethöver, Henning Wachsmuth

Keywords Paper

AI Ethics, Trust, Fairness, Fairness, Societal Impact of AI, Natural Language Processing

0

0

0

0

12:59

08/12/2020

Inflating Topic Relevance with Ideology: A Case Study of Political Ideology Bias in Social Topic Detection Models

Meiqi Guo, Rebecca Hwa, Yu-Ru Lin, Wen-Ting Chung

Keywords Paper

0

0

0

0

14:20

07/06/2021

Measuring Societal Biases from Text Corpora with Smoothed First-Order Co-occurrence

Navid Rekabsaz, Robert West, James Henderson, Allan Hanbury

Keywords Paper

Subjectivity in textual data, sentiment analysis, polarity/opinion identification and extraction, linguistic analyses of social media behavior, Text categorization, topic recognition, demographic/gender/age identification

0

0

0

0

8:05

07/06/2020

A Framework for Political Portmanteau Decomposition

Nabil Hossain, Minh Tran, Henry Kautz

Keywords Paper

building, detection, hate speech, linguistic, political, spread, terms, traditional, words

0

0

0

0

3:12

25/07/2020

Sampling bias due to near-duplicates in learning to rank

Maik Fröbe, Janek Bevendorff, Jan Heinrich Reimer and
Martin Potthast, Matthias Hagen

Keywords Paper

near-duplicate-detection, selection bias, learning to rank, novelty principle

0

0

0

0

10:59

02/02/2021

On the Importance of Word Order Information in Cross-lingual Sequence Labeling

Zihan Liu, Genta I Winata, Samuel Cahyawijaya and
Andrea Madotto, Zhaojiang Lin, Pascale Fung

Keywords Paper

0

0

0

0

15:22

14/06/2020

Instance Guided Proposal Network for Person Search

Wenkai Dong, Zhaoxiang Zhang, Chunfeng Song, Tieniu Tan

Keywords Paper

person search, person detection and re-identification, cross-correlation layer, relation modeling

0

0

0

0

4:58

19/04/2021

Exploiting emojis for abusive language detection

Michael Wiegand, Josef Ruppenhofer

Keywords Paper

0

0

0

0

11:18

16/11/2020

CrowS-Pairs: A Challenge Dataset for Measuring Social Biases in Masked Language Models

Nikita Nangia, Clara Vania, Rasika Bhalerao, Samuel R. Bowman

Keywords Paper

nlp tasks, pretrained models, masked models, mlms

0

0

0

0

10:56

19/04/2021

“laughing at you or with you”: The role of sarcasm in shaping the disagreement space

Debanjan Ghosh, Ritvik Shrivastava, Smaranda Muresan

Keywords Paper

0

0

0

0

10:54

03/08/2020

Adapting Text Embeddings for Causal Inference

Victor Veitch, Dhanya Sridhar, David Blei

Keywords Paper

0

0

0

0

8:51

19/04/2021

Implicitly abusive comparisons – a new dataset and linguistic analysis

Michael Wiegand, Maja Geulig, Josef Ruppenhofer

Keywords Paper

0

0

0

0

10:52

14/06/2020

Mitigating Bias in Face Recognition Using Skewness-Aware Reinforcement Learning

Mei Wang, Weihong Deng

Keywords Paper

fairness, racial bias, face recognition, deep reinforcement learning, adaptive margin

0

0

0

0

1:00

01/07/2020

Challenges in Emotion Style Transfer: An Exploration with a Lexical Substitution Pipeline

David Helbig, Enrica Troiano, Roman Klinger

Keywords Paper

0

0

0

0

17:44

22/09/2020

TAFA: Two-headed attention fused autoencoder for context-aware recommendations

Jin Peng Zhou, Zhaoyue Cheng, Felipe Perez, Maksims Volkovs

Keywords Paper

Deep Learning, Context-Aware Recommender Systems, Neural Attention Networks

0

0

0

0

2:06

22/09/2020

Cascading hybrid bandits: Online learning to rank for relevance and diversity

Chang Li, Haoyun Feng, Maarten Rijke

Keywords Paper

recommender system, contextual bandits, Online learning to rank, result diversification

0

0

0

0

2:51

16/11/2020

Hate-Speech and Offensive Language Detection in Roman Urdu

Hammad Rizwan, Muhammad Haroon Shakeel, Asim Karim

Keywords Paper

automatic detection, hate-speech detection, language models, transfer learning

0

0

0

0

10:55

19/04/2021

Machine translationese: Effects of algorithmic bias on linguistic complexity in machine translation

Eva Vanmassenhove, Dimitar Shterionov, Matthew Gwilliam

Keywords Paper

0

0

0

0

11:19

14/09/2020

A Deep Dive into Multilingual Hate Speech Classification

Sai Saketh Aluru, Binny Mathew, Punyajoy Saha, Animesh Mukherjee

Keywords Paper

hate speech, multilingual, classification, bert, embeddings

0

0

0

0

14:20

22/09/2020

Unbiased ad click prediction for position-aware advertising systems

Bowen Yuan, Yaxu Liu, Jui-Yang Hsia and
Zhenhua Dong, Chih-Jen Lin

Keywords Paper

Counterfactual learning, CTR prediction, Selection bias

0

0

0

0

2:51

02/02/2021

Fairness-aware News Recommendation with Decomposed Adversarial Learning

Chuhan Wu, Fangzhao Wu, Xiting Wang and
Yongfeng Huang, Xing Xie

Keywords Paper

0

0

0

0

18:26

02/02/2021

Uncovering Latent Biases in Text: Method and Application to Peer Review

Emaad Manzoor, Nihar B. Shah

Keywords Paper

0

0

0

0

18:28

22/11/2021

Feature and Label Embedding Spaces Matter in Addressing Image Classifier Bias

William Thong, Cees Snoek

Keywords Paper

bias mitigation, model debiasing, fairness

0

0

0

0

2:47

02/02/2021

Generating Diversified Comments via Reader-Aware Topic Modeling and Saliency Detection

Wei Wang, Piji Li, Hai-Tao Zheng

Keywords Paper

0

0

0

0

15:11

02/02/2021

The Gap on Gap: Tackling the Problem of Differing Data Distributions in Bias-Measuring Datasets

Vid Kocijan, Oana-Maria Camburu, Thomas Lukasiewicz

Keywords Paper

0

0

0

0

15:13

12/07/2020

Dual-Path Distillation: A Unified Framework to Improve Black-Box Attacks

Yonggang Zhang, Ya Li, Tongliang Liu, Xinmei Tian

Keywords Paper

Adversarial Examples

0

0

0

0

11:33

01/07/2020

Demoting Racial Bias in Hate Speech Detection

Mengzhou Xia, Anjalie Field, Yulia Tsvetkov

Keywords Paper

0

0

0

0

12:41

04/07/2020

Simple, Interpretable and Stable Method for Detecting Words with Usage Change across Corpora

Hila Gonen, Ganesh Jawahar, Djamé Seddah, Yoav Goldberg

Keywords Paper

computational science, word embeddings, vector alignment, vector spaces

0

0

0

0

10:42

16/11/2020

Queens are Powerful too: Mitigating Gender Bias in Dialogue Generation

Emily Dinan, Angela Fan, Adina Williams and
Jack Urbanek, Douwe Kiela, Jason Weston

Keywords Paper

counterfactual augmentation, targeted collection, bias training, generative models

0

0

0

0

12:18

16/11/2020

Semantic Drift in Multilingual Representations

Lisa Beinborn, Rochelle Choenni

Keywords Paper

multilingual representations, computational representations, representational analysis, analysis method

0

0

0

0

12:44

04/07/2020

Gender Bias in Multilingual Embeddings and Cross-Lingual Transfer

Jieyu Zhao, Subhabrata Mukherjee, Saghar Hosseini and
Kai-Wei Chang, Ahmed Hassan Awadallah

Keywords Paper

cross-lingual transfer, multilingual embeddings, NLP applications, bias analysis

0

0

0

0

11:42

08/12/2020

Semi-Supervised Topic Modeling for Gender Bias Discovery in English and Swedish

Hannah Devinney, Jenny Björklund, Henrik Björklund

Keywords Paper

0

0

0

0

8:45

19/04/2021

A unified feature representation for lexical connotations

Emily Allaway, Kathleen McKeown

Keywords Paper

0

0

0

0

12:07

03/05/2021

Probing BERT in Hyperbolic Spaces

Boli Chen, Yao Fu, Guangwei Xu and
Pengjun Xie, Chuanqi Tan, Mosha Chen, Liping Jing

Keywords Paper

Sentiment, Syntax, Probe, BERT, Hyperbolic

0

0

0

0

5:10

30/11/2020

Show, Conceive and Tell: Image Captioning with Prospective Linguistic Information

Yiqing Huang, Jiansheng Chen

Keywords Paper

0

0

0

0

7:08

19/08/2021

Hierarchical Modeling of Label Dependency and Label Noise in Fine-grained Entity Typing

Junshuang Wu, Richong Zhang, Yongyi Mao and
Masoumeh Soflaei Shahrbabak, Jinpeng Huai

Keywords Paper

Natural Language Processing, Information Extraction, Named Entities, NLP Applications and Tools

0

0

0

0

13:58

04/07/2020

OpusFilter: A Configurable Parallel Corpus Filtering Toolbox

Mikko Aulamo, Sami Virpioja, Jörg Tiedemann

Keywords Paper

filtering corpora, Finnish-English task, data selection, domain adaptation

0

0

0

0

11:16