Measuring Societal Biases from Text Corpora with Smoothed First-Order Co-occurrence

Abstract: Text corpora are widely used resources for measuring societal biases and stereotypes. The common approach to measuring such biases using a corpus is by calculating the similarities between the embedding vector of a word (like nurse) and the vectors of the representative words of the concepts of interest (such as genders). In this study, we show that, depending on what one aims to quantify as bias, this commonly-used approach can introduce non-relevant concepts into bias measurement. We propose an alternative approach to bias measurement utilizing the smoothed first-order co-occurrence relations between the word and the representative concept words, which we derive by reconstructing the co-occurrence estimates inherent in word embedding models. We compare these approaches by conducting several experiments on the scenario of measuring gender bias of occupational words, according to an English Wikipedia corpus. Our experiments show higher correlations of the measured gender bias with the actual gender bias statistics of the U.S. job market – on two collections and with a variety of word embedding models – using the first-order approach in comparison with the vector similarity-based approaches. The first-order approach also suggests a more severe bias towards female in a few specific occupations than the other approaches.

Measuring Societal Biases from Text Corpora with Smoothed First-Order Co-occurrence

Navid Rekabsaz, Robert West, James Henderson, Allan Hanbury

Comments

Similar Papers

Semi-Supervised Topic Modeling for Gender Bias Discovery in English and Swedish

Hannah Devinney, Jenny Björklund, Henrik Björklund

Keywords Abstract Paper

Gender Bias in Multilingual Embeddings and Cross-Lingual Transfer

Jieyu Zhao, Subhabrata Mukherjee, Saghar Hosseini and Kai-Wei Chang, Ahmed Hassan Awadallah

Keywords Abstract Paper

cross-lingual transfer, multilingual embeddings, NLP applications, bias analysis

Queens are Powerful too: Mitigating Gender Bias in Dialogue Generation

Emily Dinan, Angela Fan, Adina Williams and Jack Urbanek, Douwe Kiela, Jason Weston

Keywords Abstract Paper

counterfactual augmentation, targeted collection, bias training, generative models

Stereotype and skew: Quantifying gender bias in pre-trained and fine-tuned language models

Daniel Vassimon Manela, David Errington, Thomas Fisher and Boris Breugel, Pasquale Minervini

Keywords Abstract Paper

Machine translationese: Effects of algorithmic bias on linguistic complexity in machine translation

Eva Vanmassenhove, Dimitar Shterionov, Matthew Gwilliam

Keywords Abstract Paper

Discovering and Categorising Language Biases in Reddit

Xavier Ferrer, Tom Van Nuenen, Jose M. Such, Natalia Criado

Keywords Abstract Paper

Qualitative and quantitative studies of social media, Social network analysis, communities identification, expertise and authority discovery, Subjectivity in textual data, sentiment analysis, polarity/opinion identification and extraction, linguistic analy

Hurtful words: quantifying biases in clinical contextual word embeddings

Haoran Zhang, Amy X. Lu, Mohamed Abdalla and Matthew McDermott, Marzyeh Ghassemi

Keywords Abstract Paper

Applied computing, Life and medical sciences, Health informatics, Computing methodologies, Machine learning

Bias Out-of-the-Box: An Empirical Analysis of Intersectional Occupational Biases in Popular Generative Language Models

Hannah Rose Kirk, yennie jun, Filippo Volpin and Haider Iqbal, Elias Benussi, Frederic Dreyer, Aleksandar Shtedritski, Yuki Asano

Keywords Abstract Paper

language

CrowS-Pairs: A Challenge Dataset for Measuring Social Biases in Masked Language Models

Nikita Nangia, Clara Vania, Rasika Bhalerao, Samuel R. Bowman

Keywords Abstract Paper

nlp tasks, pretrained models, masked models, mlms

Double-Hard Debias: Tailoring Word Embeddings for Gender Bias Mitigation

Tianlu Wang, Xi Victoria Lin, Nazneen Fatema Rajani and Bryan McCann, Vicente Ordonez, Caiming Xiong

Keywords Abstract Paper

Tailoring Embeddings, Gender Mitigation, Double-Hard Debias, downstream models

The Gap on Gap: Tackling the Problem of Differing Data Distributions in Bias-Measuring Datasets

Vid Kocijan, Oana-Maria Camburu, Thomas Lukasiewicz

Keywords Abstract Paper

Bias Silhouette Analysis: Towards Assessing the Quality of Bias Metrics for Word Embedding Models

Maximilian Spliethöver, Henning Wachsmuth

Keywords Abstract Paper

AI Ethics, Trust, Fairness, Fairness, Societal Impact of AI, Natural Language Processing

The Secret is in the Spectra: Predicting Cross-lingual Task Performance with Spectral Similarity Measures

Haim Dubossarsky, Ivan Vulić, Roi Reichart, Anna Korhonen

Keywords Abstract Paper

cross-lingual tasks, large-scale study, bli, parsing

Aspect-based Document Similarity for Research Papers

Malte Ostendorff, Terry Ruas, Till Blume and Bela Gipp, Georg Rehm

Keywords Abstract Paper

Comparative Evaluation of Label-Agnostic Selection Bias in Multilingual Hate Speech Datasets

Nedjma Ousidhoum, Yangqiu Song, Dit-Yan Yeung

Keywords Abstract Paper

classification, data process, topic models, selection bias

Multi-Dimensional Gender Bias Classification

Emily Dinan, Angela Fan, Ledell Wu and Jason Weston, Douwe Kiela, Adina Williams

Keywords Abstract Paper

detecting bias, machine models, nlp models, fine-grained framework

Assessing Polyseme Sense Similarity through Co-predication Acceptability and Contextualised Embedding Distance

Janosch Haber, Massimo Poesio

Keywords Abstract Paper

Towards Understanding and Mitigating Social Biases in Language Models

Paul Liang, Chiyu Wu, Louis-Philippe Morency, Russ Salakhutdinov

Keywords Abstract Paper

Social Aspects of Machine Learning, Fairness, Accountability, and Transparency

Unequal Representations: Analyzing Intersectional Biases in Word Embeddings Using Representational Similarity Analysis

Michael Lepori

Keywords Abstract Paper

Robustness and reliability of gender bias assessment in word embeddings: The role of base pairs

Haiyang Zhang, Alison Sneyd, Mark Stevenson

Keywords Abstract Paper

Adapting Text Embeddings for Causal Inference

Victor Veitch, Dhanya Sridhar, David Blei

Keywords Abstract Paper

Multilingual Contextual Affective Analysis of LGBT People Portrayals in Wikipedia

Keywords Paper

Jieyu Zhao, Subhabrata Mukherjee, Saghar Hosseini and
Kai-Wei Chang, Ahmed Hassan Awadallah

Keywords Paper

Emily Dinan, Angela Fan, Adina Williams and
Jack Urbanek, Douwe Kiela, Jason Weston

Keywords Paper

Daniel Vassimon Manela, David Errington, Thomas Fisher and
Boris Breugel, Pasquale Minervini

Keywords Paper

Keywords Paper

Keywords Paper

Haoran Zhang, Amy X. Lu, Mohamed Abdalla and
Matthew McDermott, Marzyeh Ghassemi

Keywords Paper

Hannah Rose Kirk, yennie jun, Filippo Volpin and
Haider Iqbal, Elias Benussi, Frederic Dreyer, Aleksandar Shtedritski, Yuki Asano

Keywords Paper

Keywords Paper

Tianlu Wang, Xi Victoria Lin, Nazneen Fatema Rajani and
Bryan McCann, Vicente Ordonez, Caiming Xiong

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Malte Ostendorff, Terry Ruas, Till Blume and
Bela Gipp, Georg Rehm

Keywords Paper

Keywords Paper

Emily Dinan, Angela Fan, Ledell Wu and
Jason Weston, Douwe Kiela, Adina Williams

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Shammi More, Forschungszentrum Jülich, Jülich and
Germany, Simon Eickhoff, Forschungszentrum Jülich, Jülich, Germany, Julian Caspers, Kaustubh Patil, Forschungszentrum Jülich, Jülich, Germany

Keywords Paper

Jenna Cryan, Shiliang Tang, Xinyi Zhang and
Miriam Metzger, Haitao Zheng, Ben Zhao

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Paul Pu Liang, Irene Mengze Li, Emily Zheng and
Yao Chong Lim, Ruslan Salakhutdinov, Louis-Philippe Morency

Keywords Paper

Chenlong Hu, Yukun Feng, Hidetaka Kamigaito and
Hiroya Takamura, Manabu Okumura

Keywords Paper

Keywords Paper