16/11/2020

Detecting Independent Pronoun Bias with Partially-Synthetic Data Generation

Robert Munro, Alex (Carmen) Morrison

Keywords: measuring models, parsers, language models, machine models

Abstract: We report that state-of-the-art parsers consistently failed to identify ``hers″ and ``theirs″ as pronouns but identified the masculine equivalent ``his″. We find that the same biases exist in recent language models like BERT. While some of the bias comes from known sources, like training data with gender imbalances, we find that the bias is _amplified_ in the language models and that linguistic differences between English pronouns that are not inherently biased can become biases in some machine learning models. We introduce a new technique for measuring bias in models, using Bayesian approximations to generate partially-synthetic data from the model itself.

 0
 0
 0
 0
This is an embedded video. Talk and the respective paper are published at EMNLP 2020 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment
no comments yet
code of conduct: tbd Characters remaining: 140

Similar Papers