Empirical Analysis of Multi-Task Learning for Reducing Identity Bias in Toxic Comment Detection

Abstract: With the recent rise of toxicity in online conversations on social media platforms, using modern machine learning algorithms for toxic comment detection has become a central focus of many online applications. Researchers and companies have developed a variety of shallow and deep learning models to identify toxicity in online conversations, reviews, or comments with mixed successes. However, these existing approaches have learned to incorrectly associate non-toxic comments that have certain trigger-words (e.g. gay, lesbian, black, muslim) as a potential source of toxicity. In this paper, we evaluate dozens of state-of-the-art models with the specific focus of reducing model bias towards these commonly-attacked identity groups. We propose a multi-task learning model with an attention layer that jointly learns to predict the toxicity of a comment as well as the identities present in the comments in order to reduce this bias. We then compare our model to an array of shallow and deep-learning models using metrics designed especially to test for unintended model bias within these identity groups.

07/06/2020

behaviors, changes, humans, impact, learning, measures, performance, predictions, terms, toxic, toxicity

10:24

14/09/2020

algorithmic discrimination, algorithmic fairness, poisoning attacks, adversarial machine learning, machine learning security

12:06

19/04/2021

Team Oulu at SemEval-2020 Task 12: Multilingual Identification of Offensive Language, Type and Target of Twitter Post Using Translated Datasets

Psychological, personality-based and ethnographic studies of social media, Qualitative and quantitative studies of social media, Subjectivity in textual data, sentiment analysis, polarity/opinion identification and extraction, linguistic analyses of social

8:00

07/06/2020

cases, events, languages, learning, performance, sources, topic, traditional, traditional sources, twitter

10:12

02/02/2021

Sina Mohseni, Fan Yang, Shiva Pentyala and
Mengnan Du, Yi Liu, Nic Lupfer, Xia Hu, Shuiwang Ji, Eric Ragan

Keywords Paper

Qualitative and quantitative studies of social media, Credibility of online content, Trust, reputation, recommendation systems, Human computer interaction, social media tools, navigation and visualization

8:01

19/08/2021

Multidisciplinary Topics and Applications, Security and Privacy, Classification, Mining Graphs, Semi Structured Data, Complex Data

13:28

05/01/2021

Sentiment and Emotion help Sarcasm? A Multi-task Learning Framework for Multi-Modal Sarcasm, Sentiment and Emotion Analysis

behaviors, cases, classification, classifiers, communities, detection, factors, large_scale, learning, linguistic, linguistic aspects, networks, performance, representations

9:53

07/06/2020

cancer, claims, deep learning, detection, learning, linguistic, misinformation, spread, texts, tweets, twitter

3:03

22/09/2020

matrix factorization, data poisoning, shilling attacks, recommender system, differential privacy, collaborative filtering

10:44