PS3: Partition-based Skew-Specialized Sampling for Batch Mode Active Learning in Imbalanced Text Data

Abstract: While social media has taken a fixed place in our daily life, its steadily growing prominence also exacerbates the problem of hostile contents and hate-speech. These destructive phenomena call for automatic hate-speech detection, which, however, is facing two major challenges, namely i) the dynamic nature of online content causing significant data-drift over time, and ii) a high class-skew, as hate-speech represents a relatively small fraction of the overall online content. The first challenge naturally calls for a batch mode active learning solution, which updates the detection system by querying human domain-experts to annotate meticulously selected batches of data instances. However, little prior work exists on batch mode active learning with high class-skew, and in particular for the problem of hate-speech detection. In this work, we propose a novel partition-based batch mode active learning framework to address this problem. Our framework falls into the so-called screening approach, which pre-selects a subset of most uncertain data items and then selects a representative set from this uncertainty space. To tackle the class-skew problem, we use a data-driven skew-specialized cluster representation, with a higher potential to “cherry pick” minority classes. In extensive experiments we demonstrate substantial improvements in terms of G-Means, and F1 measure, over several baseline approaches and multiple datasets, for highly imbalanced class ratios.

PS3: Partition-based Skew-Specialized Sampling for Batch Mode Active Learning in Imbalanced Text Data

Ricky Fajri, Samaneh Khoshrou, Robert Peharz, Mykola Pechenizkiy

Comments

Similar Papers

A Deep Dive into Multilingual Hate Speech Classification

Sai Saketh Aluru, Binny Mathew, Punyajoy Saha, Animesh Mukherjee

Keywords Abstract Paper

hate speech, multilingual, classification, bert, embeddings

Aggressive, Repetitive, Intentional, Visible, and Imbalanced: Refining Representations for Cyberbullying Classification

Caleb Ziems, Ymir Vigfusson, Fred Morstatter

Keywords Abstract Paper

behaviors, cases, classification, classifiers, communities, detection, factors, large_scale, learning, linguistic, linguistic aspects, networks, performance, representations

Self-Damaging Contrastive Learning

Ziyu Jiang, Tianlong Chen, Bobak Mortazavi, Zhangyang Wang

Keywords Abstract Paper

Algorithms, Unsupervised Learning

“are you kidding me?”: Detecting unpalatable questions on Reddit

Sunyam Bagga, Andrew Piper, Derek Ruths

Keywords Abstract Paper

Confusable Learning for Large-class Few-Shot Classification

Bingcong Li, Bo Han, Zhuowei Wang and Jing Jiang, Guodong Long

Keywords Abstract Paper

large-class few-shot classification, meta-learning, confusion matrix

Detection of novel social bots by ensembles of specialized classifiers

Mohsen Sayyadiharikandeh, Onur Varol, Kai-Cheng Yang and Alessandro Flammini, Filippo Menczer

Keywords Abstract Paper

social bots, recall, social media, machine learning, cross-domain

Backdoor Scanning for Deep Neural Networks through K-Arm Optimization

Guangyu Shen, Yingqi Liu, Guanhong Tao and Shengwei An, Qiuling Xu, Siyuan Cheng, Shiqing Ma, Xiangyu Zhang

Keywords Abstract Paper

Social Aspects of Machine Learning, Privacy, Anonymity, and Security

“Call me sexist, but...” : Revisiting Sexism Detection Using Psychological Scales and Adversarial Samples

Mattia Samory, Indira Sen, Julian Kohne and Fabian Flöck, Claudia Wagner

Keywords Abstract Paper

Psychological, personality-based and ethnographic studies of social media, Qualitative and quantitative studies of social media, Subjectivity in textual data, sentiment analysis, polarity/opinion identification and extraction, linguistic analyses of social

Message Passing Adaptive Resonance Theory for Online Active Semi-supervised Learning

Taehyeong Kim, Injune Hwang, Hyundo Lee and Hyunseo Kim, Won-Seok Choi, Joseph Lim, Byoung-Tak Zhang

Keywords Abstract Paper

Algorithms, Active Learning

Social Media Relevance Filtering Using Perplexity-Based Positive-Unlabelled Learning

Sunghwan Mac Kim, Stephen Wan, Cécile Paris, Andreas Duenser

Keywords Abstract Paper

cases, events, languages, learning, performance, sources, topic, traditional, traditional sources, twitter

Improving Contrastive Learning on Imbalanced Data via Open-World Sampling

Ziyu Jiang, Tianlong Chen, Ting Chen, Zhangyang Wang

Keywords Abstract Paper

contrastive learning

Policy-Driven Attack: Learning to Query for Hard-label Black-box Adversarial Examples

Ziang Yan, Yiwen Guo, Jian Liang, Changshui Zhang

Keywords Abstract Paper

hard-label attack, adversarial attack, black-box attack, reinforcement learning

Abusive Language Detection in Heterogeneous Contexts: Dataset Collection and the Role of Supervised Attention

Hongyu Gong, Alberto Valido, Katherine M. Ingram and Giulia Fanti, Suma Bhat, Dorothy L. Espelage

Keywords Abstract Paper

Team Oulu at SemEval-2020 Task 12: Multilingual Identification of Offensive Language, Type and Target of Twitter Post Using Translated Datasets

Md Saroar Jahan

Keywords Abstract Paper

Joint Modelling of Emotion and Abusive Language Detection

Santhosh Rajamanickam, Pushkar Mishra, Helen Yannakoudakis, Ekaterina Shutova

Keywords Abstract Paper

Joint Detection, abuse detection, abusive detection, multi-task framework

Asynchronous Teacher Guided Bit-wise Hard Mining for Online Hashing

Sheng Jin, Qin Zhou, Hongxun Yao and Yao Liu, Xian-Sheng Hua

Keywords Abstract Paper

Empirical Analysis of Multi-Task Learning for Reducing Identity Bias in Toxic Comment Detection

Ameya Vaidya, Feng Mai, Yue Ning

Keywords Abstract Paper

attention, bias, deep learning, detection, groups, identities, learning, sources, toxic, toxicity

Recommending Courses in MOOCs for Jobs: An Auto Weak Supervision Approach

Bowen Hao, Jing Zhang, Cuiping Li and Hong Chen, Hongzhi Yin

Keywords Abstract Paper

A Simple and Effective Self-Supervised Contrastive Learning Framework for Aspect Detection

Tian Shi, Liuqing Li, Ping Wang, Chandan K. Reddy

Keywords Abstract Paper

Rumor detection on Twitter using multiloss hierarchical BiLSTM with an attenuation factor

Yudianto Sujana, Jiawen Li, Hung-Yu Kao

Keywords Abstract Paper

Adapting Meta Knowledge with Heterogeneous Information Network for COVID-19 Themed Malicious Repository Detection

Yiyue Qian, Yiming Zhang, Yanfang Ye, Chuxu Zhang

Keywords Abstract Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Bingcong Li, Bo Han, Zhuowei Wang and
Jing Jiang, Guodong Long

Keywords Paper

Mohsen Sayyadiharikandeh, Onur Varol, Kai-Cheng Yang and
Alessandro Flammini, Filippo Menczer

Keywords Paper

Guangyu Shen, Yingqi Liu, Guanhong Tao and
Shengwei An, Qiuling Xu, Siyuan Cheng, Shiqing Ma, Xiangyu Zhang

Keywords Paper

Mattia Samory, Indira Sen, Julian Kohne and
Fabian Flöck, Claudia Wagner

Keywords Paper

Taehyeong Kim, Injune Hwang, Hyundo Lee and
Hyunseo Kim, Won-Seok Choi, Joseph Lim, Byoung-Tak Zhang

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Hongyu Gong, Alberto Valido, Katherine M. Ingram and
Giulia Fanti, Suma Bhat, Dorothy L. Espelage

Keywords Paper

Keywords Paper

Keywords Paper

Sheng Jin, Qin Zhou, Hongxun Yao and
Yao Liu, Xian-Sheng Hua

Keywords Paper

Keywords Paper

Bowen Hao, Jing Zhang, Cuiping Li and
Hong Chen, Hongzhi Yin

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Liat Ein-Dor, Alon Halfon, Ariel Gera and
Eyal Shnarch, Lena Dankin, Leshem Choshen, Marina Danilevsky, Ranit Aharonov, Yoav Katz, Noam Slonim

Keywords Paper

Zhiping Xiao, Weiping Song, Haoyan Xu and
Zhicheng Ren, Yizhou Sun

Keywords Paper

Kai Shu, Guoqing Zheng, Yichuan Li and
Subhabrata Mukherjee, Ahmed Hassan Awadallah, Scott Ruston, Huan Liu

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Yannis Kalantidis, Mert Bulent Sariyildiz, Noe Pion and
Philippe Weinzaepfel, Diane Larlus

Keywords Paper

Maik Fröbe, Janek Bevendorff, Jan Heinrich Reimer and
Martin Potthast, Matthias Hagen

Keywords Paper

Shanshan Feng, Lucas Vinh Tran, Gao Cong and
Lisi Chen, Jing Li, Fan Li

Keywords Paper

Keywords Paper

Seunghyun Kim, Afsaneh Razi, Gianluca Stringhini and
Pamela J. Wisniewski, Munmun De Choudhury

Keywords Paper

Izzeddin Gur, Natasha Jaques, Yingjie Miao and
Jongwook Choi, Manoj Tiwari, Honglak Lee, Aleksandra Faust

Keywords Paper