BEEP! Korean Corpus of Online News Comments for Toxic Speech Detection

01/07/2020

BEEP! Korean Corpus of Online News Comments for Toxic Speech Detection

Jihyung Moon, Won Ik Cho, Junbum Lee

Keywords:

Abstract Paper Similar Papers

Abstract: Toxic comments in online platforms are an unavoidable social issue under the cloak of anonymity. Hate speech detection has been actively done for languages such as English, German, or Italian, where manually labeled corpus has been released. In this work, we first present 9.4K manually labeled entertainment news comments for identifying Korean toxic speech, collected from a widely used online news platform in Korea. The comments are annotated regarding social bias and hate speech since both aspects are correlated. The inter-annotator agreement Krippendorff’s alpha score is 0.492 and 0.496, respectively. We provide benchmarks using CharCNN, BiLSTM, and BERT, where BERT achieves the highest score on all tasks. The models generally display better performance on bias identification, since the hate speech detection is a more subjective issue. Additionally, when BERT is trained with bias label for hate speech detection, the prediction score increases, implying that bias and hate are intertwined. We make our dataset publicly available and open competitions with the corpus and benchmarks.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at ACL Workshops virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

14/09/2020

A Deep Dive into Multilingual Hate Speech Classification

Sai Saketh Aluru, Binny Mathew, Punyajoy Saha, Animesh Mukherjee

Keywords Paper

hate speech, multilingual, classification, bert, embeddings

0

0

0

0

14:20

05/12/2020

Toxic language detection in social media for Brazilian Portuguese: New dataset and multilingual analysis

João Augusto Leite, Diego Silva, Kalina Bontcheva, Carolina Scarton

Keywords Paper

0

0

0

0

14:38

01/07/2020

Sarcasm Identification and Detection in Conversion Context using BERT

Kalaivani A., Thenmozhi D.

Keywords Paper

0

0

0

0

5:17

04/07/2020

Contextualizing Hate Speech Classifiers with Post-hoc Explanation

Brendan Kennedy, Xisen Jin, Aida Mostafazadeh Davani and
Morteza Dehghani, Xiang Ren

Keywords Paper

Contextualizing Classifiers, Post-hoc Explanation, Hate classifiers, fine-tuned classifiers

1

1

0

0

7:09

01/07/2020

Demoting Racial Bias in Hate Speech Detection

Mengzhou Xia, Anjalie Field, Yulia Tsvetkov

Keywords Paper

0

0

0

0

12:41

16/11/2020

CrowS-Pairs: A Challenge Dataset for Measuring Social Biases in Masked Language Models

Nikita Nangia, Clara Vania, Rasika Bhalerao, Samuel R. Bowman

Keywords Paper

nlp tasks, pretrained models, masked models, mlms

0

0

0

0

10:56

16/11/2020

HABERTOR: An Efficient and Effective Deep Hatespeech Detector

Thanh Tran, Yifan Hu, Changwei Hu and
Kevin Yen, Fei Tan, Kyumin Lee, Se Rim Park

Keywords Paper

downstream task, hatespeech classification, habertor model, bert model

0

0

0

0

11:46

01/07/2020

Sarcasm Detection using Context Separators in Online Discourse

Tanvi Dadu, Kartikey Pant

Keywords Paper

0

0

0

0

4:15

08/12/2020

Team Oulu at SemEval-2020 Task 12: Multilingual Identification of Offensive Language, Type and Target of Twitter Post Using Translated Datasets

Md Saroar Jahan

Keywords Paper

0

0

0

0

10:36

08/12/2020

Predicting Clickbait Strength in Online Social Media

Vijayasaradhi Indurthi, Bakhtiyar Syed, Manish Gupta, Vasudeva Varma

Keywords Paper

0

0

0

0

14:56

07/06/2021

It’s a Thin Line Between Love and Hate: Using the Echo in Modeling Dynamics of Racist Online Communities

Eyal Arviv, Simo Hanouna, Oren Tsur

Keywords Paper

Social network analysis, communities identification, expertise and authority discovery, Text categorization, topic recognition, demographic/gender/age identification, Measuring predictability of real world phenomena based on social media, e.g., spanning po

0

0

0

0

8:01

19/10/2020

Representative negative instance generation for online ad targeting

Yuhan Quan, Jingtao Ding, Depeng Jin and
Jianbo Yang, Xing Zhou, Yong Li

Keywords Paper

feature matching, adversarial learning, ad targeting, negative sampling

0

0

0

0

6:30

02/02/2021

HateXplain: A Benchmark Dataset for Explainable Hate Speech Detection

Binny Mathew, Punyajoy Saha, Seid Muhie Yimam and
Chris Biemann, Pawan Goyal, Animesh Mukherjee

Keywords Paper

0

0

0

0

18:43

04/07/2020

Reasoning with Multimodal Sarcastic Tweets via Modeling Cross-Modality Contrast and Semantic Association

Nan Xu, Zhixiong Zeng, Wenji Mao

Keywords Paper

Reasoning, sarcasm, multimodal detection, Sarcasm

0

0

0

0

10:57

01/07/2020

Applying Transformers and Aspect-based Sentiment Analysis approaches on Sarcasm Detection

Taha Shangipour ataei, Soroush Javdan, Behrouz Minaei-Bidgoli

Keywords Paper

0

0

0

0

4:41

25/07/2020

Think beyond the word: Understanding the implied textual meaning by digesting context, local, and noise

Guoxiu He, Zhe Gao, Zhuoren Jiang and
Yangyang Kang, Changlong Sun, Xiaozhong Liu, Wei Lu

Keywords Paper

deep neural networks, text classification, semantic representation, implied textual meaning

0

0

0

0

19:57

19/04/2021

“laughing at you or with you”: The role of sarcasm in shaping the disagreement space

Debanjan Ghosh, Ritvik Shrivastava, Smaranda Muresan

Keywords Paper

0

0

0

0

10:54

19/04/2021

From toxicity in online comments to incivility in American news: Proceed with caution

Anushree Hede, Oshin Agarwal, Linda Lu and
Diana C. Mutz, Ani Nenkova

Keywords Paper

0

0

0

0

10:10

01/07/2020

C-Net: Contextual Network for Sarcasm Detection

Amit Kumar Jena, Aman Sinha, Rohit Agarwal

Keywords Paper

0

0

0

0

4:51

07/06/2021

Political Depolarization of News Articles Using Attribute-Aware Word Embeddings

Ruibo Liu, Lili Wang, Chenyan Jia, Soroush Vosoughi

Keywords Paper

Qualitative and quantitative studies of social media, Trust, reputation, recommendation systems, Subjectivity in textual data, sentiment analysis, polarity/opinion identification and extraction, linguistic analyses of social media behavior, Measuring predi

0

0

0

0

6:25

06/12/2020

The Hateful Memes Challenge: Detecting Hate Speech in Multimodal Memes

Douwe Kiela, Hamed Firooz, Aravind Mohan and
Vedanuj Goswami, Amanpreet Singh, Pratik Ringshia, Davide Testuggine

Keywords Paper

0

0

0

0

3:18

02/02/2021

Efficient Optimal Selection for Composited Advertising Creatives with Tree Structure

Jin Chen, Tiezheng Ge, Gangwei Jiang and
Zhiqiang Zhang, Defu Lian, Kai Zheng

Keywords Paper

0

0

0

0

16:45

19/04/2021

EmpathBERT: A BERT-based framework for demographic-aware empathy prediction

Bhanu Prakash Reddy Guda, Aparna Garimella, Niyati Chhaya

Keywords Paper

0

0

0

0

7:19

08/12/2020

Don’t Patronize Me! An Annotated Dataset with Patronizing and Condescending Language towards Vulnerable Communities

Carla Perez Almendros, Luis Espinosa Anke, Steven Schockaert

Keywords Paper

0

0

0

0

15:03

07/06/2021

Discovering and Categorising Language Biases in Reddit

Xavier Ferrer, Tom Van Nuenen, Jose M. Such, Natalia Criado

Keywords Paper

Qualitative and quantitative studies of social media, Social network analysis, communities identification, expertise and authority discovery, Subjectivity in textual data, sentiment analysis, polarity/opinion identification and extraction, linguistic analy

0

0

0

0

8:03

02/02/2021

Humor Knowledge Enriched Transformer for Understanding Multimodal Humor

Md Kamrul Hasan, Sangwu Lee, Wasifur Rahman and
Amir Zadeh, Rada Mihalcea, Louis-Philippe Morency, Ehsan Hoque

Keywords Paper

0

0

0

0

19:02

04/07/2020

R^3: Reverse, Retrieve, and Rank for Sarcasm Generation with Commonsense Knowledge

Tuhin Chakrabarty, Debanjan Ghosh, Smaranda Muresan, Nanyun Peng

Keywords Paper

Sarcasm Generation, unsupervised approach, retrieve-and-edit framework, Human evaluation

0

0

0

0

11:29

05/01/2021

Facial Emotion Recognition With Noisy Multi-Task Annotations

Siwei Zhang, Zhiwu Huang, Danda Pani Paudel, Luc Van Gool

Keywords Paper

0

0

0

0

4:48

08/12/2020

Towards Preemptive Detection of Depression and Anxiety in Twitter

David Owen, Jose Camacho-Collados, Luis Espinosa Anke

Keywords Paper

0

0

0

0

8:15

16/11/2020

UNION: An Unreferenced Metric for Evaluating Open-ended Story Generation

Jian Guan, Minlie Huang

Keywords Paper

open-ended generation, story generation, evaluating generation, constructing samples

0

0

0

0

11:26

14/06/2020

Global-Local GCN: Large-Scale Label Noise Cleansing for Face Recognition

Yaobin Zhang, Weihong Deng, Mei Wang and
Jiani Hu, Xian Li, Dongyue Zhao, Dongchao Wen

Keywords Paper

face recognition, label noise, graph convolutional network, global-local

0

0

0

0

1:00

22/06/2020

Enriching Knowledge Bases with Interesting Negative Statements

Hiba Arnaout, Simon Razniewski, Gerhard Weikum

Keywords Paper

information retrieval, knowledge bases, ranking, negation

0

0

0

0

5:25

05/12/2020

Measuring what counts: The case of rumour stance classification

Carolina Scarton, Diego Silva, Kalina Bontcheva

Keywords Paper

0

0

0

0

9:45

04/07/2020

Sentiment and Emotion help Sarcasm? A Multi-task Learning Framework for Multi-Modal Sarcasm, Sentiment and Emotion Analysis

Dushyant Singh Chauhan, Dhanush S R, Asif Ekbal, Pushpak Bhattacharyya

Keywords Paper

Sentiment Analysis, Sentiment , multi-tasking, sarcasm detection

0

0

0

0

10:56

04/07/2020

GoEmotions: A Dataset of Fine-Grained Emotions

Dorottya Demszky, Dana Movshovitz-Attias, Jeongwoo Ko and
Alan Cowen, Gaurav Nemade, Sujith Ravi

Keywords Paper

transfer learning, GoEmotions, Principal Analysis, BERT-based model

0

0

0

0

11:15

14/09/2020

PS3: Partition-based Skew-Specialized Sampling for Batch Mode Active Learning in Imbalanced Text Data

Ricky Fajri, Samaneh Khoshrou, Robert Peharz, Mykola Pechenizkiy

Keywords Paper

batch-mode active learning, imbalance data, hate-speech recognition

0

0

0

0

15:16

16/11/2020

Comparative Evaluation of Label-Agnostic Selection Bias in Multilingual Hate Speech Datasets

Nedjma Ousidhoum, Yangqiu Song, Dit-Yan Yeung

Keywords Paper

classification, data process, topic models, selection bias

0

0

0

0

12:07

16/11/2020

Multilingual Offensive Language Identification with Cross-lingual Embeddings

Tharindu Ranasinghe, Marcos Zampieri

Keywords Paper

bengali, cross-lingual embeddings, transfer learning, cyberaggression

0

0

0

0

7:00

08/12/2020

Learning Domain Terms - Empirical Methods to Enhance Enterprise Text Analytics Performance

Gargi Roy, Lipika Dey, Mohammad Shakir, Tirthankar Dasgupta

Keywords Paper

0

0

0

0

14:36

25/07/2020

Leveraging transitions of emotions for sarcasm detection

Ameeta Agrawal, Aijun An, Manos Papagelis

Keywords Paper

sarcasm detection, emotion detection

0

0

0

0

7:51