Hate-Speech and Offensive Language Detection in Roman Urdu

16/11/2020

Hate-Speech and Offensive Language Detection in Roman Urdu

Hammad Rizwan, Muhammad Haroon Shakeel, Asim Karim

Keywords: automatic detection, hate-speech detection, language models, transfer learning

Abstract Paper Similar Papers

Abstract: The task of automatic hate-speech and offensive language detection in social media content is of utmost importance due to its implications in unprejudiced society concerning race, gender, or religion. Existing research in this area, however, is mainly focused on the English language, limiting the applicability to particular demographics. Despite its prevalence, Roman Urdu (RU) lacks language resources, annotated datasets, and language models for this task. In this study, we: (1) Present a lexicon of hateful words in RU, (2) Develop an annotated dataset called RUHSOLD consisting of 10,012 tweets in RU with both coarse-grained and fine-grained labels of hate-speech and offensive language, (3) Explore the feasibility of transfer learning of five existing embedding models to RU, (4) Propose a novel deep learning architecture called CNN-gram for hate-speech and offensive language detection and compare its performance with seven current baseline approaches on RUHSOLD dataset, and (5) Train domain-specific embeddings on more than 4.7 million tweets and make them publicly available. We conclude that transfer learning is more beneficial as compared to training embedding from scratch and that the proposed model exhibits greater robustness as compared to the baselines.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at EMNLP 2020 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

14/09/2020

A Deep Dive into Multilingual Hate Speech Classification

Sai Saketh Aluru, Binny Mathew, Punyajoy Saha, Animesh Mukherjee

Keywords Paper

hate speech, multilingual, classification, bert, embeddings

0

0

0

0

14:20

08/12/2020

Hate Speech Detection in Saudi Twittersphere: A Deep Learning Approach

Raghad Alshaalan, Hend Al-Khalifa

Keywords Paper

0

0

0

0

14:02

07/06/2021

Discovering and Categorising Language Biases in Reddit

Xavier Ferrer, Tom Van Nuenen, Jose M. Such, Natalia Criado

Keywords Paper

Qualitative and quantitative studies of social media, Social network analysis, communities identification, expertise and authority discovery, Subjectivity in textual data, sentiment analysis, polarity/opinion identification and extraction, linguistic analy

0

0

0

0

8:03

25/07/2020

Think beyond the word: Understanding the implied textual meaning by digesting context, local, and noise

Guoxiu He, Zhe Gao, Zhuoren Jiang and
Yangyang Kang, Changlong Sun, Xiaozhong Liu, Wei Lu

Keywords Paper

deep neural networks, text classification, semantic representation, implied textual meaning

0

0

0

0

19:57

02/02/2021

Bigram and Unigram Based Text Attack via Adaptive Monotonic Heuristic Search

Xinghao Yang, Weifeng Liu, James Bailey and
Dacheng Tao, Wei Liu

Keywords Paper

0

0

0

0

17:17

04/07/2020

Cross-modal Language Generation using Pivot Stabilization for Web-scale Language Coverage

Ashish V. Thapliyal, Radu Soricut

Keywords Paper

Cross-modal Generation, Web-scale Coverage, Cross-modal tasks, Pivot Stabilization

0

0

0

0

11:43

02/02/2021

Non-Autoregressive Coarse-to-Fine Video Captioning

Bang Yang, Yuexian Zou, Fenglin Liu, Can Zhang

Keywords Paper

0

0

0

0

18:21

04/07/2020

Cross-Lingual Unsupervised Sentiment Classification with Multi-View Transfer Learning

Hongliang Fei, Ping Li

Keywords Paper

Cross-Lingual Classification, sentiment classification, unsupervised system, classification

0

0

0

0

12:23

04/07/2020

LINSPECTOR: Multilingual Probing Tasks for Word Representations

Gözde Gül Sahin, Clara Vania, Ilia Kuznetsov, Iryna Gurevych

Keywords Paper

Word Representations, NLP, classification tasks, probing tasks

0

0

0

0

11:51

04/07/2020

Word-level Textual Adversarial Attacking as Combinatorial Optimization

Yuan Zang, Fanchao Qi, Chenghao Yang and
Zhiyuan Liu, Meng Zhang, Qun Liu, Maosong Sun

Keywords Paper

Textual attacking, Word-level attacking, combinatorial problem, Word-level Attacking

0

0

0

0

9:34

02/02/2021

LightXML: Transformer with Dynamic Negative Sampling for High-Performance Extreme Multi-label Text Classification

Ting Jiang, Deqing Wang, Leilei Sun and
Huayi Yang, Zhengyang Zhao, Fuzhen Zhuang

Keywords Paper

0

0

0

0

16:28

14/09/2020

PS3: Partition-based Skew-Specialized Sampling for Batch Mode Active Learning in Imbalanced Text Data

Ricky Fajri, Samaneh Khoshrou, Robert Peharz, Mykola Pechenizkiy

Keywords Paper

batch-mode active learning, imbalance data, hate-speech recognition

0

0

0

0

15:16

14/06/2020

Object Relational Graph With Teacher-Recommended Learning for Video Captioning

Ziqi Zhang, Yaya Shi, Chunfeng Yuan and
Bing Li, Peijin Wang, Weiming Hu, Zheng-Jun Zha

Keywords Paper

vison and language, video captioning, seq2seq learning, object relational graph, teacher-recommended learning, gcn, visual relational reasoning, external language model, knowledge distillation, long-tailed problem

0

0

0

0

1:05

06/12/2020

Counterfactual Contrastive Learning for Weakly-Supervised Vision-Language Grounding

Zhu Zhang, Zhou Zhao, Zhijie Lin and
jieming zhu, Xiuqiang He

Keywords Paper

0

0

0

0

3:14

02/02/2021

Abusive Language Detection in Heterogeneous Contexts: Dataset Collection and the Role of Supervised Attention

Hongyu Gong, Alberto Valido, Katherine M. Ingram and
Giulia Fanti, Suma Bhat, Dorothy L. Espelage

Keywords Paper

0

0

0

0

15:07

19/04/2021

“are you kidding me?”: Detecting unpalatable questions on Reddit

Sunyam Bagga, Andrew Piper, Derek Ruths

Keywords Paper

0

0

0

0

11:46

14/06/2020

Suppressing Uncertainties for Large-Scale Facial Expression Recognition

Kai Wang, Xiaojiang Peng, Jianfei Yang and
Shijian Lu, Yu Qiao

Keywords Paper

emotion recognition, self-cure network, uncertainties

0

0

0

0

1:01

22/11/2021

Text-Based Person Search with Limited Data

Xiao Han, Sen He, Li Zhang, Tao Xiang

Keywords Paper

person re-identification, cross-modal image retrieval, fine-grained image retrieval, text-based person search

0

0

0

0

3:04

05/01/2021

Multimodal Prototypical Networks for Few-Shot Learning

Frederik Pahde, Mihai Puscas, Tassilo Klein, Moin Nabi

Keywords Paper

0

0

0

0

4:56

19/04/2021

Machine translationese: Effects of algorithmic bias on linguistic complexity in machine translation

Eva Vanmassenhove, Dimitar Shterionov, Matthew Gwilliam

Keywords Paper

0

0

0

0

11:19

08/12/2020

Detecting Urgency Status of Crisis Tweets: A Transfer Learning Approach for Low Resource Languages

Efsun Sarioglu Kayi, Linyong Nan, Bohan Qu and
Mona Diab, Kathleen McKeown

Keywords Paper

0

0

0

0

14:37

19/08/2021

Deep Unified Cross-Modality Hashing by Pairwise Data Alignment

Yimu Wang, Bo Xue, Quan Cheng and
Yuhui Chen, Lijun Zhang

Keywords Paper

Computer Vision, Recognition, Information Retrieval, Deep Learning

0

0

0

0

13:11

03/05/2021

Towards Robustness Against Natural Language Word Substitutions

Xinshuai Dong, Anh Tuan Luu, Rongrong Ji, Hong Liu

Keywords Paper

Adversarial Defense, Natural Language Processing

0

0

0

0

6:06

14/06/2020

Visual Grounding in Video for Unsupervised Word Translation

Gunnar A. Sigurdsson, Jean-Baptiste Alayrac, Aida Nematzadeh and
Lucas Smaira, Mateusz Malinowski, João Carreira, Phil Blunsom, Andrew Zisserman

Keywords Paper

video, translation, multimodal learning, unsupervised learning, unsupervised translation, youtube, howto100m, multilingual, language, deep learning

0

0

0

0

1:01

02/02/2021

HateXplain: A Benchmark Dataset for Explainable Hate Speech Detection

Binny Mathew, Punyajoy Saha, Seid Muhie Yimam and
Chris Biemann, Pawan Goyal, Animesh Mukherjee

Keywords Paper

0

0

0

0

18:43

14/06/2020

Deep Spatial Gradient and Temporal Depth Learning for Face Anti-Spoofing

Zezheng Wang, Zitong Yu, Chenxu Zhao and
Xiangyu Zhu, Yunxiao Qin, Qiusheng Zhou, Feng Zhou, Zhen Lei

Keywords Paper

face anti-spoofing, depth supervised learning, multiple frames, detailed discriminative clues, 3d moving faces

0

0

0

0

4:57

14/06/2020

Attention-Guided Hierarchical Structure Aggregation for Image Matting

Yu Qiao, Yuhao Liu, Xin Yang and
Dongsheng Zhou, Mingliang Xu, Qiang Zhang, Xiaopeng Wei

Keywords Paper

image matting, attention, hierarchical, aggregation, appearance cues

0

0

0

0

0:59

02/02/2021

Train a One-Million-Way Instance Classifier for Unsupervised Visual Representation Learning

Yu Liu, Lianghua Huang, Pan Pan and
Bin Wang, Yinghui Xu, Rong Jin

Keywords Paper

0

0

0

0

15:15

08/12/2020

Is it Great or Terrible? Preserving Sentiment in Neural Machine Translation of Arabic Reviews

Hadeel Saadany, Constantin Orasan

Keywords Paper

0

0

0

0

14:35

08/12/2020

ContraCAT: Contrastive Coreference Analytical Templates for Machine Translation

Dario Stojanovski, Benno Krojer, Denis Peskov, Alexander Fraser

Keywords Paper

0

0

0

0

14:09

19/04/2021

Exploiting emojis for abusive language detection

Michael Wiegand, Josef Ruppenhofer

Keywords Paper

0

0

0

0

11:18

16/11/2020

Iterative Domain-Repaired Back-Translation

Hao-Ran Wei, Zhirui Zhang, Boxing Chen, Weihua Luo

Keywords Paper

domain-specific translation, domain adaptation, back-translation method, out-of-domain systems

0

0

0

0

11:35

02/02/2021

FILTER: An Enhanced Fusion Method for Cross-lingual Language Understanding

Yuwei Fang, Shuohang Wang, Zhe Gan and
Siqi Sun, Jingjing Liu

Keywords Paper

0

0

0

0

17:39

08/12/2020

Team Oulu at SemEval-2020 Task 12: Multilingual Identification of Offensive Language, Type and Target of Twitter Post Using Translated Datasets

Md Saroar Jahan

Keywords Paper

0

0

0

0

10:36

02/02/2021

Deep Semantic Dictionary Learning for Multi-label Image Classification

Fengtao Zhou, Sheng Huang, Yun Xing

Keywords Paper

0

0

0

0

15:06

04/07/2020

Max-Margin Incremental CCG Parsing

Miloš Stanojević, Mark Steedman

Keywords Paper

Incremental parsing, human processing, ASR, MT

0

0

0

0

11:39

14/06/2020

Counterfactual Samples Synthesizing for Robust Visual Question Answering

Long Chen, Xin Yan, Jun Xiao and
Hanwang Zhang, Shiliang Pu, Yueting Zhuang

Keywords Paper

visual question answering, counterfactual, debias, language bias, data augmentation, visual-and-language

0

0

0

0

1:01

18/07/2021

Self-Damaging Contrastive Learning

Ziyu Jiang, Tianlong Chen, Bobak Mortazavi, Zhangyang Wang

Keywords Paper

Algorithms, Unsupervised Learning

0

0

0

1

5:10

16/11/2020

Multilingual Offensive Language Identification with Cross-lingual Embeddings

Tharindu Ranasinghe, Marcos Zampieri

Keywords Paper

bengali, cross-lingual embeddings, transfer learning, cyberaggression

0

0

0

0

7:00

16/11/2020

BERT-ATTACK: Adversarial Attack Against BERT Using BERT

Linyang Li, Ruotian Ma, Qipeng Guo and
Xiangyang Xue, Xipeng Qiu

Keywords Paper

adversarial attacks, downstream tasks, calculation, gradient-based methods

0

0

0

0

11:36