Multilingual Offensive Language Identification with Cross-lingual Embeddings

16/11/2020

Multilingual Offensive Language Identification with Cross-lingual Embeddings

Tharindu Ranasinghe, Marcos Zampieri

Keywords: bengali, cross-lingual embeddings, transfer learning, cyberaggression

Abstract Paper Similar Papers

Abstract: Offensive content is pervasive in social media and a reason for concern to companies and government organizations. Several studies have been recently published investigating methods to detect the various forms of such content (e.g. hate speech, cyberbulling, and cyberaggression). The clear majority of these studies deal with English partially because most annotated datasets available contain English data. In this paper, we take advantage of English data available by applying cross-lingual contextual word embeddings and transfer learning to make predictions in languages with less resources. We project predictions on comparable data in Bengali, Hindi, and Spanish and we report results of 0.8415 F1 macro for Bengali, 0.8568 F1 macro for Hindi, and 0.7513 F1 macro for Spanish. Finally, we show that our approach compares favorably to the best systems submitted to recent shared tasks on these three languages, confirming the robustness of cross-lingual contextual embeddings and transfer learning for this task.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at EMNLP 2020 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

14/09/2020

A Deep Dive into Multilingual Hate Speech Classification

Sai Saketh Aluru, Binny Mathew, Punyajoy Saha, Animesh Mukherjee

Keywords Paper

hate speech, multilingual, classification, bert, embeddings

0

0

0

0

14:20

08/12/2020

A Sentiment-annotated Dataset of English Causal Connectives

Marta Andersson, Murathan Kurfalı, Robert Östling

Keywords Paper

0

0

0

0

15:35

25/07/2020

Think beyond the word: Understanding the implied textual meaning by digesting context, local, and noise

Guoxiu He, Zhe Gao, Zhuoren Jiang and
Yangyang Kang, Changlong Sun, Xiaozhong Liu, Wei Lu

Keywords Paper

deep neural networks, text classification, semantic representation, implied textual meaning

0

0

0

0

19:57

08/12/2020

Team Oulu at SemEval-2020 Task 12: Multilingual Identification of Offensive Language, Type and Target of Twitter Post Using Translated Datasets

Md Saroar Jahan

Keywords Paper

0

0

0

0

10:36

08/12/2020

Words are the Window to the Soul: Language-based User Representations for Fake News Detection

Marco Del Tredici, Raquel Fernández

Keywords Paper

0

0

0

0

14:44

16/11/2020

X-SRL: A Parallel Cross-Lingual Semantic Role Labeling Dataset

Angel Daza, Anette Frank

Keywords Paper

generalization learning, multilingual learning, high-quality translation, srl

0

0

0

0

9:24

19/04/2021

“are you kidding me?”: Detecting unpalatable questions on Reddit

Sunyam Bagga, Andrew Piper, Derek Ruths

Keywords Paper

0

0

0

0

11:46

16/11/2020

X-FACTR: Multilingual Factual Knowledge Retrieval from Pretrained Language Models

Zhengbao Jiang, Antonios Anastasopoulos, Jun Araki and
Haibo Ding, Graham Neubig

Keywords Paper

factual retrieval, language models, lms, probing methods

0

0

0

0

9:45

02/02/2021

Efficient Optimal Selection for Composited Advertising Creatives with Tree Structure

Jin Chen, Tiezheng Ge, Gangwei Jiang and
Zhiqiang Zhang, Defu Lian, Kai Zheng

Keywords Paper

0

0

0

0

16:45

01/07/2020

Sarcasm Identification and Detection in Conversion Context using BERT

Kalaivani A., Thenmozhi D.

Keywords Paper

0

0

0

0

5:17

07/06/2020

Learning Cross-Lingual Word Embeddings from Twitter via Distant Supervision

Jose Camacho-Collados, Yerai Doval Mosquera, Eugenio Martínez-Cámara and
Luis Espinosa-Anke, Francesco Barbieri, Steven Schockaert

Keywords Paper

embedding spaces, embeddings, languages, learning, performance, representations, shared, spaces, texts, twitter, word embeddings, words

0

0

0

0

10:39

16/11/2020

Hate-Speech and Offensive Language Detection in Roman Urdu

Hammad Rizwan, Muhammad Haroon Shakeel, Asim Karim

Keywords Paper

automatic detection, hate-speech detection, language models, transfer learning

0

0

0

0

10:55

19/04/2021

Frequency-guided word substitutions for detecting textual adversarial examples

Maximilian Mozes, Pontus Stenetorp, Bennett Kleinberg, Lewis Griffin

Keywords Paper

0

0

0

0

6:34

14/06/2020

Global-Local GCN: Large-Scale Label Noise Cleansing for Face Recognition

Yaobin Zhang, Weihong Deng, Mei Wang and
Jiani Hu, Xian Li, Dongyue Zhao, Dongchao Wen

Keywords Paper

face recognition, label noise, graph convolutional network, global-local

0

0

0

0

1:00

01/07/2020

Transformer-based Context-aware Sarcasm Detection in Conversation Threads from Social Media

Xiangjue Dong, Changmao Li, Jinho D. Choi

Keywords Paper

0

0

0

0

4:43

14/06/2020

On Vocabulary Reliance in Scene Text Recognition

Zhaoyi Wan, Jielei Zhang, Liang Zhang and
Jiebo Luo, Cong Yao

Keywords Paper

scene text recognition, text spotting, document analysis, ocr, scene text detection, sequence recognition, language and vision

0

0

0

0

1:00

14/06/2020

IMRAM: Iterative Matching With Recurrent Attention Memory for Cross-Modal Image-Text Retrieval

Hui Chen, Guiguang Ding, Xudong Liu and
Zijia Lin, Ji Liu, Jungong Han

Keywords Paper

cross-modal image text retrieval, iterative matching, recurrent attention memory

0

0

0

0

1:04

08/12/2020

Federated Learning for Spoken Language Understanding

Zhiqi Huang, Fenglin Liu, Yuexian Zou

Keywords Paper

0

0

0

0

14:05

02/02/2021

Simple is not Easy: A Simple Strong Baseline for TextVQA and TextCaps

Qi Zhu, Chenyu Gao, Peng Wang, Qi Wu

Keywords Paper

0

0

0

0

15:58

05/01/2021

Facial Emotion Recognition With Noisy Multi-Task Annotations

Siwei Zhang, Zhiwu Huang, Danda Pani Paudel, Luc Van Gool

Keywords Paper

0

0

0

0

4:48

04/07/2020

GLUECoS: An Evaluation Benchmark for Code-Switched NLP

Simran Khanuja, Sandipan Dandapat, Anirudh Srinivasan and
Sunayana Sitaram, Monojit Choudhury

Keywords Paper

Code-Switched NLP, cross-lingual tasks, NLP tasks, Language Identification

0

0

0

0

12:08

02/02/2021

Commonsense Knowledge Augmentation for Low-Resource Languages via Adversarial Learning

Bosung Kim, Juae Kim, Youngjoong Ko, Jungyun Seo

Keywords Paper

0

0

0

0

19:38

04/07/2020

Evaluating and Enhancing the Robustness of Neural Network-based Dependency Parsing Models with Adversarial Examples

Xiaoqing Zheng, Jiehang Zeng, Yi Zhou and
Cho-Jui Hsieh, Minhao Cheng, Xuanjing Huang

Keywords Paper

semantic tasks, sentiment analysis, question answering, reading comprehension

0

0

0

0

11:57

19/08/2021

Hierarchical Modeling of Label Dependency and Label Noise in Fine-grained Entity Typing

Junshuang Wu, Richong Zhang, Yongyi Mao and
Masoumeh Soflaei Shahrbabak, Jinpeng Huai

Keywords Paper

Natural Language Processing, Information Extraction, Named Entities, NLP Applications and Tools

0

0

0

0

13:58

04/07/2020

Cross-Linguistic Syntactic Evaluation of Word Prediction Models

Aaron Mueller, Garrett Nicolai, Panayiota Petrou-Zeniou and
Natalia Talmina, Tal Linzen

Keywords Paper

Cross-Linguistic Syntax, Syntax, Cross-Linguistic Models, neural models

0

0

0

0

10:48

04/07/2020

Joint Modelling of Emotion and Abusive Language Detection

Santhosh Rajamanickam, Pushkar Mishra, Helen Yannakoudakis, Ekaterina Shutova

Keywords Paper

Joint Detection, abuse detection, abusive detection, multi-task framework

0

0

0

0

11:16

04/07/2020

Improving Truthfulness of Headline Generation

Kazuki Matsumaru, Sho Takase, Naoaki Okazaki

Keywords Paper

Truthfulness Generation, abstractive summarization, headline generation, automatic headlines

0

0

0

0

11:21

08/12/2020

Learning to Decouple Relations: Few-Shot Relation Classification with Entity-Guided Attention and Confusion-Aware Training

Yingyao Wang, Junwei Bao, Guangyi Liu and
Youzheng Wu, Xiaodong He, Bowen Zhou, Tiejun Zhao

Keywords Paper

0

0

0

0

10:55

19/10/2020

Analysis of multivariate scoring functions for automatic unbiased learning to rank

Tao Yang, Shikai Fang, Shibo Li and
Yulan Wang, Qingyao Ai

Keywords Paper

multivariate scoring function, unbiased learning to rank

0

0

0

0

6:39

04/07/2020

Learning Robust Models for e-Commerce Product Search

Thanh Nguyen, Nikhil Rao, Karthik Subbian

Keywords Paper

e-Commerce Search, Mitigating problem, ranking algorithms, deep model

0

0

0

0

7:34

19/04/2021

From toxicity in online comments to incivility in American news: Proceed with caution

Anushree Hede, Oshin Agarwal, Linda Lu and
Diana C. Mutz, Ani Nenkova

Keywords Paper

0

0

0

0

10:10

16/11/2020

CrowS-Pairs: A Challenge Dataset for Measuring Social Biases in Masked Language Models

Nikita Nangia, Clara Vania, Rasika Bhalerao, Samuel R. Bowman

Keywords Paper

nlp tasks, pretrained models, masked models, mlms

0

0

0

0

10:56

16/11/2020

On Negative Interference in Multilingual Models: Findings and A Meta-Learning Treatment

Zirui Wang, Zachary C. Lipton, Yulia Tsvetkov

Keywords Paper

multilingual models, meta-learning algorithm, multilingual representations, negative interference

0

0

0

0

12:03

22/06/2020

Cross-context News Corpus for Protest Events related Knowledge Base Construction

Ali Hürriyetoğlu, Erdem Yörük, Deniz Yüret and
Osman Mutlu, Çağrı Yoltar, Fırat Duruşan, Burak Gürel

Keywords Paper

protests, contentious politics, news, text classification, event extraction, social sciences, political sciences, computational social science

0

0

0

0

4:45

08/12/2020

SpanAlign: Sentence Alignment Method based on Cross-Language Span Prediction and ILP

Katsuki Chousa, Masaaki Nagata, Masaaki Nishino

Keywords Paper

0

0

0

0

14:39

16/11/2020

T3: Tree-Autoencoder Constrained Adversarial Text Generation for Targeted Attack

Boxin Wang, Hengzhi Pei, Boyuan Pan and
Qian Chen, Shuohang Wang, Bo Li

Keywords Paper

adversarial generation, nlp tasks, sentiment analysis, qa

0

0

0

0

11:59

08/12/2020

Hate Speech Detection in Saudi Twittersphere: A Deep Learning Approach

Raghad Alshaalan, Hend Al-Khalifa

Keywords Paper

0

0

0

0

14:02

14/06/2020

Multimodal Categorization of Crisis Events in Social Media

Mahdi Abavisani, Liwei Wu, Shengli Hu and
Joel Tetreault, Alejandro Jaimes

Keywords Paper

multimodal learning, multimodal categorization, cross-attention, stochastic shared embedding, event detection, social media, image-text fusion, ai for social goods, language and vision, emergency response

0

0

0

0

1:01

08/12/2020

Text Classification by Contrastive Learning and Cross-lingual Data Augmentation for Alzheimer’s Disease Detection

Zhiqiang Guo, Zhaoci Liu, Zhenhua Ling and
Shijin Wang, Lingjing Jin, Yunxia Li

Keywords Paper

0

0

0

0

13:12

04/07/2020

Unsupervised Cross-lingual Representation Learning at Scale

Alexis Conneau, Kartikay Khandelwal, Naman Goyal and
Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke Zettlemoyer, Veselin Stoyanov

Keywords Paper

cross-lingual tasks, XNLI, MLQA, NER

0

0

0

0

12:15