Civil rephrases of toxic texts with self-supervised transformers

19/04/2021

Civil rephrases of toxic texts with self-supervised transformers

Léo Laugier, John Pavlopoulos, Jeffrey Sorensen, Lucas Dixon

Keywords:

Abstract Paper Similar Papers

Abstract: Platforms that support online commentary, from social networks to news sites, are increasingly leveraging machine learning to assist their moderation efforts. But this process does not typically provide feedback to the author that would help them contribute according to the community guidelines. This is prohibitively time-consuming for human moderators to do, and computational approaches are still nascent. This work focuses on models that can help suggest rephrasings of toxic comments in a more civil manner. Inspired by recent progress in unpaired sequence-to-sequence tasks, a self-supervised learning model is introduced, called CAE-T5. CAE-T5 employs a pre-trained text-to-text transformer, which is fine tuned with a denoising and cyclic auto-encoder loss. Experimenting with the largest toxicity detection dataset to date (Civil Comments) our model generates sentences that are more fluent and better at preserving the initial content compared to earlier text style transfer systems which we compare with using several scoring systems and human evaluation.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at EACL 2021 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

04/07/2020

Unsupervised Opinion Summarization with Noising and Denoising

Reinald Kim Amplayo, Mirella Lapata

Keywords Paper

Unsupervised Summarization, supervised models, abstractive summarization, Noising

0

0

0

0

12:16

16/11/2020

An Imitation Game for Learning Semantic Parsers from User Interaction

Ziyu Yao, Yiqi Tang, Wen-tau Yih and
Huan Sun, Yu Su

Keywords Paper

bootstrapping, fine-tuning parsers, theoretical analysis, text-to-sql problem

0

0

0

0

11:49

19/04/2021

Keep learning: Self-supervised meta-learning for learning from inference

Akhil Kedia, Sai Chetan Chinthakindi

Keywords Paper

0

0

0

0

11:27

19/10/2020

AutoADR: Automatic model design for ad relevance

Yiren Chen, Yaming Yang, Hong Sun and
Yujing Wang, Yu Xu, Wei Shen, Rong Zhou, Yunhai Tong, Jing Bai, Ruofei Zhang

Keywords Paper

neural architecture search, knowledge distillation, ad relevance

0

0

0

0

9:24

18/07/2021

Self-supervised and Supervised Joint Training for Resource-rich Machine Translation

Yong Cheng, Wei Wang, Lu Jiang, Wolfgang Macherey

Keywords Paper

Applications, Natural Language Processing

0

0

0

0

5:21

04/07/2020

uBLEU: Uncertainty-Aware Automatic Evaluation Method for Open-Domain Dialogue Systems

Tsuta Yuma, Naoki Yoshinaga, Masashi Toyoda

Keywords Paper

Open-Domain Systems, uBLEU, Uncertainty-Aware Method, ΔBLEU

0

0

0

0

11:07

04/07/2020

Data Manipulation: Towards Effective Instance Learning for Neural Dialogue Generation via Learning to Augment and Reweight

Hengyi Cai, Hongshen Chen, Yonghao Song and
Cheng Zhang, Xiaofang Zhao, Dawei Yin

Keywords Paper

Data Manipulation, Neural Generation, learning, dialogue generation

0

0

0

1

9:39

08/12/2020

Attentively Embracing Noise for Robust Latent Representation in BERT

Gwenaelle Cunha Sergio, Dennis Singh Moirangthem, Minho Lee

Keywords Paper

0

0

0

0

12:55

16/11/2020

POINTER: Constrained Progressive Text Generation via Insertion-based Generative Pre-training

Yizhe Zhang, Guoyin Wang, Chunyuan Li and
Zhe Gan, Chris Brockett, Bill Dolan

Keywords Paper

language learning, free-form generation, hard-constrained generation, hard-constrained tasks

0

0

0

0

10:09

04/07/2020

Beyond User Self-Reported Likert Scale Ratings: A Comparison Model for Automatic Dialog Evaluation

Weixin Liang, James Zou, Zhou Yu

Keywords Paper

Automatic Evaluation, Open evaluation, dialog research, dialog evaluation

0

0

0

0

11:24

16/11/2020

PALM: Pre-training an Autoencoding&Autoregressive Language Model for Context-conditioned Generation

Bin Bi, Chenliang Li, Chen Wu and
Ming Yan, Wei Wang, Songfang Huang, Fei Huang, Luo Si

Keywords Paper

natural generation, language tasks, generative answering, conversational generation

0

0

0

0

11:02

04/07/2020

Unsupervised Opinion Summarization as Copycat-Review Generation

Arthur Bražinskas, Mirella Lapata, Ivan Titov

Keywords Paper

Unsupervised Summarization, Copycat-Review Generation, Opinion summarization, automatically summaries

0

0

0

0

10:55

06/12/2021

Controlled Text Generation as Continuous Optimization with Multiple Constraints

Sachin Kumar, Eric Malmi, Aliaksei Severyn, Yulia Tsvetkov

Keywords Paper

optimization

0

0

0

0

14:02

16/11/2020

Self-Supervised Meta-Learning for Few-Shot Natural Language Classification Tasks

Trapit Bansal, Rishikesh Jha, Tsendsuren Munkhdalai, Andrew McCallum

Keywords Paper

nlp applications, fine-tuning, meta-learning problem, supervised tasks

0

0

0

0

11:49

05/12/2020

Touch editing: A flexible one-time interaction approach for translation

Qian Wang, Jiajun Zhang, Lemao Liu and
Guoping Huang, Chengqing Zong

Keywords Paper

0

0

0

0

12:23

16/11/2020

Understanding the Mechanics of SPIGOT: Surrogate Gradients for Latent Structure Learning

Tsvetomila Mihaylova, Vlad Niculae, André F. T. Martins

Keywords Paper

pipeline systems, ste, latent models, end-to-end training

0

0

0

0

11:50

01/07/2020

A Metric Learning Approach to Misogyny Categorization

Juan Manuel Coria, Sahar Ghannay, Sophie Rosset, Hervé Bredin

Keywords Paper

0

0

0

0

4:45

04/07/2020

From Zero to Hero: Human-In-The-Loop Entity Linking in Low Resource Domains

Jan-Christoph Klie, Richard Eckart de Castilho, Iryna Gurevych

Keywords Paper

Human-In-The-Loop Linking, Entity linking, disambiguating mentions, annotation process

0

0

0

0

12:26

07/06/2020

Empirical Analysis of Multi-Task Learning for Reducing Identity Bias in Toxic Comment Detection

Ameya Vaidya, Feng Mai, Yue Ning

Keywords Paper

attention, bias, deep learning, detection, groups, identities, learning, sources, toxic, toxicity

0

0

0

0

9:59

26/04/2020

Plug and Play Language Models: A Simple Approach to Controlled Text Generation

Sumanth Dathathri, Andrea Madotto, Janice Lan and
Jane Hung, Eric Frank, Piero Molino, Jason Yosinski, Rosanne Liu

Keywords Paper

controlled text generation, generative models, conditional generative models, language modeling, transformer

0

0

1

1

4:58

16/11/2020

Q-learning with Language Model for Edit-based Unsupervised Summarization

Ryosuke Kohita, Akifumi Wachi, Yang Zhao, Ryuki Tachibana

Keywords Paper

abstractive textsummarization, unsupervised summarization, unsupervised summarizers, unsupervised methods

0

0

0

0

12:32

16/11/2020

Iterative Feature Mining for Constraint-Based Data Collection to Increase Data Diversity and Model Robustness

Stefan Larson, Anthony Zheng, Anish Mahendran and
Rishi Tekriwal, Adrian Cheung, Eric Guldan, Kevin Leach, Jonathan K. Kummerfeld

Keywords Paper

dialog tasks, intent classification, slot-filling, robust models

0

0

0

0

6:52

12/07/2020

Educating Text Autoencoders: Latent Representation Guidance via Denoising

Tianxiao Shen, Jonas Mueller, Regina Barzilay, Tommi Jaakkola

Keywords Paper

Deep Learning - Generative Models and Autoencoders

0

0

0

0

17:06

18/07/2021

Self-Damaging Contrastive Learning

Ziyu Jiang, Tianlong Chen, Bobak Mortazavi, Zhangyang Wang

Keywords Paper

Algorithms, Unsupervised Learning

0

0

0

1

5:10

19/08/2021

A Structure Self-Aware Model for Discourse Parsing on Multi-Party Dialogues

Ante Wang, Linfeng Song, Hui Jiang and
Shaopeng Lai, Junfeng Yao, Min Zhang, Jinsong Su

Keywords Paper

Natural Language Processing, Dialogue, Discourse, Tagging, Chunking, and Parsing

0

0

0

0

8:33

12/07/2020

Countering Language Drift with Seeded Iterated Learning

Yuchen Lu, Soumye Singhal, Florian Strub and
Aaron Courville, Olivier Pietquin

Keywords Paper

Deep Learning - Algorithms

0

0

0

0

14:25

16/11/2020

Supervised Seeded Iterated Learning for Interactive Language Learning

Yuchen Lu, Soumye Singhal, Florian Strub and
Olivier Pietquin, Aaron Courville

Keywords Paper

language drift, language-drift game, language models, word-based agents

0

0

0

0

6:56

04/07/2020

Joint Modelling of Emotion and Abusive Language Detection

Santhosh Rajamanickam, Pushkar Mishra, Helen Yannakoudakis, Ekaterina Shutova

Keywords Paper

Joint Detection, abuse detection, abusive detection, multi-task framework

0

0

0

0

11:16

16/11/2020

Few-Shot Learning for Opinion Summarization

Arthur Bražinskas, Mirella Lapata, Ivan Titov

Keywords Paper

opinion summarization, automatic text, summary production, summarization mode

0

0

0

0

11:48

19/04/2021

The Gutenberg dialogue dataset

Richard Csaky, Gábor Recski

Keywords Paper

0

0

0

0

10:14

18/07/2021

Message Passing Adaptive Resonance Theory for Online Active Semi-supervised Learning

Taehyeong Kim, Injune Hwang, Hyundo Lee and
Hyunseo Kim, Won-Seok Choi, Joseph Lim, Byoung-Tak Zhang

Keywords Paper

Algorithms, Active Learning

0

0

0

0

4:53

08/12/2020

An Empirical Study on Multi-Task Learning for Text Style Transfer and Paraphrase Generation

Pawel Bujnowski, Kseniia Ryzhova, Hyungtak Choi and
Katarzyna Witkowska, Jaroslaw Piersa, Tymoteusz Krumholc, Katarzyna Beksa

Keywords Paper

0

0

0

0

14:33

04/07/2020

Diversifying Dialogue Generation with Non-Conversational Text

Hui Su, Xiaoyu Shen, Sanqiang Zhao and
Zhou Xiao, Pengwei Hu, Randy Zhong, Cheng Niu, Jie Zhou

Keywords Paper

Diversifying Generation, low-diversity problem, open-domain generation, dialogue generation

0

0

0

1

10:53

05/12/2020

DAPPER: Learning domain-adapted persona representation using pretrained BERT and external memory

Prashanth Vijayaraghavan, Eric Chu, Deb Roy

Keywords Paper

0

0

0

0

14:48

14/06/2020

ScrabbleGAN: Semi-Supervised Varying Length Handwritten Text Generation

Sharon Fogel, Hadar Averbuch-Elor, Sarel Cohen and
Shai Mazor, Roee Litman

Keywords Paper

gan, semi-supervised, domain-adaptation, handwriting, generative, unlabeled, transfer learning, ocr, text, augmentation

0

0

0

0

1:01

01/07/2020

Simple Compounded-Label Training for Fact Extraction and Verification

Yixin Nie, Lisa Bauer, Mohit Bansal

Keywords Paper

0

0

0

0

9:59

16/11/2020

Dialogue Response Ranking Training with Large-Scale Human Feedback Data

Xiang Gao, Yizhe Zhang, Michel Galley and
Chris Brockett, Bill Dolan

Keywords Paper

feedback prediction, ranking problem, predicting feedback, open-domain models

0

0

0

0

11:57

22/09/2020

Deep bayesian bandits: Exploring in online personalized recommendations

Dalin Guo, Sofia Ira Ktena, Pranay Kumar Myana and
Ferenc Huszar, Wenzhe Shi, Alykhan Tejani, Michael Kneier, Sourav Das

Keywords Paper

Contextual bandit, Recommender Systems, Algorithmic bias

0

0

0

0

2:59

04/07/2020

Designing Precise and Robust Dialogue Response Evaluators

Tianyu Zhao, Divesh Lala, Tatsuya Kawahara

Keywords Paper

human evaluation, Precise Evaluators, Automatic evaluator, reference-free evaluator

0

0

0

0

7:04

06/12/2021

Refining Language Models with Compositional Explanations

Huihan Yao, Ying Chen, Qinyuan Ye and
Xisen Jin, Xiang Ren

Keywords Paper

machine learning, fairness, language

0

0

0

0

13:17