Fine-tuning Neural Machine Translation on Gender-Balanced Datasets

Abstract: Misrepresentation of certain communities in datasets is causing big disruptions in artificial intelligence applications. In this paper, we propose using an automatically extracted gender-balanced dataset parallel corpus from Wikipedia. This balanced set is used to perform fine-tuning techniques from a bigger model trained on unbalanced datasets to mitigate gender biases in neural machine translation.

Fine-tuning Neural Machine Translation on Gender-Balanced Datasets

Marta R. Costa-jussà, Adrià de Jorge

Comments

Similar Papers

Multi-Dimensional Gender Bias Classification

Emily Dinan, Angela Fan, Ledell Wu and Jason Weston, Douwe Kiela, Adina Williams

Keywords Abstract Paper

detecting bias, machine models, nlp models, fine-grained framework

A Generative Approach to Titling and Clustering Wikipedia Sections

Anjalie Field, Sascha Rothe, Simon Baumgartner and Cong Yu, Abe Ittycheriah

Keywords Abstract Paper

WikiHist.html: English Wikipedia’s Full Revision History in HTML Format

Blagoj Mitrevski, Tiziano Piccardi, Robert West

Keywords Abstract Paper

languages, links, rest

Photon: A Robust Cross-Domain Text-to-SQL System

Jichuan Zeng, Xi Victoria Lin, Steven C.H. Hoi and Richard Socher, Caiming Xiong, Michael Lyu, Irwin King

Keywords Abstract Paper

natural communication, programming, Photon, Robust System

HopRetriever: Retrieve Hops over Wikipedia to Answer Complex Questions

Shaobo Li, Xiaoguang Li, Lifeng Shang and Xin Jiang, Qun Liu, Chengjie Sun, Zhenzhou Ji, Bingquan Liu

Keywords Abstract Paper

Keeping Community in the Loop: Understanding Wikipedia Stakeholder Values for Machine Learning-Based Systems

C. Estelle Smith, Bowen Yu, Anjali Srivastava and Aaron Halfaker, Loren Terveen, Haiyi Zhu

Keywords Abstract Paper

wikipedia, peer production, value sensitive algorithm design, machine learning, ores, community values

Machine translationese: Effects of algorithmic bias on linguistic complexity in machine translation

Eva Vanmassenhove, Dimitar Shterionov, Matthew Gwilliam

Keywords Abstract Paper

Design Challenges in Low-resource Cross-lingual Entity Linking

Xingyu Fu, Weijia Shi, Xiaodong Yu and Zian Zhao, Dan Roth

Keywords Abstract Paper

cross-lingual linking, cross-lingual, xel, grounding entities

Diversity-Based Generalization for Unsupervised Text Classification under Domain Shift

Jitin Krishnan, Hemant Purohit, Huzefa Rangwala

Keywords Abstract Paper

text classification, unsupervised domain adaptation, natural language processing, neural networks

Entity Extraction from Wikipedia List Pages

Nicolas Heist, Heiko Paulheim

Keywords Abstract Paper

Discovering and Categorising Language Biases in Reddit

Xavier Ferrer, Tom Van Nuenen, Jose M. Such, Natalia Criado

Keywords Abstract Paper

Qualitative and quantitative studies of social media, Social network analysis, communities identification, expertise and authority discovery, Subjectivity in textual data, sentiment analysis, polarity/opinion identification and extraction, linguistic analy

Semi-Supervised Topic Modeling for Gender Bias Discovery in English and Swedish

Hannah Devinney, Jenny Björklund, Henrik Björklund

Keywords Abstract Paper

ALaSca: an Automated approach for Large-Scale Lexical Substitution

Caterina Lacerra, Tommaso Pasini, Rocco Tripodi, Roberto Navigli

Keywords Abstract Paper

Natural Language Processing, Natural Language Semantics, Resources and Evaluation

Paraphrase Generation by Learning How to Edit from Samples

Amirhossein Kazemnejad, Mohammadreza Salehi, Mahdieh Soleymani Baghshah

Keywords Abstract Paper

Paraphrase Generation, Neural sequence, sequence generation, retrieval-based method

Neural relation extraction on wikipedia tables for augmenting knowledge graphs

Erin Macdonald, Denilson Barbosa

Keywords Abstract Paper

information extraction, benchmarking, web tables

ToTTo: A Controlled Table-To-Text Generation Dataset

Ankur Parikh, Xuezhi Wang, Sebastian Gehrmann and Manaal Faruqui, Bhuwan Dhingra, Diyi Yang, Dipanjan Das

Keywords Abstract Paper

controlled task, high-precision generation, totto, dataset process

Double-Hard Debias: Tailoring Word Embeddings for Gender Bias Mitigation

Tianlu Wang, Xi Victoria Lin, Nazneen Fatema Rajani and Bryan McCann, Vicente Ordonez, Caiming Xiong

Keywords Abstract Paper

Tailoring Embeddings, Gender Mitigation, Double-Hard Debias, downstream models

It's Easier to Translate out of English than into it: Measuring Neural Translation Difficulty by Cross-Mutual Information

Emanuele Bugliarello, Sabrina J. Mielke, Antonios Anastasopoulos and Ryan Cotterell, Naoaki Okazaki

Keywords Abstract Paper

Measuring Difficulty, generation, asymmetric difficulty, machine difficulty

Multilingual Contextual Affective Analysis of LGBT People Portrayals in Wikipedia

Chan Young Park, Xinru Yan, Anjalie Field, Yulia Tsvetkov

Keywords Abstract Paper

Social network analysis, communities identification, expertise and authority discovery, Subjectivity in textual data, sentiment analysis, polarity/opinion identification and extraction, linguistic analyses of social media behavior

Expanding, retrieving and infilling: Diversifying cross-domain question generation with flexible templates

Xiaojing Yu, Anxiao Jiang

Keywords Abstract Paper

Global Gender Differences in Wikipedia Readership

Isaac Johnson, Florian Lemmerich, Diego Sáez-Trumper and Robert West, Markus Strohmaier, Leila Zia

Emily Dinan, Angela Fan, Ledell Wu and
Jason Weston, Douwe Kiela, Adina Williams

Keywords Paper

Anjalie Field, Sascha Rothe, Simon Baumgartner and
Cong Yu, Abe Ittycheriah

Keywords Paper

Keywords Paper

Jichuan Zeng, Xi Victoria Lin, Steven C.H. Hoi and
Richard Socher, Caiming Xiong, Michael Lyu, Irwin King

Keywords Paper

Shaobo Li, Xiaoguang Li, Lifeng Shang and
Xin Jiang, Qun Liu, Chengjie Sun, Zhenzhou Ji, Bingquan Liu

Keywords Paper

C. Estelle Smith, Bowen Yu, Anjali Srivastava and
Aaron Halfaker, Loren Terveen, Haiyi Zhu

Keywords Paper

Keywords Paper

Xingyu Fu, Weijia Shi, Xiaodong Yu and
Zian Zhao, Dan Roth

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Ankur Parikh, Xuezhi Wang, Sebastian Gehrmann and
Manaal Faruqui, Bhuwan Dhingra, Diyi Yang, Dipanjan Das

Keywords Paper

Tianlu Wang, Xi Victoria Lin, Nazneen Fatema Rajani and
Bryan McCann, Vicente Ordonez, Caiming Xiong

Keywords Paper

Emanuele Bugliarello, Sabrina J. Mielke, Antonios Anastasopoulos and
Ryan Cotterell, Naoaki Okazaki

Keywords Paper

Keywords Paper

Keywords Paper

Isaac Johnson, Florian Lemmerich, Diego Sáez-Trumper and
Robert West, Markus Strohmaier, Leila Zia

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Raul Puri, Ryan Spring, Mohammad Shoeybi and
Mostofa Patwary, Bryan Catanzaro

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Fei Yuan, Linjun Shou, Xuanyu Bai and
Ming Gong, Yaobo Liang, Nan Duan, Yan Fu, Daxin Jiang

Keywords Paper

Mohsen Sayyadiharikandeh, Onur Varol, Kai-Cheng Yang and
Alessandro Flammini, Filippo Menczer

Keywords Paper

Keywords Paper

Keywords Paper

Binyuan Hui, Ruiying Geng, Qiyu Ren and
Binhua Li, Yongbin Li, Jian Sun, Fei Huang, Luo Si, Pengfei Zhu, Xiaodan Zhu

Keywords Paper