Share or Not? Learning to Schedule Language-Specific Capacity for Multilingual Translation

Abstract: Using a mix of shared and language-specific (LS) parameters has shown promise in multilingual neural machine translation (MNMT), but the question of when and where LS capacity matters most is still under-studied. We offer such a study by proposing conditional language-specific routing (CLSR). CLSR employs hard binary gates conditioned on token representations to dynamically select LS or shared paths. By manipulating these gates, it can schedule LS capacity across sub-layers in MNMT subject to the guidance of translation signals and budget constraints. Moreover, CLSR can easily scale up to massively multilingual settings. Experiments with Transformer on OPUS-100 and WMT datasets show that: 1) MNMT is sensitive to both the amount and the position of LS modeling: distributing 10%-30% LS computation to the top and/or bottom encoder/decoder layers delivers the best performance; and 2) one-to-many translation benefits more from CLSR compared to many-to-one translation, particularly with unbalanced training data. Our study further verifies the trade-off between the shared capacity and LS capacity for multilingual translation. We corroborate our analysis by confirming the soundness of our findings as foundation of our improved multilingual Transformers. Source code and models are available at https://github.com/googleinterns/cct-m4.

06/12/2021

It's Easier to Translate out of English than into it: Measuring Neural Translation Difficulty by Cross-Mutual Information

Ankit Arun, Soumya Batra, Vikas Bhardwaj and
Ashwini Challa, Pinar Donmez, Peyman Heidari, Hakan Inan, Shashank Jain, Anuj Kumar, Shawn Mei, Karthik Mohan, Michael White

Deep Learning, Predictive Models, Algorithms, Representation Learning; Neuroscience and Cognitive Science; Neuroscience and Cognitive Science, Problem Solvin, Deep Learning, Architectures

17:00

14/06/2020

Precise Yet Efficient Semantic Calibration and Refinement in ConvNets for Real-time Polyp Segmentation from Colonoscopy Videos

deep learning, machine learning, adversarial robustness and security, domain adaptation, representation learning, transfer learning

15:02

08/12/2020

Meet Changes with Constancy: Learning Invariance in Multi-Source Translation

Jianfeng Liu, Ling Luo, Xiang Ao and
Yan Song, Haoran Xu, Jian Ye

Machine Learning, Adversarial Machine Learning, Explainable/Interpretable Machine Learning, Sentiment Analysis and Text Mining

14:52

14/09/2020