Pre-training via Leveraging Assisting Languages for Neural Machine Translation

04/07/2020

Pre-training via Leveraging Assisting Languages for Neural Machine Translation

Haiyue Song, Raj Dabre, Zhuoyuan Mao, Fei Cheng, Sadao Kurohashi, Eiichiro Sumita

Keywords: Neural Translation, S2S tasks, LOI, low-resource translation

Abstract Paper Similar Papers

Abstract: Sequence-to-sequence (S2S) pre-training using large monolingual data is known to improve performance for various S2S NLP tasks. However, large monolingual corpora might not always be available for the languages of interest (LOI). Thus, we propose to exploit monolingual corpora of other languages to complement the scarcity of monolingual corpora for the LOI. We utilize script mapping (Chinese to Japanese) to increase the similarity (number of cognates) between the monolingual corpora of helping languages and LOI. An empirical case study of low-resource Japanese-English neural machine translation (NMT) reveals that leveraging large Chinese and French monolingual corpora can help overcome the shortage of Japanese and English monolingual corpora, respectively, for S2S pre-training. Using only Chinese and French monolingual corpora, we were able to improve Japanese-English translation quality by up to 8.5 BLEU in low-resource scenarios.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at ACL 2020 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

04/07/2020

A Graph-based Model for Joint Chinese Word Segmentation and Dependency Parsing

Hang Yan, Xipeng Qiu, Xuanjing Huang

Keywords Paper

Joint Segmentation, Joint Parsing, Chinese segmentation, dependency parsing

0

0

0

0

8:15

04/07/2020

Joint Chinese Word Segmentation and Part-of-speech Tagging via Two-way Attentions of Auto-analyzed Knowledge

Yuanhe Tian, Yan Song, Xiang Ao and
Fei Xia, Xiaojun Quan, Tong Zhang, Yonggang Wang

Keywords Paper

Chinese Segmentation, Part-of-speech Tagging, Chinese processing, joint tagging

0

0

0

0

11:53

16/11/2020

A Joint Multiple Criteria Model in Transfer Learning for Cross-domain Chinese Word Segmentation

Kaiyu Huang, Degen Huang, Zhuang Liu, Fengran Mo

Keywords Paper

natural, chinese segmentation, chinese, chinese tasks

0

0

0

0

10:49

02/02/2021

LET: Linguistic Knowledge Enhanced Graph Transformer for Chinese Short Text Matching

Boer Lyu, Lu Chen, Su Zhu, Kai Yu

Keywords Paper

0

0

0

0

15:57

08/12/2020

Joint Chinese Word Segmentation and Part-of-speech Tagging via Multi-channel Attention of Character N-grams

Yuanhe Tian, Yan Song, Fei Xia

Keywords Paper

0

0

0

0

14:53

04/07/2020

AdvAug: Robust Adversarial Augmentation for Neural Machine Translation

Yong Cheng, Lu Jiang, Wolfgang Macherey, Jacob Eisenstein

Keywords Paper

Robust Augmentation, Neural Translation, Neural NMT, Neural

0

0

0

0

12:16

05/12/2020

UnihanLM: Coarse-to-fine Chinese-Japanese language model pretraining with the unihan database

Canwen Xu, Tao Ge, Chenliang Li, Furu Wei

Keywords Paper

0

0

0

0

8:52

02/02/2021

Bridging the Domain Gap: Improve Informal Language Translation via Counterfactual Domain Adaptation

Ke Wang, Guandan Chen, Zhongqiang Huang and
Xiaojun Wan, Fei Huang

Keywords Paper

0

0

0

0

18:24

16/11/2020

Improving the Efficiency of Grammatical Error Correction with Erroneous Span Detection and Correction

Mengyun Chen, Tao Ge, Xingxing Zhang and
Furu Wei, Ming Zhou

Keywords Paper

erroneous detection, erroneous correction, inference, language-independent approach

0

0

0

0

6:27

16/11/2020

Surprisal Predicts Code-Switching in Chinese-English Bilingual Text

Jesús Calvillo, Le Fang, Jeremy Cole, David Reitter

Keywords Paper

code-switching, inhibition language, computational model, surprisal

0

0

0

0

11:29

16/11/2020

Mind Your Inflections! Improving NLP for Non-Standard Englishes with Base-Inflection Encoding

Samson Tan, Shafiq Joty, Lav Varshney, Min-Yen Kan

Keywords Paper

comprehension, fine-tuning models, downstream tasks, nlp systems

0

0

0

0

10:22

08/12/2020

Multi-grained Chinese Word Segmentation with Weakly Labeled Data

Chen Gong, Zhenghua Li, Bowei Zou, Min Zhang

Keywords Paper

0

0

0

0

14:48

05/12/2020

English-to-Chinese transliteration with phonetic auxiliary task

Yuan He, Shay B. Cohen

Keywords Paper

0

0

0

0

14:10

05/12/2020

Mixed-lingual pre-training for cross-lingual summarization

Ruochen Xu, Chenguang Zhu, Yu Shi and
Michael Zeng, Xuedong Huang

Keywords Paper

0

0

0

0

11:49

01/07/2020

A Deep Reinforced Model for Zero-Shot Cross-Lingual Summarization with Bilingual Semantic Similarity Rewards

Zi-Yi Dou, Sachin Kumar, Yulia Tsvetkov

Keywords Paper

0

0

0

0

4:35

04/07/2020

2kenize: Tying Subword Sequences for Chinese Script Conversion

- Pranav A, Isabelle Augenstein

Keywords Paper

Chinese Conversion, Chinese NLP, mapping sequences, topic classification

0

0

0

0

10:32

16/11/2020

Generating Diverse Translation from Model Distribution with Dropout

Xuanfu Wu, Yang Feng, Chenze Shao

Keywords Paper

neural, inference, chinese-english tasks, nmt

0

0

0

0

11:09

04/07/2020

Spelling Error Correction with Soft-Masked BERT

Shaohua Zhang, Haoran Huang, Jicong Liu, Hang Li

Keywords Paper

Spelling Correction, Chinese correction, Chinese CSC, error detection

0

0

0

0

11:34

08/12/2020

Synonym Knowledge Enhanced Reader for Chinese Idiom Reading Comprehension

Siyu Long, Ran Wang, Kun Tao and
Jiali Zeng, Xinyu Dai

Keywords Paper

0

0

0

0

9:58

19/04/2021

CLiMP: A benchmark for Chinese language model evaluation

Beilei Xiang, Changbing Yang, Yu Li and
Alex Warstadt, Katharina Kann

Keywords Paper

0

0

0

0

7:04

06/12/2020

DeepI2I: Enabling Deep Hierarchical Image-to-Image Translation by Transferring from GANs

yaxing wang, Lu Yu, Joost van de Weijer

Keywords Paper

Algorithms -> Online Learning, Optimization -> Stochastic Optimization

0

0

0

0

3:23

04/07/2020

Glyph2Vec: Learning Chinese Out-of-Vocabulary Word Embedding from Glyphs

Hong-You Chen, SZ-HAN YU, Shou-de Lin

Keywords Paper

Chinese Embedding, Chinese applications, Chinese problem, out-of-vocabulary embedding

0

0

0

0

7:22

08/12/2020

Emergent Communication Pretraining for Few-Shot Machine Translation

Yaoyiran Li, Edoardo Maria Ponti, Ivan Vulić, Anna Korhonen

Keywords Paper

0

0

0

0

14:42

04/07/2020

Effectively Aligning and Filtering Parallel Corpora under Sparse Data Conditions

Steinþór Steingrímsson, Hrafn Loftsson, Andy Way

Keywords Paper

Aligning Corpora, machine systems, data problem, alignment problem

0

0

0

0

11:47

05/01/2021

Handwritten Chinese Font Generation With Collaborative Stroke Refinement

Chuan Wen, Yujie Pan, Jie Chang and
Ya Zhang, Siheng Chen, Yanfeng Wang, Mei Han, Qi Tian

Keywords Paper

0

0

0

0

5:01

02/02/2021

StrokeGAN: Reducing Mode Collapse in Chinese Font Generation via Stroke Encoding

Jinshan Zeng, Qi Chen, Yunxin Liu and
Mingwen Wang, Yuan Yao

Keywords Paper

0

0

0

0

17:02

14/06/2020

Visual Grounding in Video for Unsupervised Word Translation

Gunnar A. Sigurdsson, Jean-Baptiste Alayrac, Aida Nematzadeh and
Lucas Smaira, Mateusz Malinowski, João Carreira, Phil Blunsom, Andrew Zisserman

Keywords Paper

video, translation, multimodal learning, unsupervised learning, unsupervised translation, youtube, howto100m, multilingual, language, deep learning

0

0

0

0

1:01

08/12/2020

Effective Use of Target-side Context for Neural Machine Translation

Hideya Mino, Hitoshi Ito, Isao Goto and
Ichiro Yamada, Takenobu Tokunaga

Keywords Paper

0

0

0

0

13:42

02/02/2021

Commonsense Knowledge Augmentation for Low-Resource Languages via Adversarial Learning

Bosung Kim, Juae Kim, Youngjoong Ko, Jungyun Seo

Keywords Paper

0

0

0

0

19:38

01/07/2020

Robust Neural Machine Translation with ASR Errors

Haiyang Xue, Yang Feng, Shuhao Gu, Wei Chen

Keywords Paper

0

0

0

0

8:15

08/12/2020

Monolingual and Multilingual Reduction of Gender Bias in Contextualized Representations

Sheng Liang, Philipp Dufter, Hinrich Schütze

Keywords Paper

0

0

0

0

14:20

02/02/2021

Ideography Leads Us to the Field of Cognition: A Radical-Guided Associative Model for Chinese Text Classification

Hanqing Tao, Shiwei Tong, Kun Zhang and
Tong Xu, Qi Liu, Enhong Chen, Min Hou

Keywords Paper

0

0

0

0

14:26

19/04/2021

Applying the transformer to character-level transduction

Shijie Wu, Ryan Cotterell, Mans Hulden

Keywords Paper

0

0

0

0

6:45

19/04/2021

Multilingual neural machine translation with deep encoder and multiple shallow decoders

Xiang Kong, Adithya Renduchintala, James Cross and
Yuqing Tang, Jiatao Gu, Xian Li

Keywords Paper

0

0

0

0

10:26

16/11/2020

On Negative Interference in Multilingual Models: Findings and A Meta-Learning Treatment

Zirui Wang, Zachary C. Lipton, Yulia Tsvetkov

Keywords Paper

multilingual models, meta-learning algorithm, multilingual representations, negative interference

0

0

0

0

12:03

04/07/2020

Coupling Distant Annotation and Adversarial Training for Cross-Domain Chinese Word Segmentation

Ning Ding, Dingkun Long, Guangwei Xu and
Muhua Zhu, Pengjun Xie, Xiaobin Wang, Haitao Zheng

Keywords Paper

Coupling Annotation, Cross-Domain Segmentation, Chinese segmentation, Chinese CWS

0

0

0

0

8:35

04/07/2020

LINSPECTOR: Multilingual Probing Tasks for Word Representations

Gözde Gül Sahin, Clara Vania, Ilia Kuznetsov, Iryna Gurevych

Keywords Paper

Word Representations, NLP, classification tasks, probing tasks

0

0

0

0

11:51

04/07/2020

MMPE: A Multi-Modal Interface for Post-Editing Machine Translation

Nico Herbig, Tim Düwel, Santanu Pal and
Kalliopi Meladaki, Mahsa Monshizadeh, Antonio Krüger, Josef van Genabith

Keywords Paper

Post-Editing Translation, machine translation, MT, translators

0

0

0

0

11:41

02/02/2021

Accelerating Neural Machine Translation with Partial Word Embedding Compression

Fan Zhang, Mei Tu, Jinyao Yan

Keywords Paper

0

0

0

0

14:53

04/07/2020

Feature Projection for Improved Text Classification

Qi Qin, Wenpeng Hu, Bing Liu

Keywords Paper

Text Classification, classification, sentiment classification, Bert classification

0

0

0

0

10:57