Word alignment by fine-tuning embeddings on parallel corpora

Abstract: Word alignment over parallel corpora has a wide variety of applications, including learning translation lexicons, cross-lingual transfer of language processing tools, and automatic evaluation or analysis of translation outputs. The great majority of past work on word alignment has worked by performing unsupervised learning on parallel text. Recently, however, other work has demonstrated that pre-trained contextualized word embeddings derived from multilingually trained language models (LMs) prove an attractive alternative, achieving competitive results on the word alignment task even in the absence of explicit training on parallel data. In this paper, we examine methods to marry the two approaches: leveraging pre-trained LMs but fine-tuning them on parallel text with objectives designed to improve alignment quality, and proposing methods to effectively extract alignments from these fine-tuned models. We perform experiments on five language pairs and demonstrate that our model can consistently outperform previous state-of-the-art models of all varieties. In addition, we demonstrate that we are able to train multilingual word aligners that can obtain robust performance on different language pairs.

04/07/2020

Word alignment by fine-tuning embeddings on parallel corpora

Zi-Yi Dou, Graham Neubig

Comments

Similar Papers

Unsupervised Word Translation with Adversarial Autoencoder

Tasnim Mohiuddin, Shafiq Joty

Keywords Abstract Paper

Unsupervised Translation, machine translation, transfer learning, word task

English intermediate-task training improves zero-shot cross-lingual transfer too

Jason Phang, Iacer Calixto, Phu Mon Htut and Yada Pruksachatkun, Haokun Liu, Clara Vania, Katharina Kann, Samuel R. Bowman

Keywords Abstract Paper

Jointly Learning to Align and Summarize for Neural Cross-Lingual Summarization

Yue Cao, Hui Liu, Xiaojun Wan

Keywords Abstract Paper

Neural Summarization, Cross-lingual summarization, cross-lingual training, pipeline methods

Few-shot learning through contextual data augmentation

Farid Arthaud, Rachel Bawden, Alexandra Birch

Keywords Abstract Paper

Word-Level Speech Recognition With a Letter to Word Encoder

Ronan Collobert, Awni Hannun, Gabriel Synnaeve

Keywords Abstract Paper

Applications - Language, Speech and Dialog

Cross-Lingual Unsupervised Sentiment Classification with Multi-View Transfer Learning

Hongliang Fei, Ping Li

Keywords Abstract Paper

Cross-Lingual Classification, sentiment classification, unsupervised system, classification

Dynamic Data Selection and Weighting for Iterative Back-Translation

Zi-Yi Dou, Antonios Anastasopoulos, Graham Neubig

Keywords Abstract Paper

neural translation, neural nmt, nmt, domain adaptation

A pairwise probe for understanding BERT fine-tuning on machine reading comprehension

Jie Cai, Zhengzhou Zhu, Ping Nie, Qian Liu

Keywords Abstract Paper

machine reading comprehension, pairwise, fine-tune, BERT

Fine-Grained Analysis of Cross-Linguistic Syntactic Divergences

Dmitry Nikolaev, Ofir Arviv, Taelin Karidi and Neta Kenneth, Veronika Mitnik, Lilja Maria Saeboe, Omri Abend

Keywords Abstract Paper

Fine-Grained Divergences, cross-lingual transfer, full automation, cross-lingual parser

Guiding Non-Autoregressive Neural Machine Translation Decoding with Reordering Information

Qiu Ran, Yankai Lin, Peng Li, Jie Zhou

Keywords Abstract Paper

LAReQA: Language-Agnostic Answer Retrieval from a Multilingual Pool

Uma Roy, Noah Constant, Rami Al-Rfou and Aditya Barua, Aaron Phillips, Yinfei Yang

Keywords Abstract Paper

language-agnostic retrieval, cross-lingual tasks, cross-lingual retrieval, alignment

Sequence-Level Mixed Sample Data Augmentation

Demi Guo, Yoon Kim, Alexander Rush

Keywords Abstract Paper

sequence-to-sequence problems, scan, semantic parsing, neural networks

GLUECoS: An Evaluation Benchmark for Code-Switched NLP

Simran Khanuja, Sandipan Dandapat, Anirudh Srinivasan and Sunayana Sitaram, Monojit Choudhury

Keywords Abstract Paper

Code-Switched NLP, cross-lingual tasks, NLP tasks, Language Identification

Referring Transformer: A One-step Approach to Multi-task Visual Grounding

Muchen Li, Leonid Sigal

Keywords Abstract Paper

transformers, vision

Iterative Domain-Repaired Back-Translation

Hao-Ran Wei, Zhirui Zhang, Boxing Chen, Weihua Luo

Keywords Abstract Paper

domain-specific translation, domain adaptation, back-translation method, out-of-domain systems

A Probabilistic Formulation of Unsupervised Text Style Transfer

Junxian He, Xinyi Wang, Graham Neubig, Taylor Berg-Kirkpatrick

Keywords Abstract Paper

unsupervised text style transfer, deep latent sequence model

A Retrieve-and-Rewrite Initialization Method for Unsupervised Machine Translation

Shuo Ren, Yu Wu, Shujie Liu and Ming Zhou, Shuai Ma

Keywords Abstract Paper

Unsupervised Translation, translation, Retrieve-and-Rewrite Method, translation models

Learning with Noisy Correspondence for Cross-modal Matching

Zhenyu Huang, Guocheng Niu, Xiao Liu and Wenbiao Ding, Xinyan Xiao, Hua Wu, Xi Peng

Keywords Abstract Paper

deep learning, language

Improving Stylized Neural Machine Translation with Iterative Dual Knowledge Transfer

Xuanxuan Wu, Jian Liu, Xinjie Li and Jinan Xu, Yufeng Chen, Yujie Zhang, Hui Huang

Keywords Abstract Paper

Natural Language Processing, Machine Translation, Natural Language Generation

Pronoun-Targeted Fine-tuning for NMT with Hybrid Losses

Prathyusha Jwalapuram, Shafiq Joty, Youlin Shen

Keywords Abstract Paper

Keywords Paper

Jason Phang, Iacer Calixto, Phu Mon Htut and
Yada Pruksachatkun, Haokun Liu, Clara Vania, Katharina Kann, Samuel R. Bowman

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Dmitry Nikolaev, Ofir Arviv, Taelin Karidi and
Neta Kenneth, Veronika Mitnik, Lilja Maria Saeboe, Omri Abend

Keywords Paper

Keywords Paper

Uma Roy, Noah Constant, Rami Al-Rfou and
Aditya Barua, Aaron Phillips, Yinfei Yang

Keywords Paper

Keywords Paper

Simran Khanuja, Sandipan Dandapat, Anirudh Srinivasan and
Sunayana Sitaram, Monojit Choudhury

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Shuo Ren, Yu Wu, Shujie Liu and
Ming Zhou, Shuai Ma

Keywords Paper

Zhenyu Huang, Guocheng Niu, Xiao Liu and
Wenbiao Ding, Xinyan Xiao, Hua Wu, Xi Peng

Keywords Paper

Xuanxuan Wu, Jian Liu, Xinjie Li and
Jinan Xu, Yufeng Chen, Yujie Zhang, Hui Huang

Keywords Paper

Keywords Paper

Zaixiang Zheng, Hao Zhou, Shujian Huang and
Jiajun Chen, Jingjing Xu, Lei Li

Keywords Paper

Wentao Ma, Yiming Cui, Chenglei Si and
Ting Liu, Shijin Wang, Guoping Hu

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Lei Yu, Laurent Sartran, Wojciech Stokowiec and
Wang Ling, Lingpeng Kong, Phil Blunsom, Chris Dyer

Keywords Paper

Huda Khayrallah, Jacob Bremerman, Arya D. McCarthy and
Kenton Murray, Winston Wu, Matt Post

Keywords Paper

Chao Jiang, Mounica Maddela, Wuwei Lan and
Yang Zhong, Wei Xu

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Pan Xie, Zhi Cui, Xiuying Chen and
XiaoHui Hu, Jianwei Cui, Bin Wang

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper