Understanding the effects of word-level linguistic annotations in under-resourced neural machine translation

Abstract: This paper studies the effects of word-level linguistic annotations in under-resourced neural machine translation, for which there is incomplete evidence in the literature. The study covers eight language pairs, different training corpus sizes, two architectures, and three types of annotation: dummy tags (with no linguistic information at all), part-of-speech tags, and morpho-syntactic description tags, which consist of part of speech and morphological features. These linguistic annotations are interleaved in the input or output streams as a single tag placed before each word. In order to measure the performance under each scenario, we use automatic evaluation metrics and perform automatic error classification. Our experiments show that, in general, source-language annotations are helpful and morpho-syntactic descriptions outperform part of speech for some language pairs. On the contrary, when words are annotated in the target language, part-of-speech tags systematically outperform morpho-syntactic description tags in terms of automatic evaluation metrics, even though the use of morpho-syntactic description tags improves the grammaticality of the output. We provide a detailed analysis of the reasons behind this result.

03/05/2021

Understanding the effects of word-level linguistic annotations in under-resourced neural machine translation

Víctor M. Sánchez-Cartagena, Juan Antonio Pérez-Ortiz, Felipe Sánchez-Martínez

Comments

Similar Papers

Empirical Analysis of Unlabeled Entity Problem in Named Entity Recognition

Yangming Li, lemao liu, Shuming Shi

Keywords Abstract Paper

Negative Sampling, Unlabeled Entity Problem, Named Entity Recognition

On the Importance of Word Order Information in Cross-lingual Sequence Labeling

Zihan Liu, Genta I Winata, Samuel Cahyawijaya and Andrea Madotto, Zhaojiang Lin, Pascale Fung

Keywords Abstract Paper

On the Sentence Embeddings from Pre-trained Language Models

Bohan Li, Hao Zhou, Junxian He and Mingxuan Wang, Yiming Yang, Lei Li

Keywords Abstract Paper

natural processing, semantic task, semantic tasks, pre-trained representations

FastCorrect: Fast Error Correction with Edit Alignment for Automatic Speech Recognition

Yichong Leng, Xu Tan, Linchen Zhu and Jin Xu, Renqian Luo, Linquan Liu, Tao Qin, Xiangyang Li, Edward Lin, Tie-Yan Liu

Keywords Abstract Paper

Specializing Unsupervised Pretraining Models for Word-Level Semantic Similarity

Anne Lauscher, Ivan Vulić, Edoardo Maria Ponti and Anna Korhonen, Goran Glavaš

Keywords Abstract Paper

An exploratory study on multilingual quality estimation

Shuo Sun, Marina Fomicheva, Frédéric Blain and Vishrav Chaudhary, Ahmed El-Kishky, Adithya Renduchintala, Francisco Guzmán, Lucia Specia

Keywords Abstract Paper

Detecting Word Sense Disambiguation Biases in Machine Translation for Model-Agnostic Adversarial Attacks

Denis Emelin, Ivan Titov, Rico Sennrich

Keywords Abstract Paper

word disambiguation, nmt, prediction errors, adversarial strategy

Cross-Linguistic Syntactic Evaluation of Word Prediction Models

Aaron Mueller, Garrett Nicolai, Panayiota Petrou-Zeniou and Natalia Talmina, Tal Linzen

Keywords Abstract Paper

Cross-Linguistic Syntax, Syntax, Cross-Linguistic Models, neural models

Encodings of Source Syntax: Similarities in NMT Representations Across Target Languages

Tyler A. Chang, Anna Rafferty

Keywords Abstract Paper

ContraCAT: Contrastive Coreference Analytical Templates for Machine Translation

Dario Stojanovski, Benno Krojer, Denis Peskov, Alexander Fraser

Keywords Abstract Paper

Cross-Lingual Unsupervised Sentiment Classification with Multi-View Transfer Learning

Hongliang Fei, Ping Li

Keywords Abstract Paper

Cross-Lingual Classification, sentiment classification, unsupervised system, classification

Adversarial Meta Sampling for Multilingual Low-Resource Speech Recognition

Yubei Xiao, Ke Gong, Pan Zhou and Guolin Zheng, Xiaodan Liang, Liang Lin

Keywords Abstract Paper

Bidirectional Variational Inference for Non-Autoregressive Text-to-Speech

Yoonhyung Lee, Joongbo Shin, Kyomin Jung

Keywords Abstract Paper

VAE, non-autoregressive, speech synthesis, text-to-speech

Less is Better: A cognitively inspired unsupervised model for language segmentation

Jinbiao Yang, Stefan L. Frank, Antal van den Bosch

Keywords Abstract Paper

Infusing Sequential Information into Conditional Masked Translation Model with Self-Review Mechanism

Pan Xie, Zhi Cui, Xiuying Chen and XiaoHui Hu, Jianwei Cui, Bin Wang

Keywords Abstract Paper

WER-BERT: Automatic WER estimation with BERT in a balanced ordinal classification paradigm

Akshay Krishna Sheshadri, Anvesh Rao Vijjini, Sukhdeep Kharbanda

Keywords Abstract Paper

WiC-TSV: An evaluation benchmark for target sense verification of words in context

Anna Breit, Artem Revenko, Kiamehr Rezaee and Mohammad Taher Pilehvar, Jose Camacho-Collados

Keywords Abstract Paper

Visually Grounded Compound PCFGs

Yanpeng Zhao, Ivan Titov

Keywords Abstract Paper

exploiting groundings, language understanding, gradient estimates, fully-differentiable learning

On the Linguistic Representational Power of Neural Machine Translation Models

Yonatan Belinkov, Nadir Durrani, Fahim Dalvi and Hassan Sajjad, James Glass

Keywords Abstract Paper

Linguistic Models, natural processing, artificial intelligence, translating languages

Robust Neural Machine Translation with ASR Errors

Haiyang Xue, Yang Feng, Shuhao Gu, Wei Chen

Keywords Abstract Paper

Improving Massively Multilingual Neural Machine Translation and Zero-Shot Translation

Biao Zhang, Philip Williams, Ivan Titov, Rico Sennrich

Keywords Abstract Paper

Massively Translation, Zero-Shot Translation, neural translation, NMT

Incremental Processing in the Age of Non-Incremental Encoders: An Empirical Assessment of Bidirectional Models for Incremental NLU

Brielen Madureira, David Schlangen

Keywords Abstract Paper

nlp, interactive systems, language encoders, bidirectional lstms

Keywords Paper

Zihan Liu, Genta I Winata, Samuel Cahyawijaya and
Andrea Madotto, Zhaojiang Lin, Pascale Fung

Keywords Paper

Bohan Li, Hao Zhou, Junxian He and
Mingxuan Wang, Yiming Yang, Lei Li

Keywords Paper

Yichong Leng, Xu Tan, Linchen Zhu and
Jin Xu, Renqian Luo, Linquan Liu, Tao Qin, Xiangyang Li, Edward Lin, Tie-Yan Liu

Keywords Paper

Anne Lauscher, Ivan Vulić, Edoardo Maria Ponti and
Anna Korhonen, Goran Glavaš

Keywords Paper

Shuo Sun, Marina Fomicheva, Frédéric Blain and
Vishrav Chaudhary, Ahmed El-Kishky, Adithya Renduchintala, Francisco Guzmán, Lucia Specia

Keywords Paper

Keywords Paper

Aaron Mueller, Garrett Nicolai, Panayiota Petrou-Zeniou and
Natalia Talmina, Tal Linzen

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Yubei Xiao, Ke Gong, Pan Zhou and
Guolin Zheng, Xiaodan Liang, Liang Lin

Keywords Paper

Keywords Paper

Keywords Paper

Pan Xie, Zhi Cui, Xiuying Chen and
XiaoHui Hu, Jianwei Cui, Bin Wang

Keywords Paper

Keywords Paper

Anna Breit, Artem Revenko, Kiamehr Rezaee and
Mohammad Taher Pilehvar, Jose Camacho-Collados

Keywords Paper

Keywords Paper

Yonatan Belinkov, Nadir Durrani, Fahim Dalvi and
Hassan Sajjad, James Glass

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Cheng-I Jeff Lai, Yang Zhang, Alexander Liu and
Shiyu Chang, Yi-Lun Liao, Yung-Sung Chuang, Kaizhi Qian, Sameer Khurana, David Cox, Jim Glass

Keywords Paper

Wenqing Chen, Jidong Tian, Liqiang Xiao and
Hao He, Yaohui Jin

Keywords Paper

Adhiguna Kuncoro, Lingpeng Kong, Daniel Fried and
Dani Yogatama, Laura Rimell, Chris Dyer, Phil Blunsom

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Nikhil Saini, Drumil Trivedi, Shreya Khare and
Tejas Dhamecha, Preethi Jyothi, Samarth Bharadwaj, Pushpak Bhattacharyya

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Zhengbao Jiang, Antonios Anastasopoulos, Jun Araki and
Haibo Ding, Graham Neubig

Keywords Paper