Mixed Cross Entropy Loss for Neural Machine Translation

18/07/2021

Mixed Cross Entropy Loss for Neural Machine Translation

Haoran Li, Wei Lu

Keywords: Applications, Natural Language Processing

Abstract Paper Similar Papers

Abstract: In neural machine translation, Cross Entropy loss (CE) is the standard loss function in two training methods of auto-regressive models, i.e., teacher forcing and scheduled sampling. In this paper, we propose mixed Cross Entropy loss (mixed CE) as a substitute for CE in both training approaches. In teacher forcing, the model trained with CE regards the translation problem as a one-to-one mapping process, while in mixed CE this process can be relaxed to one-to-many. In scheduled sampling, we show that mixed CE has the potential to encourage the training and testing behaviours to be similar to each other, more effectively mitigating the exposure bias problem. We demonstrate the superiority of mixed CE over CE on several machine translation datasets, WMT'16 Ro-En, WMT'16 Ru-En, and WMT'14 En-De in both teacher forcing and scheduled sampling setups. Furthermore, in WMT'14 En-De, we also find mixed CE consistently outperforms CE on a multi-reference set as well as a challenging paraphrased reference set. We also found the model trained with mixed CE is able to provide a better probability distribution defined over the translation output space. Our code is available at https://github.com/haorannlp/mix.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at ICML 2021 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

03/05/2021

Exploring Balanced Feature Spaces for Representation Learning

Bingyi Kang, Yu Li, Sain Xie and
Zehuan Yuan, Jiashi Feng

Keywords Paper

Representation Learning, Contrastive Learning, Long-Tailed Recognition

0

0

0

0

7:18

05/12/2020

Self-supervised learning for pairwise data refinement

Gustavo Hernandez Abrego, Bowen Liang, Wei Wang and
Zarana Parekh, Yinfei Yang, Yunhsuan Sung

Keywords Paper

0

0

0

0

15:17

19/08/2021

TIDOT: A Teacher Imitation Learning Approach for Domain Adaptation with Optimal Transport

Tuan Nguyen, Trung Le, Nhan Dam and
Quan Hung Tran, Truyen Nguyen, Dinh Phung

Keywords Paper

Machine Learning, Transfer, Adaptation, Multi-task Learning

0

0

0

0

14:34

03/05/2021

Understanding and Improving Lexical Choice in Non-Autoregressive Translation

Liam Ding, Longyue Wang, Xuebo Liu and
Derek Wong, Dacheng Tao, Zhaopeng Tu

Keywords Paper

0

0

0

0

11:37

06/12/2021

Training Over-parameterized Models with Non-decomposable Objectives

Harikrishna Narasimhan, Aditya Menon

Keywords Paper

optimization, machine learning, fairness

0

0

0

0

8:28

05/01/2021

Towards Fair Cross-Domain Adaptation via Generative Learning

Tongxin Wang, Zhengming Ding, Wei Shao and
Haixu Tang, Kun Huang

Keywords Paper

0

0

0

0

4:56

26/08/2020

Regularization via Structural Label Smoothing

Weizhi Li, Gautam Dasarathy, Visar Berisha

Keywords Paper

0

0

0

0

13:36

06/12/2021

Exponential Separation between Two Learning Models and Adversarial Robustness

Grzegorz Gluch, Ruediger Urbanke

Keywords Paper

theory, robustness, adversarial robustness and security

0

0

0

0

15:11

12/07/2020

An end-to-end approach for the verification problem: learning the right distance

Joao Monteiro, Isabela Albuquerque, Jahangir Alam and
R Devon Hjelm, Tiago Falk

Keywords Paper

General Machine Learning Techniques

0

0

0

0

13:06

03/05/2021

Knowledge distillation via softmax regression representation learning

Jing Yang, Brais Martinez, Adrian Bulat, Georgios Tzimiropoulos

Keywords Paper

0

0

0

0

4:56

08/12/2020

Collective Wisdom: Improving Low-resource Neural Machine Translation using Adaptive Knowledge Distillation

Fahimeh Saleh, Wray Buntine, Gholamreza Haffari

Keywords Paper

0

0

0

0

9:03

12/07/2020

Time-Consistent Self-Supervision for Semi-Supervised Learning

Tianyi Zhou, Shengjie Wang, Jeff Bilmes

Keywords Paper

Unsupervised and Semi-Supervised Learning

0

0

0

0

14:37

03/05/2021

Initialization and Regularization of Factorized Neural Layers

Misha Khodak, Neil Tenenholtz, Lester Mackey, Nicolo Fusi

Keywords Paper

matrix factorization, knowledge distillation, multi-head attention, model compression

0

0

0

0

4:25

19/04/2021

Does typological blinding impede cross-lingual sharing?

Johannes Bjerva, Isabelle Augenstein

Keywords Paper

0

0

0

0

7:52

26/04/2020

DivideMix: Learning with Noisy Labels as Semi-supervised Learning

Junnan Li, Richard Socher, Steven C.H. Hoi

Keywords Paper

label noise, semi-supervised learning

0

0

0

0

5:00

22/11/2021

Domain Attention Consistency for Multi-Source Domain Adaptation

Zhongying Deng, Kaiyang Zhou, Yongxin Yang, Tao Xiang

Keywords Paper

Transferable Attribute Learning, Domain Attention Consistency, Multi-Source Domain Adaptation

0

0

0

0

9:24

14/06/2020

Semi-Supervised Semantic Segmentation With Cross-Consistency Training

Yassine Ouali, Céline Hudelot, Myriam Tami

Keywords Paper

semantic segmentation, semi-supervised learning, consistency training, semi-supervised semantic segmentation

0

0

0

0

1:01

14/06/2020

Distilling Cross-Task Knowledge via Relationship Matching

Han-Jia Ye, Su Lu, De-Chuan Zhan

Keywords Paper

knowledge distillation, model reuse, knowledge transfer, cross-task learning, embedding learning

0

0

0

0

4:54

19/08/2021

Differentially Private Correlation Alignment for Domain Adaptation

Kaizhong Jin, Xiang Cheng, Jiaxi Yang, Kaiyuan Shen

Keywords Paper

Multidisciplinary Topics and Applications, Security and Privacy, Transfer, Adaptation, Multi-task Learning

0

0

0

0

8:03

18/07/2021

Efficient Iterative Amortized Inference for Learning Symmetric and Disentangled Multi-Object Representations

Patrick Emami, Pan He, Sanjay Ranka, Anand Rangarajan

Keywords Paper

Deep Learning, Embedding and Representation learning

0

0

0

0

5:10

06/12/2021

CLDA: Contrastive Learning for Semi-Supervised Domain Adaptation

Ankit Singh

Keywords Paper

domain adaptation, contrastive learning

0

0

0

0

6:24

05/01/2021

AVGZSLNet: Audio-Visual Generalized Zero-Shot Learning by Reconstructing Label Features From Multi-Modal Embeddings

Pratik Mazumder, Pravendra Singh, Kranti Kumar Parida, Vinay P. Namboodiri

Keywords Paper

0

0

0

0

4:46

22/11/2021

In-N-Out: Towards Good Initialization for Inpainting and Outpainting

Changho Jo, Woobin Im, Sungeui Yoon

Keywords Paper

inpainting, outpainting, extrapolation, environment map estimation, self-supervised learning, transfer learning

0

0

0

0

2:33

14/06/2020

PADS: Policy-Adapted Sampling for Visual Similarity Learning

Karsten Roth, Timo Milbich, Björn Ommer

Keywords Paper

deep metric learning, visual similarity, reinforcement learning, generalization, image retrieval

0

0

0

0

1:01

06/12/2021

Grounding inductive biases in natural images: invariance stems from variations in data

Diane Bouchacourt, Mark Ibrahim, Ari Morcos

Keywords Paper

machine learning, transformers

0

0

0

0

14:19

03/05/2021

Knowledge Distillation as Semiparametric Inference

Tri Dao, Govinda Kamath, Vasilis Syrgkanis, Lester Mackey

Keywords Paper

generalization bounds, knowledge distillation, model compression, loss correction, orthogonal machine learning, cross-fitting, semiparametric inference

0

0

0

0

5:10

03/05/2021

Supervised Contrastive Learning for Pre-trained Language Model Fine-tuning

Beliz Gunel, Jingfei Du, Alexis Conneau, Veselin Stoyanov

Keywords Paper

supervised contrastive learning, pre-trained language model fine-tuning, natural language understanding, generalization, few-shot learning, robustness

0

0

0

0

4:44

06/12/2021

Contrastively Disentangled Sequential Variational Autoencoder

Junwen Bai, Weiran Wang, Carla Gomes

Keywords Paper

self-supervised learning, generative model, contrastive learning, representation learning, interpretability

0

0

0

0

12:53

03/05/2021

Contrastive Learning with Adversarial Perturbations for Conditional Text Generation

Seanie Lee, Dong Bok Lee, Sung Ju Hwang

Keywords Paper

contrastive learning, conditional text generation

0

0

0

0

4:51

14/06/2020

Few Sample Knowledge Distillation for Efficient Network Compression

Tianhong Li, Jianguo Li, Zhuang Liu, Changshui Zhang

Keywords Paper

efficient network compression, few samples, knowledge distillation

0

0

0

0

1:01

06/12/2021

Learning Causal Semantic Representation for Out-of-Distribution Prediction

Chang Liu, Xinwei Sun, Jindong Wang and
Haoyue Tang, Tao Li, Tao Qin, Wei Chen, Tie-Yan Liu

Keywords Paper

generative model, domain adaptation, representation learning

0

0

0

0

14:29

04/07/2020

Structure-Level Knowledge Distillation For Multilingual Sequence Labeling

Xinyu Wang, Yong Jiang, Nguyen Bach and
Tao Wang, Fei Huang, Kewei Tu

Keywords Paper

Multilingual Labeling, predicting sequences, online serving, Structure-Level Distillation

0

0

0

0

11:52

06/12/2020

Auxiliary Task Reweighting for Minimum-data Learning

Baifeng Shi, Judy Hoffman, Kate Saenko and
Trevor Darrell, Huijuan Xu

Keywords Paper

0

0

0

0

3:28

06/12/2021

HSVA: Hierarchical Semantic-Visual Adaptation for Zero-Shot Learning

Shiming Chen, Guosen Xie, Yang Liu and
Qinmu Peng, Baigui Sun, Hao Li, Xinge You, Ling Shao

Keywords Paper

generative model, domain adaptation

0

0

0

0

9:19

05/01/2021

Intra-Class Part Swapping for Fine-Grained Image Classification

Lianbo Zhang, Shaoli Huang, Wei Liu

Keywords Paper

0

0

0

0

4:43

02/02/2021

Learning a Few-shot Embedding Model with Contrastive Learning

Chen Liu, Yanwei Fu, Chengming Xu and
Siqian Yang, Jilin Li, Chengjie Wang, Li Zhang

Keywords Paper

0

0

0

0

15:02

06/12/2020

Learning Diverse and Discriminative Representations via the Principle of Maximal Coding Rate Reduction

Yaodong Yu, Ryan Chan, Chong You and
Chaobing Song, Yi Ma

Keywords Paper

0

0

0

0

3:20

19/08/2021

Object Detection in Densely Packed Scenes via Semi-Supervised Learning with Dual Consistency

Chao Ye, Huaidong Zhang, Xuemiao Xu and
Weiwei Cai, Jing Qin, Kup-Sze Choi

Keywords Paper

Computer Vision, Recognition, Deep Learning, Semi-Supervised Learning

0

0

0

0

10:19

06/12/2021

Revealing and Protecting Labels in Distributed Training

Trung Dang, Om Thakkar, Swaroop Ramaswamy and
Rajiv Mathews, Peter Chin, Françoise Beaufays

Keywords Paper

machine learning, vision, privacy, federated learning

0

0

0

0

13:06

06/12/2021

Consistency Regularization for Variational Auto-Encoders

Samarth Sinha, Adji Bousso Dieng

Keywords Paper

deep learning, machine learning, self-supervised learning, generative model, contrastive learning, representation learning

0

0

0

0

10:52