Towards a better understanding of label smoothing in neural machine translation

Abstract: In order to combat overfitting and in pursuit of better generalization, label smoothing is widely applied in modern neural machine translation systems. The core idea is to penalize over-confident outputs and regularize the model so that its outputs do not diverge too much from some prior distribution. While training perplexity generally gets worse, label smoothing is found to consistently improve test performance. In this work, we aim to better understand label smoothing in the context of neural machine translation. Theoretically, we derive and explain exactly what label smoothing is optimizing for. Practically, we conduct extensive experiments by varying which tokens to smooth, tuning the probability mass to be deducted from the true targets and considering different prior distributions. We show that label smoothing is theoretically well-motivated, and by carefully choosing hyperparameters, the practical performance of strong neural machine translation systems can be further improved.

06/12/2021

Towards a better understanding of label smoothing in neural machine translation

Yingbo Gao, Weiyue Wang, Christian Herold, Zijian Yang, Hermann Ney

Comments

Similar Papers

SmoothMix: Training Confidence-calibrated Smoothed Classifiers for Certified Robustness

Jongheon Jeong, Sejun Park, Minkyu Kim and Heung-Chang Lee, Do-Guk Kim, Jinwoo Shin

Keywords Abstract Paper

deep learning, machine learning, robustness, adversarial robustness and security

Regularization via Structural Label Smoothing

Weizhi Li, Gautam Dasarathy, Visar Berisha

Keywords Abstract Paper

Improving Deep Learning Interpretability by Saliency Guided Training

Aya Abdelsalam Ismail, Hector Corrada Bravo, Soheil Feizi

Keywords Abstract Paper

deep learning, transformers, vision, language, interpretability

Risk Bounds for Over-parameterized Maximum Margin Classification on Sub-Gaussian Mixtures

Yuan Cao, Quanquan Gu, Mikhail Belkin

Keywords Abstract Paper

deep learning, machine learning

Overcoming the curse of dimensionality with Laplacian regularization in semi-supervised learning

Vivien Cabannes, Loucas Pillaud-Vivien, Francis Bach, Alessandro Rudi

Keywords Abstract Paper

machine learning, kernel methods, semi-supervised learning

Training Binary Neural Networks through Learning with Noisy Supervision

Kai Han, Yunhe Wang, Yixing Xu and Chunjing Xu, Enhua Wu, Chang Xu

Keywords Abstract Paper

Unsupervised and Semi-Supervised Learning

ReLU Regression with Massart Noise

Ilias Diakonikolas, Jong Ho Park, Christos Tzamos

Keywords Abstract Paper

On the interplay between data structure and loss function in classification problems

Stéphane d'Ascoli, Marylou Gabrié, Levent Sagun, Giulio Biroli

Keywords Abstract Paper

deep learning, machine learning

Frequency-based Search-control in Dyna

Yangchen Pan, Jincheng Mei, Amir-massoud Farahmand

Keywords Abstract Paper

Model-based reinforcement learning, search-control, Dyna, frequency of a signal

On Warm-Starting Neural Network Training

Jordan Ash, Ryan Adams

Keywords Abstract Paper

On Linear Identifiability of Learned Representations

Geoffrey Roeder, Luke Metz, Durk Kingma

Keywords Abstract Paper

Deep Learning, Embedding and Representation learning

Adversarially robust estimate and risk analysis in linear regression

Yue Xing, Ruizhi Zhang, Guang Cheng

Keywords Abstract Paper

The Generalization-Stability Tradeoff In Neural Network Pruning

Brian Bartoldson, Ari Morcos, Adrian Barbu, Gordon Erlebacher

Keywords Abstract Paper

Extrapolation for Large-batch Training in Deep Learning

Tao LIN, Lingjing Kong, Sebastian Stich, Martin Jaggi

Keywords Abstract Paper

Deep Learning - Algorithms

Leveraging Recursive Gumbel-Max Trick for Approximate Inference in Combinatorial Spaces

Kirill Struminsky, Artyom Gadetsky, Denis Rakitin and Danil Karpushkin, Dmitry Vetrov

Keywords Abstract Paper

deep learning, optimization

Post-Estimation Smoothing: A Simple Baseline for Learning with Side Information

Esther Rolf, Michael Jordan, Benjamin Recht

Keywords Abstract Paper

Bayesian Adaptation for Covariate Shift

Aurick Zhou, Sergey Levine

Keywords Abstract Paper

deep learning, machine learning, robustness, vision, domain adaptation

Robust Unsupervised Learning via L-statistic Minimization

Andreas Maurer, Daniela Angela Parletta, Andrea Paudice, Massimiliano Pontil

Keywords Abstract Paper

Theory, Statistical Learning Theory

Instabilities of Offline RL with Pre-Trained Neural Representation

Ruosong Wang, Yifan Wu, Russ Salakhutdinov, Sham Kakade

Keywords Abstract Paper

Reinforcement Learning and Planning

Tailoring: encoding inductive biases by optimizing unsupervised objectives at prediction time

Ferran Alet, Maria Bauza, Kenji Kawaguchi and Nurullah Giray Kuru, Tomás Lozano-Pérez, Leslie Kaelbling

Keywords Abstract Paper

deep learning, optimization, machine learning, self-supervised learning, meta learning

Implicit Bias of SGD for Diagonal Linear Networks: a Provable Benefit of Stochasticity

Scott Pesme, Loucas Pillaud-Vivien, Nicolas Flammarion

Jongheon Jeong, Sejun Park, Minkyu Kim and
Heung-Chang Lee, Do-Guk Kim, Jinwoo Shin

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Kai Han, Yunhe Wang, Yixing Xu and
Chunjing Xu, Enhua Wu, Chang Xu

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Kirill Struminsky, Artyom Gadetsky, Denis Rakitin and
Danil Karpushkin, Dmitry Vetrov

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Ferran Alet, Maria Bauza, Kenji Kawaguchi and
Nurullah Giray Kuru, Tomás Lozano-Pérez, Leslie Kaelbling

Keywords Paper

Keywords Paper

Yue Sun, Adhyyan Narang, Ibrahim Gulluk and
Samet Oymak, Maryam Fazel

Keywords Paper

Keywords Paper

Keywords Paper

Tim Seyde, Igor Gilitschenski, Wilko Schwarting and
Bartolomeo Stellato, Martin Riedmiller, Markus Wulfmeier, Daniela Rus

Keywords Paper

Pengfei Chen, Guangyong Chen, Junjie Ye and
jingwei zhao, Pheng-Ann Heng

Keywords Paper

Junjiao Tian, Yen-Cheng Liu, Nathaniel Glaser and
Yen-Chang Hsu, Zsolt Kira

Keywords Paper

Yaoqing Yang, Rajiv Khanna, Yaodong Yu and
Amir Gholami, Kurt Keutzer, Joseph Gonzalez, Kannan Ramchandran, Michael W Mahoney

Keywords Paper

Keywords Paper

Bahar Taskesen, Man Chung Yue, Jose Blanchet and
Daniel Kuhn, Viet Anh Nguyen

Keywords Paper

Lingxiao Wang, Jing Huang, Kevin Huang and
Ziniu Hu, Guangtao Wang, Quanquan Gu

Keywords Paper

Keywords Paper

Songzhu Zheng, Pengxiang Wu, Aman Goswami and
Mayank Goswami, Dimitris Metaxas, Chao Chen

Keywords Paper

Keywords Paper

Qizhou Wang, Bo Han, Tongliang Liu and
Gang Niu, Jian Yang, Chen Gong

Keywords Paper

Mohammad Pezeshki, Oumar Kaba, Yoshua Bengio and
Aaron Courville, Doina Precup, Guillaume Lajoie

Keywords Paper