Generalized Entropy Regularization or: There's Nothing Special about Label Smoothing

04/07/2020

Generalized Entropy Regularization or: There's Nothing Special about Label Smoothing

Clara Meister, Elizabeth Salesky, Ryan Cotterell

Keywords: label smoothing, language tasks, Generalized Regularization, Label Smoothing

Abstract Paper Similar Papers

Abstract: Prior work has explored directly regularizing the output distributions of probabilistic models to alleviate peaky (i.e. over-confident) predictions, a common sign of overfitting. This class of techniques, of which label smoothing is one, has a connection to entropy regularization. Despite the consistent success of label smoothing across architectures and data sets in language generation tasks, two problems remain open: (1) there is little understanding of the underlying effects entropy regularizers have on models, and (2) the full space of entropy regularization techniques is largely unexplored. We introduce a parametric family of entropy regularizers, which includes label smoothing as a special case, and use it to gain a better understanding of the relationship between the entropy of a model and its performance on language generation tasks. We also find that variance in model performance can be explained largely by the resulting entropy of the model. Lastly, we find that label smoothing provably does not allow for sparsity in an output distribution, an undesirable property for language generation models, and therefore advise the use of other entropy regularization methods in its place.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at ACL 2020 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

06/12/2021

Beware of the Simulated DAG! Causal Discovery Benchmarks May Be Easy to Game

Alexander Reisach, Christof Seiler, Sebastian Weichwald

Keywords Paper

optimization, graph learning, causality

0

0

0

0

14:13

26/04/2020

Learning The Difference That Makes A Difference With Counterfactually-Augmented Data

Divyansh Kaushik, Eduard Hovy, Zachary Lipton

Keywords Paper

humans in the loop, annotation artifacts, text classification, sentiment analysis, natural language inference

0

0

0

0

4:25

16/11/2020

Calibrated Language Model Fine-Tuning for In- and Out-of-Distribution Data

Lingkai Kong, Haoming Jiang, Yuchen Zhuang and
Jie Lyu, Tuo Zhao, Chao Zhang

Keywords Paper

augmented training, in-distribution calibration, text classification, expectation error

0

0

0

0

11:47

04/07/2020

Towards Transparent and Explainable Attention Models

Akash Kumar Mohankumar, Preksha Nema, Sharan Narasimhan and
Mitesh M. Khapra, Balaji Vasan Srinivasan, Balaraman Ravindran

Keywords Paper

interpretability distributions, attention mechanisms, Human evaluations, Transparent Models

0

0

0

0

11:58

06/12/2020

Dual T: Reducing Estimation Error for Transition Matrix in Label-noise Learning

Nick Yao, Tongliang Liu, Bo Han and
Mingming Gong, Jiankang Deng, Gang Niu, Masashi Sugiyama

Keywords Paper

, Optimization -> Non-Convex Optimization

0

0

0

0

3:15

05/12/2020

Towards a better understanding of label smoothing in neural machine translation

Yingbo Gao, Weiyue Wang, Christian Herold and
Zijian Yang, Hermann Ney

Keywords Paper

0

0

0

0

13:37

18/11/2020

Robust deep ordinal regression under label noise

Bhanu Garg, Naresh Manwani

Keywords Paper

0

0

0

0

12:03

06/12/2021

Bayesian Adaptation for Covariate Shift

Aurick Zhou, Sergey Levine

Keywords Paper

deep learning, machine learning, robustness, vision, domain adaptation

0

0

0

0

8:21

06/12/2021

Refining Language Models with Compositional Explanations

Huihan Yao, Ying Chen, Qinyuan Ye and
Xisen Jin, Xiang Ren

Keywords Paper

machine learning, fairness, language

0

0

0

0

13:17

02/02/2021

MASKER: Masked Keyword Regularization for Reliable Text Classification

Seung Jun Moon, Sangwoo Mo, Kimin Lee and
Jaeho Lee, Jinwoo Shin

Keywords Paper

0

0

0

0

15:05

06/12/2021

Scaling Ensemble Distribution Distillation to Many Classes with Proxy Targets

Max Ryabinin, Andrey Malinin, Mark Gales

Keywords Paper

machine learning

0

0

0

0

12:36

06/12/2021

Consistency Regularization for Variational Auto-Encoders

Samarth Sinha, Adji Bousso Dieng

Keywords Paper

deep learning, machine learning, self-supervised learning, generative model, contrastive learning, representation learning

0

0

0

0

10:52

06/12/2021

Trustworthy Multimodal Regression with Mixture of Normal-inverse Gamma Distributions

Huan Ma, Zongbo Han, Changqing Zhang and
Huazhu Fu, Joey Tianyi Zhou, Qinghua Hu

Keywords Paper

0

0

0

0

5:37

03/05/2021

Empirical Analysis of Unlabeled Entity Problem in Named Entity Recognition

Yangming Li, lemao liu, Shuming Shi

Keywords Paper

Negative Sampling, Unlabeled Entity Problem, Named Entity Recognition

0

0

0

1

4:49

26/08/2020

A Unified Statistically Efficient Estimation Framework for Unnormalized Models

Masatoshi Uehara, Takafumi Kanamori, Takashi Takenouchi, Takeru Matsuda

Keywords Paper

0

0

0

0

13:58

02/02/2021

IsoBN: Fine-Tuning BERT with Isotropic Batch Normalization

Wenxuan Zhou, Bill Yuchen Lin, Xiang Ren

Keywords Paper

0

0

0

0

16:25

02/02/2021

Improving Commonsense Causal Reasoning by Adversarial Training and Data Augmentation

Ieva Staliūnaitė, Philip John Gorinski, Ignacio Iacobacci

Keywords Paper

0

0

0

0

16:40

26/04/2020

Adversarially Robust Representations with Smooth Encoders

Taylan Cemgil, Sumedh Ghaisas, Krishnamurthy (Dj) Dvijotham, Pushmeet Kohli

Keywords Paper

Adversarial Learning, Robust Representations, Variational AutoEncoder, Wasserstein Distance, Variational Inference

0

0

0

0

5:16

03/05/2021

Multi-Class Uncertainty Calibration via Mutual Information Maximization-based Binning

Kanil Patel, William H Beluch, Bin Yang and
Michael Pfeiffer, Dan Zhang

Keywords Paper

deep neural networks, histogram binning, post-hoc calibration, uncertainty calibration, mutual information

0

0

0

0

5:13

01/07/2020

Memory-bounded Neural Incremental Parsing for Psycholinguistic Prediction

Lifeng Jin, William Schuler

Keywords Paper

0

0

0

0

14:19

01/07/2020

Syntactic Parsing in Humans and Machines

Paola Merlo

Keywords Paper

0

0

0

0

44:12

03/05/2021

Learning with Instance-Dependent Label Noise: A Sample Sieve Approach

Hao Cheng, Zhaowei Zhu, Xingyu Li and
Yifei Gong, Xing Sun, Yang Liu

Keywords Paper

deep neural networks., instance-based label noise, Learning with noisy labels

0

0

0

0

5:18

12/07/2020

Calibration, Entropy Rates, and Memory in Language Models

Mark Braverman, Xinyi Chen, Sham Kakade and
Karthik Narasimhan, Cyril Zhang, Yi Zhang

Keywords Paper

Applications - Language, Speech and Dialog

0

0

0

0

6:56

26/08/2020

Independent Subspace Analysis for Unsupervised Learning of Disentangled Representations

Jan Stuehmer, Richard Turner, Sebastian Nowozin

Keywords Paper

0

0

0

0

11:43

04/07/2020

Mind the Trade-off: Debiasing NLU Models without Degrading the In-distribution Performance

Prasetya Ajie Utama, Nafise Sadat Moosavi, Iryna Gurevych

Keywords Paper

Debiasing Models, natural tasks, NLU tasks, debiasing methods

0

0

0

1

11:09

14/09/2020

Learning Gradient Boosted Multi-label Classification Rules

Michael Rapp, Eneldo Loza Mencía, Johannes Fürnkranz and
Vu-Linh Nguyen, Eyke Hüllermeier

Keywords Paper

multi-label classification, gradient boosting, rule learning

0

0

0

0

15:45

06/12/2020

Autoencoders that don't overfit towards the Identity

Harald Steck

Keywords Paper

0

0

0

0

3:22

19/04/2021

Measuring and improving faithfulness of attention in neural machine translation

Pooya Moradi, Nishant Kambhatla, Anoop Sarkar

Keywords Paper

0

1

1

1

10:55

19/08/2021

Partial Multi-Label Optimal Margin Distribution Machine

Nan Cao, Teng Zhang, Hai Jin

Keywords Paper

Machine Learning, Classification, Multi-instance; Multi-label; Multi-view learning, Weakly Supervised Learning

0

0

0

0

11:43

03/05/2021

Noise against noise: stochastic label noise helps combat inherent label noise

Pengfei Chen, Guangyong Chen, Junjie Ye and
jingwei zhao, Pheng-Ann Heng

Keywords Paper

Regularization, SGD noise, Robust Learning, Noisy Labels

0

0

0

0

9:42

04/07/2020

Towards Robustifying NLI Models Against Lexical Dataset Biases

Xiang Zhou, Mohit Bansal

Keywords Paper

Natural Inference, data augmentation, Robustifying Models, deep models

0

0

0

0

11:34

03/05/2021

A Mathematical Exploration of Why Language Models Help Solve Downstream Tasks

Nikunj Saunshi, Sadhika Malladi, Sanjeev Arora

Keywords Paper

representation learning, self-supervised learning, language models, theory, transfer learning, natural language processing, unsupervised learning

0

0

0

0

5:16

04/07/2020

Improved Natural Language Generation via Loss Truncation

Daniel Kang, Tatsunori Hashimoto

Keywords Paper

Natural Generation, optimization, estimation, distinguishability

0

0

0

0

10:35

04/07/2020

On Importance Sampling-Based Evaluation of Latent Language Models

Robert L Logan IV, Matt Gardner, Sameer Singh

Keywords Paper

Importance Models, likelihood-based evaluation, Language models, importance sampling

0

0

0

0

7:20

06/12/2020

Promoting Stochasticity for Expressive Policies via a Simple and Efficient Regularization Method

Qi Zhou, Yufei Kuang, Zherui Qiu and
Houqiang Li, Jie Wang

Keywords Paper

0

0

0

0

3:10

06/12/2020

Provably Good Batch Off-Policy Reinforcement Learning Without Great Exploration

Yao Liu, Adith Swaminathan, Alekh Agarwal, Emma Brunskill

Keywords Paper

0

0

0

0

3:17

25/07/2020

Combining contextualized and non-contextualized query translations to improve CLIR

Suraj Nair, Petra Galuscakova, Douglas W. Oard

Keywords Paper

CLIR, machine translation

0

0

0

0

8:39

06/12/2020

Bayes Consistency vs. H-Consistency: The Interplay between Surrogate Loss Functions and the Scoring Function Class

Mingyuan Zhang, Shivani Agarwal

Keywords Paper

0

0

0

0

3:19

06/12/2020

Self-training Avoids Using Spurious Features Under Domain Shift

Yining Chen, Colin Wei, Ananya Kumar, Tengyu Ma

Keywords Paper

0

0

0

0

3:18

16/11/2020

Syntactic Structure Distillation Pretraining for Bidirectional Encoders

Adhiguna Kuncoro, Lingpeng Kong, Daniel Fried and
Dani Yogatama, Laura Rimell, Chris Dyer, Phil Blunsom

Keywords Paper

bert pretraining, structured tasks, natural understanding, textual learners

0

0

0

0

12:23