On the Softmax Bottleneck of Recurrent Language Models

02/02/2021

On the Softmax Bottleneck of Recurrent Language Models

Dwarak Govind Parthiban, Yongyi Mao, Diana Inkpen

Keywords:

Abstract Paper Similar Papers

Abstract: Recent research has pointed to a limitation of word-level neural language models with softmax outputs. This limitation, known as the softmax bottleneck refers to the inability of these models to produce high-rank log probability (log P) matrices. Various solutions have been proposed to break this bottleneck, including Mixture of Softmaxes, SigSoftmax, and Linear Monotonic Softmax with Piecewise Linear Increasing Functions. They were reported to offer better performance in terms of perplexity on test data. A natural perception from these results is a strong positive correlation between the rank of the log P matrix and the model's performance. In this work, we show via an extensive empirical study that such a correlation is fairly weak and that the high-rank of the log P matrix is neither necessary nor sufficient for better test perplexity. Although our results are empirical, they are established in part via the construction of a rich family of models, which we call Generalized SigSoftmax. They are able to create diverse ranks for the log P matrices. We also present an investigation as to why the proposed solutions achieve better performance.

The video of this talk cannot be embedded. You can watch it here:

https://slideslive.com/38949026

(Link will open in new window)

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at AAAI 2021 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

12/07/2020

Normalized Loss Functions for Deep Learning with Noisy Labels

Xingjun Ma, Hanxun Huang, Yisen Wang and
Simone Romano, Sarah Erfani, James Bailey

Keywords Paper

Supervised Learning

0

0

0

0

16:00

18/07/2021

Implicit Bias of Linear RNNs

Melika Emami, Moji Sahraee-Ardakan, Parthe Pandit and
Sundeep Rangan, Alyson Fletcher

Keywords Paper

Theory, Deep learning Theory

0

0

0

0

5:34

12/07/2020

Being Bayesian, Even Just a Bit, Fixes Overconfidence in ReLU Networks

Agustinus Kristiadi, Matthias Hein, Philipp Hennig

Keywords Paper

Deep Learning - General

0

0

0

0

15:02

06/12/2020

On the Expressiveness of Approximate Inference in Bayesian Neural Networks

Andrew Foong, David Burt, Yingzhen Li, Richard Turner

Keywords Paper

0

0

0

0

3:23

12/07/2020

Towards Understanding the Dynamics of the First-Order Adversaries

Zhun Deng, Hangfeng He, Jiaoyang Huang, Weijie Su

Keywords Paper

Adversarial Examples

0

0

0

0

11:05

06/12/2021

Task-Agnostic Undesirable Feature Deactivation Using Out-of-Distribution Data

Dongmin Park, Hwanjun Song, Minseok Kim, Jae-Gil Lee

Keywords Paper

deep learning, machine learning

0

0

0

0

14:30

19/08/2021

Learning Deeper Non-Monotonic Networks by Softly Transferring Solution Space

Zheng-Fan Wu, Hui Xue, Weimin Bai

Keywords Paper

Machine Learning, Kernel Methods, Deep Learning, Classification

0

0

0

0

12:50

16/11/2020

Multi-document Summarization with Maximal Marginal Relevance-guided Reinforcement Learning

Yuning Mao, Yanru Qu, Yiqing Xie and
Xiang Ren, Jiawei Han

Keywords Paper

single-document summarization, single-document sds, multi-document summarization, multi-document mds

0

0

0

0

10:58

30/11/2020

Hyperparameter-Free Out-of-Distribution Detection Using Cosine Similarity

Engkarat Techapanurak, Masanori Suganuma, Takayuki Okatani

Keywords Paper

0

0

0

0

7:48

06/12/2021

Local Signal Adaptivity: Provable Feature Learning in Neural Networks Beyond Kernels

Stefani Karp, Ezra Winston, Yuanzhi Li, Aarti Singh

Keywords Paper

theory, deep learning, optimization, machine learning, vision, kernel methods

0

0

0

0

13:22

12/07/2020

Two Routes to Scalable Credit Assignment without Weight Symmetry

Daniel Kunin, Aran Nayebi, Javier Sagastuy-Brena and
Surya Ganguli, Jonathan Bloom, Daniel Yamins

Keywords Paper

Applications - Neuroscience, Cognitive Science, Biology and Health

0

0

0

1

14:12

03/05/2021

Are Neural Rankers still Outperformed by Gradient Boosted Decision Trees?

Zhen Qin, Le Yan, Honglei Zhuang and
Yi Tay, Rama Kumar Pasumarthi, Xuanhui Wang, Michael Bendersky, Marc Najork

Keywords Paper

gradient boosted decision trees, Learning to Rank, benchmark, neural network

0

0

0

0

9:37

12/07/2020

Efficient and Scalable Bayesian Neural Nets with Rank-1 Factors

Mike Dusenberry, Ghassen Jerfel, Yeming Wen and
Yian Ma, Jasper Snoek, Katherine Heller, Balaji Lakshminarayanan, Dustin Tran

Keywords Paper

Deep Learning - General

0

0

0

1

14:29

04/07/2020

Improved Natural Language Generation via Loss Truncation

Daniel Kang, Tatsunori Hashimoto

Keywords Paper

Natural Generation, optimization, estimation, distinguishability

0

0

0

0

10:35

06/12/2020

The Implications of Local Correlation on Learning Some Deep Functions

Eran Malach, Shai Shalev-Shwartz

Keywords Paper

0

0

0

0

3:07

06/12/2020

Adversarial robustness via robust low rank representations

Pranjal Awasthi, Himanshu Jain, Ankit Singh Rawat, Aravindan Vijayaraghavan

Keywords Paper

0

0

0

1

3:14

26/08/2020

Approximate Cross-Validation in High Dimensions with Guarantees

William Stephenson, Tamara Broderick

Keywords Paper

0

0

1

1

14:35

14/06/2020

Adversarial Vertex Mixup: Toward Better Adversarially Robust Generalization

Saehyung Lee, Hyungyu Lee, Sungroh Yoon

Keywords Paper

adversarial training, adversarially robust generalization, mixup, adversarial defense, adversarial examples, adversarial robustness, security

0

0

0

0

5:01

02/02/2021

Learning from Noisy Labels with Complementary Loss Functions

Deng-Bao Wang, Yong Wen, Lujia Pan, Min-Ling Zhang

Keywords Paper

0

0

0

0

14:00

26/04/2020

Understanding the Limitations of Conditional Generative Models

Ethan Fetaya, Joern-Henrik Jacobsen, Will Grathwohl, Richard Zemel

Keywords Paper

Conditional Generative Models, Generative Classifiers, Robustness, Adversarial Examples

0

0

0

0

4:46

12/07/2020

Revisiting Spatial Invariance with Low-Rank Local Connectivity

Gamaleldin Elsayed, Prajit Ramachandran, Jon Shlens, Simon Kornblith

Keywords Paper

Deep Learning - General

0

0

0

0

14:48

06/12/2021

Improving Calibration through the Relationship with Adversarial Robustness

Yao Qin, Xuezhi Wang, Alex Beutel, Ed Chi

Keywords Paper

deep learning, machine learning, robustness, adversarial robustness and security

0

0

0

0

14:15

18/07/2021

PC-MLP: Model-based Reinforcement Learning with Policy Cover Guided Exploration

Yuda Song, Wen Sun

Keywords Paper

Reinforcement Learning and Planning

0

0

0

0

5:13

06/12/2020

Bayes Consistency vs. H-Consistency: The Interplay between Surrogate Loss Functions and the Scoring Function Class

Mingyuan Zhang, Shivani Agarwal

Keywords Paper

0

0

0

0

3:19

26/04/2020

Stable Rank Normalization for Improved Generalization in Neural Networks and GANs

Amartya Sanyal, Philip H. Torr, Puneet K. Dokania

Keywords Paper

Generelization, regularization, empirical lipschitz

0

0

0

0

5:25

02/02/2021

Uncertainty Quantification in CNN Through the Bootstrap of Convex Neural Networks

Hongfei Du, Emre Barut, Fang Jin

Keywords Paper

0

0

0

0

13:45

06/12/2021

BooVI: Provably Efficient Bootstrapped Value Iteration

Boyi Liu, Qi Cai, Zhuoran Yang, Zhaoran Wang

Keywords Paper

theory, deep learning, reinforcement learning and planning

0

0

0

0

13:02

18/07/2021

What Are Bayesian Neural Network Posteriors Really Like?

Pavel Izmailov, Sharad Vikram, Matt Hoffman, Andrew Wilson

Keywords Paper

Deep Learning, Bayesian Deep Learning

0

0

0

0

17:13

03/05/2021

Deciphering and Optimizing Multi-Task Learning: a Random Matrix Approach

Malik Tiomoko, Hafiz Tiomoko Ali, Romain Couillet

Keywords Paper

Transfer Learning, Random Matrix Theory, Multi Task Learning

0

0

0

0

11:15

06/12/2021

An Infinite-Feature Extension for Bayesian ReLU Nets That Fixes Their Asymptotic Overconfidence

Agustinus Kristiadi, Matthias Hein, Philipp Hennig

Keywords Paper

deep learning, kernel methods

0

0

0

0

10:57

13/04/2021

Goodness-of-fit test for mismatched self-exciting processes

Song Wei, Shixiang Zhu, Minghe Zhang, Yao Xie

Keywords Paper

0

0

0

0

2:48

26/04/2020

Distributionally Robust Neural Networks

Shiori Sagawa, Pang Wei Koh, Tatsunori B. Hashimoto, Percy Liang

Keywords Paper

distributionally robust optimization, deep learning, robustness, generalization, regularization

0

0

0

1

5:22

12/07/2020

Maximum-and-Concatenation Networks

Xingyu Xie, Hao Kong, Jianlong Wu and
Wayne Zhang, Guangcan Liu, Zhouchen Lin

Keywords Paper

Deep Learning - Theory

0

0

0

0

14:05

06/12/2021

Functional Neural Networks for Parametric Image Restoration Problems

Fangzhou Luo, Xiaolin Wu, Yanhui Guo

Keywords Paper

deep learning

0

0

0

0

3:29

12/07/2020

Angular Visual Hardness

Beidi Chen, Weiyang Liu, Zhiding Yu and
Jan Kautz, Anshumali Shrivastava, Animesh Garg, Anima Anandkumar

Keywords Paper

Deep Learning - General

0

0

0

0

15:24

06/12/2021

Locally Valid and Discriminative Prediction Intervals for Deep Learning Models

Zhen Lin, Shubhendu Trivedi, Jimeng Sun

Keywords Paper

deep learning

0

0

0

0

12:05

14/06/2020

DeepDeform: Learning Non-Rigid RGB-D Reconstruction With Semi-Supervised Data

Aljaž Božič, Michael Zollhöfer, Christian Theobalt, Matthias Nießner

Keywords Paper

non-rigid reconstruction, non-rigid tracking, dataset, benchmark, correspondence prediction, heatmap network, rgb-d, single camera, least squares optimization

0

0

0

0

1:00

06/12/2020

The Pitfalls of Simplicity Bias in Neural Networks

Harshay Shah, Kaustav Tamuly, Aditi Raghunathan and
Prateek Jain, Praneeth Netrapalli

Keywords Paper

0

0

0

0

3:20

13/04/2021

Exponential convergence rates of classification errors on learning with SGD and random features

Shingo Yashima, Atsushi Nitanda, Taiji Suzuki

Keywords Paper

0

0

0

0

2:58

12/07/2020

Detecting Out-of-Distribution Examples with Gram Matrices

Chandramouli Shama Sastry, Sageev Oore

Keywords Paper

Fairness, Equity, Justice, and Safety

0

0

0

0

12:40