Stolen Probability: A Structural Weakness of Neural Language Models

04/07/2020

Stolen Probability: A Structural Weakness of Neural Language Models

David Demeter, Gregory Kimmel, Doug Downey

Keywords: Neural Models, Neural NNLMs, NNLMs, softmax function

Abstract Paper Similar Papers

Abstract: Neural Network Language Models (NNLMs) generate probability distributions by applying a softmax function to a distance metric formed by taking the dot product of a prediction vector with all word vectors in a high-dimensional embedding space. The dot-product distance metric forms part of the inductive bias of NNLMs. Although NNLMs optimize well with this inductive bias, we show that this results in a sub-optimal ordering of the embedding space that structurally impoverishes some words at the expense of others when assigning probability. We present numerical, theoretical and empirical analyses which show that words on the interior of the convex hull in the embedding space have their probability bounded by the probabilities of the words on the hull.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at ACL 2020 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

09/07/2020

On the Multiple Descent of Minimum-Norm Interpolants and Restricted Lower Isometry of Kernels

Tengyuan Liang, Alexander Rakhlin, Xiyu Zhai

Keywords Paper

Supervised learning, Excess risk bounds and generalization error bounds, High-dimensional statistics, Kernel methods, Regression

0

0

0

0

14:56

18/07/2021

Exact Gap between Generalization Error and Uniform Convergence in Random Feature Models

Zitong Yang, Yu Bai, Song Mei

Keywords Paper

Theory, Deep learning Theory

0

0

0

0

5:40

12/07/2020

Being Bayesian about Categorical Probability

Taejong Joo, Uijung Chung, Min-Gwan Seo

Keywords Paper

Supervised Learning

0

0

0

0

12:26

03/05/2021

When does preconditioning help or hurt generalization?

Shun-ichi Amari, Jimmy Ba, Roger Grosse and
Chen Li, Atsushi Nitanda, Taiji Suzuki, Denny Wu, Ji Xu

Keywords Paper

high-dimensional asymptotics, generalization, second-order optimization, natural gradient descent

0

0

0

0

5:21

03/08/2020

Neural Likelihoods via Cumulative Distribution Functions

Pawel Chilinski, Ricardo Silva

Keywords Paper

0

0

0

0

8:07

06/12/2021

Analytic Study of Families of Spurious Minima in Two-Layer ReLU Neural Networks: A Tale of Symmetry II

Yossi Arjevani, Michael Field

Keywords Paper

theory, deep learning, optimization

0

0

0

0

8:40

02/02/2021

Improved Mutual Information Estimation

Youssef Mroueh, Igor Melnyk, Pierre Dognin and
Jarret Ross, Tom Sercu

Keywords Paper

0

0

0

0

18:46

06/12/2021

Non-asymptotic Error Bounds for Bidirectional GANs

Shiao Liu, Yunfei Yang, Jian Huang and
Yuling Jiao, Yang Wang

Keywords Paper

deep learning, generative model

0

0

0

0

13:23

13/04/2021

The spectrum of fisher information of deep networks achieving dynamical isometry

Tomohiro Hayase, Ryo Karakida

Keywords Paper

0

0

0

0

3:10

06/12/2020

Generalization error in high-dimensional perceptrons: Approaching Bayes error with convex optimization

Benjamin Aubin, Florent Krzakala, Yue Lu, Lenka Zdeborová

Keywords Paper

0

0

0

0

3:08

03/05/2021

Exploring the Uncertainty Properties of Neural Networks’ Implicit Priors in the Infinite-Width Limit

Ben Adlam, Jaehoon Lee, Lechao Xiao and
Jeffrey Pennington, Jasper Snoek

Keywords Paper

Deep Learning, Bayesian Neural Networks, Neural Network Gaussian Process, Infinite-Width Limit, Uncertainty, Gaussian Process

0

0

0

0

4:34

06/12/2020

Asymptotic normality and confidence intervals for derivatives of 2-layers neural network in the random features model

Yiwei Shen, Pierre C Bellec

Keywords Paper

0

0

0

0

3:12

12/07/2020

Frequency Bias in Neural Networks for Input of Non-Uniform Density

Ronen Basri, Meirav Galun, Amnon Geifman and
David Jacobs, Yoni Kasten, Shira Kritchman

Keywords Paper

Deep Learning - Theory

0

0

0

0

11:18

13/04/2021

Learning with gradient descent and weakly convex losses

Dominic Richards, Mike Rabbat

Keywords Paper

0

0

0

0

3:20

13/04/2021

Non-asymptotic performance guarantees for neural estimation of f-divergences

Sreejith Sreekumar, Zhengxin Zhang, Ziv Goldfeld

Keywords Paper

0

0

0

0

3:02

06/12/2020

Probabilistic Orientation Estimation with Matrix Fisher Distributions

David Mohlin, Josephine Sullivan, Gérald Bianchi

Keywords Paper

0

0

0

0

3:08

06/12/2021

Rectangular Flows for Manifold Learning

Anthony Caterini, Gabriel Loaiza-Ganem, Geoff Pleiss, John Cunningham

Keywords Paper

deep learning, optimization, generative model

0

0

0

0

12:26

06/12/2020

Faster Wasserstein Distance Estimation with the Sinkhorn Divergence

Lénaïc Chizat, Pierre Roussillon, Flavien Léger and
François-Xavier Vialard, Gabriel Peyré

Keywords Paper

0

0

1

1

3:21

06/12/2021

Lattice partition recovery with dyadic CART

OSCAR HERNAN MADRID PADILLA, Yi Yu, Alessandro Rinaldo

Keywords Paper

machine learning, graph learning

0

0

0

0

13:36

18/07/2021

Besov Function Approximation and Binary Classification on Low-Dimensional Manifolds Using Convolutional Residual Networks

Hao Liu, Minshuo Chen, Tuo Zhao, Wenjing Liao

Keywords Paper

Applications, Computer Vision, , Theory, Deep learning Theory

0

0

0

0

5:14

03/05/2021

Deep Networks and the Multiple Manifold Problem

Sam Buchanan, Dar Gilboa, John Wright

Keywords Paper

low-dimensional structure, overparameterized neural networks, deep learning

0

0

0

0

5:14

26/04/2020

Beyond Linearization: On Quadratic and Higher-Order Approximation of Wide Neural Networks

Yu Bai, Jason D. Lee

Keywords Paper

Neural Tangent Kernels, over-parametrized neural networks, deep learning theory

0

0

0

0

5:25

13/04/2021

Latent derivative bayesian last layer networks

Joe Watson, Jihao Andreas Lin, Pascal Klink and
Joni Pajarinen, Jan Peters

Keywords Paper

0

0

0

0

3:05

06/12/2021

Slice Sampling Reparameterization Gradients

David M Zoltowski, Diana Cai, Ryan Adams

Keywords Paper

optimization, machine learning, generative model

0

0

0

0

14:43

13/04/2021

vqSGD: Vector quantized stochastic gradient descent

Venkata Gandikota, Daniel Kane, Raj Kumar Maity, Arya Mazumdar

Keywords Paper

0

0

0

0

3:11

06/12/2020

A Contour Stochastic Gradient Langevin Dynamics Algorithm for Simulations of Multi-modal Distributions

Wei Deng, Guang Lin, Faming Liang

Keywords Paper

0

0

0

0

3:26

06/12/2021

Understanding Interlocking Dynamics of Cooperative Rationalization

Mo Yu, Yang Zhang, Shiyu Chang, Tommi Jaakkola

Keywords Paper

deep learning, language, interpretability

0

0

0

0

13:41

12/07/2020

Representation Learning via Adversarially-Contrastive Optimal Transport

Anoop Cherian, Shuchin Aeron

Keywords Paper

Representation Learning

0

0

0

0

14:47

06/12/2020

Neural Methods for Point-wise Dependency Estimation

Yao-Hung Hubert Tsai, Han Zhao, Makoto Yamada and
LP Morency, Russ Salakhutdinov

Keywords Paper

0

0

0

0

3:21

18/07/2021

Provable Generalization of SGD-trained Neural Networks of Any Width in the Presence of Adversarial Label Noise

Spencer Frei, Yuan Cao, Quanquan Gu

Keywords Paper

Applications, Fairness, Accountability, and Transparency, Algorithms, Classification; Algorithms, Online Learning, Theory, Deep learning Theory

0

0

0

0

5:00

18/07/2021

Amortized Conditional Normalized Maximum Likelihood: Reliable Out of Distribution Uncertainty Estimation

Aurick Zhou, Sergey Levine

Keywords Paper

Deep Learning, Bayesian Deep Learning

0

0

0

0

5:05

03/05/2021

Efficient Inference of Flexible Interaction in Spiking-neuron Networks

Feng Zhou, Yixuan Zhang, Jun Zhu

Keywords Paper

conjugacy, auxiliary latent variable, nonlinear Hawkes process, neural spike train

0

0

0

0

5:39

06/12/2020

Robustness Analysis of Non-Convex Stochastic Gradient Descent using Biased Expectations

Kevin Scaman, Cedric Malherbe

Keywords Paper

0

0

0

0

3:09

06/12/2020

Learning Bounds for Risk-sensitive Learning

Jaeho Lee, Sejun Park, Jinwoo Shin

Keywords Paper

0

0

0

0

3:02

06/12/2020

A Universal Approximation Theorem of Deep Neural Networks for Expressing Probability Distributions

Yulong Lu, Jianfeng Lu

Keywords Paper

0

0

0

0

2:55

18/07/2021

Bayesian Deep Learning via Subnetwork Inference

Erik Daxberger, Eric Nalisnick, James Allingham and
Javier Antorán, Jose Miguel Hernandez-Lobato

Keywords Paper

, Reinforcement Learning and Planning, Multi-Agent RL, Deep Learning, Bayesian Deep Learning

0

0

0

0

5:18

03/05/2021

Rethinking the Role of Gradient-based Attribution Methods for Model Interpretability

Suraj Srinivas, François Fleuret

Keywords Paper

Interpretability, saliency maps, score-matching

0

0

0

0

15:08

06/12/2020

Learning of Discrete Graphical Models with Neural Networks

Abhijith Jayakumar, Andrey Lokhov, Sidhant Misra, Marc Vuffray

Keywords Paper

Algorithms -> Density Estimation, Probabilistic Methods

0

0

0

0

3:11

03/05/2021

Optimal Rates for Averaged Stochastic Gradient Descent under Neural Tangent Kernel Regime

Atsushi Nitanda, Taiji Suzuki

Keywords Paper

stochastic gradient descent, neural tangent kernel, over-parameterization, two-layer neural network

0

0

0

0

18:48

09/07/2020

High probability guarantees for stochastic convex optimization

Damek Davis, Dmitriy Drusvyatskiy

Keywords Paper

Stochastic optimization, Computational complexity, Convex optimization, Excess risk bounds and generalization error bounds

0

0

0

0

15:10