The Lipschitz Constant of Self-Attention

18/07/2021

The Lipschitz Constant of Self-Attention

Hyunjik Kim, George Papamakarios, Andriy Mnih

Keywords: Theory, Deep learning Theory

Abstract Paper Similar Papers

Abstract: Lipschitz constants of neural networks have been explored in various contexts in deep learning, such as provable adversarial robustness, estimating Wasserstein distance, stabilising training of GANs, and formulating invertible neural networks. Such works have focused on bounding the Lipschitz constant of fully connected or convolutional networks, composed of linear maps and pointwise non-linearities. In this paper, we investigate the Lipschitz constant of self-attention, a non-linear neural network module widely used in sequence modelling. We prove that the standard dot-product self-attention is not Lipschitz for unbounded input domain, and propose an alternative L2 self-attention that is Lipschitz. We derive an upper bound on the Lipschitz constant of L2 self-attention and provide empirical evidence for its asymptotic tightness. To demonstrate the practical relevance of our theoretical work, we formulate invertible self-attention and use it in a Transformer-based architecture for a character-level language modelling task.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at ICML 2021 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

06/12/2021

Robust Implicit Networks via Non-Euclidean Contractions

Saber Jafarpour, Alexander Davydov, Anton Proskurnikov, Francesco Bullo

Keywords Paper

theory, deep learning, machine learning, robustness, vision

0

0

0

0

14:59

06/12/2020

Learning Optimal Representations with the Decodable Information Bottleneck

Yann Dubois, Douwe Kiela, David Schwab, Ramakrishna Vedantam

Keywords Paper

0

0

0

0

3:13

06/12/2020

Semialgebraic Optimization for Lipschitz Constants of ReLU Networks

Tong Chen, Jean Lasserre, Victor Magron, Edouard Pauwels

Keywords Paper

0

0

0

0

3:22

26/08/2020

Rep the Set: Neural Networks for Learning Set Representations

Konstantinos Skianis, Giannis Nikolentzos, Stratis Limnios, Michalis Vazirgiannis

Keywords Paper

0

0

0

0

14:19

06/12/2021

Even your Teacher Needs Guidance: Ground-Truth Targets Dampen Regularization Imposed by Self-Distillation

Kenneth Borup, Lars N Andersen

Keywords Paper

theory, deep learning, optimization

0

0

0

0

6:00

03/05/2021

NOVAS: Non-convex Optimization via Adaptive Stochastic Search for End-to-end Learning and Control

Ioannis Exarchos, Marcus A Pereira, Ziyi Wang, Evangelos Theodorou

Keywords Paper

deep neural networks, deep FBSDEs, stochastic control, nested optimization

0

0

0

0

5:35

14/06/2020

MTL-NAS: Task-Agnostic Neural Architecture Search Towards General-Purpose Multi-Task Learning

Yuan Gao, Haoping Bai, Zequn Jie and
Jiayi Ma, Kui Jia, Wei Liu

Keywords Paper

neural architecture search, general-purpose multi-task learning, task-agnostic search space, single-shot gradient-based search algorithm, minimal entropy regularization

0

0

1

0

1:00

18/07/2021

The Heavy-Tail Phenomenon in SGD

Mert Gurbuzbalaban, Umut Simsekli, Lingjiong Zhu

Keywords Paper

Optimization, Stochastic Optimization

0

0

0

0

5:37

06/12/2021

Domain Adaptation with Invariant Representation Learning: What Transformations to Learn?

Petar Stojanov, Zijian Li, Mingming Gong and
Ruichu Cai, Jaime Carbonell, Kun Zhang

Keywords Paper

deep learning, machine learning, adversarial robustness and security, domain adaptation, representation learning, transfer learning

0

0

0

0

15:02

26/04/2020

Convolutional Conditional Neural Processes

Jonathan Gordon, Wessel P. Bruinsma, Andrew Y. K. Foong and
James Requeima, Yann Dubois, Richard E. Turner

Keywords Paper

Neural Processes, Deep Sets, Translation Equivariance

0

0

0

0

15:00

06/12/2021

Generalization Bounds For Meta-Learning: An Information-Theoretic Analysis

Qi CHEN, Changjian Shui, Mario Marchand

Keywords Paper

deep learning, meta learning, few shot learning

0

0

0

0

11:45

12/07/2020

Fiedler Regularization: Learning Neural Networks with Graph Sparsity

Edric Tam, David Dunson

Keywords Paper

Supervised Learning

0

0

0

0

15:31

06/12/2020

Lipschitz Bounds and Provably Robust Training by Laplacian Smoothing

Vishaal Krishnan, Abed AlRahman Al Makdah, Fabio Pasqualetti

Keywords Paper

0

0

0

0

3:48

26/04/2020

Locally Constant Networks

Guang-He Lee, Tommi S. Jaakkola

Keywords Paper

0

0

0

0

4:44

07/09/2020

Lifted Regression/Reconstruction Networks

Rasmus Høier, Christopher Zach

Keywords Paper

Lifted neural networks, Lipschitz continuity, adversarial robustness, energy-based models

0

0

0

0

8:23

03/05/2021

Learning explanations that are hard to vary

Giambattista Parascandolo, Alexander Neitz, Antonio Orvieto and
Luigi Gresele, Bernhard Schoelkopf

Keywords Paper

invariances, gradient alignment, consistency

0

0

0

0

5:16

06/12/2020

Non-Euclidean Universal Approximation

Anastasis Kratsios, Eugene Bilokopytov

Keywords Paper

0

0

0

0

3:34

13/04/2021

Graphical normalizing flows

Antoine Wehenkel, Gilles Louppe

Keywords Paper

0

0

0

0

3:04

26/04/2020

Differentiation of Blackbox Combinatorial Solvers

Marin Vlastelica Pogančić, Anselm Paulus, Vit Musil and
Georg Martius, Michal Rolinek

Keywords Paper

combinatorial algorithms, deep learning, representation learning, optimization

0

0

0

0

4:50

12/07/2020

Convolutional dictionary learning based auto-encoders for natural exponential-family distributions

Bahareh Tolooshams, Andrew Song, Simona Temereanca, Demba Ba

Keywords Paper

Deep Learning - Generative Models and Autoencoders

0

0

0

0

14:49

03/05/2021

Prototypical Contrastive Learning of Unsupervised Representations

Junnan Li, Pan Zhou, Caiming Xiong, Steven Hoi

Keywords Paper

self-supervised learning, unsupervised learning, representation learning, contrastive learning

0

0

0

0

4:51

03/05/2021

Neural Mechanics: Symmetry and Broken Conservation Laws in Deep Learning Dynamics

Daniel Kunin, Javier Sagastuy-Brena, Surya Ganguli and
Daniel L Yamins, Hidenori Tanaka

Keywords Paper

geometry, stochastic differential equation, symmetry, learning dynamics, modified equation analysis, conservation law, physics, gradient flow, loss landscape, hessian

0

0

0

0

4:36

06/12/2020

Dynamical mean-field theory for stochastic gradient descent in Gaussian mixture classification

Francesca Mignacco, Florent Krzakala, Pierfrancesco Urbani, Lenka Zdeborová

Keywords Paper

0

0

0

0

3:21

03/05/2021

A unifying view on implicit bias in training linear neural networks

Chulhee (Charlie) Yun, Shankar Krishnan, Hossein Mobahi

Keywords Paper

convergence, implicit bias, gradient flow, implicit regularization, gradient descent

0

0

0

0

5:24

12/07/2020

Optimistic bounds for multi-output learning

Henry Reeve, Ata Kaban

Keywords Paper

Supervised Learning

0

0

0

0

14:41

03/05/2021

Wasserstein-2 Generative Networks

Alexander Korotin, Vage Egiazarian, Arip Asadulaev and
Alexander Safin, Evgeny Burnaev

Keywords Paper

input-convex neural networks, cycle-consistency regularization, non-minimax optimization, optimal transport maps, wasserstein-2 distance

0

0

0

1

5:10

12/07/2020

Representation Learning via Adversarially-Contrastive Optimal Transport

Anoop Cherian, Shuchin Aeron

Keywords Paper

Representation Learning

0

0

0

0

14:47

06/12/2021

Meta-Learning for Relative Density-Ratio Estimation

Atsutoshi Kumagai, Tomoharu Iwata, Yasuhiro Fujiwara

Keywords Paper

deep learning, machine learning, meta learning

0

0

0

0

8:56

18/07/2021

On the Implicit Bias of Initialization Shape: Beyond Infinitesimal Mirror Descent

Shahar Azulay, Edward Moroshko, Mor Shpigel Nacson and
Blake Woodworth, Nati Srebro, Amir Globerson, Daniel Soudry

Keywords Paper

, Probabilistic Methods, MCMC, Theory, Deep learning Theory

0

0

0

0

15:38

12/07/2020

dS^2LBI: Exploring Structural Sparsity on Deep Network via Differential Inclusion Paths

Yanwei Fu, Chen Liu, Donghao Li and
Xinwei Sun, Jinshan ZENG, Yuan Yao

Keywords Paper

Deep Learning - Algorithms

0

0

0

1

12:45

06/12/2020

Untangling tradeoffs between recurrence and self-attention in artificial neural networks

Giancarlo Kerg, bhargav104 Kanuparthi, Anirudh Goyal ALIAS PARTH GOYAL and
Kyle Goyette, Yoshua Bengio, Guillaume Lajoie

Keywords Paper

0

0

0

0

3:20

06/12/2021

Simple Stochastic and Online Gradient Descent Algorithms for Pairwise Learning

ZHENHUAN YANG, Yunwen Lei, Puyu Wang and
Tianbao Yang, Yiming Ying

Keywords Paper

optimization, machine learning, privacy

0

0

0

0

14:40

18/07/2021

Variational Empowerment as Representation Learning for Goal-Conditioned Reinforcement Learning

Jongwook Choi, Archit Sharma, Honglak Lee and
Sergey Levine, Shixiang Gu

Keywords Paper

Neuroscience and Cognitive Science, Neuroscience, Reinforcement Learning and Planning, Algorithms, Representation Learning; Algorithms, Sparse Coding and Dimensionality Expansion; Applications, Matrix and Ten

0

0

0

0

5:16

12/07/2020

Graph Optimal Transport for Cross-Domain Alignment

Liqun Chen, Zhe Gan, Yu Cheng and
Linjie Li, Lawrence Carin, Jingjing Liu

Keywords Paper

Deep Learning - Algorithms

0

0

0

0

12:20

18/07/2021

Sparsifying Networks via Subdifferential Inclusion

Sagar Verma, Jean-Christophe Pesquet

Keywords Paper

Optimization, Convex Optimization

0

0

0

0

5:10

06/12/2021

Partition-Based Formulations for Mixed-Integer Optimization of Trained ReLU Neural Networks

Calvin Tsay, Jan Kronqvist, Alexander Thebelt, Ruth Misener

Keywords Paper

deep learning, optimization

0

0

0

0

10:54

06/12/2020

Directional convergence and alignment in deep learning

Ziwei Ji, Matus Telgarsky

Keywords Paper

0

0

0

0

3:21

13/04/2021

Implicit regularization via neural feature alignment

Aristide Baratin, Thomas George, César Laurent and
R Devon Hjelm, Guillaume Lajoie, Pascal Vincent, Simon Lacoste-Julien

Keywords Paper

0

0

0

0

3:15

06/12/2021

Rectangular Flows for Manifold Learning

Anthony Caterini, Gabriel Loaiza-Ganem, Geoff Pleiss, John Cunningham

Keywords Paper

deep learning, optimization, generative model

0

0

0

0

12:26

18/07/2021

Towards Understanding Learning in Neural Networks with Linear Teachers

Roei Sarussi, Alon Brutzkus, Amir Globerson

Keywords Paper

Probabilistic Methods, Theory, Probabilistic Methods, MCMC

0

0

0

0

5:22