Lipschitz normalization for self-attention layers with application to graph neural networks

18/07/2021

Lipschitz normalization for self-attention layers with application to graph neural networks

George Dasoulas, Kevin Scaman, Aladin Virmaux

Keywords: Deep Learning

Abstract Paper Similar Papers

Abstract: Attention based neural networks are state of the art in a large range of applications. However, their performance tends to degrade when the number of layers increases. In this work, we show that enforcing Lipschitz continuity by normalizing the attention scores can significantly improve the performance of deep attention models. First, we show that, for deep graph attention networks (GAT), gradient explosion appears during training, leading to poor performance of gradient-based training algorithms. To address this issue, we derive a theoretical analysis of the Lipschitz continuity of attention modules and introduce LipschitzNorm, a simple and parameter-free normalization for self-attention mechanisms that enforces the model to be Lipschitz continuous. We then apply LipschitzNorm to GAT and Graph Transformers and show that their performance is substantially improved in the deep setting (10 to 30 layers). More specifically, we show that a deep GAT model with LipschitzNorm achieves state of the art results for node label prediction tasks that exhibit long-range dependencies, while showing consistent improvements over their unnormalized counterparts in benchmark node classification tasks.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at ICML 2021 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

14/06/2020

Regularizing Class-Wise Predictions via Self-Knowledge Distillation

Sukmin Yun, Jongjin Park, Kimin Lee, Jinwoo Shin

Keywords Paper

image classification, regularization, self-knowledge distillation, generalization, calibration

0

0

0

0

1:01

02/02/2021

Learning from Noisy Labels with Complementary Loss Functions

Deng-Bao Wang, Yong Wen, Lujia Pan, Min-Ling Zhang

Keywords Paper

0

0

0

0

14:00

23/08/2020

Rethinking pruning for accelerating deep inference at the edge

Dawei Gao, Xiaoxi He, Zimu Zhou and
Yongxin Tong, Ke Xu, Lothar Thiele

Keywords Paper

automatic speech recognition, deep learning, name entity recognition, network pruning, sequence labelling

0

0

0

0

13:43

12/07/2020

Adversarial Filters of Dataset Biases

Ronan Le Bras, Swabha Swayamdipta, Chandra Bhagavatula and
Rowan Zellers, Matthew Peters, Ashish Sabharwal, Yejin Choi

Keywords Paper

Deep Learning - Algorithms

0

0

0

0

15:25

12/07/2020

Revisiting Spatial Invariance with Low-Rank Local Connectivity

Gamaleldin Elsayed, Prajit Ramachandran, Jon Shlens, Simon Kornblith

Keywords Paper

Deep Learning - General

0

0

0

0

14:48

06/12/2021

On Provable Benefits of Depth in Training Graph Convolutional Networks

Weilin Cong, Morteza Ramezani, Mehrdad Mahdavi

Keywords Paper

theory, deep learning, optimization, graph learning

0

0

0

0

11:36

12/07/2020

Extrapolation for Large-batch Training in Deep Learning

Tao LIN, Lingjing Kong, Sebastian Stich, Martin Jaggi

Keywords Paper

Deep Learning - Algorithms

0

0

0

0

13:21

14/06/2020

On the Regularization Properties of Structured Dropout

Ambar Pal, Connor Lane, René Vidal, Benjamin D. Haeffele

Keywords Paper

dropout, regularization, dropblock, dropconnect, neural networks, optimization, low rank, nuclear norm, k-support norm

0

0

0

0

1:01

26/04/2020

Distributionally Robust Neural Networks

Shiori Sagawa, Pang Wei Koh, Tatsunori B. Hashimoto, Percy Liang

Keywords Paper

distributionally robust optimization, deep learning, robustness, generalization, regularization

0

0

0

1

5:22

12/07/2020

Landscape Connectivity and Dropout Stability of SGD Solutions for Over-parameterized Neural Networks

Alexander Shevchenko, Marco Mondelli

Keywords Paper

Deep Learning - Theory

0

0

0

0

13:20

03/05/2021

No Cost Likelihood Manipulation at Test Time for Making Better Mistakes in Deep Networks

Shyamgopal Karthik, Ameya Prabhu, Puneet Dokania, Vineet Gandhi

Keywords Paper

Conditional Risk Minimization, Hierarchy-Aware Classification, Post-Hoc Correction

0

0

0

0

4:53

06/12/2020

Optimization and Generalization Analysis of Transduction through Gradient Boosting and Application to Multi-scale Graph Neural Networks

Kenta Oono, Taiji Suzuki

Keywords Paper

0

0

0

0

3:22

14/06/2020

WCP: Worst-Case Perturbations for Semi-Supervised Deep Learning

Liheng Zhang, Guo-Jun Qi

Keywords Paper

semi-supervised learning, worst-case perturbations, model-based robustness, sample-based robustness, additive perturbations, dropconnect perturbations

0

0

0

0

5:01

26/04/2020

PairNorm: Tackling Oversmoothing in GNNs

Lingxiao Zhao, Leman Akoglu

Keywords Paper

Graph Neural Network, oversmoothing, normalization

0

0

0

0

6:08

18/07/2021

GraphNorm: A Principled Approach to Accelerating Graph Neural Network Training

Tianle Cai, Shengjie Luo, Keyulu Xu and
Di He, Tie-Yan Liu, Liwei Wang

Keywords Paper

Deep Learning

0

0

0

0

4:48

22/11/2021

Siamese Prototypical Contrastive Learning

Shentong Mo, Zhun Sun, Chao Li

Keywords Paper

self-supervised learning, contrastive learning, representation learning

0

0

0

0

2:50

12/07/2020

Analyzing the effect of neural network architecture on training performance

Karthik Abinav Sankararaman, Soham De, Zheng Xu and
W. Ronny Huang, Tom Goldstein

Keywords Paper

Deep Learning - Theory

0

0

0

0

14:03

22/11/2021

Parameter Efficient Dynamic Convolution via Tensor Decomposition

Zejiang Hou, Sun-Yuan Kung

Keywords Paper

dynamic convolution, input-dependent reparameterization, parameter efficiency, tensor decomposition

0

0

0

0

3:58

14/06/2020

Learning to Forget for Meta-Learning

Sungyong Baik, Seokil Hong, Kyoung Mu Lee

Keywords Paper

meta learning, few-shot learning, reinforcement learning

0

0

0

0

1:01

02/02/2021

Adaptive Verifiable Training Using Pairwise Class Similarity

Shiqi Wang, Kevin Eykholt, Taesung Lee and
Jiyong Jang, Ian Molloy

Keywords Paper

0

0

0

0

16:49

06/12/2021

Can we have it all? On the Trade-off between Spatial and Adversarial Robustness of Neural Networks

Sandesh Kamath, Amit Deshpande, Subrahmanyam Kambhampati Venkata, Vineeth N Balasubramanian

Keywords Paper

deep learning, robustness, adversarial robustness and security

0

0

0

0

14:57

06/12/2021

AC-GC: Lossy Activation Compression with Guaranteed Convergence

R David Evans, Tor Aamodt

Keywords Paper

deep learning, optimization, graph learning

0

0

0

0

14:39

02/02/2021

STL-SGD: Speeding Up Local SGD with Stagewise Communication Period

Shuheng Shen, Yifei Cheng, Jingchang Liu, Linli Xu

Keywords Paper

0

0

0

0

14:53

06/12/2020

The Generalization-Stability Tradeoff In Neural Network Pruning

Brian Bartoldson, Ari Morcos, Adrian Barbu, Gordon Erlebacher

Keywords Paper

0

0

0

0

3:12

14/06/2020

Conditional Channel Gated Networks for Task-Aware Continual Learning

Davide Abati, Jakub Tomczak, Tijmen Blankevoort and
Simone Calderara, Rita Cucchiara, Babak Ehteshami Bejnordi

Keywords Paper

continual learning, channel gating, conditional computation, incremental learning, lifelong learning, hard attention

0

0

0

0

5:01

18/07/2021

PHEW : Constructing Sparse Networks that Learn Fast and Generalize Well without Training Data

Shreyas Malakarjun Patil, Constantine Dovrolis

Keywords Paper

Deep Learning

1

1

0

1

5:20

06/12/2021

Collapsed Variational Bounds for Bayesian Neural Networks

Marcin Tomczak, Siddharth Swaroop, Andrew Foong, Richard Turner

Keywords Paper

deep learning, optimization, generative model

0

0

0

0

5:44

06/12/2021

Consistency Regularization for Variational Auto-Encoders

Samarth Sinha, Adji Bousso Dieng

Keywords Paper

deep learning, machine learning, self-supervised learning, generative model, contrastive learning, representation learning

0

0

0

0

10:52

05/01/2021

AdarGCN: Adaptive Aggregation GCN for Few-Shot Learning

Jianhong Zhang, Manli Zhang, Zhiwu Lu, Tao Xiang

Keywords Paper

0

0

0

0

4:45

06/12/2021

Training Certifiably Robust Neural Networks with Efficient Local Lipschitz Bounds

Yujia Huang, Huan Zhang, Yuanyuan Shi and
J. Zico Kolter, Anima Anandkumar

Keywords Paper

deep learning, robustness, adversarial robustness and security

0

0

0

0

12:25

06/12/2021

Heavy Ball Neural Ordinary Differential Equations

Hedi Xia, Vai Suliafu, Hangjie Ji and
Tan Nguyen, Andrea Bertozzi, Stanley Osher, Bao Wang

Keywords Paper

deep learning, optimization, machine learning, vision

0

0

0

0

4:08

03/05/2021

Revisiting Locally Supervised Learning: an Alternative to End-to-end Training

Yulin Wang, Zanlin Ni, Shiji Song and
Le Yang, Gao Huang

Keywords Paper

Deep learning, Locally supervised training

1

0

0

1

5:03

19/08/2021

Decomposable-Net: Scalable Low-Rank Compression for Neural Networks

Atsushi Yaguchi, Taiji Suzuki, Shuhei Nitta and
Yukinobu Sakata, Akiyuki Tanizawa

Keywords Paper

Machine Learning, Deep Learning, Statistical Methods and Machine Learning, Recognition, 2D and 3D Computer Vision

0

0

0

0

10:40

07/09/2020

Transferring Pretrained Networks to Small Data via Category Decorrelation

Ying Jin, Zhangjie Cao, Mingsheng Long, Jianmin Wang

Keywords Paper

Category Decorrelation, Under Transfer

1

1

0

0

8:39

03/05/2021

Regularization Matters in Policy Optimization - An Empirical Study on Continuous Control

Zhuang Liu, Xuanlin Li, Bingyi Kang, trevor darrell

Keywords Paper

Deep Reinforcement Learning, Regularization, Continuous Control, Policy Optimization

0

0

0

0

8:45

26/04/2020

SELF: Learning to Filter Noisy Labels with Self-Ensembling

Duc Tam Nguyen, Chaithanya Kumar Mummadi, Thi Phuong Nhung Ngo and
Thi Hoai Phuong Nguyen, Laura Beggel, Thomas Brox

Keywords Paper

Ensemble Learning, Robust Learning, Noisy Labels, Labels Filtering

0

0

0

0

5:00

06/12/2021

Heavy Tails in SGD and Compressibility of Overparametrized Neural Networks

Melih Barsbey, Milad Sefidgaran, Murat Erdogdu and
Gaël Richard, Umut Simsekli

Keywords Paper

theory, deep learning, optimization

0

0

0

0

14:25

14/06/2020

HyperSTAR: Task-Aware Hyperparameters for Deep Networks

Gaurav Mittal, Chang Liu, Nikolaos Karianakis and
Victor Fragoso, Mei Chen, Yun Fu

Keywords Paper

auto ml, hyperparameter optimization, meta learning, task aware, hyperband, hyperparameters, warm start, image classication, resnet, shufflenet

0

0

0

0

4:58

05/04/2021

Adaptive Gradient Communication via Critical Learning Regime Identification

Saurabh Agarwal, Hongyi Wang, Kangwook Lee and
Shivaram Venkataraman, Dimitrios Papailiopoulos

Keywords Paper

0

0

0

0

4:23

05/04/2021

Adaptive Gradient Communication via Critical Learning Regime Identification

Saurabh Agarwal, Hongyi Wang, Kangwook Lee and
Shivaram Venkataraman, Dimitrios Papailiopoulos

Keywords Paper

0

0

0

0

21:08