Relative Positional Encoding for Transformers with Linear Complexity

18/07/2021

Relative Positional Encoding for Transformers with Linear Complexity

Antoine Liutkus, Ondřej Cífka, Shih-Lun Wu, Umut Simsekli, Yi-Hsuan Yang, Gaël RICHARD

Keywords: Deep Learning, Architectures

Abstract Paper Similar Papers

Abstract: Recent advances in Transformer models allow for unprecedented sequence lengths, due to linear space and time complexity. In the meantime, relative positional encoding (RPE) was proposed as beneficial for classical Transformers and consists in exploiting lags instead of absolute positions for inference. Still, RPE is not available for the recent linear-variants of the Transformer, because it requires the explicit computation of the attention matrix, which is precisely what is avoided by such methods. In this paper, we bridge this gap and present Stochastic Positional Encoding as a way to generate PE that can be used as a replacement to the classical additive (sinusoidal) PE and provably behaves like RPE. The main theoretical contribution is to make a connection between positional encoding and cross-covariance structures of correlated Gaussian processes. We illustrate the performance of our approach on the Long-Range Arena benchmark and on music generation.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at ICML 2021 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

18/07/2021

Neural SDEs as Infinite-Dimensional GANs

Patrick Kidger, James Foster, Xuechen Li, Terry Lyons

Keywords Paper

Deep Learning, Adversarial Networks, Algorithms, Unsupervised Learning, Applications, Network Analysis

0

0

0

0

4:53

06/12/2021

Stable, Fast and Accurate: Kernelized Attention with Relative Positional Encoding

Shengjie Luo, Shanda Li, Tianle Cai and
Di He, Dinglan Peng, Shuxin Zheng, Guolin Ke, Liwei Wang, Tie-Yan Liu

Keywords Paper

optimization, machine learning, transformers, vision

0

0

0

0

10:07

02/02/2021

Residual Shuffle-Exchange Networks for Fast Processing of Long Sequences

Andis Draguns, Emīls Ozoliņš, Agris Šostaks and
Matīss Apinis, Karlis Freivalds

Keywords Paper

0

0

0

0

15:35

06/12/2020

Convolutional Tensor-Train LSTM for Spatio-Temporal Learning

Jiahao Su, Wonmin Byeon, Jean Kossaifi and
Furong Huang, Jan Kautz, Anima Anandkumar

Keywords Paper

0

0

0

0

3:29

18/07/2021

Neural Rough Differential Equations for Long Time Series

James Morrill, Cristopher Salvi, Patrick Kidger, James Foster

Keywords Paper

Algorithms, Time Series and Sequences

0

0

0

0

5:31

06/12/2021

Autoformer: Decomposition Transformers with Auto-Correlation for Long-Term Series Forecasting

haixu wu, Jiehui Xu, Jianmin Wang, Mingsheng Long

Keywords Paper

theory, transformers

0

0

0

0

11:45

06/12/2020

Differentiable Expected Hypervolume Improvement for Parallel Multi-Objective Bayesian Optimization

Sam Daulton, Max Balandat, Eytan Bakshy

Keywords Paper

0

0

0

0

3:20

12/07/2020

Composable Sketches for Functions of Frequencies: Beyond the Worst Case

Edith Cohen, Ofir Geri, Rasmus Pagh

Keywords Paper

Optimization - Large Scale, Parallel and Distributed

0

0

0

0

14:51

06/12/2021

Scaling Gaussian Processes with Derivative Information Using Variational Inference

Misha Padidar, Xinran Zhu, Leo Huang and
Jacob Gardner, David Bindel

Keywords Paper

deep learning, optimization, machine learning, generative model, graph learning, kernel methods

0

0

0

0

14:52

18/07/2021

A Riemannian Block Coordinate Descent Method for Computing the Projection Robust Wasserstein Distance

Minhui Huang, Shiqian Ma, Lifeng Lai

Keywords Paper

Algorithms, Optimal Transport

0

0

0

1

5:14

12/07/2020

Regularized Optimal Transport is Ground Cost Adversarial

François-Pierre Paty, Marco Cuturi

Keywords Paper

General Machine Learning Techniques

1

0

1

1

13:26

26/08/2020

Linear Dynamics: Clustering without identification

Chloe Hsu, Michaela Hardt, Moritz Hardt

Keywords Paper

0

0

0

0

14:37

06/12/2021

Universal Approximation Using Well-Conditioned Normalizing Flows

Holden Lee, Chirag Pabbaraju, Anish Prasad Sevekari, Andrej Risteski

Keywords Paper

theory, deep learning, generative model

0

0

0

0

11:46

06/12/2021

Efficient hierarchical Bayesian inference for spatio-temporal regression models in neuroimaging

Ali Hashemi, Yijing Gao, Chang Cai and
Sanjay Ghosh, Klaus-Robert Müller, Srikantan Nagarajan, Stefan Haufe

Keywords Paper

theory, optimization

0

0

0

0

14:31

02/02/2021

Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting

Haoyi Zhou, Shanghang Zhang, Jieqi Peng and
Shuai Zhang, Jianxin Li, Hui Xiong, Wancai Zhang

Keywords Paper

0

0

0

0

17:22

06/12/2021

Structured Dropout Variational Inference for Bayesian Neural Networks

Son Nguyen, Duong Nguyen, Khai Nguyen and
Khoat Than, Hung Bui, Nhat Ho

Keywords Paper

deep learning, generative model

0

0

0

0

11:28

12/07/2020

Knowing The What But Not The Where in Bayesian Optimization

Vu Nguyen, Michael Osborne

Keywords Paper

Gaussian Processes

0

0

0

0

13:38

26/04/2020

Stochastic AUC Maximization with Deep Neural Networks

Mingrui Liu, Zhuoning Yuan, Yiming Ying, Tianbao Yang

Keywords Paper

Stochastic AUC Maximization, Deep Neural Networks

0

0

0

0

4:58

06/12/2021

Estimating High Order Gradients of the Data Distribution by Denoising

Chenlin Meng, Yang Song, Wenzhe Li, Stefano Ermon

Keywords Paper

generative model

0

0

0

0

7:31

06/12/2020

Demystifying Orthogonal Monte Carlo and Beyond

Han Lin, Haoxian Chen, Krzysztof M Choromanski and
Tianyi Zhang, Clement Laroche

Keywords Paper

0

0

0

0

3:19

19/08/2021

Boosting Variational Inference With Locally Adaptive Step-Sizes

Gideon Dresdner, Saurav Shekhar, Fabian Pedregosa and
Francesco Locatello, Gunnar Rätsch

Keywords Paper

Machine Learning, Bayesian Learning, Probabilistic Machine Learning

0

0

0

0

11:39

03/05/2021

Accelerating Convergence of Replica Exchange Stochastic Gradient MCMC via Variance Reduction

Wei Deng, Qi Feng, Georgios Karagiannis and
Guang Lin, Faming Liang

Keywords Paper

Markov jump process, uncertainty quantification, generalized Girsanov theorem, change of measure, stochastic gradient Langevin dynamics, parallel tempering, replica exchange, Dirichlet form, variance reduction

0

0

0

0

5:19

12/07/2020

A quantile-based approach for hyperparameter transfer learning

David Salinas, Huibin Shen, Valerio Perrone

Keywords Paper

Gaussian Processes

0

0

0

0

16:14

02/02/2021

Symbolic Music Generation with Transformer-GANs

Aashiq Muhamed, Liang Li, Xingjian Shi and
Suri Yaddanapudi, Wayne Chi, Dylan Jackson, Rahul Suresh, Zachary C. Lipton, Alex J. Smola

Keywords Paper

0

0

0

0

20:01

06/12/2020

Recursive Inference for Variational Autoencoders

Minyoung Kim, Vladimir Pavlovic

Keywords Paper

0

0

0

0

3:24

12/07/2020

State Space Expectation Propagation: Efficient Inference Schemes for Temporal Gaussian Processes

William Wilkinson, Paul Chang, Michael Andersen, Arno Solin

Keywords Paper

Gaussian Processes

0

0

0

0

13:31

06/12/2021

Continuous Latent Process Flows

Ruizhi Deng, Marcus Brubaker, Greg Mori, Andreas M Lehrmann

Keywords Paper

generative model

0

0

0

0

14:54

02/02/2021

TRQ: Ternary Neural Networks With Residual Quantization

Yue Li, Wenrui Ding, Chunlei Liu and
Baochang Zhang, Guodong Guo

Keywords Paper

0

0

0

0

15:21

12/07/2020

Rate-distortion optimization guided autoencoder for isometric embedding in Euclidean latent space

Keizo Kato, Jing Zhou, Tomotake Sasaki, Akira Nakagawa

Keywords Paper

Deep Learning - Generative Models and Autoencoders

0

0

0

0

14:48

03/05/2021

MALI: A memory efficient and reverse accurate integrator for Neural ODEs

Juntang Zhuang, Nicha C Dvornek, sekhar tatikonda, James s Duncan

Keywords Paper

neural ode, memory efficient, gradient estimation, reverse accuracy

0

0

0

0

5:12

06/12/2021

Redesigning the Transformer Architecture with Insights from Multi-particle Dynamical Systems

Subhabrata Dutta, Tanya Gautam, Soumen Chakrabarti, Tanmoy Chakraborty

Keywords Paper

deep learning, transformers

0

0

0

0

11:54

03/05/2021

Random Feature Attention

Hao Peng, Nikolaos Pappas, Dani Yogatama and
Roy Schwartz, Noah Smith, Lingpeng Kong

Keywords Paper

machine translation, transformers, language modeling, Attention

0

0

0

0

10:20

18/07/2021

Intermediate Layer Optimization for Inverse Problems using Deep Generative Models

Giannis Daras, Joseph Dean, Ajil Jalal, Alex Dimakis

Keywords Paper

Algorithms, Sparsity and Compressed Sensing

0

0

0

0

5:16

02/02/2021

Going Deeper With Directly-Trained Larger Spiking Neural Networks

Hanle Zheng, Yujie Wu, Lei Deng and
Yifan Hu, Guoqi Li

Keywords Paper

0

0

0

0

16:29

06/12/2020

Effective Dimension Adaptive Sketching Methods for Faster Regularized Least-Squares Optimization

Jonathan Lacotte, Mert Pilanci

Keywords Paper

0

0

0

0

3:17

06/12/2020

Projection Robust Wasserstein Distance and Riemannian Optimization

Darren Lin, Chenyou Fan, Nhat Ho and
Marco Cuturi, Michael Jordan

Keywords Paper

Optimization -> Non-Convex Optimization; Optimization -> Stochastic Optimization, Deep Learning -> Optimization for Deep Networks

0

0

0

1

3:01

12/07/2020

Adversarial Mutual Information for Text Generation

Boyuan Pan, Yazheng Yang, Kaizhao Liang and
Bhavya Kailkhura, Zhongming Jin, Xian-Sheng Hua, Deng Cai, Bo Li

Keywords Paper

Applications - Language, Speech and Dialog

0

0

0

0

12:38

18/07/2021

Grad-TTS: A Diffusion Probabilistic Model for Text-to-Speech

Vadim Popov, Ivan Vovk, Vladimir Gogoryan and
Tasnima Sadekova, Mikhail Kudinov

Keywords Paper

Applications, Audio and Speech Processing

0

0

0

0

5:12

06/12/2020

Approximate Cross-Validation with Low-Rank Data in High Dimensions

Will Stephenson, Madeleine Udell, Tamara Broderick

Keywords Paper

0

0

0

0

3:02

04/08/2021

Convergence rates and approximation results for SGD and its continuous-time counterpart

Xavier Fontaine, Valentin De Bortoli, Alain Durmus

Keywords Paper

0

0

0

0

17:35