On Scalar Embedding of Relative Positions in Attention Models

02/02/2021

On Scalar Embedding of Relative Positions in Attention Models

Junshuang Wu, Richong Zhang, Yongyi Mao, Junfan Chen

Keywords:

Abstract Paper Similar Papers

Abstract: Attention with positional encoding has been demonstrated as a powerful component in modern neural network models, such as transformers. However, why positional encoding works well in attention models remains largely unanswered. In this paper, we study the scalar relative positional encoding (SRPE) proposed in the T5 transformer. Such an encoding method has two features. First, it uses a scalar to embed relative positions. Second, the relative positions are bucketized using a fixed heuristic algorithm, and positions in the same bucket share the same embedding. In this work, we show that SRPE in attention has an elegant probabilistic interpretation. More specifically, the positional encoding serves to produce a prior distribution for the attended positions. The resulting attentive distribution can be viewed as a posterior distribution of the attended position given the observed input sequence. Furthermore, we propose a new SRPE (AT5) that adopts a learnable bucketization protocol and automatically adapts to the dependency range specific to the learning task. Empirical studies show that the AT5 achieves superior performance than the T5's SRPE.

The video of this talk cannot be embedded. You can watch it here:

https://slideslive.com/38948047

(Link will open in new window)

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at AAAI 2021 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

06/12/2021

Noether Networks: meta-learning useful conserved quantities

Ferran Alet, Dylan Doblar, Allan Zhou and
Josh Tenenbaum, Kenji Kawaguchi, Chelsea Finn

Keywords Paper

machine learning, vision, meta learning

0

0

0

0

11:18

25/07/2020

Sequential recommendation with self-attentive multi-adversarial network

Ruiyang Ren, Zhaoyang Liu, Yaliang Li and
Wayne Xin Zhao, Hui Wang, Bolin Ding, Ji-Rong Wen

Keywords Paper

sequential recommendation, adversarial training, self-attentive mechanism

0

0

0

0

15:12

06/12/2020

Can Temporal-Diﬀerence and Q-Learning Learn Representation? A Mean-Field Theory

Yufeng Zhang, Qi Cai, Zhuoran Yang and
Yongxin Chen, Zhaoran Wang

Keywords Paper

0

0

0

0

3:02

26/04/2020

Continual Learning with Bayesian Neural Networks for Non-Stationary Data

Richard Kurle, Botond Cseke, Alexej Klushyn and
Patrick van der Smagt, Stephan Günnemann

Keywords Paper

Continual Learning, Online Variational Bayes, Non-Stationary Data, Bayesian Neural Networks, Variational Inference, Lifelong Learning, Concept Drift, Episodic Memory

0

0

0

0

5:26

06/12/2021

Representer Point Selection via Local Jacobian Expansion for Post-hoc Classifier Explanation of Deep Neural Networks and Ensemble Models

Yi Sui, Ga Wu, Scott Sanner

Keywords Paper

deep learning, optimization, machine learning, vision

0

0

0

0

10:29

14/06/2020

Enhanced Transport Distance for Unsupervised Domain Adaptation

Mengxue Li, Yi-Ming Zhai, You-Wei Luo and
Peng-Fei Ge, Chuan-Xian Ren

Keywords Paper

uda, optimal transport, neural networks, attention mechanism, kantorovich potential

0

0

0

0

0:58

12/07/2020

Learning to Learn Kernels with Variational Random Features

Xiantong Zhen, Haoliang Sun, Yingjun Du and
Jun Xu, Yilong Yin, Ling Shao, Cees Snoek

Keywords Paper

Transfer, Multitask and Meta-learning

0

0

0

0

13:08

18/07/2021

Unsupervised Representation Learning via Neural Activation Coding

Yookoon Park, Sangho Lee, Gunhee Kim, David Blei

Keywords Paper

Deep Learning, Embedding and Representation learning

0

0

0

0

13:50

18/07/2021

An Identifiable Double VAE For Disentangled Representations

Graziano Mita, Maurizio Filippone, Pietro Michiardi

Keywords Paper

Deep Learning, Adversarial Networks, Deep Learning, Generative Models

0

0

0

0

4:51

03/05/2021

Optimal Rates for Averaged Stochastic Gradient Descent under Neural Tangent Kernel Regime

Atsushi Nitanda, Taiji Suzuki

Keywords Paper

stochastic gradient descent, neural tangent kernel, over-parameterization, two-layer neural network

0

0

0

0

18:48

06/12/2020

Domain Adaptation with Conditional Distribution Matching and Generalized Label Shift

Remi Tachet des Combes, Han Zhao, Yu-Xiang Wang, Geoffrey Gordon

Keywords Paper

0

0

0

0

3:19

06/12/2020

A Local Temporal Difference Code for Distributional Reinforcement Learning

Pablo Tano, Peter Dayan, Alexandre Pouget

Keywords Paper

0

0

0

0

3:24

12/07/2020

Minimax Weight and Q-Function Learning for Off-Policy Evaluation

Masatoshi Uehara, Jiawei Huang, Nan Jiang

Keywords Paper

Reinforcement Learning - Theory

0

0

0

0

14:20

12/07/2020

Learning Attentive Meta-Transfer

Jaesik Yoon, Gautam Singh, Sungjin Ahn

Keywords Paper

Sequential, Network, and Time-Series Modeling

1

1

0

0

15:22

06/12/2021

Time-series Generation by Contrastive Imitation

Daniel Jarrett, Ioana Bica, Mihaela van der Schaar

Keywords Paper

generative model

0

0

0

0

8:47

26/04/2020

Functional Regularisation for Continual Learning with Gaussian Processes

Michalis K. Titsias, Jonathan Schwarz, Alexander G. de G. Matthews and
Razvan Pascanu, Yee Whye Teh

Keywords Paper

Continual Learning, Gaussian Processes, Lifelong learning, Incremental Learning

0

0

0

0

4:31

06/12/2021

Robust Implicit Networks via Non-Euclidean Contractions

Saber Jafarpour, Alexander Davydov, Anton Proskurnikov, Francesco Bullo

Keywords Paper

theory, deep learning, machine learning, robustness, vision

0

0

0

0

14:59

26/04/2020

Frequency-based Search-control in Dyna

Yangchen Pan, Jincheng Mei, Amir-massoud Farahmand

Keywords Paper

Model-based reinforcement learning, search-control, Dyna, frequency of a signal

0

0

0

0

4:32

06/12/2020

Analytical Probability Distributions and Exact Expectation-Maximization for Deep Generative Networks

Randall Balestriero, Sebastien PARIS, Richard Baraniuk

Keywords Paper

0

0

0

0

3:17

06/12/2020

Deep Reinforcement and InfoMax Learning

Bogdan Mazoure, Remi Tachet des Combes, Thang Doan and
Philip Bachman, R Devon Hjelm

Keywords Paper

0

0

0

0

3:15

26/04/2020

Bayesian Meta Sampling for Fast Uncertainty Adaptation

Zhenyi Wang, Yang Zhao, Ping Yu and
Ruiyi Zhang, Changyou Chen

Keywords Paper

Bayesian Sampling, Uncertainty Adaptation, Meta Learning, Variational Inference

0

0

0

0

4:44

03/05/2021

Meta-Learning with Neural Tangent Kernels

Yufan Zhou, Zhenyi Wang, Jiayi Xian and
Changyou Chen, Jinhui Xu

Keywords Paper

neural tangent kernel, meta-learning

0

0

0

0

3:54

19/08/2021

Domain Generalization under Conditional and Label Shifts via Variational Bayesian Inference

Xiaofeng Liu, Bo Hu, Linghao Jin and
Xu Han, Fangxu Xing, Jinsong Ouyang, Jun Lu, Georges El Fakhri, Jonghye Woo

Keywords Paper

Computer Vision, Recognition, Statistical Methods and Machine Learning, Classification

0

0

0

0

11:33

03/05/2021

Neurally Augmented ALISTA

Freya Behrens, Jonathan Sauder, Peter Jung

Keywords Paper

learned ISTA, unrolled algorithms, compressed sensing, sparse reconstruction

0

0

0

0

5:18

18/07/2021

Improving Generalization in Meta-learning via Task Augmentation

Huaxiu Yao, Long-Kai Huang, Linjun Zhang and
Ying WEI, Li Tian, James Zou, Junzhou Huang, Zhenhui (Jessie) Li

Keywords Paper

Algorithms, Multitask, Transfer, and Meta Learning

0

0

0

0

8:27

06/12/2020

Learning Optimal Representations with the Decodable Information Bottleneck

Yann Dubois, Douwe Kiela, David Schwab, Ramakrishna Vedantam

Keywords Paper

0

0

0

0

3:13

06/12/2021

Choose a Transformer: Fourier or Galerkin

Shuhao Cao

Keywords Paper

theory, transformers

0

0

0

0

8:25

03/05/2021

Exploring the Uncertainty Properties of Neural Networks’ Implicit Priors in the Infinite-Width Limit

Ben Adlam, Jaehoon Lee, Lechao Xiao and
Jeffrey Pennington, Jasper Snoek

Keywords Paper

Deep Learning, Bayesian Neural Networks, Neural Network Gaussian Process, Infinite-Width Limit, Uncertainty, Gaussian Process

0

0

0

0

4:34

06/12/2021

Contrastively Disentangled Sequential Variational Autoencoder

Junwen Bai, Weiran Wang, Carla Gomes

Keywords Paper

self-supervised learning, generative model, contrastive learning, representation learning, interpretability

0

0

0

0

12:53

06/12/2021

Tailoring: encoding inductive biases by optimizing unsupervised objectives at prediction time

Ferran Alet, Maria Bauza, Kenji Kawaguchi and
Nurullah Giray Kuru, Tomás Lozano-Pérez, Leslie Kaelbling

Keywords Paper

deep learning, optimization, machine learning, self-supervised learning, meta learning

0

0

0

0

15:05

03/05/2021

Representation Learning via Invariant Causal Mechanisms

Jovana Mitrovic, Brian McWilliams, Jacob C Walker and
Lars Buesing, Charles Blundell

Keywords Paper

Self-supervised Learning, Representation Learning, Causality, Contrastive Methods

1

0

0

0

7:03

06/12/2021

Credit Assignment in Neural Networks through Deep Feedback Control

Alexander Meulemans, Matilde Tristany Farinha, Javier Garcia Ordonez and
Pau Vilimelis Aceituno, João Sacramento, Benjamin F. Grewe

Keywords Paper

theory, deep learning, optimization, neuroscience, vision

0

0

0

0

13:58

26/04/2020

Progressive learning and disentanglement of hierarchical representations

Zhiyuan Li, Jaideep Vitthal Murkute, Prashnna Kumar Gyawali, Linwei Wang

Keywords Paper

generative model, disentanglement, progressive learning, VAE

0

0

0

0

5:06

12/07/2020

MetaFun: Meta-Learning with Iterative Functional Updates

Jin Xu, Jean-Francois Ton, Hyunjik Kim and
Adam Kosiorek, Yee Whye Teh

Keywords Paper

Transfer, Multitask and Meta-learning

0

0

0

0

13:51

18/07/2021

Cyclically Equivariant Neural Decoders for Cyclic Codes

Xiangyu Chen, Min Ye

Keywords Paper

Algorithms, Online Learning, Algorithms, Bandit Algorithms; Reinforcement Learning and Planning, Reinforcement Learning, Theory, Information Theory

0

0

0

0

17:06

03/05/2021

Neural Jump Ordinary Differential Equations: Consistent Continuous-Time Prediction and Filtering

Calypso Herrera, Florian Krach, Josef Teichmann

Keywords Paper

irregular-observed data modelling, conditional expectation, Neural ODE

0

0

0

0

3:50

08/12/2020

How LSTM Encodes Syntax: Exploring Context Vectors and Semi-Quantization on Natural Text

Chihiro Shibata, Kei Uchiumi, Daichi Mochihashi

Keywords Paper

0

0

0

0

14:45

03/05/2021

Rethinking the Role of Gradient-based Attribution Methods for Model Interpretability

Suraj Srinivas, François Fleuret

Keywords Paper

Interpretability, saliency maps, score-matching

0

0

0

0

15:08

03/05/2021

Meta-GMVAE: Mixture of Gaussian VAE for Unsupervised Meta-Learning

Dong Bok Lee, Dongchan Min, Seanie Lee, Sung Ju Hwang

Keywords Paper

Unsupervised Learning, Variational Autoencoders, Unsupervised Meta-learning, Meta-Learning

0

0

0

0

13:31

03/05/2021

CPR: Classifier-Projection Regularization for Continual Learning

Sungmin Cha, Hsiang Hsu, Taebaek Hwang and
Flavio Calmon, Taesup Moon

Keywords Paper

regularization, wide local minima, continual learning

0

0

0

1

5:21