Learning to Encode Position for Transformer with Continuous Dynamical Model

12/07/2020

Learning to Encode Position for Transformer with Continuous Dynamical Model

Xuanqing Liu, Hsiang-Fu Yu, Inderjit Dhillon, Cho-Jui Hsieh

Keywords: Applications - Language, Speech and Dialog

Abstract Paper Similar Papers

Abstract: We introduce a new way of learning to encode position information for non-recurrent models, such as Transformer models. Unlike RNN and LSTM, which contain inductive bias by loading the input tokens sequentially, non-recurrent models are less sensitive to position. The main reason is that position information among input units is not encoded inherently, i.e., they are permutation equivalent, this problem justifies why all of the existing models are accompanied by position encoding/embedding layer at the input. However, this solution has clear limitations: the sinusoidal position encoding is not flexible enough as it is manually designed and does not contain any learnable parameters, whereas the position embedding restricts the maximum length of input sequences. It is thus desirable to design a new position layer that contains learnable parameters to adjust to different datasets and different architectures. At the same time, we would also like it to extrapolate in accordance with the variable length of inputs. In our proposed solution, we borrow from the recent Neural ODE approach, which may be viewed as a versatile continuous version of a ResNet. This model is capable of modeling many kinds of dynamical systems. We model the evolution of encoded results along position index by such a dynamical system, thereby overcoming the above limitations of existing methods. We evaluate our new position layers on a variety of neural machine translation and language understanding tasks, the experimental results show consistent improvements over the baselines.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at ICML 2020 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

02/02/2021

DPFPS: Dynamic and Progressive Filter Pruning for Compressing Convolutional Neural Networks from Scratch

Xiaofeng Ruan, Yufan Liu, Bing Li and
Chunfeng Yuan, Weiming Hu

Keywords Paper

0

0

0

0

14:38

26/04/2020

Deformable Kernels: Adapting Effective Receptive Fields for Object Deformation

Hang Gao, Xizhou Zhu, Stephen Lin, Jifeng Dai

Keywords Paper

Effective Receptive Fields, Deformation Modeling, Dynamic Inference

0

0

0

0

4:13

06/12/2020

Biological credit assignment through dynamic inversion of feedforward networks

William Podlaski, Christian K. Machens

Keywords Paper

0

0

0

0

3:23

14/06/2020

Unsupervised Intra-Domain Adaptation for Semantic Segmentation Through Self-Supervision

Fei Pan, Inkyu Shin, Francois Rameau and
Seokju Lee, In So Kweon

Keywords Paper

domain adaptation, semantic segmentation, self-supervised learning, unsupervised learning, transfer learning.

0

0

0

0

4:58

14/06/2020

A Disentangling Invertible Interpretation Network for Explaining Latent Representations

Patrick Esser, Robin Rombach, Björn Ommer

Keywords Paper

interpretability, inn, disentangling, generative models, invertible neural networks, autoencoders, normalizing flows, vae, explainable, xai

0

0

0

0

1:01

06/12/2021

Task-Agnostic Undesirable Feature Deactivation Using Out-of-Distribution Data

Dongmin Park, Hwanjun Song, Minseok Kim, Jae-Gil Lee

Keywords Paper

deep learning, machine learning

0

0

0

0

14:30

26/04/2020

Adversarially Robust Representations with Smooth Encoders

Taylan Cemgil, Sumedh Ghaisas, Krishnamurthy (Dj) Dvijotham, Pushmeet Kohli

Keywords Paper

Adversarial Learning, Robust Representations, Variational AutoEncoder, Wasserstein Distance, Variational Inference

0

0

0

0

5:16

06/12/2020

HAWQ-V2: Hessian Aware trace-Weighted Quantization of Neural Networks

Zhen Dong, Zhewei Yao, Daiyaan Arfeen and
Amir Gholami, Michael Mahoney, Kurt Keutzer

Keywords Paper

1

0

0

0

3:21

06/12/2020

Regularizing Towards Permutation Invariance In Recurrent Models

Edo Cohen-Karlik, Avichai Ben David, Amir Globerson

Keywords Paper

0

0

0

0

3:19

12/07/2020

ControlVAE: Controllable Variational Autoencoder

Huajie Shao, Shuochao Yao, Dachun Sun and
Aston Zhang, Shengzhong Liu, Dongxin Liu, Jun Wang, Tarek Abdelzaher

Keywords Paper

Deep Learning - Generative Models and Autoencoders

0

0

0

0

14:22

17/08/2020

Unsupervised k-modal styled content generation

Omry Sendik, Dani Lischinski, Daniel Cohen-Or

Keywords Paper

StyleGAN, generative adversarial networks, multi-modal distributions

0

0

0

0

11:37

06/12/2020

Non-Euclidean Universal Approximation

Anastasis Kratsios, Eugene Bilokopytov

Keywords Paper

0

0

0

0

3:34

17/08/2020

Learning temporal coherence via self-supervision for GAN-based video generation

Mengyu Chu, You Xie, Jonas Mayer and
Laura Leal-Taixé, Nils Thuerey

Keywords Paper

self-supervision, temporal cycle-consistency, video super-resolution, generative adversarial network, unpaired video translation

0

0

0

0

16:59

06/12/2021

Accelerated Sparse Neural Training: A Provable and Efficient Method to Find N:M Transposable Masks

Itay Hubara, Brian Chmiel, Moshe Island and
Ron Banner, Joseph Naor, Daniel Soudry

Keywords Paper

deep learning

0

0

0

0

11:02

06/12/2020

Autoencoders that don't overfit towards the Identity

Harald Steck

Keywords Paper

0

0

0

0

3:22

12/07/2020

Variable-Bitrate Neural Compression via Bayesian Arithmetic Coding

Yibo Yang, Robert Bamler, Stephan Mandt

Keywords Paper

Deep Learning - General

0

0

0

0

15:08

14/06/2020

Domain Decluttering: Simplifying Images to Mitigate Synthetic-Real Domain Shift and Improve Depth Estimation

Yunhan Zhao, Shu Kong, Daeyun Shin, Charless Fowlkes

Keywords Paper

monocular depth prediction, real-synthetic domain shift, synthetic training data, domain adaptation, image inpainting, high-level domain gaps

0

0

0

0

1:01

12/07/2020

InfoGAN-CR: Disentangling Generative Adversarial Networks with Contrastive Regularizers

Zinan Lin, Kiran Thekumparampil, Giulia Fanti, Sewoong Oh

Keywords Paper

Representation Learning

0

0

0

0

12:06

02/02/2021

DecAug: Out-of-Distribution Generalization via Decomposed Feature Representation and Semantic Augmentation

Haoyue Bai, Rui Sun, Lanqing Hong and
Fengwei Zhou, Nanyang Ye, Han-Jia Ye, S.-H. Gary Chan, Zhenguo Li

Keywords Paper

0

0

0

0

15:59

14/06/2020

Modeling the Background for Incremental Learning in Semantic Segmentation

Fabio Cermelli, Massimiliano Mancini, Samuel Rota Bulò and
Elisa Ricci, Barbara Caputo

Keywords Paper

incremental, learning, semantic, segmentation, continual, catastrophic, forgetting, scene, parsing

0

0

0

0

1:01

06/12/2021

Early Convolutions Help Transformers See Better

Tete Xiao, Piotr Dollar, Mannat Singh and
Eric Mintun, Trevor Darrell, Ross B Girshick

Keywords Paper

deep learning, optimization, transformers

0

0

0

0

9:23

19/08/2021

Learning Deeper Non-Monotonic Networks by Softly Transferring Solution Space

Zheng-Fan Wu, Hui Xue, Weimin Bai

Keywords Paper

Machine Learning, Kernel Methods, Deep Learning, Classification

0

0

0

0

12:50

02/02/2021

Learning the Parameters of Bayesian Networks from Uncertain Data

Segev Wasserkrug, Radu Marinescu, Sergey Zeltyn and
Evgeny Shindin, Yishai A Feldman

Keywords Paper

0

0

0

0

19:29

14/06/2020

Meta-Transfer Learning for Zero-Shot Super-Resolution

Jae Woong Soh, Sunwoo Cho, Nam Ik Cho

Keywords Paper

zero-shot super-resolution, meta learning, transfer learning

0

0

0

0

0:59

14/06/2020

Forward and Backward Information Retention for Accurate Binary Neural Networks

Haotong Qin, Ruihao Gong, Xianglong Liu and
Mingzhu Shen, Ziran Wei, Fengwei Yu, Jingkuan Song

Keywords Paper

model compression, binary neural networks, deep learning, quantization, computer vision

0

0

0

0

1:00

30/11/2020

dpVAEs: Fixing Sample Generation for Regularized VAEs

Riddhish Bhalodia, Iain Lee, Shireen Elhabian

Keywords Paper

0

0

0

0

7:54

19/04/2021

Keep learning: Self-supervised meta-learning for learning from inference

Akhil Kedia, Sai Chetan Chinthakindi

Keywords Paper

0

0

0

0

11:27

14/06/2020

HRank: Filter Pruning Using High-Rank Feature Map

Mingbao Lin, Rongrong Ji, Yan Wang and
Yichen Zhang, Baochang Zhang, Yonghong Tian, Ling Shao

Keywords Paper

network pruning, neural network compression and acceleration, high-rank feature map, efficient deep learning computing

0

0

0

0

4:57

06/12/2021

Self-Supervised Learning of Event-Based Optical Flow with Spiking Neural Networks

Jesse Hagenaars, Federico Paredes-Valles, Guido de Croon

Keywords Paper

deep learning, optimization, self-supervised learning

0

0

0

0

13:28

08/12/2020

Infusing Sequential Information into Conditional Masked Translation Model with Self-Review Mechanism

Pan Xie, Zhi Cui, Xiuying Chen and
XiaoHui Hu, Jianwei Cui, Bin Wang

Keywords Paper

0

0

0

0

6:43

06/12/2021

A Multi-Implicit Neural Representation for Fonts

Pradyumna Reddy, Zhifei Zhang, Matthew Fisher and
Hailin Jin, Zhaowen Wang, Niloy Mitra

Keywords Paper

deep learning, representation learning

0

0

0

0

8:42

02/02/2021

Longitudinal Deep Kernel Gaussian Process Regression

Junjie Liang, Yanting Wu, Dongkuan Xu, Vasant G Honavar

Keywords Paper

0

0

0

0

16:27

19/08/2021

Progressive Open-Domain Response Generation with Multiple Controllable Attributes

Haiqin Yang, Xiaoyuan Yao, Yiqun Duan and
Jianping Shen, Jie Zhong, Kun Zhang

Keywords Paper

Machine Learning, Learning Generative Models, Dialogue

0

0

0

0

14:43

06/12/2021

STEP: Out-of-Distribution Detection in the Presence of Limited In-Distribution Labeled Data

Zhi Zhou, Lan-Zhe Guo, Zhanzhan Cheng and
Yu-Feng Li, Shiliang Pu

Keywords Paper

optimization, semi-supervised learning

0

0

0

0

11:24

12/07/2020

Educating Text Autoencoders: Latent Representation Guidance via Denoising

Tianxiao Shen, Jonas Mueller, Regina Barzilay, Tommi Jaakkola

Keywords Paper

Deep Learning - Generative Models and Autoencoders

0

0

0

0

17:06

14/09/2020

ADMMiRNN: Training RNN with Stable Convergence via An Efficient ADMM Approach

Yu Tang, Dequan Sun, Linbo Qiao and
Jingjing Xiao , Zhiquan Lai, Dongsheng Li

Keywords Paper

0

0

0

0

14:51

02/02/2021

Attribute-Guided Adversarial Training for Robustness to Natural Perturbations

Tejas Gokhale, Rushil Anirudh, Bhavya Kailkhura and
Jayaraman J. Thiagarajan, Chitta Baral, Yezhou Yang

Keywords Paper

0

0

0

0

19:57

14/09/2020

Unsupervised Domain Adaptation with Joint Domain-Adversarial Reconstruction Networks

Qian Chen, Yuntao Du, Zhiwen Tan and
Yi Zhang, Chongjun Wang

Keywords Paper

unsupervised domain adaptation, domain-adversarial learning, data reconstruction, distribution alignment

0

0

0

0

15:18

06/12/2021

AugMax: Adversarial Composition of Random Augmentations for Robust Training

Haotao Wang, Chaowei Xiao, Jean Kossaifi and
Zhiding Yu, Anima Anandkumar, Zhangyang Wang

Keywords Paper

deep learning, robustness, adversarial robustness and security

0

0

0

0

11:19

06/12/2021

ResNEsts and DenseNEsts: Block-based DNN Models with Improved Representation Guarantees

Kuan-Lin Chen, Ching-Hua Lee, Harinath Garudadri, Bhaskar D Rao

Keywords Paper

optimization, vision

0

0

0

0

13:27