Rethinking Positional Encoding in Language Pre-training

Abstract: In this work, we investigate the positional encoding methods used in language pre-training (e.g., BERT) and identify several problems in the existing formulations. First, we show that in the absolute positional encoding, the addition operation applied on positional embeddings and word embeddings brings mixed correlations between the two heterogeneous information resources. It may bring unnecessary randomness in the attention and further limit the expressiveness of the model. Second, we question whether treating the position of the symbol \texttt{[CLS]} the same as other words is a reasonable design, considering its special role (the representation of the entire sentence) in the downstream tasks. Motivated from above analysis, we propose a new positional encoding method called \textbf{T}ransformer with \textbf{U}ntied \textbf{P}ositional \textbf{E}ncoding (TUPE). In the self-attention module, TUPE computes the word contextual correlation and positional correlation separately with different parameterizations and then adds them together. This design removes the mixed and noisy correlations over heterogeneous embeddings and offers more expressiveness by using different projection matrices. Furthermore, TUPE unties the \texttt{[CLS]} symbol from other positions, making it easier to capture information from all positions. Extensive experiments and ablation studies on GLUE benchmark demonstrate the effectiveness of the proposed method. Codes and models are released at \url{https://github.com/guolinke/TUPE}.

01/07/2020

Sentiment, Syntax, Probe, BERT, Hyperbolic

5:10

06/12/2020

Rethinking Positional Encoding in Language Pre-training

Guolin Ke, Di He, Tie-Yan Liu

Comments

Similar Papers

CopyBERT: A Unified Approach to Question Generation with Self-Attention

Stalin Varanasi, Saadullah Amin, Guenter Neumann

Keywords Abstract Paper

CharBERT: Character-aware Pre-trained Language Model

Wentao Ma, Yiming Cui, Chenglei Si and Ting Liu, Shijin Wang, Guoping Hu

Keywords Abstract Paper

Self-Attention with Cross-Lingual Position Representation

Liang Ding, Longyue Wang, Dacheng Tao

Keywords Abstract Paper

natural tasks, WMT'17 tasks, Cross-Lingual Representation, Position encoding

Disentangled Face Attribute Editing via Instance-Aware Latent Space Search

Yuxuan Han, Jiaolong Yang, Ying Fu

Keywords Abstract Paper

Computer Vision, 2D and 3D Computer Vision, Explainable/Interpretable Machine Learning

Generating Dialogue Responses from a Semantic Latent Space

Wei-Jen Ko, Avik Ray, Yilin Shen, Hongxia Jin

Keywords Abstract Paper

generation responses, regression task, open-domain models, end-to-end classification

Probing BERT in Hyperbolic Spaces

Boli Chen, Yao Fu, Guangwei Xu and Pengjun Xie, Chuanqi Tan, Mosha Chen, Liping Jing

Keywords Abstract Paper

Sentiment, Syntax, Probe, BERT, Hyperbolic

Language Through a Prism: A Spectral Approach for Multiscale Language Representations

Alex Tamkin, Dan Jurafsky, Noah Goodman

Keywords Abstract Paper

QuASE: Question-Answer Driven Sentence Encoding

Hangfeng He, Qiang Ning, Dan Roth

Keywords Abstract Paper

named recognition, NLP tasks, QuASE, QAMR

Deep subjecthood: Higher-order grammatical features in multilingual BERT

Isabel Papadimitriou, Ethan A. Chi, Richard Futrell, Kyle Mahowald

Keywords Abstract Paper

A Bilingual Generative Transformer for Semantic Sentence Embedding

John Wieting, Graham Neubig, Taylor Berg-Kirkpatrick

Keywords Abstract Paper

source separation, semantic encoding, data distributions, unsupervised evaluations

Neural Mask Generator: Learning to Generate Adaptive Word Maskings for Language Model Adaptation

Minki Kang, Moonsu Han, Sung Ju Hwang

Keywords Abstract Paper

self-supervised pre-training, question answering, task, reinforcement learning

Educating Text Autoencoders: Latent Representation Guidance via Denoising

Tianxiao Shen, Jonas Mueller, Regina Barzilay, Tommi Jaakkola

Keywords Abstract Paper

Deep Learning - Generative Models and Autoencoders

Fake it Till You Make it: Self-Supervised Semantic Shifts for Monolingual Word Embedding Tasks

Maurício Gruppi, Pin-Yu Chen, Sibel Adali

Keywords Abstract Paper

Bayesian Methods for Semi-supervised Text Annotation

Kristian Miok, Gregor Pirs, Marko Robnik-Sikonja

Keywords Abstract Paper

Merging Statistical Feature via Adaptive Gate for Improved Text Classification

Xianming Li, Zongxi Li, Haoran Xie, Qing Li

Keywords Abstract Paper

Pseudo-Masked Language Models for Unified Language Model Pre-Training

Hangbo Bao, Li Dong, Furu Wei and Wenhui Wang, Nan Yang, Xiaodong Liu, Yu Wang, Jianfeng Gao, Songhao Piao, Ming Zhou, Hsiao-Wuen Hon

Keywords Abstract Paper

Applications - Language, Speech and Dialog

Interpretability for morphological inflection: From character-level predictions to subword-level rules

Tatyana Ruzsics, Olga Sozinova, Ximena Gutierrez-Vasques, Tanja Samardzic

Keywords Abstract Paper

Multi-resolution Annotations for Emoji Prediction

Weicheng Ma, Ruibo Liu, Lili Wang, Soroush Vosoughi

Keywords Abstract Paper

natural tasks, emojis, linguistic components, multi-class setting

Named entity recognition in multi-level contexts

Yubo Chen, Chuhan Wu, Tao Qi and Zhigang Yuan, Yongfeng Huang

Keywords Abstract Paper

LAReQA: Language-Agnostic Answer Retrieval from a Multilingual Pool

Uma Roy, Noah Constant, Rami Al-Rfou and Aditya Barua, Aaron Phillips, Yinfei Yang

Keywords Abstract Paper

language-agnostic retrieval, cross-lingual tasks, cross-lingual retrieval, alignment

Iterative Answer Prediction With Pointer-Augmented Multimodal Transformers for TextVQA

Ronghang Hu, Amanpreet Singh, Trevor Darrell, Marcus Rohrbach

Keywords Abstract Paper

textvqa, visual question answering, vqa, vision and language, st-vqa, ocr-vqa, transformer, pointer network, ocr

Exploiting Semantic Relations for Fine-grained Entity Typing

Keywords Paper

Wentao Ma, Yiming Cui, Chenglei Si and
Ting Liu, Shijin Wang, Guoping Hu

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Boli Chen, Yao Fu, Guangwei Xu and
Pengjun Xie, Chuanqi Tan, Mosha Chen, Liping Jing

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Hangbo Bao, Li Dong, Furu Wei and
Wenhui Wang, Nan Yang, Xiaodong Liu, Yu Wang, Jianfeng Gao, Songhao Piao, Ming Zhou, Hsiao-Wuen Hon

Keywords Paper

Keywords Paper

Keywords Paper

Yubo Chen, Chuhan Wu, Tao Qi and
Zhigang Yuan, Yongfeng Huang

Keywords Paper

Uma Roy, Noah Constant, Rami Al-Rfou and
Aditya Barua, Aaron Phillips, Yinfei Yang

Keywords Paper

Keywords Paper

Keywords Paper

Kun Zhang, Le Wu, Guangyi Lv and
Meng Wang, Enhong Chen, Shulan Ruan

Keywords Paper

Boxin Wang, Shuohang Wang, Yu Cheng and
Zhe Gan, Ruoxi Jia, Bo Li, Jingjing Liu

Keywords Paper

Keywords Paper

Xiang Li, Wenhai Wang, Lijun Wu and
Shuo Chen, Xiaolin Hu, Jun Li, Jinhui Tang, Jian Yang

Keywords Paper

Keywords Paper

Qian-Wen Zhang, Ximing Zhang, Zhao Yan and
Ruifang Liu, Yunbo Cao, Min-Ling Zhang

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Yutai Hou, Sanyuan Chen, Wanxiang Che and
Cheng Chen, Ting Liu

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper