On Position Embeddings in BERT

03/05/2021

On Position Embeddings in BERT

Wang Benyou, Lifeng Shang, Christina Lioma, Xin Jiang, Hao Yang, Qun Liu, Jakob Simonsen

Keywords: pretrained language model., Position Embedding, BERT

Abstract Paper Similar Papers

Abstract: Various Position Embeddings (PEs) have been proposed in Transformer based architectures~(e.g. BERT) to model word order. These are empirically-driven and perform well, but no formal framework exists to systematically study them. To address this, we present three properties of PEs that capture word distance in vector space: translation invariance, monotonicity, and symmetry. These properties formally capture the behaviour of PEs and allow us to reinterpret sinusoidal PEs in a principled way. Moreover, we propose a new probing test (called `identical word probing') and mathematical indicators to quantitatively detect the general attention patterns with respect to the above properties. An empirical evaluation of seven PEs (and their combinations) for classification (GLUE) and span prediction (SQuAD) shows that: (1) both classification and span prediction benefit from translation invariance and local monotonicity, while symmetry slightly decreases performance; (2) The fully-learnable absolute PE performs better in classification, while relative PEs perform better in span prediction. We contribute the first formal and quantitative analysis of desiderata for PEs, and a principled discussion about their correlation to the performance of typical downstream tasks.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at ICLR 2021 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

16/11/2020

Unified Feature and Instance Based Domain Adaptation for Aspect-Based Sentiment Analysis

Chenggong Gong, Jianfei Yu, Rui Xia

Keywords Paper

aspect-based analysis, absa task, feature-based adaptation, auxiliary tasks

0

0

0

0

12:12

08/12/2020

Specializing Unsupervised Pretraining Models for Word-Level Semantic Similarity

Anne Lauscher, Ivan Vulić, Edoardo Maria Ponti and
Anna Korhonen, Goran Glavaš

Keywords Paper

0

0

0

0

13:01

05/01/2021

Class-Wise Metric Scaling for Improved Few-Shot Classification

Ge Liu, Linglan Zhao, Wei Li and
Dashan Guo, Xiangzhong Fang

Keywords Paper

0

0

0

0

5:01

08/12/2020

Syntactically Aware Cross-Domain Aspect and Opinion Terms Extraction

Oren Pereg, Daniel Korat, Moshe Wasserblat

Keywords Paper

0

0

0

0

7:46

14/06/2020

Attention-Guided Hierarchical Structure Aggregation for Image Matting

Yu Qiao, Yuhao Liu, Xin Yang and
Dongsheng Zhou, Mingliang Xu, Qiang Zhang, Xiaopeng Wei

Keywords Paper

image matting, attention, hierarchical, aggregation, appearance cues

0

0

0

0

0:59

16/11/2020

On the Sentence Embeddings from Pre-trained Language Models

Bohan Li, Hao Zhou, Junxian He and
Mingxuan Wang, Yiming Yang, Lei Li

Keywords Paper

natural processing, semantic task, semantic tasks, pre-trained representations

0

0

0

0

9:11

02/02/2021

Have We Solved The Hard Problem? It’s Not Easy! Contextual Lexical Contrast as a Means to Probe Neural Coherence

Wenqiang Lei, Yisong Miao, Runpeng Xie and
Bonnie Webber, Meichun Liu, Tat-Seng Chua, Nancy F. Chen

Keywords Paper

0

0

0

0

18:55

02/02/2021

IsoBN: Fine-Tuning BERT with Isotropic Batch Normalization

Wenxuan Zhou, Bill Yuchen Lin, Xiang Ren

Keywords Paper

0

0

0

0

16:25

25/07/2020

A pairwise probe for understanding BERT fine-tuning on machine reading comprehension

Jie Cai, Zhengzhou Zhu, Ping Nie, Qian Liu

Keywords Paper

machine reading comprehension, pairwise, fine-tune, BERT

0

0

0

0

6:38

02/02/2021

Circles are like Ellipses, or Ellipses are like Circles? Measuring the Degree of Asymmetry of Static and Contextual Word Embeddings and the Implications to Representation Learning

Wei Zhang, Murray Campbell, Yang Yu, Sadhana Kumaravel

Keywords Paper

0

0

0

0

13:34

04/07/2020

RPD: A Distance Function Between Word Embeddings

Xuhui Zhou, Shujian Huang, Zaixiang Zheng

Keywords Paper

RPD, Word Embeddings, training processes, Relative Distance

0

0

0

0

11:13

03/05/2021

Isotropy in the Contextual Embedding Space: Clusters and Manifolds

Xingyu Cai, Jiaji Huang, Yuchen Bian, Kenneth Church

Keywords Paper

Clusters, Isotropy, Contextual embedding space, Manifolds

0

0

0

0

4:49

02/11/2020

Conformer-based sound event detection with semi-supervised learning and data augmentation

Koichi Miyazaki, Tatsuya Komatsu, Tomoki Hayashi and
Shinji Watanabe, Tomoki Toda, Kazuya Takeda

Keywords Paper

0

0

0

0

14:29

06/12/2020

Incorporating BERT into Parallel Sequence Decoding with Adapters

Junliang Guo, Zhirui Zhang, Linli Xu and
Hao-Ran Wei, Boxing Chen, Enhong Chen

Keywords Paper

0

0

0

0

3:17

06/12/2021

Few-Shot Segmentation via Cycle-Consistent Transformer

Gengwei Zhang, Guoliang Kang, Yi Yang, Yunchao Wei

Keywords Paper

transformers, vision, few shot learning

0

0

0

0

11:58

06/12/2021

Contrastive Learning for Neural Topic Model

Thong Nguyen, Anh Tuan Luu

Keywords Paper

optimization, contrastive learning

0

0

0

0

10:12

06/12/2020

MPNet: Masked and Permuted Pre-training for Language Understanding

Kaitao Song, Xu Tan, Tao Qin and
Jianfeng Lu, Tie-Yan Liu

Keywords Paper

0

0

0

0

3:23

04/07/2020

Integrating Multimodal Information in Large Pretrained Transformers

Wasifur Rahman, Md Kamrul Hasan, Sangwu Lee and
AmirAli Bagher Zadeh, Chengfeng Mao, Louis-Philippe Morency, Ehsan Hoque

Keywords Paper

NLP, lexical applications, modeling communication, multimodal analysis

0

0

0

0

10:58

19/04/2021

WiC-TSV: An evaluation benchmark for target sense verification of words in context

Anna Breit, Artem Revenko, Kiamehr Rezaee and
Mohammad Taher Pilehvar, Jose Camacho-Collados

Keywords Paper

0

0

0

0

9:54

26/08/2020

Context Mover's Distance & Barycenters: Optimal Transport of Contexts for Building Representations

Sidak Pal Singh, Andreas Hug, Aymeric Dieuleveut, Martin Jaggi

Keywords Paper

0

0

0

0

14:15

04/07/2020

Modelling Context and Syntactical Features for Aspect-based Sentiment Analysis

Minh Hieu Phan, Philip O. Ogunbona

Keywords Paper

Modelling Context, Aspect-based Analysis, aspect extraction, aspect classification

0

0

0

0

11:45

05/01/2021

Towards Fair Cross-Domain Adaptation via Generative Learning

Tongxin Wang, Zhengming Ding, Wei Shao and
Haixu Tang, Kun Huang

Keywords Paper

0

0

0

0

4:56

02/02/2021

Making the Relation Matters: Relation of Relation Learning Network for Sentence Semantic Matching

Kun Zhang, Le Wu, Guangyi Lv and
Meng Wang, Enhong Chen, Shulan Ruan

Keywords Paper

0

0

0

0

15:16

16/11/2020

Information-Theoretic Probing with Minimum Description Length

Elena Voita, Ivan Titov

Keywords Paper

random tasks, estimating mdl, representations, pretrained representations

0

0

0

0

11:29

01/07/2020

Learning Probabilistic Sentence Representations from Paraphrases

Mingda Chen, Kevin Gimpel

Keywords Paper

0

0

0

0

5:00

06/12/2020

Generalized Focal Loss: Learning Qualified and Distributed Bounding Boxes for Dense Object Detection

Xiang Li, Wenhai Wang, Lijun Wu and
Shuo Chen, Xiaolin Hu, Jun Li, Jinhui Tang, Jian Yang

Keywords Paper

0

0

0

0

2:42

02/02/2021

Merging Statistical Feature via Adaptive Gate for Improved Text Classification

Xianming Li, Zongxi Li, Haoran Xie, Qing Li

Keywords Paper

0

0

0

0

14:56

03/05/2021

Probing BERT in Hyperbolic Spaces

Boli Chen, Yao Fu, Guangwei Xu and
Pengjun Xie, Chuanqi Tan, Mosha Chen, Liping Jing

Keywords Paper

Sentiment, Syntax, Probe, BERT, Hyperbolic

0

0

0

0

5:10

03/05/2021

Prototypical Representation Learning for Relation Extraction

Ning Ding, Xiaobin Wang, Yao Fu and
Guangwei Xu, Rui Wang, Pengjun Xie, Ying Shen, Fei Huang, Hai-Tao Zheng, Rui Zhang

Keywords Paper

NLP, Representation Learning, Relation Extraction

0

0

0

0

5:14

02/02/2021

A Unified Taylor Framework for Revisiting Attribution Methods

Huiqi Deng, Na Zou, Mengnan Du and
Weifu Chen, Guocan Feng, Xia Hu

Keywords Paper

0

0

0

0

16:18

05/12/2020

Beyond fine-tuning: Few-sample sentence embedding transfer

Siddhant Garg, Rohit Kumar Sharma, Yingyu Liang

Keywords Paper

0

0

0

0

9:56

04/07/2020

Evaluating Explanation Methods for Neural Machine Translation

Jierui Li, Lemao Liu, Huayang Li and
Guanlin Li, Guoping Huang, Shuming Shi

Keywords Paper

Neural Translation, translation tasks, Explanation Methods, black-box models

0

0

0

0

10:55

16/11/2020

An Unsupervised Sentence Embedding Method by Mutual Information Maximization

Yan Zhang, Ruidan He, Zuozhu Liu and
Kwan Hui Lim, Lidong Bing

Keywords Paper

sentence-pair tasks, clustering, semantic search, downstream tasks

0

0

0

0

12:22

30/11/2020

Reconstructing Human Body Mesh from Point Clouds by Adversarial GP Network

Boyao Zhou, Jean-Sebastien Franco, Federica Bogo and
Bugra Tekin, Edmond Boyer

Keywords Paper

0

0

0

0

7:09

26/04/2020

Incorporating BERT into Neural Machine Translation

Jinhua Zhu, Yingce Xia, Lijun Wu and
Di He, Tao Qin, Wengang Zhou, Houqiang Li, Tieyan Liu

Keywords Paper

BERT, Neural Machine Translation

0

0

0

0

4:47

19/10/2020

Enhance prototypical network with text descriptions for few-shot relation classification

Kaijia Yang, Nantao Zheng, Xinyu Dai and
Liang He, Shujian Huang, Jiajun Chen

Keywords Paper

text description, relation extraction, few shot

0

0

0

0

6:55

08/12/2020

SpanAlign: Sentence Alignment Method based on Cross-Language Span Prediction and ILP

Katsuki Chousa, Masaaki Nagata, Masaaki Nishino

Keywords Paper

0

0

0

0

14:39

26/08/2020

Neural Topic Model with Attention for Supervised Learning

Xinyi Wang, YI YANG

Keywords Paper

0

0

0

0

12:39

06/12/2021

Test-Time Classifier Adjustment Module for Model-Agnostic Domain Generalization

Yusuke Iwasawa, Yutaka Matsuo

Keywords Paper

deep learning, optimization, transformers, domain adaptation

0

0

0

0

13:50

30/11/2020

VAN: Versatile Affinity Network for End-to-end Online Multi-Object Tracking

Hyemin Lee, Inhan Kim, Daijin Kim

Keywords Paper

0

0

0

0

9:04