Self-Attention is Not Only a Weight: Analyzing BERT with Vector Norms

04/07/2020

Self-Attention is Not Only a Weight: Analyzing BERT with Vector Norms

Goro Kobayashi, Tatsuki Kuribayashi, Sho Yokoi, Kentaro Inui

Keywords: BERT, Self-attention modules, Transformer-based models, output modules

Abstract Paper Similar Papers

Abstract: Self-attention modules are essential building blocks of Transformer-based language models and hence are the subject of a large number of studies aiming to discover which linguistic capabilities these models possess (Rogers et al., 2020). Such studies are commonly conducted by analyzing correlations of attention weights with specific linguistic phenomena. In this paper, we show that attention weights alone are only one of two factors determining the output of self-attention modules and propose to incorporate the other factor, namely the norm of the transformed input vectors, into the analysis, as well. Our analysis of self-attention modules in BERT (Devlin et al., 2019) shows that the proposed method produces insights that better agree with linguistic intuitions than an analysis based on attention-weights alone. Our analysis further reveals that BERT controls the amount of the contribution from frequent informative and less informative tokens not by attention weights but via vector norms.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at ACL 2020 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

01/07/2020

CopyBERT: A Unified Approach to Question Generation with Self-Attention

Stalin Varanasi, Saadullah Amin, Guenter Neumann

Keywords Paper

0

0

0

0

12:35

02/02/2021

Self-Attention Attribution: Interpreting Information Interactions Inside Transformer

Yaru Hao, Li Dong, Furu Wei, Ke Xu

Keywords Paper

0

0

0

0

16:26

19/04/2021

Deep subjecthood: Higher-order grammatical features in multilingual BERT

Isabel Papadimitriou, Ethan A. Chi, Richard Futrell, Kyle Mahowald

Keywords Paper

0

0

0

0

11:56

02/02/2021

Have We Solved The Hard Problem? It’s Not Easy! Contextual Lexical Contrast as a Means to Probe Neural Coherence

Wenqiang Lei, Yisong Miao, Runpeng Xie and
Bonnie Webber, Meichun Liu, Tat-Seng Chua, Nancy F. Chen

Keywords Paper

0

0

0

0

18:55

02/02/2021

Merging Statistical Feature via Adaptive Gate for Improved Text Classification

Xianming Li, Zongxi Li, Haoran Xie, Qing Li

Keywords Paper

0

0

0

0

14:56

08/12/2020

Specializing Unsupervised Pretraining Models for Word-Level Semantic Similarity

Anne Lauscher, Ivan Vulić, Edoardo Maria Ponti and
Anna Korhonen, Goran Glavaš

Keywords Paper

0

0

0

0

13:01

06/12/2021

Can fMRI reveal the representation of syntactic structure in the brain?

Aniketh Janardhan Reddy, Leila Wehbe

Keywords Paper

neuroscience, graph learning

0

0

0

0

15:02

04/07/2020

On the Linguistic Representational Power of Neural Machine Translation Models

Yonatan Belinkov, Nadir Durrani, Fahim Dalvi and
Hassan Sajjad, James Glass

Keywords Paper

Linguistic Models, natural processing, artificial intelligence, translating languages

0

0

0

0

19:17

04/07/2020

QuASE: Question-Answer Driven Sentence Encoding

Hangfeng He, Qiang Ning, Dan Roth

Keywords Paper

named recognition, NLP tasks, QuASE, QAMR

0

0

0

0

11:05

16/11/2020

Syntactic Structure Distillation Pretraining for Bidirectional Encoders

Adhiguna Kuncoro, Lingpeng Kong, Daniel Fried and
Dani Yogatama, Laura Rimell, Chris Dyer, Phil Blunsom

Keywords Paper

bert pretraining, structured tasks, natural understanding, textual learners

0

0

0

0

12:23

03/05/2021

Probing BERT in Hyperbolic Spaces

Boli Chen, Yao Fu, Guangwei Xu and
Pengjun Xie, Chuanqi Tan, Mosha Chen, Liping Jing

Keywords Paper

Sentiment, Syntax, Probe, BERT, Hyperbolic

0

0

0

0

5:10

02/02/2021

Context-Guided BERT for Targeted Aspect-Based Sentiment Analysis

Zhengxuan Wu, Desmond C. Ong

Keywords Paper

0

0

0

0

15:17

14/06/2020

Attention-Guided Hierarchical Structure Aggregation for Image Matting

Yu Qiao, Yuhao Liu, Xin Yang and
Dongsheng Zhou, Mingliang Xu, Qiang Zhang, Xiaopeng Wei

Keywords Paper

image matting, attention, hierarchical, aggregation, appearance cues

0

0

0

0

0:59

16/11/2020

Probing Pretrained Language Models for Lexical Semantics

Ivan Vulić, Edoardo Maria Ponti, Robert Litschko and
Goran Glavaš, Anna Korhonen

Keywords Paper

lexical tasks, pretrained models, lms, lexical strategies

0

0

0

0

12:17

26/04/2020

Residual Energy-Based Models for Text Generation

Yuntian Deng, Anton Bakhtin, Myle Ott and
Arthur Szlam, Marc'Aurelio Ranzato

Keywords Paper

energy-based models, text generation

0

0

0

0

4:59

02/02/2021

The Heads Hypothesis: A Unifying Statistical Approach Towards Understanding Multi-Headed Attention in BERT

Madhura Pande, Aakriti Budhraja, Preksha Nema and
Pratyush Kumar, Mitesh M. Khapra

Keywords Paper

0

0

0

0

14:29

22/06/2020

Exploiting Semantic Relations for Fine-grained Entity Typing

Hongliang Dai, Yangqiu Song, Xin Li

Keywords Paper

Fine-grained Entity Typing, Hypernym Extraction, Semantic Role Labeling

0

0

0

0

4:45

16/11/2020

Towards Interpreting BERT for Reading Comprehension Based QA

Sahana Ramnath, Preksha Nema, Deep Sahni, Mitesh M. Khapra

Keywords Paper

nlp tasks, reading answering, contextual understanding, answer prediction

0

0

0

0

7:02

02/02/2021

Improving Commonsense Causal Reasoning by Adversarial Training and Data Augmentation

Ieva Staliūnaitė, Philip John Gorinski, Ignacio Iacobacci

Keywords Paper

0

0

0

0

16:40

03/05/2021

Grounding Language to Autonomously-Acquired Skills via Goal Generation

Ahmed Akakzia, Cédric Colas, Pierre-Yves Oudeyer and
Mohamed CHETOUANI, Olivier Sigaud

Keywords Paper

intrinsic motivations, Deep reinforcement learning, autonomous learning, symbolic representations

0

0

0

0

5:01

02/02/2021

Making the Relation Matters: Relation of Relation Learning Network for Sentence Semantic Matching

Kun Zhang, Le Wu, Guangyi Lv and
Meng Wang, Enhong Chen, Shulan Ruan

Keywords Paper

0

0

0

0

15:16

04/07/2020

How does BERT's attention change when you fine-tune? An analysis methodology and a case study in negation scope

Yiyun Zhao, Steven Bethard

Keywords Paper

downstream task, NLP problems, knowledge-related tasks, downstream tasks

0

0

0

0

11:43

16/11/2020

Generating Dialogue Responses from a Semantic Latent Space

Wei-Jen Ko, Avik Ray, Yilin Shen, Hongxia Jin

Keywords Paper

generation responses, regression task, open-domain models, end-to-end classification

0

0

0

0

11:26

04/07/2020

Why is penguin more similar to polar bear than to sea gull? Analyzing conceptual knowledge in distributional models

Pia Sommerauer

Keywords Paper

word ing, distributional models, BERT, ELMO

0

0

0

0

11:17

30/11/2020

Show, Conceive and Tell: Image Captioning with Prospective Linguistic Information

Yiqing Huang, Jiansheng Chen

Keywords Paper

0

0

0

0

7:08

26/04/2020

Incorporating BERT into Neural Machine Translation

Jinhua Zhu, Yingce Xia, Lijun Wu and
Di He, Tao Qin, Wengang Zhou, Houqiang Li, Tieyan Liu

Keywords Paper

BERT, Neural Machine Translation

0

0

0

0

4:47

08/12/2020

Linguistic Profiling of a Neural Language Model

Alessio Miaschi, Dominique Brunato, Felice Dell’Orletta, Giulia Venturi

Keywords Paper

0

0

0

0

14:06

04/07/2020

Information-Theoretic Probing for Linguistic Structure

Tiago Pimentel, Josef Valvoda, Rowan Hall Maudslay and
Ran Zmigrod, Adina Williams, Ryan Cotterell

Keywords Paper

Information-Theoretic Probing, NLP tasks, linguistic task, probing

0

0

0

0

10:30

16/11/2020

Detecting Fine-Grained Cross-Lingual Semantic Divergences without Supervision by Learning to Rank

Eleftheria Briakou, Marine Carpuat

Keywords Paper

detecting content, cross-lingual nlp, machine problem, annotation

0

0

0

0

11:06

08/12/2020

Syntactically Aware Cross-Domain Aspect and Opinion Terms Extraction

Oren Pereg, Daniel Korat, Moshe Wasserblat

Keywords Paper

0

0

0

0

7:46

16/11/2020

LAReQA: Language-Agnostic Answer Retrieval from a Multilingual Pool

Uma Roy, Noah Constant, Rami Al-Rfou and
Aditya Barua, Aaron Phillips, Yinfei Yang

Keywords Paper

language-agnostic retrieval, cross-lingual tasks, cross-lingual retrieval, alignment

0

0

0

0

12:07

01/07/2020

Syntactic Parsing in Humans and Machines

Paola Merlo

Keywords Paper

0

0

0

0

44:12

03/08/2020

TX-Ray: Quantifying and Explaining Model-Knowledge Transfer in (Un-)Supervised NLP

Nils Rethmeier, Vageesh Kumar Saxena, Isabelle Augenstein

Keywords Paper

0

0

0

0

7:32

16/11/2020

An Information Bottleneck Approach for Controlling Conciseness in Rationale Extraction

Bhargavi Paranjape, Mandar Joshi, John Thickstun and
Hannaneh Hajishirzi, Luke Zettlemoyer

Keywords Paper

language understanding, semi-supervised setting, complex models, explainer

0

0

0

0

11:44

04/07/2020

Recurrent Neural Network Language Models Always Learn English-Like Relative Clause Attachment

Forrest Davis, Marten van Schijndel

Keywords Paper

production, Recurrent Always, language models, RNN LMs

0

0

0

0

7:48

16/11/2020

A Bilingual Generative Transformer for Semantic Sentence Embedding

John Wieting, Graham Neubig, Taylor Berg-Kirkpatrick

Keywords Paper

source separation, semantic encoding, data distributions, unsupervised evaluations

0

0

0

0

14:32

02/02/2021

KEML: A Knowledge-Enriched Meta-Learning Framework for Lexical Relation Classification

Chengyu Wang, Minghui Qiu, Jun Huang, Xiaofeng He

Keywords Paper

0

0

0

0

15:47

05/12/2020

DAPPER: Learning domain-adapted persona representation using pretrained BERT and external memory

Prashanth Vijayaraghavan, Eric Chu, Deb Roy

Keywords Paper

0

0

0

0

14:48

04/07/2020

A Frame-based Sentence Representation for Machine Reading Comprehension

Shaoru Guo, Ru Li, Hongye Tan and
Xiaoli Li, Yong Guan, Hongyan Zhao, Yueping Zhang

Keywords Paper

Machine Comprehension, Sentence representation, SR, Machine MRC

0

0

0

0

6:33

16/11/2020

Multimodal Routing: Improving Local and Global Interpretability of Multimodal Language Analysis

Yao-Hung Hubert Tsai, Martin Ma, Muqiao Yang and
Ruslan Salakhutdinov, Louis-Philippe Morency

Keywords Paper

human-centric tasks, sentiment analysis, emotion recognition, multimodal learning

1

0

0

0

10:54