Telling BERT’s full story: From local attention to global aggregation

19/04/2021

Telling BERT’s full story: From local attention to global aggregation

Damian Pascual, Gino Brunner, Roger Wattenhofer

Keywords:

Abstract Paper Similar Papers

Abstract: We take a deep look into the behaviour of self-attention heads in the transformer architecture. In light of recent work discouraging the use of attention distributions for explaining a model’s behaviour, we show that attention distributions can nevertheless provide insights into the local behaviour of attention heads. This way, we propose a distinction between local patterns revealed by attention and global patterns that refer back to the input, and analyze BERT from both angles. We use gradient attribution to analyze how the output of an attention head depends on the input tokens, effectively extending the local attention-based analysis to account for the mixing of information throughout the transformer layers. We find that there is a significant mismatch between attention and attribution distributions, caused by the mixing of context inside the model. We quantify this discrepancy and observe that interestingly, there are some patterns that persist across all layers despite the mixing.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at EACL 2021 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

26/04/2020

On Identifiability in Transformers

Gino Brunner, Yang Liu, Damian Pascual and
Oliver Richter, Massimiliano Ciaramita, Roger Wattenhofer

Keywords Paper

Self-attention, interpretability, identifiability, BERT, Transformer, NLP, explanation, gradient attribution

0

0

0

0

4:58

02/02/2021

Self-Attention Attribution: Interpreting Information Interactions Inside Transformer

Yaru Hao, Li Dong, Furu Wei, Ke Xu

Keywords Paper

0

0

0

0

16:26

04/07/2020

Quantifying Attention Flow in Transformers

Samira Abnar, Willem Zuidema

Keywords Paper

Quantifying Transformers, quantifying information, Attention Transformers, Transformer model

0

0

0

0

6:24

06/12/2021

Do Vision Transformers See Like Convolutional Neural Networks?

Maithra Raghu, Thomas Unterthiner, Simon Kornblith and
Chiyuan Zhang, Alexey Dosovitskiy

Keywords Paper

deep learning, machine learning, transformers, vision, representation learning, transfer learning

0

0

0

0

13:13

05/01/2021

Context-Aware Domain Adaptation in Semantic Segmentation

Jinyu Yang, Weizhi An, Chaochao Yan and
Peilin Zhao, Junzhou Huang

Keywords Paper

0

0

0

0

4:59

14/06/2020

Learning to Manipulate Individual Objects in an Image

Yanchao Yang, Yutong Chen, Stefano Soatto

Keywords Paper

representation learning, disentangled, spatial disentanglement, unsupervised, spatially localized, object-centric, scene manipulation, independent factors, controllable factors, multiple objects

0

0

0

0

1:01

30/11/2020

Local Context Attention for Salient Object Segmentation

Jing Tan Research, Pengfei Xiong Research, Zhengyi Lv Research and
Kuntao Xiao Research, Yuwen He Research

Keywords Paper

0

0

0

0

9:35

02/02/2021

Generative Partial Visual-Tactile Fused Object Clustering

Tao Zhang, Yang Cong, Gan Sun and
Jiahua Dong, Yuyang Liu, Zhengming Ding

Keywords Paper

0

0

0

0

15:49

25/07/2020

Disentangled graph collaborative filtering

Xiang Wang, Hongye Jin, An Zhang and
Xiangnan He, Tong Xu, Tat-Seng Chua

Keywords Paper

explainable recommendation, disentangled representation learning, collaborative filtering, graph neural networks

0

0

0

0

15:17

22/09/2020

MEANTIME: Mixture of attention mechanisms with multi-temporal embeddings for sequential recommendation

Sung Min Cho, Eunhyeok Park, Sungjoo Yoo

Keywords Paper

Self-attention, Sequential Recommendation, Temporal Embedding, BERT

0

0

0

0

3:10

06/12/2020

Model Agnostic Multilevel Explanations

Karthi Natesan Ramamurthy, Bhanu Vinzamuri, Yunfeng Zhang, Amit Dhurandhar

Keywords Paper

0

0

0

0

3:17

03/05/2021

Explaining the Efficacy of Counterfactually Augmented Data

Divyansh Kaushik, Amrith Setlur, Eduard H Hovy, Zachary Lipton

Keywords Paper

sentiment analysis, text classification, natural language inference, annotation artifacts, humans in the loop

0

0

0

0

5:11

02/02/2021

Classification by Attention: Scene Graph Classification with Prior Knowledge

Sahand Sharifzadeh, Sina Moayed Baharlou, Volker Tresp

Keywords Paper

0

0

0

0

17:04

05/01/2021

Regional Attention Networks With Context-Aware Fusion for Group Emotion Recognition

Ahmed Shehab Khan, Zhiyuan Li, Jie Cai, Yan Tong

Keywords Paper

0

0

0

0

5:00

19/08/2021

Dependent Multi-Task Learning with Causal Intervention for Image Captioning

Wenqing Chen, Jidong Tian, Caoyun Fan and
Hao He, Yaohui Jin

Keywords Paper

Machine Learning, Transfer, Adaptation, Multi-task Learning, Natural Language Generation, Language and Vision

0

0

0

0

12:02

26/08/2020

Characterization of Overlap in Observational Studies

Michael Oberst, Fredrik Johansson, Dennis Wei and
Tian Gao, Gabriel Brat, David Sontag, Kush Varshney

Keywords Paper

0

0

0

0

15:01

02/02/2021

The Heads Hypothesis: A Unifying Statistical Approach Towards Understanding Multi-Headed Attention in BERT

Madhura Pande, Aakriti Budhraja, Preksha Nema and
Pratyush Kumar, Mitesh M. Khapra

Keywords Paper

0

0

0

0

14:29

06/12/2021

Intriguing Properties of Contrastive Losses

Ting Chen, Calvin Luo, Lala Li

Keywords Paper

self-supervised learning, vision, contrastive learning

0

0

0

0

13:36

19/08/2021

Context-Aware Image Inpainting with Learned Semantic Priors

Wendong Zhang, Junwei Zhu, Ying Tai and
Yunbo Wang, Wenqing Chu, Bingbing Ni, Chengjie Wang, Xiaokang Yang

Keywords Paper

Computer Vision, 2D and 3D Computer Vision, Deep Learning

0

0

0

0

13:26

16/11/2020

LOGAN: Local Group Bias Detection by Clustering

Jieyu Zhao, Kai-Wei Chang

Keywords Paper

evaluating bias, toxicity classification, object tasks, machine techniques

0

0

0

0

7:16

05/01/2021

Facial Expression Recognition in the Wild via Deep Attentive Center Loss

Amir Hossein Farzaneh, Xiaojun Qi

Keywords Paper

0

0

0

0

4:59

14/06/2020

Hierarchical Pyramid Diverse Attention Networks for Face Recognition

Qiangchang Wang, Tianyi Wu, He Zheng, Guodong Guo

Keywords Paper

diverse attention, pyramid attention, hierarchical bilinear pooling, local representations, multi-scale features, hierarchical information, pose variation, age gap, quality change, face recognition

0

0

0

0

1:01

22/09/2020

FISSA: Fusing item similarity models with self-attention networks for sequential recommendation

Jing Lin, Weike Pan, Zhong Ming

Keywords Paper

Item Similarity Models, Sequential Recommendation, Gating Networks, Self-Attention

0

0

0

0

2:06

22/11/2021

Measuring the Biases and Effectiveness of Content-Style Disentanglement

Xiao Liu, Spyridon Thermos, Gabriele Valvano and
Agisilaos Chartsias, Alison Q O'Neil, Sotirios Tsaftaris

Keywords Paper

Disentangled Representations Learning, Content and Style Disentanglement, Metrics, Biases, Semantic Segmentation, Image to Image Translation, Pose Estimation

0

0

0

0

2:57

22/11/2021

Multi-Granularity Hypergraphs and Adversarial Complementary Learning for Person Re-identification

Yi Ma, Tian Bai, Wenyu Zhang, Jian Hu

Keywords Paper

Person Re-Identification, Hypergraphs Learning, Adversarial Complementary Learning

0

0

0

0

2:40

05/01/2021

PDAN: Pyramid Dilated Attention Network for Action Detection

Rui Dai, Srijan Das, Luca Minciullo and
Lorenzo Garattoni, Gianpiero Francesca, Francois Bremond

Keywords Paper

0

0

0

0

5:00

18/07/2021

Structured World Belief for Reinforcement Learning in POMDP

Gautam Singh, Skand Peri, Junghyun Kim and
Hyunseok Kim, Sungjin Ahn

Keywords Paper

Deep Learning, Embedding and Representation learning

0

0

0

0

5:21

22/11/2021

Feature Fusion Vision Transformer for Fine-Grained Visual Categorization

Jun Wang, Xiaohan Yu, Yongsheng Gao

Keywords Paper

Fine-grained visual categorization, Vision transformer, Self-attention, Feature Fusion

0

0

0

0

3:02

06/12/2020

Learning Semantic-aware Normalization for Generative Adversarial Networks

Heliang Zheng, Jianlong Fu, zengyh Zeng and
Jiebo Luo, Zheng-Jun Zha

Keywords Paper

0

0

0

0

3:11

12/07/2020

Learning to Combine Top-Down and Bottom-Up Signals in Recurrent Neural Networks with Attention over Modules

Sarthak Mittal, Alex Lamb, Anirudh Goyal and
Vikram Voleti, Murray Shanahan, Guillaume Lajoie, Michael Mozer, Yoshua Bengio

Keywords Paper

Sequential, Network, and Time-Series Modeling

0

0

0

0

12:37

06/12/2021

An Efficient Transfer Learning Framework for Multiagent Reinforcement Learning

Tianpei Yang, Weixun Wang, Hongyao Tang and
Jianye Hao, Zhaopeng Meng, Hangyu Mao, Dong Li, Wulong Liu, Yingfeng Chen, Yujing Hu, Changjie Fan, Chengwei Zhang

Keywords Paper

reinforcement learning and planning, transfer learning

0

0

0

0

15:21

26/04/2020

Disentangling neural mechanisms for perceptual grouping

Junkyung Kim, Drew Linsley, Kalpit Thakkar, Thomas Serre

Keywords Paper

Perceptual grouping, visual cortex, recurrent feedback, horizontal connections, top-down connections

0

0

0

0

5:16

30/11/2020

Localin Reshuffle Net: Toward Naturally and Efficiently Facial Image Blending

Chengyao Zheng, Siyu Xia, Joseph Robinson and
Changsheng Lu, Wayne Wu, Chen Qian, Ming Shao

Keywords Paper

0

0

0

0

2:19

13/04/2021

Variational selective autoencoder: Learning from partially-observed heterogeneous data

Yu Gong, Hossein Hajimirsadeghi, Jiawei He and
Thibaut Durand, Greg Mori

Keywords Paper

0

0

0

0

3:17

06/12/2021

Local Disentanglement in Variational Auto-Encoders Using Jacobian $L_1$ Regularization

Travers Rhodes, Daniel Lee

Keywords Paper

representation learning

0

0

0

0

6:58

14/06/2020

Stochastic Classifiers for Unsupervised Domain Adaptation

Zhihe Lu, Yongxin Yang, Xiatian Zhu and
Cong Liu, Yi-Zhe Song, Tao Xiang

Keywords Paper

unsupervised domain adaptation, stochastic classifiers, adversarial learning, local alignment, multi-head network, object classification, semantic segmentation

0

0

0

0

1:00

02/02/2021

Bi-Classifier Determinacy Maximization for Unsupervised Domain Adaptation

Shuang Li, Fangrui Lv, Binhui Xie and
Chi Harold Liu, Jian Liang, Chen Qin

Keywords Paper

0

0

0

0

14:07

02/02/2021

Looking Wider for Better Adaptive Representation in Few-Shot Learning

Jiabao Zhao, Yifan Yang, Xin Lin and
Jing Yang, Liang He

Keywords Paper

0

0

0

0

16:58

02/02/2021

Visualization of Supervised and Self-Supervised Neural Networks via Attribution Guided Factorization

Shir Gur, Ameen Ali, Lior Wolf

Keywords Paper

0

0

0

0

14:14

03/05/2021

Spatially Structured Recurrent Modules

Nasim Rahaman, Anirudh Goyal, Waleed Gondal and
Manuel Wuthrich, Stefan Bauer, Yash Sharma, Yoshua Bengio, Bernhard Schoelkopf

Keywords Paper

spatio-temporal modelling, partially observed environments, recurrent neural networks, modular architectures

0

0

0

0

5:27