Repulsive Attention: Rethinking Multi-head Attention as Bayesian Inference

16/11/2020

Repulsive Attention: Rethinking Multi-head Attention as Bayesian Inference

Bang An, Jie Lyu, Zhenyi Wang, Chunyuan Li, Changwei Hu, Fei Tan, Ruiyi Zhang, Yifan Hu, Changyou Chen

Keywords: natural applications, attention collapse, neural mechanism, bayesian perspective

Abstract Paper Similar Papers

Abstract: The neural attention mechanism plays an important role in many natural language processing applications. In particular, multi-head attention extends single-head attention by allowing a model to jointly attend information from different perspectives. However, without explicit constraining, multi-head attention may suffer from attention collapse, an issue that makes different heads extract similar attentive features, thus limiting the model′s representation power. In this paper, for the first time, we provide a novel understanding of multi-head attention from a Bayesian perspective. Based on the recently developed particle-optimization sampling techniques, we propose a non-parametric approach that explicitly improves the repulsiveness in multi-head attention and consequently strengthens model′s expressiveness. Remarkably, our Bayesian interpretation provides theoretical inspirations on the not-well-understood questions: why and how one uses multi-head attention. Extensive experiments on various attention models and applications demonstrate that the proposed repulsive attention can improve the learned feature diversity, leading to more informative representations with consistent performance improvement on multiple tasks.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at EMNLP 2020 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

06/12/2021

Drop, Swap, and Generate: A Self-Supervised Approach for Generating Neural Activity

Ran Liu, Mehdi Azabou, Max Dabagia and
Chi-Heng Lin, Mohammad Gheshlaghi Azar, Keith Hengen, Michal Valko, Eva Dyer

Keywords Paper

self-supervised learning, generative model, representation learning

0

0

0

0

19:38

14/06/2020

Modeling Biological Immunity to Adversarial Examples

Edward Kim, Jocelyn Rego, Yijing Watkins, Garrett T. Kenyon

Keywords Paper

adversarial examples, sparse coding, retina, cortex, neuron, biology, robust, feedback

0

0

0

0

1:01

03/05/2021

Return-Based Contrastive Representation Learning for Reinforcement Learning

Guoqing Liu, Chuheng Zhang, Li Zhao and
Tao Qin, Jinhua Zhu, Li Jian, Nenghai Yu, Tie-Yan Liu

Keywords Paper

reinforcement learning, auxiliary task, contrastive learning, representation learning

0

0

0

0

5:20

19/04/2021

Measuring and improving faithfulness of attention in neural machine translation

Pooya Moradi, Nishant Kambhatla, Anoop Sarkar

Keywords Paper

0

1

1

1

10:55

14/06/2020

Towards Robust Image Classification Using Sequential Attention Models

Daniel Zoran, Mike Chrzanowski, Po-Sen Huang and
Sven Gowal, Alex Mott, Pushmeet Kohli

Keywords Paper

adversarial robustness, imagenet, attention, interpretabilty

0

0

0

0

1:01

02/02/2021

DIBS: Diversity Inducing Information Bottleneck in Model Ensembles

Samarth Sinha, Homanga Bharadhwaj, Anirudh Goyal and
Hugo Larochelle, Animesh Garg, Florian Shkurti

Keywords Paper

0

0

0

0

16:26

05/01/2021

Self Supervision for Attention Networks

Badri N. Patro, Kasturi G.S., Ansh Jain, Vinay P. Namboodiri

Keywords Paper

0

0

0

0

5:01

06/12/2021

Associative Memories via Predictive Coding

Tommaso Salvatori, Yuhang Song, Yujian Hong and
Lei Sha, Simon Frieder, Zhenghua Xu, Rafal Bogacz, Thomas Lukasiewicz

Keywords Paper

deep learning, robustness, generative model

0

0

0

0

12:22

06/12/2021

A flow-based latent state generative model of neural population responses to natural images

Mohammad Bashiri, Edgar Walker, Konstantin-Klemens Lurz and
Akshay Jagadish, Taliah Muhammad, Zhiwei Ding, Zhuokun Ding, Andreas Tolias, Fabian Sinz

Keywords Paper

generative model

0

0

0

0

14:20

08/12/2020

Meet Changes with Constancy: Learning Invariance in Multi-Source Translation

Jianfeng Liu, Ling Luo, Xiang Ao and
Yan Song, Haoran Xu, Jian Ye

Keywords Paper

0

0

0

0

13:35

06/12/2021

Explaining heterogeneity in medial entorhinal cortex with task-driven neural networks

Aran Nayebi, Alexander Attinger, Malcolm Campbell and
Kiah Hardcastle, Isabel Low, Caitlin S Mallory, Gabriel Mel, Ben Sorscher, Alex H Williams, Surya Ganguli, Lisa Giocomo, Dan Yamins

Keywords Paper

deep learning

0

0

0

0

14:10

06/12/2020

Dynamic allocation of limited memory resources in reinforcement learning

Nisheet Patel, Luigi Acerbi, Alexandre Pouget

Keywords Paper

0

0

0

0

3:19

14/06/2020

Non-Local Neural Networks With Grouped Bilinear Attentional Transforms

Lu Chi, Zehuan Yuan, Yadong Mu, Changhu Wang

Keywords Paper

attention, non-local, bilinear, image classification, video classification, grouped, data-adaptive

0

0

0

0

1:01

18/07/2021

Bayesian Attention Belief Networks

Shujian Zhang, Xinjie Fan, Bo Chen, Mingyuan Zhou

Keywords Paper

, Applications, Program Understanding and Generation, Deep Learning, Bayesian Deep Learning

0

0

0

0

4:28

26/04/2020

Co-Attentive Equivariant Neural Networks: Focusing Equivariance On Transformations Co-Occurring in Data

David W. Romero, Mark Hoogendoorn

Keywords Paper

Equivariant Neural Networks, Attention Mechanisms, Deep Learning

0

0

0

0

4:51

03/05/2021

Generalized Multimodal ELBO

Thomas Sutter, Imant Daunhawer, Julia E Vogt

Keywords Paper

self-supervised, generative learning, ELBO, VAE, Multimodal

0

0

0

0

5:15

19/08/2021

Hindsight Trust Region Policy Optimization

Hanbo Zhang, Site Bai, Xuguang Lan and
David Hsu, Nanning Zheng

Keywords Paper

Machine Learning, Deep Reinforcement Learning, Reinforcement Learning

0

0

0

0

13:14

06/12/2020

Bayesian Attention Modules

Xinjie Fan, Shujian Zhang, Bo Chen, Mingyuan Zhou

Keywords Paper

0

0

0

0

3:32

06/12/2020

Learning identifiable and interpretable latent models of high-dimensional neural activity using pi-VAE

Ding Zhou, Xue-Xin Wei

Keywords Paper

Reinforcement Learning and Planning -> Multi-Agent RL, Theory -> Game Theory and Computational Economics

0

0

0

0

3:22

01/07/2020

How Self-Attention Improves Rare Class Performance in a Question-Answering Dialogue Agent

Adam Stiff, Qi Song, Eric Fosler-Lussier

Keywords Paper

0

0

0

0

7:55

04/07/2020

Effective Estimation of Deep Generative Language Models

Tom Pelsmaeker, Wilker Aziz

Keywords Paper

Estimation Models, parameterisation models, posterior collapse, language modelling

0

0

0

0

12:19

06/12/2021

What Makes Multi-Modal Learning Better than Single (Provably)

Yu Huang, Chenzhuang Du, Zihui Xue and
Xuanyao Chen, Hang Zhao, Longbo Huang

Keywords Paper

theory, deep learning

0

0

0

0

13:23

12/07/2020

Adaptive Adversarial Multi-task Representation Learning

YUREN MAO, Weiwei Liu, Xuemin Lin

Keywords Paper

Transfer, Multitask and Meta-learning

0

0

0

0

13:24

06/12/2021

Noether Networks: meta-learning useful conserved quantities

Ferran Alet, Dylan Doblar, Allan Zhou and
Josh Tenenbaum, Kenji Kawaguchi, Chelsea Finn

Keywords Paper

machine learning, vision, meta learning

0

0

0

0

11:18

12/07/2020

Learning Representations that Support Extrapolation

Taylor Webb, Zachary Dulberg, Steven Frankland and
Alexander Petrov, Randall O'Reilly, Jonathan Cohen

Keywords Paper

Deep Learning - Algorithms

0

0

0

0

12:45

26/08/2020

Independent Subspace Analysis for Unsupervised Learning of Disentangled Representations

Jan Stuehmer, Richard Turner, Sebastian Nowozin

Keywords Paper

0

0

0

0

11:43

26/04/2020

Nesterov Accelerated Gradient and Scale Invariance for Adversarial Attacks

Jiadong Lin, Chuanbiao Song, Kun He and
Liwei Wang, John E. Hopcroft

Keywords Paper

adversarial examples, adversarial attack, transferability, Nesterov accelerated gradient, scale invariance

0

0

0

0

3:59

14/06/2020

Deep Homography Estimation for Dynamic Scenes

Hoang Le, Feng Liu, Shu Zhang, Aseem Agarwala

Keywords Paper

homography estimation, dynamic scenes, motion estimation, multi-task learning, deep learning

0

0

0

0

1:01

02/02/2021

Longitudinal Deep Kernel Gaussian Process Regression

Junjie Liang, Yanting Wu, Dongkuan Xu, Vasant G Honavar

Keywords Paper

0

0

0

0

16:27

06/12/2020

A new inference approach for training shallow and deep generalized linear models of noisy interacting neurons

Gabriel Mahuas, Giulio Isacchini, Olivier Marre and
Ulisse Ferrari, Thierry Mora

Keywords Paper

0

0

0

0

3:07

04/07/2020

Neural Data-to-Text Generation via Jointly Learning the Segmentation and Correspondence

Xiaoyu Shen, Ernie Chang, Hui Su and
Cheng Niu, Dietrich Klakow

Keywords Paper

Neural Generation, Segmentation, data-to-text tasks, neural model

0

0

0

0

9:09

19/08/2021

Contrastive Model Invertion for Data-Free Knolwedge Distillation

Gongfan Fang, Jie Song, Xinchao Wang and
Chengchao Shen, Xingen Wang, Mingli Song

Keywords Paper

Machine Learning, Deep Learning, Explainable/Interpretable Machine Learning, Transfer, Adaptation, Multi-task Learning

0

0

0

0

5:51

26/04/2020

Federated Adversarial Domain Adaptation

Xingchao Peng, Zijun Huang, Yizhe Zhu, Kate Saenko

Keywords Paper

Federated Learning, Domain Adaptation, Transfer Learning, Feature Disentanglement

0

0

0

2

4:57

04/07/2020

Multi-Domain Dialogue Acts and Response Co-Generation

Kai Wang, Junfeng Tian, Rui Wang and
Xiaojun Quan, Jianxing Yu

Keywords Paper

Generating responses, task-oriented systems, response generation, automatic evaluations

0

0

0

1

10:01

14/06/2020

Visual Commonsense R-CNN

Tan Wang, Jianqiang Huang, Hanwang Zhang, Qianru Sun

Keywords Paper

visual commonsense learning, causal inference, un-/self-supervised learning, visual representation learning, vision and language

0

0

0

0

1:01

06/12/2021

Estimating the Unique Information of Continuous Variables

Ari Pakman, Amin Nejatbakhsh, Dar Gilboa and
Abdullah Makkeh, Luca Mazzucato, Michael Wibral, Elad Schneidman

Keywords Paper

deep learning, optimization, generative model

0

0

0

0

12:39

12/07/2020

Enhancing Simple Models by Exploiting What They Already Know

Amit Dhurandhar, Karthikeyan Shanmugam, Ronny Luss

Keywords Paper

Supervised Learning

0

0

0

0

13:57

18/07/2021

Leveraging Sparse Linear Layers for Debuggable Deep Networks

Eric Wong, Shibani Santurkar, Aleksander Madry

Keywords Paper

Deep Learning

0

0

0

0

17:01

12/07/2020

Learning Attentive Meta-Transfer

Jaesik Yoon, Gautam Singh, Sungjin Ahn

Keywords Paper

Sequential, Network, and Time-Series Modeling

1

1

0

0

15:22

03/05/2021

Learning to Sample with Local and Global Contexts in Experience Replay Buffer

Youngmin Oh, Kimin Lee, Jinwoo Shin and
Eunho Yang, Sung Ju Hwang

Keywords Paper

reinforcement learning, off-policy RL, experience replay buffer

1

0

0

0

5:20