On the Dynamics of Training Attention Models

03/05/2021

On the Dynamics of Training Attention Models

Haoye Lu, Yongyi Mao, Amiya Nayak

Keywords:

Abstract Paper Similar Papers

Abstract: The attention mechanism has been widely used in deep neural networks as a model component. By now, it has become a critical building block in many state-of-the-art natural language models. Despite its great success established empirically, the working mechanism of attention has not been investigated at a sufficient theoretical depth to date. In this paper, we set up a simple text classification task and study the dynamics of training a simple attention-based classification model using gradient descent. In this setting, we show that, for the discriminative words that the model should attend to, a persisting identity exists relating its embedding and the inner product of its key and the query. This allows us to prove that training must converge to attending to the discriminative words when the attention output is classified by a linear classifier. Experiments are performed, which validate our theoretical analysis and provide further insights.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at ICLR 2021 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

02/02/2021

LRSC: Learning Representations for Subspace Clustering

Changsheng Li, Chen Yang, Bo Liu and
Ye Yuan, Guoren Wang

Keywords Paper

0

0

0

0

15:09

26/04/2020

Gradients as Features for Deep Representation Learning

Fangzhou Mu, Yingyu Liang, Yin Li

Keywords Paper

representation learning, gradient features, deep learning

0

0

0

0

5:07

06/12/2020

Leap-Of-Thought: Teaching Pre-Trained Models to Systematically Reason Over Implicit Knowledge

Alon Talmor, Oyvind Tafjord, Peter Clark and
Yoav Goldberg, Jonathan Berant

Keywords Paper

0

0

0

0

3:28

13/04/2021

Nonparametric estimation of heterogeneous treatment effects: From theory to learning algorithms

Alicia Curth, Mihaela Schaar

Keywords Paper

0

0

0

0

3:01

06/12/2021

Self-Attention Between Datapoints: Going Beyond Individual Input-Output Pairs in Deep Learning

Jannik Kossen, Neil Band, Clare Lyle and
Aidan Gomez, Thomas Rainforth, Yarin Gal

Keywords Paper

deep learning, transformers

0

0

0

0

9:54

02/02/2021

DeepCollaboration: Collaborative Generative and Discriminative Models for Class Incremental Learning

Bo Cui, Guyue Hu, Shan Yu

Keywords Paper

0

0

0

0

15:13

26/04/2020

A Neural Dirichlet Process Mixture Model for Task-Free Continual Learning

Soochan Lee, Junsoo Ha, Dongsu Zhang, Gunhee Kim

Keywords Paper

continual learning, task-free, task-agnostic

0

0

0

0

5:08

14/09/2020

Learning a Sequence of Sentiment Classification Tasks

Zixuan Ke, Bing Liu, Hao Wang, Lei Shu

Keywords Paper

0

0

0

0

14:23

16/11/2020

Meta Fine-Tuning Neural Language Models for Multi-Domain Text Mining

Chengyu Wang, Minghui Qiu, Jun Huang, Xiaofeng He

Keywords Paper

nlp tasks, fine-tuning, learning process, multi-domain tasks

0

0

0

0

9:58

12/07/2020

Retrieval Augmented Language Model Pre-Training

Kelvin Guu, Kenton Lee, Zora Tung and
Panupong Pasupat, Mingwei Chang

Keywords Paper

Applications - Language, Speech and Dialog

0

0

0

0

14:44

12/07/2020

Multi-Agent Determinantal Q-Learning

Yaodong Yang, Ying Wen, Jun Wang and
Liheng Chen, Kun Shao, David Mguni, Weinan Zhang

Keywords Paper

Planning, Control, and Multiagent Learning

0

0

0

0

15:58

03/05/2021

Distilling Knowledge from Reader to Retriever for Question Answering

Gautier Izacard, Edouard Grave

Keywords Paper

question answering, information retrieval

0

0

0

0

5:14

14/09/2020

Active Learning for Hierarchical Multi-Label Classification

Felipe Kenji Nakano, Ricardo Cerri, Vens Celin

Keywords Paper

0

0

0

0

15:42

04/07/2020

Deep Contextualized Self-training for Low Resource Dependency Parsing

Guy Rotman, Roi Reichart

Keywords Paper

Low Parsing, sequence tasks, Deep Self-training, Neural parsing

0

0

0

0

11:41

18/07/2021

Improving Generalization in Meta-learning via Task Augmentation

Huaxiu Yao, Long-Kai Huang, Linjun Zhang and
Ying WEI, Li Tian, James Zou, Junzhou Huang, Zhenhui (Jessie) Li

Keywords Paper

Algorithms, Multitask, Transfer, and Meta Learning

0

0

0

0

8:27

18/07/2021

LTL2Action: Generalizing LTL Instructions for Multi-Task RL

Pashootan Vaezipoor, Andrew C Li, Rodrigo A Toro Icarte, Sheila McIlraith

Keywords Paper

Reinforcement Learning and Planning

0

0

0

0

5:07

03/05/2021

One Network Fits All? Modular versus Monolithic Task Formulations in Neural Networks

Atish Agarwala, Abhimanyu Das, Brendan Juba and
Rina Panigrahy, Vatsal Sharan, Xin Wang, Qiuyi Zhang

Keywords Paper

deep learning theory, multi-task learning

0

0

0

0

5:18

06/12/2021

A Mathematical Framework for Quantifying Transferability in Multi-source Transfer Learning

Xinyi Tong, Xiangxiang Xu, Shao-Lun Huang, Lizhong Zheng

Keywords Paper

theory, deep learning, machine learning, vision, transfer learning

2

1

0

0

13:27

18/07/2021

A Theory of Label Propagation for Subpopulation Shift

Tianle Cai, Ruiqi Gao, Jason Lee, Qi Lei

Keywords Paper

Theory, Statistical Learning Theory

0

1

0

0

5:08

06/12/2020

Neural Complexity Measures

Yoonho Lee, Juho Lee, Sung Ju Hwang and
Eunho Yang, Seungjin Choi

Keywords Paper

0

0

0

0

3:22

03/05/2021

Dataset Meta-Learning from Kernel Ridge-Regression

Timothy Nguyen, Zhourong Chen, Jaehoon Lee

Keywords Paper

dataset corruption, infinite-width networks, neural kernels, kernel-ridge regression, dataset compression, dataset distillation, meta-learning

0

0

0

0

4:59

02/02/2021

Progressive Multi-task Learning with Controlled Information Flow for Joint Entity and Relation Extraction

Kai Sun, Richong Zhang, Samuel Mensah and
Yongyi Mao, Xudong Liu

Keywords Paper

0

0

0

0

13:45

03/05/2021

Parrot: Data-Driven Behavioral Priors for Reinforcement Learning

Avi Singh, Huihan Liu, Gaoyue Zhou and
Albert Yu, Nicholas Rhinehart, Sergey Levine

Keywords Paper

reinforcement learning, imitation learning

0

0

0

0

14:21

06/12/2020

Continuous Meta-Learning without Tasks

James Harrison, Apoorva Sharma, Chelsea Finn, Marco Pavone

Keywords Paper

0

0

0

0

3:09

18/07/2021

The Impact of Record Linkage on Learning from Feature Partitioned Data

Richard Nock, Stephen J Hardy, Wilko Henecka and
Hamish Ivey-Law, Jakub Nabaglo, Giorgio Patrini, Guillaume Smith, Brian Thorne

Keywords Paper

Theory, Statistical Learning Theory

0

0

0

0

6:02

06/12/2020

Knowledge Transfer in Multi-Task Deep Reinforcement Learning for Continuous Control

Zhiyuan Xu, Kun Wu, Zhengping Che and
Jian Tang, Jieping Ye

Keywords Paper

0

0

0

0

2:45

03/05/2021

A Trainable Optimal Transport Embedding for Feature Aggregation and its Relationship to Attention

Grégoire Mialon, Dexiong Chen, Alexandre d'Aspremont, Julien Mairal

Keywords Paper

attention, bioinformatics, transformers, optimal transport, kernel methods

0

0

0

0

5:29

18/07/2021

Function Contrastive Learning of Transferable Meta-Representations

Waleed Gondal, Shruti Joshi, Nasim Rahaman and
Stefan Bauer, Manuel Wuthrich, Bernhard Schölkopf

Keywords Paper

Algorithms, Multitask, Transfer, and Meta Learning

0

0

0

0

5:46

18/07/2021

Improved OOD Generalization via Adversarial Training and Pretraing

Mingyang Yi, Lu Hou, Jiacheng Sun and
Lifeng Shang, Xin Jiang, Qun Liu, Zhiming Ma

Keywords Paper

Theory, Deep learning Theory

0

0

0

0

4:11

02/02/2021

Physarum Powered Differentiable Linear Programming Layers and Applications

Zihang Meng, Sathya N. Ravi, Vikas Singh

Keywords Paper

0

0

0

0

16:57

06/12/2020

Efficient Contextual Bandits with Continuous Actions

Maryam Majzoubi, Chicheng Zhang, Rajan Chari and
Akshay Krishnamurthy, John Langford, Alex Slivkins Slivkins

Keywords Paper

0

0

0

0

3:15

04/11/2020

Retiarii: A Deep Learning Exploratory-Training Framework

Quanlu Zhang, Zhenhua Han, Fan Yang and
Yuge Zhang, Zhe Liu, Mao Yang, Lidong Zhou

Keywords Paper

0

0

0

0

20:05

06/12/2021

A Framework to Learn with Interpretation

Jayneel Parekh, Pavlo Mozharovskyi, Florence d'Alché-Buc

Keywords Paper

deep learning, interpretability

0

0

0

0

14:05

19/08/2021

Learning CNF Theories Using MDL and Predicate Invention

Arcchit Jain, Clément Gautrais, Angelika Kimmig, Luc De Raedt

Keywords Paper

Machine Learning, Relational Learning, Constraints and Data Mining; Constraints and Machine Learning

0

0

0

0

15:00

12/07/2020

Generative Teaching Networks: Accelerating Neural Architecture Search by Learning to Generate Synthetic Training Data

Felipe Petroski Such, Aditya Rawal, Joel Lehman and
Kenneth Stanley, Jeffrey Clune

Keywords Paper

Transfer, Multitask and Meta-learning

0

0

0

0

7:25

13/04/2021

Learning to defend by learning to attack

Haoming Jiang, Zhehui Chen, Yuyang Shi and
Bo Dai, Tuo Zhao

Keywords Paper

0

0

0

0

2:58

06/12/2021

Understanding Instance-based Interpretability of Variational Auto-Encoders

Zhifeng Kong, Kamalika Chaudhuri

Keywords Paper

deep learning, self-supervised learning, generative model, interpretability

0

0

0

0

15:39

03/05/2021

Usable Information and Evolution of Optimal Representations During Training

Michael Kleinman, Alessandro Achille, Daksh Idnani, Jonathan Kao

Keywords Paper

Representation Learning, SGD, Learning Dynamics, Usable Information, Initialization

0

0

0

0

5:23

18/07/2021

Uncovering the Connections Between Adversarial Transferability and Knowledge Transferability

Kaizhao Liang, Jacky Zhang, Boxin Wang and
Zhuolin Yang, Sanmi Koyejo, Bo Li

Keywords Paper

Algorithms, Adversarial Examples

0

0

0

0

5:17

19/04/2021

Keep learning: Self-supervised meta-learning for learning from inference

Akhil Kedia, Sai Chetan Chinthakindi

Keywords Paper

0

0

0

0

11:27