Linear Transformers Are Secretly Fast Weight Programmers

18/07/2021

Linear Transformers Are Secretly Fast Weight Programmers

Imanol Schlag, Kazuki Irie, Jürgen Schmidhuber

Keywords: Deep Learning

Abstract Paper Similar Papers

Abstract: We show the formal equivalence of linearised self-attention mechanisms and fast weight controllers from the early '90s, where a slow neural net learns by gradient descent to program the fast weights of another net through sequences of elementary programming instructions which are additive outer products of self-invented activation patterns (today called keys and values). Such Fast Weight Programmers (FWPs) learn to manipulate the contents of a finite memory and dynamically interact with it. We infer a memory capacity limitation of recent linearised softmax attention variants, and replace the purely additive outer products by a delta rule-like programming instruction, such that the FWP can more easily learn to correct the current mapping from keys to values. The FWP also learns to compute dynamically changing learning rates. We also propose a new kernel function to linearise attention which balances simplicity and effectiveness. We conduct experiments on synthetic retrieval problems as well as standard machine translation and language modelling tasks which demonstrate the benefits of our methods.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at ICML 2021 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

19/04/2021

Keep learning: Self-supervised meta-learning for learning from inference

Akhil Kedia, Sai Chetan Chinthakindi

Keywords Paper

0

0

0

0

11:27

02/02/2021

Fine-grained Generalization Analysis of Vector-Valued Learning

Liang Wu, Antoine Ledent, Yunwen Lei, Marius Kloft

Keywords Paper

0

0

0

0

13:54

06/12/2021

Leveraging Recursive Gumbel-Max Trick for Approximate Inference in Combinatorial Spaces

Kirill Struminsky, Artyom Gadetsky, Denis Rakitin and
Danil Karpushkin, Dmitry Vetrov

Keywords Paper

deep learning, optimization

0

0

0

0

14:26

06/12/2020

Learning Differentiable Programs with Admissible Neural Heuristics

Ameesh Shah, Eric Zhan, Jennifer Sun and
Abhinav Verma, Yisong Yue, Swarat Chaudhuri

Keywords Paper

Algorithms -> Missing Data; Algorithms -> Uncertainty Estimation; Probabilistic Methods -> Causal Inference; Probabilistic Meth, Probabilistic Methods -> Bayesian Nonparametrics

0

0

0

0

3:28

03/05/2021

Meta-Learning with Neural Tangent Kernels

Yufan Zhou, Zhenyi Wang, Jiayi Xian and
Changyou Chen, Jinhui Xu

Keywords Paper

neural tangent kernel, meta-learning

0

0

0

0

3:54

13/04/2021

A theoretical characterization of semi-supervised learning with self-training for gaussian mixture models

Samet Oymak, Talha Cihad Gulcu

Keywords Paper

1

1

0

0

2:59

03/05/2021

Few-Shot Bayesian Optimization with Deep Kernel Surrogates

Martin Wistuba, Josif Grabocka

Keywords Paper

automl, bayesian optimization, metalearning, few-shot learning

0

0

0

0

5:18

18/07/2021

Efficient Iterative Amortized Inference for Learning Symmetric and Disentangled Multi-Object Representations

Patrick Emami, Pan He, Sanjay Ranka, Anand Rangarajan

Keywords Paper

Deep Learning, Embedding and Representation learning

0

0

0

0

5:10

03/05/2021

Initialization and Regularization of Factorized Neural Layers

Misha Khodak, Neil Tenenholtz, Lester Mackey, Nicolo Fusi

Keywords Paper

matrix factorization, knowledge distillation, multi-head attention, model compression

0

0

0

0

4:25

18/07/2021

Provable Meta-Learning of Linear Representations

Nilesh Tripuraneni, Chi Jin, Michael Jordan

Keywords Paper

Theory, Statistical Learning Theory

0

0

0

0

5:09

12/07/2020

Training Neural Networks for and by Interpolation

Leonard Berrada, M. Pawan Kumar, Andrew Zisserman

Keywords Paper

Deep Learning - General

0

0

0

0

16:12

18/07/2021

Training Data Subset Selection for Regression with Controlled Generalization Error

Durga S, Rishabh Iyer, Ganesh Ramakrishnan, Abir De

Keywords Paper

, Algorithms, Online Learning, Algorithms, Supervised Learning

0

0

0

0

4:15

03/05/2021

Auxiliary Task Update Decomposition: The Good, the Bad and the Neutral

Lucio Dery, Yann Dauphin, David Grangier

Keywords Paper

multitask learning, deeplearning, pre-training, gradient decomposition

0

0

0

0

5:22

26/04/2020

Rapid Learning or Feature Reuse? Towards Understanding the Effectiveness of MAML

Aniruddh Raghu, Maithra Raghu, Samy Bengio, Oriol Vinyals

Keywords Paper

deep learning analysis, representation learning, meta-learning, few-shot learning

0

0

0

0

5:25

03/05/2021

Dataset Meta-Learning from Kernel Ridge-Regression

Timothy Nguyen, Zhourong Chen, Jaehoon Lee

Keywords Paper

dataset corruption, infinite-width networks, neural kernels, kernel-ridge regression, dataset compression, dataset distillation, meta-learning

0

0

0

0

4:59

18/07/2021

Sparsifying Networks via Subdifferential Inclusion

Sagar Verma, Jean-Christophe Pesquet

Keywords Paper

Optimization, Convex Optimization

0

0

0

0

5:10

06/12/2021

Differentiable Optimization of Generalized Nondecomposable Functions using Linear Programs

Zihang Meng, Lopamudra Mukherjee, Yichao Wu and
Vikas Singh, Sathya Narayanan Ravi

Keywords Paper

deep learning, optimization

0

0

0

0

13:21

23/08/2020

AutoFIS: Automatic feature interaction selection in factorization models for click-through rate prediction

Bin Liu, Chenxu Zhu, Guilin Li and
Weinan Zhang, Jincai Lai, Ruiming Tang, Xiuqiang He, Zhenguo Li, Yong Yu

Keywords Paper

feature selection, neural architecture search, recommendation, factorization machine

0

0

0

0

19:23

06/12/2020

Efficient Learning of Generative Models via Finite-Difference Score Matching

Tianyu Pang, Kun Xu, Chongxuan LI and
Yang Song, Stefano Ermon, Jun Zhu

Keywords Paper

0

0

0

0

2:59

06/12/2020

DisCor: Corrective Feedback in Reinforcement Learning via Distribution Correction

Aviral Kumar, Abhishek Gupta, Sergey Levine

Keywords Paper

0

0

0

0

3:25

06/12/2020

Sparse Spectrum Warped Input Measures for Nonstationary Kernel Learning

Anthony Tompkins, Rafael Oliveira, Fabio Ramos

Keywords Paper

0

0

0

0

3:20

06/12/2021

Efficient Training of Retrieval Models using Negative Cache

Erik Lindgren, Sashank Reddi, Ruiqi Guo, Sanjiv Kumar

Keywords Paper

deep learning, machine learning

0

0

0

0

10:41

03/05/2021

Generating Adversarial Computer Programs using Optimized Obfuscations

Shashank Srikant, Sijia Liu, Tamara Mitrovska and
Shiyu Chang, Quanfu Fan, Gaoyuan Zhang, Una-May O'Reilly

Keywords Paper

Models for code, Differentiable program generator, Combinatorial optimization, Program obfuscation, Adversarial computer programs, Machine Learning (ML) for Programming Languages (PL)/Software Engineering (SE)

0

0

0

0

6:27

15/06/2020

Learning nonlinear loop invariants with gated continuous logic networks

Jianan Yao, Gabriel Ryan, Justin Wong and
Suman Jana, Ronghui Gu

Keywords Paper

Loop Invariant Inference, Continuous Logic Networks, Program Verification

0

0

0

0

14:18

02/02/2021

Harmonized Dense Knowledge Distillation Training for Multi-Exit Architectures

Xinglu Wang, Yingming Li

Keywords Paper

0

0

0

0

15:12

06/12/2021

Second-Order Neural ODE Optimizer

Guan-Horng Liu, Tianrong Chen, Evangelos Theodorou

Keywords Paper

deep learning, optimization, machine learning, vision

0

0

0

0

14:59

06/12/2020

FLAMBE: Structural Complexity and Representation Learning of Low Rank MDPs

Alekh Agarwal, Sham Kakade, Akshay Krishnamurthy, Wen Sun

Keywords Paper

0

0

0

0

3:34

03/05/2021

Remembering for the Right Reasons: Explanations Reduce Catastrophic Forgetting

Sayna Ebrahimi, Suzanne Petryk, Akash Gokul and
William Gan, Joseph E Gonzalez, Marcus Rohrbach, trevor darrell

Keywords Paper

Explainability, Catastrophic Forgetting, Continual Learning, XAI, Lifelong Learning

0

0

0

0

5:13

03/05/2021

On the Curse of Memory in Recurrent Neural Networks: Approximation and Optimization Analysis

Zhong Li, Jiequn Han, Weinan E, Qianxiao Li

Keywords Paper

universal approximation, optimization, curse of memory, recurrent neural network, dynamical system

0

0

0

0

5:00

12/07/2020

Hierarchically Decoupled Morphological Transfer

Donald Hejna, Lerrel Pinto, Pieter Abbeel

Keywords Paper

Reinforcement Learning - Deep RL

0

0

0

0

15:14

06/12/2020

Bayesian Meta-Learning for the Few-Shot Setting via Deep Kernels

Massimiliano Patacchiola, Jack Turner, Elliot Crowley and
Michael O'Boyle, Amos Storkey

Keywords Paper

Deep Learning; Deep Learning -> CNN Architectures; Theory -> Spaces of Functions and Kernels, Theory

0

0

0

0

3:11

30/11/2020

Regularizing Meta-Learning via Gradient Dropout

Hung-Yu Tseng, Yi-Wen Chen, Yi-Hsuan Tsai and
Sifei Liu, Yen-Yu Lin, Ming-Hsuan Yang

Keywords Paper

0

0

0

0

3:21

12/07/2020

Responsive Safety in Reinforcement Learning

Adam Stooke, Joshua Achiam, Pieter Abbeel

Keywords Paper

Reinforcement Learning - Deep RL

0

0

0

0

13:36

18/07/2021

Opening the Blackbox: Accelerating Neural Differential Equations by Regularizing Internal Solver Heuristics

Avik Pal, Yingbo Ma, Viral Shah, Christopher Rackauckas

Keywords Paper

Deep Learning

0

0

0

0

5:11

06/12/2020

Learning Discrete Energy-based Models via Auxiliary-variable Local Exploration

Hanjun Dai, Rishabh Singh, Bo Dai and
Charles Sutton, Dale Schuurmans

Keywords Paper

0

0

0

0

3:23

06/12/2020

Memory-Efficient Learning of Stable Linear Dynamical Systems for Prediction and Control

Giorgos Mamakoukas, Orest Xherija, Todd Murphey

Keywords Paper

Optimization -> Non-Convex Optimization, Optimization -> Stochastic Optimization

0

0

0

0

3:13

12/07/2020

Operation-Aware Soft Channel Pruning using Differentiable Masks

Minsoo Kang, Bohyung Han

Keywords Paper

Applications - Computer Vision

0

0

0

0

14:56

13/04/2021

The sample complexity of meta sparse regression

Zhanyu Wang, Jean Honorio

Keywords Paper

0

0

0

0

2:57

06/12/2021

Replacing Rewards with Examples: Example-Based Policy Search via Recursive Classification

Ben Eysenbach, Sergey Levine, Russ Salakhutdinov

Keywords Paper

reinforcement learning and planning, machine learning

0

0

0

0

19:49

06/12/2021

Scalable Rule-Based Representation Learning for Interpretable Classification

Zhuo Wang, Wei Zhang, Ning Liu, Jianyong Wang

Keywords Paper

optimization, machine learning, representation learning, interpretability

0

0

0

0

14:52