Going Beyond Linear Transformers with Recurrent Fast Weight Programmers

06/12/2021

Going Beyond Linear Transformers with Recurrent Fast Weight Programmers

Kazuki Irie, Imanol Schlag, Róbert Csordás, Jürgen Schmidhuber

Keywords: deep learning, reinforcement learning and planning, transformers

Abstract Paper Similar Papers

Abstract: Transformers with linearised attention (''linear Transformers'') have demonstrated the practical scalability and effectiveness of outer product-based Fast Weight Programmers (FWPs) from the '90s. However, the original FWP formulation is more general than the one of linear Transformers: a slow neural network (NN) continually reprograms the weights of a fast NN with arbitrary architecture. In existing linear Transformers, both NNs are feedforward and consist of a single layer. Here we explore new variations by adding recurrence to the slow and fast nets. We evaluate our novel recurrent FWPs (RFWPs) on two synthetic algorithmic tasks (code execution and sequential ListOps), Wikitext-103 language models, and on the Atari 2600 2D game environment. Our models exhibit properties of Transformers and RNNs. In the reinforcement learning setting, we report large improvements over LSTM in several Atari games. Our code is public.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at NeurIPS 2021 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

06/12/2021

TransGAN: Two Pure Transformers Can Make One Strong GAN, and That Can Scale Up

Yifan Jiang, Shiyu Chang, Zhangyang Wang

Keywords Paper

machine learning, transformers, vision, generative model

0

0

0

0

3:44

16/11/2020

On the Ability and Limitations of Transformers to Recognize Formal Languages

Satwik Bhattamishra, Kabir Ahuja, Navin Goyal

Keywords Paper

nlp tasks, construction, transformers, lstms

1

1

0

0

11:27

02/02/2021

Nyströmformer: A Nyström-based Algorithm for Approximating Self-Attention

Yunyang Xiong, Zhanpeng Zeng, Rudrasis Chakraborty and
Mingxing Tan, Glenn Fung, Yin Li, Vikas Singh

Keywords Paper

0

0

0

0

17:26

06/12/2021

Container: Context Aggregation Networks

peng gao, Jiasen Lu, hongsheng Li and
Roozbeh Mottaghi, Aniruddha Kembhavi

Keywords Paper

deep learning, self-supervised learning, transformers, vision, language

0

0

0

0

8:50

19/10/2020

Deep multifaceted transformers for multi-objective ranking in large-scale e-commerce recommender systems

Yulong Gu, Zhuoye Ding, Shuaiqiang Wang and
Lixin Zou, Yiding Liu, Dawei Yin

Keywords Paper

click-through rate prediction, conversation rate prediction, recommender systems, e-commerce, multi-task learning

0

0

0

0

10:34

03/05/2021

Random Feature Attention

Hao Peng, Nikolaos Pappas, Dani Yogatama and
Roy Schwartz, Noah Smith, Lingpeng Kong

Keywords Paper

machine translation, transformers, language modeling, Attention

0

0

0

0

10:20

06/12/2020

Discovering Reinforcement Learning Algorithms

Junhyuk Oh, Matteo Hessel, Wojciech Czarnecki and
Zhongwen Xu, Hado van Hasselt, Satinder Singh, David Silver

Keywords Paper

0

0

0

0

3:21

16/11/2020

Hardware as Policy: Mechanical and Computational Co-Optimization using Deep Reinforcement Learning

Tianjian Chen, Zhanpeng He, Matei Ciocarlie

Keywords Paper

0

0

0

0

4:51

04/07/2020

Improving Transformer Models by Reordering their Sublayers

Ofir Press, Noah A. Smith, Omer Levy

Keywords Paper

task-specific reorderings, Transformer Models, Multilayer networks, randomly transformers

1

1

0

0

12:29

06/12/2021

Augmented Shortcuts for Vision Transformers

Yehui Tang, Kai Han, Chang Xu and
An Xiao, Yiping Deng, Chao Xu, Yunhe Wang

Keywords Paper

transformers, vision

0

0

0

0

7:28

06/12/2021

ViTAE: Vision Transformer Advanced by Exploring Intrinsic Inductive Bias

Yufei Xu, Qiming ZHANG, Jing Zhang, Dacheng Tao

Keywords Paper

machine learning, transformers, vision

0

0

0

0

10:16

26/04/2020

Are Transformers universal approximators of sequence-to-sequence functions?

Chulhee Yun, Srinadh Bhojanapalli, Ankit Singh Rawat and
Sashank Reddi, Sanjiv Kumar

Keywords Paper

Transformer, universal approximation, contextual mapping, expressive power, permutation equivariance

1

1

0

0

4:55

16/11/2020

Attention is Not Only a Weight: Analyzing Transformers with Vector Norms

Goro Kobayashi, Tatsuki Kuribayashi, Sho Yokoi, Kentaro Inui

Keywords Paper

natural processing, norm-based analyses, word alignment, transformers

0

0

0

0

11:51

12/07/2020

Stabilizing Transformers for Reinforcement Learning

Emilio Parisotto, Francis Song, Jack Rae and
Razvan Pascanu, Caglar Gulcehre, Siddhant Jayakumar, Max Jaderberg, Raphael Lopez Kaufman, Aidan Clark, Seb Noury, Matthew Botvinick, Nicolas Heess, Raia Hadsell

Keywords Paper

Reinforcement Learning - Deep RL

0

0

0

0

14:20

06/12/2021

Stable, Fast and Accurate: Kernelized Attention with Relative Positional Encoding

Shengjie Luo, Shanda Li, Tianle Cai and
Di He, Dinglan Peng, Shuxin Zheng, Guolin Ke, Liwei Wang, Tie-Yan Liu

Keywords Paper

optimization, machine learning, transformers, vision

0

0

0

0

10:07

18/07/2021

Linear Transformers Are Secretly Fast Weight Programmers

Imanol Schlag, Kazuki Irie, Jürgen Schmidhuber

Keywords Paper

Deep Learning

0

0

0

0

5:18

18/07/2021

Generative Video Transformer: Can Objects be the Words?

Yi-Fu Wu, Jaesik Yoon, Sungjin Ahn

Keywords Paper

Deep Learning, Generative Models

0

0

0

0

5:15

04/07/2020

HAT: Hardware-Aware Transformers for Efficient Natural Language Processing

Hanrui Wang, Zhanghao Wu, Zhijian Liu and
Han Cai, Ligeng Zhu, Chuang Gan, Song Han

Keywords Paper

Natural Processing, Natural tasks, low-latency inference, machine tasks

0

0

0

0

11:26

06/12/2021

Neural Circuit Synthesis from Specification Patterns

Frederik Schmitt, Christopher Hahn, Markus N Rabe, Bernd Finkbeiner

Keywords Paper

machine learning, transformers, generative model

0

0

0

0

14:12

06/12/2021

XCiT: Cross-Covariance Image Transformers

Alaaeldin Ali, Hugo Touvron, Mathilde Caron and
Piotr Bojanowski, Matthijs Douze, Armand Joulin, Ivan Laptev, Natalia Neverova, Gabriel Synnaeve, Jakob Verbeek, Herve Jegou

Keywords Paper

deep learning, machine learning, transformers, vision, language

0

0

0

0

13:15

06/12/2021

Pay Attention to MLPs

Hanxiao Liu, Zihang Dai, David So, Quoc V Le

Keywords Paper

deep learning, transformers

0

0

0

0

1:43

06/12/2021

Kernel Identification Through Transformers

Fergus Simpson, Ian Davies, Vidhi Lalchand and
Alessandro Vullo, Nicolas Durrande, Carl Edward Rasmussen

Keywords Paper

deep learning, transformers, kernel methods

0

0

0

0

12:05

18/07/2021

Evolving Attention with Residual Convolutions

Yujing Wang, Yaming Yang, Jiangang Bai and
Mingliang Zhang, Jing Bai, JING YU, Ce Zhang, Gao Huang, Yunhai Tong

Keywords Paper

Deep Learning, Architectures

0

0

0

0

4:36

26/08/2020

Accelerating Gradient Boosting Machines

Haihao Lu, Sai Praneeth Karimireddy, Natalia Ponomareva, Vahab Mirrokni

Keywords Paper

0

0

0

0

14:56

16/11/2020

Towards General and Autonomous Learning of Core Skills: A Case Study in Locomotion

Roland Hafner, Tim Hertweck, Philipp Kloeppner and
Michael Bloesch, Michael Neunert, Markus Wulfmeier, Saran Tunyasuvunakool, Nicolas Heess, Martin Riedmiller

Keywords Paper

0

0

0

0

5:24

02/02/2021

*-CFQ: Analyzing the Scalability of Machine Learning on a Compositional Task

Dmitry Tsarkov, Tibor Tihon, Nathan Scales and
Nikola Momchev, Danila Sinopalnikov, Nathanael Schärli

Keywords Paper

0

0

0

0

16:33

18/07/2021

OmniNet: Omnidirectional Representations from Transformers

Yi Tay, Mostafa Dehghani, Vamsi Aribandi and
Jai Gupta, Philip Pham, Zhen Qin, Dara Bahri, Da-Cheng Juan, Don Metzler

Keywords Paper

Deep Learning, Predictive Models, Algorithms, Representation Learning; Neuroscience and Cognitive Science; Neuroscience and Cognitive Science, Problem Solvin, Deep Learning, Architectures

0

0

0

0

17:00

18/07/2021

Thinking Like Transformers

Gail Weiss, Yoav Goldberg, Eran Yahav

Keywords Paper

Deep Learning, Others

0

0

0

0

5:15

06/12/2021

Transformers Generalize DeepSets and Can be Extended to Graphs & Hypergraphs

Jinwoo Kim, Saeyoon Oh, Seunghoon Hong

Keywords Paper

deep learning, transformers, graph learning

0

0

0

0

15:02

06/12/2020

Big Bird: Transformers for Longer Sequences

Manzil Zaheer, Guru Guruganesh, Kumar Avinava Dubey and
Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed

Keywords Paper

0

0

0

0

3:17

16/11/2020

PRover: Proof Generation for Interpretable Reasoning over Rules

Swarnadeep Saha, Sayan Ghosh, Shashank Srivastava, Mohit Bansal

Keywords Paper

inference, qa generation, generalization, qa task

0

0

0

0

11:30

06/12/2021

Learning to Execute: Efficient Learning of Universal Plan-Conditioned Policies in Robotics

Ingmar Schubert, Danny Driess, Ozgur S. Oguz, Marc Toussaint

Keywords Paper

reinforcement learning and planning

0

0

0

0

8:36

03/05/2021

Parameter Efficient Multimodal Transformers for Video Representation Learning

Sangho Lee, Youngjae Yu, Gunhee Kim and
Thomas Breuel, Jan Kautz, Yale Song

Keywords Paper

Self-supervised learning, audio-visual representation learning, video representation learning

0

0

0

0

5:02

06/12/2021

Space-time Mixing Attention for Video Transformer

Adrian Bulat, Juan Manuel Perez Rua, Swathikiran Sudhakaran and
Brais Martinez, Georgios Tzimiropoulos

Keywords Paper

transformers

0

0

0

0

10:25

06/12/2021

Continual Learning via Local Module Composition

Oleksiy Ostapenko, Pau Rodriguez, Massimo Caccia, Laurent Charlin

Keywords Paper

continual learning, transfer learning

1

0

0

1

14:32

16/11/2020

Chaining Behaviors from Data with Model-Free Reinforcement Learning

Avi Singh, Albert Yu, Jonathan Yang and
Jesse Zhang, Aviral Kumar, Sergey Levine

Keywords Paper

0

0

0

0

5:01

19/04/2021

Multi-split reversible transformers can enhance neural machine translation

Yuekai Zhao, Shuchang Zhou, Zhihua Zhang

Keywords Paper

0

0

0

0

12:00

06/12/2021

Combiner: Full Attention Transformer with Sparse Computation Cost

Hongyu Ren, Hanjun Dai, Zihang Dai and
Mengjiao Yang, Jure Leskovec, Dale Schuurmans, Bo Dai

Keywords Paper

transformers

0

0

0

0

14:31

23/06/2021

AKG: Automatic Kernel Generation for Neural Processing Units using Polyhedral Transformations

Jie Zhao, Bojie Li, Wang Nie and
Zhen Geng, Renwei Zhang, Xiong Gao, Bin Cheng, Chen Wu, Yun Cheng, Zheng Li, Peng Di, Kun Zhang, Xuefeng Jin

Keywords Paper

neural networks, neural processing units, polyhedral model, code generation, auto-tuning

0

0

0

0

21:49

06/12/2021

Transformer in Transformer

Kai Han, An Xiao, Enhua Wu and
Jianyuan Guo, Chunjing XU, Yunhe Wang

Keywords Paper

transformers, vision

0

0

0

0

11:24