Practical Real Time Recurrent Learning with a Sparse Approximation

03/05/2021

Practical Real Time Recurrent Learning with a Sparse Approximation

Jacob Menick, Erich Elsen, Utku Evci, Simon Osindero, Karen Simonyan, Alex Graves

Keywords: backpropagation, rtrl, real time recurrent learning, forward mode, biologically plausible, bptt, recurrent neural networks

Abstract Paper Similar Papers

Abstract: Recurrent neural networks are usually trained with backpropagation through time, which requires storing a complete history of network states, and prohibits updating the weights "online" (after every timestep). Real Time Recurrent Learning (RTRL) eliminates the need for history storage and allows for online weight updates, but does so at the expense of computational costs that are quartic in the state size. This renders RTRL training intractable for all but the smallest networks, even ones that are made highly sparse. We introduce the Sparse n-step Approximation (SnAp) to the RTRL influence matrix. SnAp only tracks the influence of a parameter on hidden units that are reached by the computation graph within $n$ timesteps of the recurrent core. SnAp with $n=1$ is no more expensive than backpropagation but allows training on arbitrarily long sequences. We find that it substantially outperforms other RTRL approximations with comparable costs such as Unbiased Online Recurrent Optimization. For highly sparse networks, SnAp with $n=2$ remains tractable and can outperform backpropagation through time in terms of learning speed when updates are done online.

1

1

1

0

Share

This is an embedded video. Talk and the respective paper are published at ICLR 2021 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

06/12/2021

Efficient Neural Network Training via Forward and Backward Propagation Sparsification

Xiao Zhou, Weizhong Zhang, Zonghao Chen and
SHIZHE DIAO, Tong Zhang

Keywords Paper

deep learning, optimization

0

0

0

0

7:48

18/07/2021

Neural Architecture Search without Training

Joe Mellor, Jack Turner, Amos Storkey, Elliot Crowley

Keywords Paper

Deep Learning, Architectures

0

0

0

1

20:37

18/07/2021

Training Adversarially Robust Sparse Networks via Bayesian Connectivity Sampling

Ozan Özdenizci, Robert Legenstein

Keywords Paper

Algorithms, Adversarial Examples

0

0

0

1

6:27

26/04/2020

Dynamic Model Pruning with Feedback

Tao Lin, Sebastian U. Stich, Luis Barba and
Daniil Dmitriev, Martin Jaggi

Keywords Paper

network pruning, dynamic reparameterization, model compression

0

0

0

0

4:30

06/12/2021

BulletTrain: Accelerating Robust Neural Network Training via Boundary Example Mining

Weizhe Hua, Yichi Zhang, Chuan Guo and
Zhiru Zhang, G. Edward Suh

Keywords Paper

deep learning, machine learning, robustness, adversarial robustness and security

0

0

0

0

6:36

12/07/2020

Network Pruning by Greedy Subnetwork Selection

Mao Ye, Chengyue Gong, Lizhen Nie and
Denny Zhou, Adam Klivans, Qiang Liu

Keywords Paper

Deep Learning - General

0

0

0

0

10:01

06/12/2021

Exponential Graph is Provably Efficient for Decentralized Deep Training

Bicheng Ying, Kun Yuan, Yiming Chen and
Hanbin Hu, PAN PAN, Wotao Yin

Keywords Paper

deep learning, optimization, graph learning

0

0

0

0

14:16

26/04/2020

MACER: Attack-free and Scalable Robust Training via Maximizing Certified Radius

Runtian Zhai, Chen Dan, Di He and
Huan Zhang, Boqing Gong, Pradeep Ravikumar, Cho-Jui Hsieh, Liwei Wang

Keywords Paper

Adversarial Robustness, Provable Adversarial Defense, Randomized Smoothing, Robustness Certification

0

0

0

0

5:10

12/07/2020

Boosting Deep Neural Network Efficiency with Dual-Module Inference

Liu Liu, Lei Deng, Zhaodong Chen and
yuke wang, Shuangchen Li, Jingwei Zhang, Yihua Yang, Zhenyu Gu, Yufei Ding, Yuan Xie

Keywords Paper

Deep Learning - General

0

0

0

0

8:04

06/12/2020

Greedy Optimization Provably Wins the Lottery: Logarithmic Number of Winning Tickets is Enough

Mao Ye, lemon woo, Qiang Liu

Keywords Paper

0

0

0

0

3:14

12/07/2020

Scalable Deep Generative Modeling for Sparse Graphs

Hanjun Dai, Azade Nazi, Yujia Li and
Bo Dai, Dale Schuurmans

Keywords Paper

Deep Learning - Generative Models and Autoencoders

0

0

0

0

12:19

18/07/2021

Selfish Sparse RNN Training

Shiwei Liu, Decebal Constantin Mocanu, Yulong Pei, Mykola Pechenizkiy

Keywords Paper

Deep Learning, Optimization for Deep Networks

0

0

0

1

4:58

03/05/2021

Learning N:M Fine-grained Structured Sparse Neural Networks From Scratch

Aojun Zhou, Yukun Ma, Junnan Zhu and
Jianbo Liu, Zhijie Zhang, Kun Yuan, Wenxiu Sun, Hongsheng Li

Keywords Paper

sparsity, efficient training and inference.

0

0

0

0

5:09

14/09/2020

Incremental Sensitivity Analysis for Kernelized Models

Hadar Sivan, Moshe Gabel, Assaf Schuster

Keywords Paper

0

0

0

0

14:54

05/04/2021

Value Learning for Throughput Optimization of Deep Learning Workloads

Benoit Steiner, Chris Cummins, Horace He, Hugh Leather

Keywords Paper

0

0

0

0

5:03

05/04/2021

Value Learning for Throughput Optimization of Deep Learning Workloads

Benoit Steiner, Chris Cummins, Horace He, Hugh Leather

Keywords Paper

0

0

0

0

21:54

12/07/2020

It's Not What Machines Can Learn, It's What We Cannot Teach

Gal Yehuda, Moshe Gabel, Assaf Schuster

Keywords Paper

Supervised Learning

0

0

0

0

10:41

06/12/2021

Delayed Gradient Averaging: Tolerate the Communication Latency for Federated Learning

Ligeng Zhu, Hongzhou Lin, Yao Lu and
Yujun Lin, Song Han

Keywords Paper

optimization, machine learning, federated learning

0

0

0

1

14:48

06/12/2020

The Surprising Simplicity of the Early-Time Learning Dynamics of Neural Networks

Wei Hu, Lechao Xiao, Ben Adlam, Jeffrey Pennington

Keywords Paper

0

0

0

0

3:20

05/01/2021

Dynamic Routing Networks

Shaofeng Cai, Yao Shu, Wei Wang

Keywords Paper

0

0

0

0

4:52

06/12/2021

Powerpropagation: A sparsity inducing weight reparameterisation

Jonathan Schwarz, Siddhant M Jayakumar, Razvan Pascanu and
Peter E Latham, Yee Teh

Keywords Paper

deep learning, optimization, continual learning

0

0

0

1

9:08

06/12/2020

Enabling certification of verification-agnostic networks via memory-efficient semidefinite programming

Sumanth Dathathri, Krishnamurthy Dvijotham, Alexey Kurakin and
Aditi Raghunathan, Jonathan Uesato, Rudy Bunel, Shreya Shankar, Jacob Steinhardt, Ian Goodfellow, Percy Liang, Pushmeet Kohli

Keywords Paper

0

0

0

0

3:23

06/12/2020

Pruning neural networks without any data by iteratively conserving synaptic flow

Hidenori Tanaka, Daniel Kunin, Daniel Yamins, Surya Ganguli

Keywords Paper

Deep Learning -> Optimization for Deep Networks; Optimization -> Non-Convex Optimization, Theory

1

0

0

0

3:19

14/06/2020

On the Acceleration of Deep Learning Model Parallelism With Staleness

An Xu, Zhouyuan Huo, Heng Huang

Keywords Paper

layer-wise staleness, asynchronous model parallelism, convolutional neural networks.

0

0

0

0

1:01

02/02/2021

Post-training Quantization with Multiple Points: Mixed Precision without Mixed Precision

Xingchao Liu, Mao Ye, Dengyong Zhou, Qiang Liu

Keywords Paper

0

0

0

0

15:18

06/12/2020

ISTA-NAS: Efficient and Consistent Neural Architecture Search by Sparse Coding

Yibo Yang, Hongyang Li, Shan You and
Fei Wang, Chen Qian, Zhouchen Lin

Keywords Paper

0

0

0

0

3:19

06/12/2021

EvoGrad: Efficient Gradient-Based Meta-Learning and Hyperparameter Optimization

Ondrej Bohdal, Yongxin Yang, Timothy Hospedales

Keywords Paper

deep learning, optimization, graph learning, meta learning, few shot learning

0

0

0

0

14:09

18/07/2021

Few-Shot Neural Architecture Search

Yiyang Zhao, Linnan Wang, Yuandong Tian and
Rodrigo Fonseca, Tian Guo

Keywords Paper

Algorithms, AutoML

0

0

0

0

16:43

26/04/2020

Training Recurrent Neural Networks Online by Learning Explicit State Variables

Somjit Nath, Vincent Liu, Alan Chan and
Xin Li, Adam White, Martha White

Keywords Paper

Recurrent Neural Network, Partial Observability, Online Prediction, Incremental Learning

0

0

0

0

5:06

06/12/2020

Lipschitz-Certifiable Training with a Tight Outer Bound

Sungyoon Lee, Jaewook Lee, Saerom Park

Keywords Paper

Algorithms -> Adversarial Learning; Algorithms -> Classification; Deep Learning -> Adversarial Networks, Applications -> Computer Vision

0

0

0

0

3:15

06/12/2020

Top-KAST: Top-K Always Sparse Training

Sid Jayakumar, Razvan Pascanu, Jack Rae and
Simon Osindero, Erich Elsen

Keywords Paper

0

0

0

0

3:18

06/12/2020

Temporal Spike Sequence Learning via Backpropagation for Deep Spiking Neural Networks

Wenrui Zhang, Peng Li

Keywords Paper

0

0

0

0

3:06

06/12/2021

BatchQuant: Quantized-for-all Architecture Search with Robust Quantizer

Haoping Bai, Meng Cao, Ping Huang, Jiulong Shan

Keywords Paper

deep learning, optimization

0

0

0

0

4:12

06/12/2020

MetaPerturb: Transferable Regularizer for Heterogeneous Tasks and Architectures

Jeong Un Ryu, JWoong Shin, Hae Beom Lee, Sung Ju Hwang

Keywords Paper

0

0

0

0

3:32

14/06/2020

Meta-Transfer Learning for Zero-Shot Super-Resolution

Jae Woong Soh, Sunwoo Cho, Nam Ik Cho

Keywords Paper

zero-shot super-resolution, meta learning, transfer learning

0

0

0

0

0:59

06/12/2021

AC/DC: Alternating Compressed/DeCompressed Training of Deep Neural Networks

Alexandra Peste, Eugenia Iofinova, Adrian Vladu, Dan Alistarh

Keywords Paper

deep learning

0

0

0

0

14:01

26/04/2020

Continual learning with hypernetworks

Johannes von Oswald, Christian Henning, João Sacramento, Benjamin F. Grewe

Keywords Paper

Continual Learning, Catastrophic Forgetting, Meta Model, Hypernetwork

0

0

0

0

5:04

18/07/2021

On the Explicit Role of Initialization on the Convergence and Implicit Bias of Overparametrized Linear Networks

Hancheng Min, Salma Tarmoun, Rene Vidal, Enrique Mallada

Keywords Paper

Theory

0

0

0

0

5:16

06/12/2021

Symplectic Adjoint Method for Exact Gradient of Neural ODE with Minimal Memory

Takashi Matsubara, Yuto Miyatake, Takaharu Yaguchi

Keywords Paper

deep learning, graph learning

0

0

0

0

13:13

14/06/2020

L2-GCN: Layer-Wise and Learned Efficient Training of Graph Convolutional Networks

Yuning You, Tianlong Chen, Zhangyang Wang, Yang Shen

Keywords Paper

graph convolutional network, efficient training, mini-batch training

0

0

0

0

1:00