On the Curse of Memory in Recurrent Neural Networks: Approximation and Optimization Analysis

03/05/2021

On the Curse of Memory in Recurrent Neural Networks: Approximation and Optimization Analysis

Zhong Li, Jiequn Han, Weinan E, Qianxiao Li

Keywords: universal approximation, optimization, curse of memory, recurrent neural network, dynamical system

Abstract Paper Similar Papers

Abstract: We study the approximation properties and optimization dynamics of recurrent neural networks (RNNs) when applied to learn input-output relationships in temporal data. We consider the simple but representative setting of using continuous-time linear RNNs to learn from data generated by linear relationships. Mathematically, the latter can be understood as a sequence of linear functionals. We prove a universal approximation theorem of such linear functionals and characterize the approximation rate. Moreover, we perform a fine-grained dynamical analysis of training linear RNNs by gradient methods. A unifying theme uncovered is the non-trivial effect of memory, a notion that can be made precise in our framework, on both approximation and optimization: when there is long-term memory in the target, it takes a large number of neurons to approximate it. Moreover, the training process will suffer from slow downs. In particular, both of these effects become exponentially more pronounced with increasing memory - a phenomenon we call the “curse of memory”. These analyses represent a basic step towards a concrete mathematical understanding of new phenomenons that may arise in learning temporal relationships using recurrent architectures.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at ICLR 2021 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

03/05/2021

MALI: A memory efficient and reverse accurate integrator for Neural ODEs

Juntang Zhuang, Nicha C Dvornek, sekhar tatikonda, James s Duncan

Keywords Paper

neural ode, memory efficient, gradient estimation, reverse accuracy

0

0

0

0

5:12

06/12/2021

Second-Order Neural ODE Optimizer

Guan-Horng Liu, Tianrong Chen, Evangelos Theodorou

Keywords Paper

deep learning, optimization, machine learning, vision

0

0

0

0

14:59

26/04/2020

Training Recurrent Neural Networks Online by Learning Explicit State Variables

Somjit Nath, Vincent Liu, Alan Chan and
Xin Li, Adam White, Martha White

Keywords Paper

Recurrent Neural Network, Partial Observability, Online Prediction, Incremental Learning

0

0

0

0

5:06

06/12/2020

Untangling tradeoffs between recurrence and self-attention in artificial neural networks

Giancarlo Kerg, bhargav104 Kanuparthi, Anirudh Goyal ALIAS PARTH GOYAL and
Kyle Goyette, Yoshua Bengio, Guillaume Lajoie

Keywords Paper

0

0

0

0

3:20

03/05/2021

Continual learning in recurrent neural networks

Benjamin Ehret, Christian Henning, Maria Cervera and
Alexander Meulemans, Johannes von Oswald, Benjamin F Grewe

Keywords Paper

Continual Learning, Recurrent Neural Networks

0

0

0

0

5:16

13/04/2021

On the memory mechanism of tensor-power recurrent models

Hejia Qiu, Chao Li, Ying Weng and
Zhun Sun, Xingyu He, Qibin Zhao

Keywords Paper

0

0

0

0

3:04

03/05/2021

MONGOOSE: A Learnable LSH Framework for Efficient Neural Network Training

Beidi Chen, Zichang Liu, Binghui Peng and
Zhaozhuo Xu, Jonathan L Li, Tri Dao, Zhao Song, Anshumali Shrivastava, Christopher Re

Keywords Paper

Randomized Algorithms, Efficient Training, Large-scale Machine Learning, Large-scale Deep Learning

0

0

0

0

15:07

26/04/2020

Continual learning with hypernetworks

Johannes von Oswald, Christian Henning, João Sacramento, Benjamin F. Grewe

Keywords Paper

Continual Learning, Catastrophic Forgetting, Meta Model, Hypernetwork

0

0

0

0

5:04

06/12/2020

HiPPO: Recurrent Memory with Optimal Polynomial Projections

Albert Gu, Tri Dao, Stefano Ermon and
Atri Rudra, Chris Ré

Keywords Paper

0

0

0

0

3:22

03/05/2021

Meta-Learning with Neural Tangent Kernels

Yufan Zhou, Zhenyi Wang, Jiayi Xian and
Changyou Chen, Jinhui Xu

Keywords Paper

neural tangent kernel, meta-learning

0

0

0

0

3:54

12/07/2020

Associative Memory in Iterated Overparameterized Sigmoid Autoencoders

Yibo Jiang, Cengiz Pehlevan

Keywords Paper

Deep Learning - Generative Models and Autoencoders

0

0

0

0

13:37

06/12/2021

Efficient Training of Retrieval Models using Negative Cache

Erik Lindgren, Sashank Reddi, Ruiqi Guo, Sanjiv Kumar

Keywords Paper

deep learning, machine learning

0

0

0

0

10:41

04/08/2021

Bounded Memory Active Learning through Enriched Queries

Max Hopkins, Daniel Kane, Shachar Lovett, Michal Moshkovitz

Keywords Paper

1

1

0

0

18:26

03/05/2021

Gradient Projection Memory for Continual Learning

Gobinda Saha, Isha Garg, Kaushik Roy

Keywords Paper

Continual Learning, Representation Learning, Computer Vision, Deep learning

0

0

0

0

17:12

02/02/2021

GLISTER: Generalization based Data Subset Selection for Efficient and Robust Learning

Krishnateja Killamsetty, Durga Sivasubramanian, Ganesh Ramakrishnan, Rishabh Iyer

Keywords Paper

0

0

0

0

19:14

02/02/2021

Using Hindsight to Anchor Past Knowledge in Continual Learning

Arslan Chaudhry, Albert Gordo, Puneet Dokania and
Philip Torr, David Lopez-Paz

Keywords Paper

0

0

0

0

14:31

06/12/2020

Temporal Spike Sequence Learning via Backpropagation for Deep Spiking Neural Networks

Wenrui Zhang, Peng Li

Keywords Paper

0

0

0

0

3:06

06/12/2021

Leveraging Recursive Gumbel-Max Trick for Approximate Inference in Combinatorial Spaces

Kirill Struminsky, Artyom Gadetsky, Denis Rakitin and
Danil Karpushkin, Dmitry Vetrov

Keywords Paper

deep learning, optimization

0

0

0

0

14:26

18/07/2021

Momentum Residual Neural Networks

Michael Sander, Pierre Ablin, Mathieu Blondel, Gabriel Peyré

Keywords Paper

Deep Learning

0

0

0

0

5:07

06/12/2020

Relative gradient optimization of the Jacobian term in unsupervised deep learning

Luigi Gresele, Giancarlo Fissore, Adrián Javaloy and
Bernhard Schölkopf, Aapo Hyvarinen

Keywords Paper

0

0

0

0

3:15

03/05/2021

Dataset Meta-Learning from Kernel Ridge-Regression

Timothy Nguyen, Zhourong Chen, Jaehoon Lee

Keywords Paper

dataset corruption, infinite-width networks, neural kernels, kernel-ridge regression, dataset compression, dataset distillation, meta-learning

0

0

0

0

4:59

06/12/2021

Meta-Learning Sparse Implicit Neural Representations

Jaeho Lee, Jihoon Tack, Namhoon Lee, Jinwoo Shin

Keywords Paper

deep learning, optimization, meta learning, representation learning

0

0

0

0

8:41

12/07/2020

Extrapolation for Large-batch Training in Deep Learning

Tao LIN, Lingjing Kong, Sebastian Stich, Martin Jaggi

Keywords Paper

Deep Learning - Algorithms

0

0

0

0

13:21

06/12/2020

Interior Point Solving for LP-based prediction+optimisation

Jayanta Mandi, Tias Guns

Keywords Paper

0

0

0

1

3:28

18/07/2021

Guarantees for Tuning the Step Size using a Learning-to-Learn Approach

Xiang Wang, Shuai Yuan, Chenwei Wu, Rong Ge

Keywords Paper

Theory, Computational Learning Theory

0

0

0

0

5:20

03/05/2021

A Temporal Kernel Approach for Deep Learning with Continuous-time Information

Da Xu, Chuanwei Ruan, evren korpeoglu and
Sushant Kumar, kannan achan

Keywords Paper

Reparameterization, Random Feature, Spectral Distribution, Continuous-time System, Kernel Learning, Learning Theory

0

0

0

0

4:20

06/12/2021

Training Feedback Spiking Neural Networks by Implicit Differentiation on the Equilibrium State

Mingqing Xiao, Qingyan Meng, Zongpeng Zhang and
Yisen Wang, Zhouchen Lin

Keywords Paper

deep learning

0

0

0

0

12:22

06/12/2021

Symplectic Adjoint Method for Exact Gradient of Neural ODE with Minimal Memory

Takashi Matsubara, Yuto Miyatake, Takaharu Yaguchi

Keywords Paper

deep learning, graph learning

0

0

0

0

13:13

06/12/2020

Efficient Learning of Generative Models via Finite-Difference Score Matching

Tianyu Pang, Kun Xu, Chongxuan LI and
Yang Song, Stefano Ermon, Jun Zhu

Keywords Paper

0

0

0

0

2:59

26/04/2020

Why Gradient Clipping Accelerates Training: A Theoretical Justification for Adaptivity

Jingzhao Zhang, Tianxing He, Suvrit Sra, Ali Jadbabaie

Keywords Paper

Adaptive methods, optimization, deep learning

1

0

0

0

14:15

20/07/2020

A type of generalization error induced by initialization in deep neural networks

Yaoyu Zhang, Zhi-Qin John Xu, Tao Luo, Zheng Ma

Keywords Paper

0

0

0

0

17:33

06/12/2021

Fractal Structure and Generalization Properties of Stochastic Optimization Algorithms

Alexander Camuto, George Deligiannidis, Murat Erdogdu and
Mert Gurbuzbalaban, Umut Simsekli, Lingjiong Zhu

Keywords Paper

theory, deep learning, optimization

0

0

0

0

14:36

22/11/2021

DISCO: accurate Discrete Scale Convolutions

Ivan Sosnovik, Artem Moskalev, Arnold W.M. Smeulders

Keywords Paper

equivariance, symmetry, invariance, scale, convolutions, dilation, tracking, image classification

0

0

0

0

8:38

06/12/2020

Bayesian Optimization for Iterative Learning

Vu Nguyen, Sebastian Schulze, Michael A Osborne

Keywords Paper

0

0

0

0

3:19

18/07/2021

Improved Regret Bound and Experience Replay in Regularized Policy Iteration

Nevena Lazic, Dong Yin, Yasin Abbasi-Yadkori, Csaba Szepesvari

Keywords Paper

Theory, RL, Decisions and Control Theory

0

0

0

0

20:03

26/04/2020

At Stability's Edge: How to Adjust Hyperparameters to Preserve Minima Selection in Asynchronous Training of Neural Networks?

Niv Giladi, Mor Shpigel Nacson, Elad Hoffer, Daniel Soudry

Keywords Paper

implicit bias, stability, neural networks, generalization gap, asynchronous SGD

0

0

0

0

5:03

12/07/2020

Training Neural Networks for and by Interpolation

Leonard Berrada, M. Pawan Kumar, Andrew Zisserman

Keywords Paper

Deep Learning - General

0

0

0

0

16:12

02/02/2021

Fine-grained Generalization Analysis of Vector-Valued Learning

Liang Wu, Antoine Ledent, Yunwen Lei, Marius Kloft

Keywords Paper

0

0

0

0

13:54

06/12/2021

Robust Implicit Networks via Non-Euclidean Contractions

Saber Jafarpour, Alexander Davydov, Anton Proskurnikov, Francesco Bullo

Keywords Paper

theory, deep learning, machine learning, robustness, vision

0

0

0

0

14:59

13/04/2021

Faster & more reliable tuning of neural networks: Bayesian optimization with importance sampling

Setareh Ariafar, Zelda Mariet, Dana Brooks and
Jennifer Dy, Jasper Snoek

Keywords Paper

0

0

0

0

3:01