MomentumRNN: Integrating Momentum into Recurrent Neural Networks

06/12/2020

MomentumRNN: Integrating Momentum into Recurrent Neural Networks

Tan Nguyen, Richard Baraniuk, Andrea Bertozzi, Stanley Osher, Bao Wang

Keywords:

Abstract Paper Similar Papers

Abstract: Designing deep neural networks is an art that often involves an expensive search over candidate architectures. To overcome this for recurrent neural nets (RNNs), we establish a connection between the hidden state dynamics in an RNN and gradient descent (GD). We then integrate momentum into this framework and propose a new family of RNNs, called {\em MomentumRNNs}. We theoretically prove and numerically demonstrate that MomentumRNNs alleviate the vanishing gradient issue in training RNNs. We study the momentum long-short term memory (MomentumLSTM) and verify its advantages in convergence speed and accuracy over its LSTM counterpart across a variety of benchmarks. We also demonstrate that MomentumRNN is applicable to many types of recurrent cells, including those in the state-of-the-art orthogonal RNNs. Finally, we show that other advanced momentum-based optimization methods, such as Adam and Nesterov accelerated gradients with a restart, can be easily incorporated into the MomentumRNN framework for designing new recurrent cells with even better performance.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at NeurIPS 2020 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

05/04/2021

Pipelined Backpropagation at Scale: Training Large Models without Batches

Atli Kosson, Vitaliy Chiley, Abhi Venigalla and
Joel Hestness, Urs Koster

Keywords Paper

0

0

0

0

4:14

05/04/2021

Pipelined Backpropagation at Scale: Training Large Models without Batches

Atli Kosson, Vitaliy Chiley, Abhi Venigalla and
Joel Hestness, Urs Koster

Keywords Paper

0

0

0

0

18:00

06/12/2021

Does Preprocessing Help Training Over-parameterized Neural Networks?

Zhao Song, Shuo Yang, Ruizhe Zhang

Keywords Paper

deep learning, optimization, machine learning

0

0

0

0

14:49

02/02/2021

Large Batch Optimization for Deep Learning Using New Complete Layer-Wise Adaptive Rate Scaling

Zhouyuan Huo, Bin Gu, Heng Huang

Keywords Paper

0

0

0

0

15:17

26/04/2020

Escaping Saddle Points Faster with Stochastic Momentum

Jun-Kun Wang, Chi-Heng Lin, Jacob Abernethy

Keywords Paper

SGD, momentum, escaping saddle point

0

0

0

0

5:26

06/12/2021

Reverse engineering recurrent neural networks with Jacobian switching linear dynamical systems

Jimmy Smith, Scott Linderman, David Sussillo

Keywords Paper

deep learning, optimization, machine learning, neuroscience, interpretability

0

0

0

0

5:12

26/04/2020

Training Recurrent Neural Networks Online by Learning Explicit State Variables

Somjit Nath, Vincent Liu, Alan Chan and
Xin Li, Adam White, Martha White

Keywords Paper

Recurrent Neural Network, Partial Observability, Online Prediction, Incremental Learning

0

0

0

0

5:06

03/05/2021

The Recurrent Neural Tangent Kernel

Sina Alemohammad, Jack Wang, Randall Balestriero, Richard Baraniuk

Keywords Paper

Gaussian Process, Recurrent Neural Network, Neural Tangent Kernel, Overparameterization

0

0

0

0

4:44

04/11/2020

Ansor: Generating High-Performance Tensor Programs for Deep Learning

Lianmin Zheng, Chengfan Jia, Minmin Sun and
Zhao Wu, Cody Hao Yu, Ameer Haj-Ali, Yida Wang, Jun Yang, Danyang Zhuo, Koushik Sen, Joseph E. Gonzalez, Ion Stoica

Keywords Paper

0

0

0

0

20:10

03/05/2021

How Much Over-parameterization Is Sufficient to Learn Deep ReLU Networks?

Zixiang Chen, Yuan Cao, Difan Zou, Quanquan Gu

Keywords Paper

classification, neural tangent kernel, generalization error, (stochastic) gradient descent, deep ReLU networks

0

0

0

0

4:44

26/04/2020

Quantum Algorithms for Deep Convolutional Neural Networks

Iordanis Kerenidis, Jonas Landman, Anupam Prakash

Keywords Paper

quantum computing, quantum machine learning, convolutional neural network, theory, algorithm

0

0

0

0

7:09

14/06/2020

When NAS Meets Robustness: In Search of Robust Architectures Against Adversarial Attacks

Minghao Guo, Yuzhe Yang, Rui Xu and
Ziwei Liu, Dahua Lin

Keywords Paper

adversarial robustness, neural architecture search, adversarial examples, deep learning architectures, adversarial attacks

0

0

0

0

1:01

18/07/2021

UnICORNN: A recurrent model for learning very long time dependencies

T. Konstantin Rusch, Siddhartha Mishra

Keywords Paper

Deep Learning, Architectures

0

0

0

0

6:17

26/04/2020

Differentiation of Blackbox Combinatorial Solvers

Marin Vlastelica Pogančić, Anselm Paulus, Vit Musil and
Georg Martius, Michal Rolinek

Keywords Paper

combinatorial algorithms, deep learning, representation learning, optimization

0

0

0

0

4:50

18/07/2021

Sparsifying Networks via Subdifferential Inclusion

Sagar Verma, Jean-Christophe Pesquet

Keywords Paper

Optimization, Convex Optimization

0

0

0

0

5:10

26/04/2020

Beyond Linearization: On Quadratic and Higher-Order Approximation of Wide Neural Networks

Yu Bai, Jason D. Lee

Keywords Paper

Neural Tangent Kernels, over-parametrized neural networks, deep learning theory

0

0

0

0

5:25

06/12/2021

Learning Transferable Adversarial Perturbations

Krishna kanth Nakka, Mathieu Salzmann

Keywords Paper

deep learning, optimization, adversarial robustness and security

0

0

0

0

12:00

26/04/2020

RNNs Incrementally Evolving on an Equilibrium Manifold: A Panacea for Vanishing and Exploding Gradients?

Anil Kag, Ziming Zhang, Venkatesh Saligrama

Keywords Paper

novel recurrent neural architectures, learning representations of outputs or states

0

0

0

0

5:03

26/04/2020

Stochastic AUC Maximization with Deep Neural Networks

Mingrui Liu, Zhuoning Yuan, Yiming Ying, Tianbao Yang

Keywords Paper

Stochastic AUC Maximization, Deep Neural Networks

0

0

0

0

4:58

06/12/2020

Benchmarking Deep Inverse Models over time, and the Neural-Adjoint method

Ben Ren, Willie Padilla, Jordan Malof

Keywords Paper

0

0

0

0

3:17

03/05/2021

A Design Space Study for LISTA and Beyond

Tianjian Meng, Xiaohan Chen, Yifan Jiang, Zhangyang Wang

Keywords Paper

0

0

0

0

5:50

03/05/2021

Meta-Learning with Neural Tangent Kernels

Yufan Zhou, Zhenyi Wang, Jiayi Xian and
Changyou Chen, Jinhui Xu

Keywords Paper

neural tangent kernel, meta-learning

0

0

0

0

3:54

18/07/2021

Momentum Residual Neural Networks

Michael Sander, Pierre Ablin, Mathieu Blondel, Gabriel Peyré

Keywords Paper

Deep Learning

0

0

0

0

5:07

06/12/2021

Parametric Complexity Bounds for Approximating PDEs with Neural Networks

Tanya Marwah, Zachary Lipton, Andrej Risteski

Keywords Paper

theory, deep learning, optimization

0

0

0

0

12:32

26/04/2020

Accelerating SGD with momentum for over-parameterized learning

Chaoyue Liu, Mikhail Belkin

Keywords Paper

SGD, acceleration, momentum, stochastic, over-parameterized, Nesterov

0

0

0

0

4:50

06/12/2020

The phase diagram of approximation rates for deep neural networks

Dmitry Yarotsky, Anton Zhevnerchuk

Keywords Paper

0

0

0

0

3:07

05/01/2021

Improving Robustness and Uncertainty Modelling in Neural Ordinary Differential Equations

Srinivas Anumasa, P. K. Srijith

Keywords Paper

0

0

0

0

4:53

22/11/2021

FFNB: Forgetting-Free Neural Blocks for Deep Continual Learning

Hichem Sahbi, Haoming Zhan

Keywords Paper

Continual and incremental learning, lifelong learning, catastrophic interference, catastrophic forgetting, dynamic neural networks, visual recognition

0

0

0

0

3:05

02/02/2021

Provable Benefits of Overparameterization in Model Compression: From Double Descent to Pruning Neural Networks

Xiangyu Chang, Yingcong Li, Samet Oymak, Christos Thrampoulidis

Keywords Paper

0

0

0

0

18:14

12/07/2020

Non-convex Learning via Replica Exchange Stochastic Gradient MCMC

Wei Deng, Qi Feng, Liyao Gao and
Faming Liang, Guang Lin

Keywords Paper

Probabilistic Inference - Approximate, Monte Carlo, and Spectral Methods

0

0

0

0

15:01

06/12/2020

Understanding Approximate Fisher Information for Fast Convergence of Natural Gradient Descent in Wide Neural Networks

Ryo Karakida, Kazuki Osawa

Keywords Paper

0

0

0

0

3:19

03/05/2021

Reducing the Computational Cost of Deep Generative Models with Binary Neural Networks

Thomas Bird, Friso Kingma, David Barber

Keywords Paper

generative, binary, optimization, compression

0

0

0

0

5:14

06/12/2021

Adaptive Proximal Gradient Methods for Structured Neural Networks

Jihun Yun, Aurelie Lozano, Eunho Yang

Keywords Paper

deep learning, optimization, machine learning

0

0

0

0

10:46

14/06/2020

F-BRS: Rethinking Backpropagating Refinement for Interactive Segmentation

Konstantin Sofiiuk, Ilia Petrov, Olga Barinova, Anton Konushin

Keywords Paper

interactive segmentation, interactive, instance segmentation, segmentation, backpropagating refinement, refinement

0

0

0

0

4:56

06/12/2020

On Second Order Behaviour in Augmented Neural ODEs

Alex Norcliffe, Cristian Bodnar, Ben Day and
Nikola Simidjievski, Pietro Lió

Keywords Paper

0

0

0

0

3:19

06/12/2020

Improving Neural Network Training in Low Dimensional Random Bases

Frithjof Gressmann, Zach Eaton-Rosen, Carlo Luschi

Keywords Paper

0

0

0

0

3:01

06/12/2020

Non-Euclidean Universal Approximation

Anastasis Kratsios, Eugene Bilokopytov

Keywords Paper

0

0

0

0

3:34

12/07/2020

Evolving Machine Learning Algorithms From Scratch

Esteban Real, Chen Liang, David So, Quoc Le

Keywords Paper

Transfer, Multitask and Meta-learning

0

0

0

0

15:01

06/12/2020

Delta-STN: Efficient Bilevel Optimization for Neural Networks using Structured Response Jacobians

Juhan Bae, Roger Grosse

Keywords Paper

0

0

0

0

3:20

03/08/2020

An Interpretable and Sample Efficient Deep Kernel for Gaussian Process

Yijue Dai, Tianjian Zhang, Zhidi Lin and
Feng Yin, Sergios Theodoridis, Shuguang Cui

Keywords Paper

0

0

0

0

8:31