Momentum Residual Neural Networks

Abstract: The training of deep residual neural networks (ResNets) with backpropagation has a memory cost that increases linearly with respect to the depth of the network. A simple way to circumvent this issue is to use reversible architectures. In this paper, we propose to change the forward rule of a ResNet by adding a momentum term. The resulting networks, momentum residual neural networks (MomentumNets), are invertible. Unlike previous invertible architectures, they can be used as a drop-in replacement for any existing ResNet block. We show that MomentumNets can be interpreted in the infinitesimal step size regime as second-order ordinary differential equations (ODEs) and exactly characterize how adding momentum progressively increases the representation capabilities of MomentumNets: they can learn any linear mapping up to a multiplicative factor, while ResNets cannot. In a learning to optimize setting, where convergence to a fixed point is required, we show theoretically and empirically that our method succeeds while existing invertible architectures fail. We show on CIFAR and ImageNet that MomentumNets have the same accuracy as ResNets, while having a much smaller memory footprint, and show that pre-trained MomentumNets are promising for fine-tuning models.

06/12/2021

Momentum Residual Neural Networks

Michael Sander, Pierre Ablin, Mathieu Blondel, Gabriel Peyré

Comments

Similar Papers

Second-Order Neural ODE Optimizer

Guan-Horng Liu, Tianrong Chen, Evangelos Theodorou

Keywords Abstract Paper

deep learning, optimization, machine learning, vision

MALI: A memory efficient and reverse accurate integrator for Neural ODEs

Juntang Zhuang, Nicha C Dvornek, sekhar tatikonda, James s Duncan

Keywords Abstract Paper

neural ode, memory efficient, gradient estimation, reverse accuracy

Training Recurrent Neural Networks Online by Learning Explicit State Variables

Somjit Nath, Vincent Liu, Alan Chan and Xin Li, Adam White, Martha White

Keywords Abstract Paper

Recurrent Neural Network, Partial Observability, Online Prediction, Incremental Learning

On the Role of Optimization in Double Descent: A Least Squares Study

Ilja Kuzborskij, Csaba Szepesvari, Omar Rivasplata and Amal Rannen-Triki, Razvan Pascanu

Keywords Abstract Paper

theory, deep learning, optimization

Exploiting the Redundancy in Convolutional Filters for Parameter Reduction

Kumara Kahatapitiya, Ranga Rodrigo

Keywords Abstract Paper

TRQ: Ternary Neural Networks With Residual Quantization

Yue Li, Wenrui Ding, Chunlei Liu and Baochang Zhang, Guodong Guo

Keywords Abstract Paper

On the Curse of Memory in Recurrent Neural Networks: Approximation and Optimization Analysis

Zhong Li, Jiequn Han, Weinan E, Qianxiao Li

Keywords Abstract Paper

universal approximation, optimization, curse of memory, recurrent neural network, dynamical system

Why Gradient Clipping Accelerates Training: A Theoretical Justification for Adaptivity

Jingzhao Zhang, Tianxing He, Suvrit Sra, Ali Jadbabaie

Keywords Abstract Paper

Adaptive methods, optimization, deep learning

Delta-STN: Efficient Bilevel Optimization for Neural Networks using Structured Response Jacobians

Juhan Bae, Roger Grosse

Keywords Abstract Paper

Distributional Gradient Matching for Learning Uncertain Neural Dynamics Models

Lenart Treven, Philippe Wenk, Florian Dorfler, Andreas Krause

Keywords Abstract Paper

deep learning, reinforcement learning and planning, kernel methods, active learning

Better depth-width trade-offs for neural networks through the lens of dynamical systems

Evangelos Chatziafratis, Ioannis Panageas, Sai Ganesh Nagarajan

Keywords Abstract Paper

Deep Learning - Theory

Meta-Learning with Neural Tangent Kernels

Yufan Zhou, Zhenyi Wang, Jiayi Xian and Changyou Chen, Jinhui Xu

Keywords Abstract Paper

neural tangent kernel, meta-learning

A Trace-restricted Kronecker-Factored Approximation to Natural Gradient

Kaixin Gao, Xiaolei Liu, Zhenghai Huang and Min Wang, Zidong Wang, Dachuan Xu, Fan Yu

Keywords Abstract Paper

Data-driven Prediction of General Hamiltonian Dynamics via Learning Exactly-Symplectic Maps

Renyi Chen, Molei Tao

Keywords Abstract Paper

Algorithms, Time Series and Sequences

On the memory mechanism of tensor-power recurrent models

Hejia Qiu, Chao Li, Ying Weng and Zhun Sun, Xingyu He, Qibin Zhao

Keywords Abstract Paper

MomentumRNN: Integrating Momentum into Recurrent Neural Networks

Tan Nguyen, Richard Baraniuk, Andrea Bertozzi and Stanley Osher, Bao Wang

Keywords Abstract Paper

Neural SDEs as Infinite-Dimensional GANs

Patrick Kidger, James Foster, Xuechen Li, Terry Lyons

Keywords Abstract Paper

Deep Learning, Adversarial Networks, Algorithms, Unsupervised Learning, Applications, Network Analysis

A Modular Analysis of Provable Acceleration via Polyak's Momentum: Training a Wide ReLU Network and a Deep Linear Network

Jun-Kun Wang, Chi-Heng Lin, Jake Abernethy

Keywords Abstract Paper

Optimization, Non-Convex Optimization

F-BRS: Rethinking Backpropagating Refinement for Interactive Segmentation

Konstantin Sofiiuk, Ilia Petrov, Olga Barinova, Anton Konushin

Keywords Abstract Paper

interactive segmentation, interactive, instance segmentation, segmentation, backpropagating refinement, refinement

Non-convex Learning via Replica Exchange Stochastic Gradient MCMC

Wei Deng, Qi Feng, Liyao Gao and Faming Liang, Guang Lin

Keywords Abstract Paper

Probabilistic Inference - Approximate, Monte Carlo, and Spectral Methods

Rethinking pruning for accelerating deep inference at the edge

Dawei Gao, Xiaoxi He, Zimu Zhou and Yongxin Tong, Ke Xu, Lothar Thiele

Keywords Paper

Keywords Paper

Somjit Nath, Vincent Liu, Alan Chan and
Xin Li, Adam White, Martha White

Keywords Paper

Ilja Kuzborskij, Csaba Szepesvari, Omar Rivasplata and
Amal Rannen-Triki, Razvan Pascanu

Keywords Paper

Keywords Paper

Yue Li, Wenrui Ding, Chunlei Liu and
Baochang Zhang, Guodong Guo

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Yufan Zhou, Zhenyi Wang, Jiayi Xian and
Changyou Chen, Jinhui Xu

Keywords Paper

Kaixin Gao, Xiaolei Liu, Zhenghai Huang and
Min Wang, Zidong Wang, Dachuan Xu, Fan Yu

Keywords Paper

Keywords Paper

Hejia Qiu, Chao Li, Ying Weng and
Zhun Sun, Xingyu He, Qibin Zhao

Keywords Paper

Tan Nguyen, Richard Baraniuk, Andrea Bertozzi and
Stanley Osher, Bao Wang

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Wei Deng, Qi Feng, Liyao Gao and
Faming Liang, Guang Lin

Keywords Paper

Dawei Gao, Xiaoxi He, Zimu Zhou and
Yongxin Tong, Ke Xu, Lothar Thiele

Keywords Paper

Mingqing Xiao, Qingyan Meng, Zongpeng Zhang and
Yisen Wang, Zhouchen Lin

Keywords Paper

Itay Hubara, Brian Chmiel, Moshe Island and
Ron Banner, Joseph Naor, Daniel Soudry

Keywords Paper

Fangcheng Fu, Yuzheng Hu, Yihan He and
Jiawei Jiang, Yingxia Shao, Ce Zhang, Bin Cui

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Haotong Qin, Ruihao Gong, Xianglong Liu and
Mingzhu Shen, Ziran Wei, Fengwei Yu, Jingkuan Song

Keywords Paper

Hao Wu, Yueyi Zhang, Wenming Weng and
Yongting Zhang, Zhiwei Xiong, Zheng-Jun Zha, Xiaoyan Sun, Feng Wu

Keywords Paper

Xiao Zhou, Weizhong Zhang, Zonghao Chen and
SHIZHE DIAO, Tong Zhang

Keywords Paper

Byeongho Heo, Sanghyuk Chun, Seong Joon Oh and
Dongyoon Han, Sangdoo Yun, Gyuwan Kim, Youngjung Uh, Jung-Woo Ha

Keywords Paper