SBO-RNN: Reformulating Recurrent Neural Networks via Stochastic Bilevel Optimization

06/12/2021

SBO-RNN: Reformulating Recurrent Neural Networks via Stochastic Bilevel Optimization

Ziming Zhang, Yun Yue, Guojun Wu, Yanhua Li, Haichong Zhang

Keywords: deep learning, optimization

Abstract Paper Similar Papers

Abstract: In this paper we consider the training stability of recurrent neural networks (RNNs) and propose a family of RNNs, namely SBO-RNN, that can be formulated using stochastic bilevel optimization (SBO). With the help of stochastic gradient descent (SGD), we manage to convert the SBO problem into an RNN where the feedforward and backpropagation solve the lower and upper-level optimization for learning hidden states and their hyperparameters, respectively. We prove that under mild conditions there is no vanishing or exploding gradient in training SBO-RNN. Empirically we demonstrate our approach with superior performance on several benchmark datasets, with fewer parameters, less training data, and much faster convergence. Code is available at https://zhang-vislab.github.io.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at NeurIPS 2021 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

06/12/2021

Heavy Ball Neural Ordinary Differential Equations

Hedi Xia, Vai Suliafu, Hangjie Ji and
Tan Nguyen, Andrea Bertozzi, Stanley Osher, Bao Wang

Keywords Paper

deep learning, optimization, machine learning, vision

0

0

0

0

4:08

26/04/2020

SNODE: Spectral Discretization of Neural ODEs for System Identification

Alessio Quaglino, Marco Gallieri, Jonathan Masci, Jan Koutník

Keywords Paper

Recurrent neural networks, system identification, neural ODEs

0

0

0

0

5:00

03/05/2021

BRECQ: Pushing the Limit of Post-Training Quantization by Block Reconstruction

Yuhang Li, Ruihao Gong, Xu Tan and
Yang Yang, Peng Hu, Qi Zhang, fengwei yu, Wei Wang, Shi Gu

Keywords Paper

Second-order analysis, Mixed Precision, Post Training Quantization

0

0

0

0

4:36

04/08/2021

Nonparametric Regression with Shallow Overparametrized Neural Networks Trained by GD with Early Stopping

Ilja Kuzborskij , Csaba Szepesvari

Keywords Paper

0

0

0

0

15:14

12/07/2020

Training Neural Networks for and by Interpolation

Leonard Berrada, M. Pawan Kumar, Andrew Zisserman

Keywords Paper

Deep Learning - General

0

0

0

0

16:12

02/02/2021

Delving into Variance Transmission and Normalization: Shift of Average Gradient Makes the Network Collapse

Yuxiang Liu, Jidong Ge, Chuanyi Li, Jie Gui

Keywords Paper

0

0

0

0

14:49

26/04/2020

Harnessing the Power of Infinitely Wide Deep Nets on Small-data Tasks

Sanjeev Arora, Simon S. Du, Zhiyuan Li and
Ruslan Salakhutdinov, Ruosong Wang, Dingli Yu

Keywords Paper

small data, neural tangent kernel, UCI database, few-shot learning, kernel SVMs, deep learning theory, kernel design

0

0

0

0

5:02

06/12/2021

Efficient Algorithms for Learning Depth-2 Neural Networks with General ReLU Activations

Pranjal Awasthi, Alex Tang, Aravindan Vijayaraghavan

Keywords Paper

theory, deep learning

0

0

0

0

14:31

13/04/2021

Meta learning in the continuous time limit

Ruitu Xu, Lin Chen, Amin Karbasi

Keywords Paper

0

0

0

0

2:56

06/12/2020

A Single-Loop Smoothed Gradient Descent-Ascent Algorithm for Nonconvex-Concave Min-Max Problems

Jiawei Zhang, Peijun Xiao, Ruoyu Sun, Zhiquan Luo

Keywords Paper

0

0

0

0

3:12

06/12/2021

Stability & Generalisation of Gradient Descent for Shallow Neural Networks without the Neural Tangent Kernel

Dominic Richards, Ilja Kuzborskij

Keywords Paper

deep learning, optimization

0

0

0

0

11:09

03/05/2021

Separation and Concentration in Deep Networks

John Zarka, Florentin Guth, Stéphane Mallat

Keywords Paper

concentration, mean separation, neural collapse, fisher ratio, image classification, variance reduction, deep learning

0

0

0

0

5:11

06/12/2021

The Implicit Bias of Minima Stability: A View from Function Space

Rotem Mulayoff, Tomer Michaeli, Daniel Soudry

Keywords Paper

deep learning, optimization

0

0

0

0

13:51

06/12/2021

Functional Regularization for Reinforcement Learning via Learned Fourier Features

Alexander Li, Deepak Pathak

Keywords Paper

deep learning, optimization, reinforcement learning and planning

0

0

0

0

14:35

06/12/2021

Second-Order Neural ODE Optimizer

Guan-Horng Liu, Tianrong Chen, Evangelos Theodorou

Keywords Paper

deep learning, optimization, machine learning, vision

0

0

0

0

14:59

05/01/2021

Phase-Wise Parameter Aggregation for Improving SGD Optimization

Takumi Kobayashi

Keywords Paper

0

0

0

0

4:36

18/07/2021

Tighter Bounds on the Log Marginal Likelihood of Gaussian Process Regression Using Conjugate Gradients

Artem Artemev, David Burt, Mark van der Wilk

Keywords Paper

Probabilistic Methods, Gaussian Processes and Bayesian non-parametrics

0

0

0

0

17:13

06/12/2021

End-to-end reconstruction meets data-driven regularization for inverse problems

Subhadip Mukherjee, Marcello Carioni, Ozan Öktem, Carola-Bibiane Schönlieb

Keywords Paper

deep learning, graph learning

0

0

0

0

13:12

18/07/2021

What Are Bayesian Neural Network Posteriors Really Like?

Pavel Izmailov, Sharad Vikram, Matt Hoffman, Andrew Wilson

Keywords Paper

Deep Learning, Bayesian Deep Learning

0

0

0

0

17:13

14/06/2020

HRank: Filter Pruning Using High-Rank Feature Map

Mingbao Lin, Rongrong Ji, Yan Wang and
Yichen Zhang, Baochang Zhang, Yonghong Tian, Ling Shao

Keywords Paper

network pruning, neural network compression and acceleration, high-rank feature map, efficient deep learning computing

0

0

0

0

4:57

12/07/2020

Minimally distorted Adversarial Examples with a Fast Adaptive Boundary Attack

Francesco Croce, Matthias Hein

Keywords Paper

Adversarial Examples

0

0

0

0

15:12

06/12/2021

Learned Robust PCA: A Scalable Deep Unfolding Approach for High-Dimensional Outlier Detection

HanQin Cai, Jialin Liu, Wotao Yin

Keywords Paper

deep learning, machine learning

0

0

0

0

8:07

12/07/2020

Extrapolation for Large-batch Training in Deep Learning

Tao LIN, Lingjing Kong, Sebastian Stich, Martin Jaggi

Keywords Paper

Deep Learning - Algorithms

0

0

0

0

13:21

13/04/2021

Spectral tensor train parameterization of deep learning layers

Anton Obukhov, Maxim Rakhuba, Alexander Liniger and
Zhiwu Huang, Stamatios Georgoulis, Dengxin Dai, Luc Van Gool

Keywords Paper

0

0

0

0

3:09

03/05/2021

Sharper Generalization Bounds for Learning with Gradient-dominated Objective Functions

Yunwen Lei, Yiming Ying

Keywords Paper

generalization bounds, non-convex learning

0

0

0

0

5:09

06/12/2021

Efficient Mirror Descent Ascent Methods for Nonsmooth Minimax Problems

Feihu Huang, Xidong Wu, Heng Huang

Keywords Paper

theory, deep learning, optimization

0

0

0

0

7:57

03/05/2021

Understanding Over-parameterization in Generative Adversarial Networks

Yogesh Balaji, Mohammadmahdi Sajedi, Neha Kalibhat and
Mucong Ding, Dominik Stöger, Mahdi Soltanolkotabi, Soheil Feizi

Keywords Paper

min-max optimization, Over-parameterization, GAN

0

0

0

0

5:04

18/07/2021

Tensor Programs IV: Feature Learning in Infinite-Width Neural Networks

Greg Yang, Edward Hu

Keywords Paper

Theory, Deep learning Theory

0

0

0

0

5:22

14/06/2020

Orthogonal Convolutional Neural Networks

Jiayun Wang, Yubei Chen, Rudrasis Chakraborty, Stella X. Yu

Keywords Paper

orthogonal convolution, orthogonality, regularization, filter redundancy, robustness, classification, retrieval, semi-supervised, gans, inpainting

0

0

0

0

1:00

03/05/2021

How Neural Networks Extrapolate: From Feedforward to Graph Neural Networks

Keyulu Xu, Mozhi Zhang, Jingling Li and
Simon Du, Ken-Ichi Kawarabayashi, Stefanie Jegelka

Keywords Paper

graph neural networks, out-of-distribution, deep learning, extrapolation, deep learning theory

0

0

0

1

17:06

06/12/2021

Test-Time Classifier Adjustment Module for Model-Agnostic Domain Generalization

Yusuke Iwasawa, Yutaka Matsuo

Keywords Paper

deep learning, optimization, transformers, domain adaptation

0

0

0

0

13:50

18/07/2021

"Hey, that's not an ODE": Faster ODE Adjoints via Seminorms

Patrick Kidger, Ricky T. Q. Chen, Terry Lyons

Keywords Paper

Deep Learning

0

0

0

0

5:01

06/12/2020

Delta-STN: Efficient Bilevel Optimization for Neural Networks using Structured Response Jacobians

Juhan Bae, Roger Grosse

Keywords Paper

0

0

0

0

3:20

09/07/2020

Implicit Bias of Gradient Descent for Wide Two-layer Neural Networks Trained with the Logistic Loss

Lénaïc Chizat, Francis Bach

Keywords Paper

Neural networks/deep learning, Non-convex optimization

0

0

0

0

14:41

12/07/2020

Don't Waste Your Bits! Squeeze Activations and Gradients for Deep Neural Networks via TinyScript

Fangcheng Fu, Yuzheng Hu, Yihan He and
Jiawei Jiang, Yingxia Shao, Ce Zhang, Bin Cui

Keywords Paper

Optimization - Large Scale, Parallel and Distributed

0

0

0

0

9:59

02/02/2021

VSQL: Variational Shadow Quantum Learning for Classification

Guangxi Li, Zhixin Song, Xin Wang

Keywords Paper

0

0

0

0

16:46

06/12/2020

Implicit Bias in Deep Linear Classification: Initialization Scale vs Training Accuracy

Edward Moroshko, Blake Woodworth, Suriya Gunasekar and
Jason Lee, Nati Srebro, Daniel Soudry

Keywords Paper

0

0

0

0

3:19

20/07/2020

DP-LSSGD: A Stochastic Optimization Method to Lift the Utility in Privacy-Preserving ERM

Bao Wang, Quanquan Gu, March Boedihardjo and
Lingxiao Wang, Farzin Barekat, Stanley J. Osher

Keywords Paper

0

0

0

0

17:42

14/06/2020

On the Regularization Properties of Structured Dropout

Ambar Pal, Connor Lane, René Vidal, Benjamin D. Haeffele

Keywords Paper

dropout, regularization, dropblock, dropconnect, neural networks, optimization, low rank, nuclear norm, k-support norm

0

0

0

0

1:01

12/07/2020

Landscape Connectivity and Dropout Stability of SGD Solutions for Over-parameterized Neural Networks

Alexander Shevchenko, Marco Mondelli

Keywords Paper

Deep Learning - Theory

0

0

0

0

13:20