Stochasticity of Deterministic Gradient Descent: Large Learning Rate for Multiscale Objective Function

06/12/2020

Stochasticity of Deterministic Gradient Descent: Large Learning Rate for Multiscale Objective Function

Lingkai Kong, Molei Tao

Keywords: Deep Learning -> Efficient Inference Methods, Algorithms -> Boosting and Ensemble Methods

Abstract Paper Similar Papers

Abstract: This article suggests that deterministic Gradient Descent, which does not use any stochastic gradient approximation, can still exhibit stochastic behaviors. In particular, it shows that if the objective function exhibit multiscale behaviors, then in a large learning rate regime which only resolves the macroscopic but not the microscopic details of the objective, the deterministic GD dynamics can become chaotic and convergent not to a local minimizer but to a statistical distribution. In this sense, deterministic GD resembles stochastic GD even though no stochasticity is injected. A sufficient condition is also established for approximating this long-time statistical limit by a rescaled Gibbs distribution, which for example allows escapes from local minima to be quantified. Both theoretical and numerical demonstrations are provided, and the theoretical part relies on the construction of a stochastic map that uses bounded noise (as opposed to Gaussian noise).

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at NeurIPS 2020 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

06/12/2021

Time-independent Generalization Bounds for SGLD in Non-convex Settings

Tyler Farghly, Patrick Rebeschini

Keywords Paper

optimization

0

0

0

0

9:07

06/12/2020

Stochastic Normalizing Flows

Hao Wu, Jonas Köhler, Frank Noe

Keywords Paper

0

0

0

0

3:19

06/12/2020

Distributionally Robust Federated Averaging

Yuyang Deng, Mohammad Mahdi Kamani, Mehrdad Mahdavi

Keywords Paper

0

0

0

0

3:11

06/12/2020

Distributionally Robust Parametric Maximum Likelihood Estimation

Viet Anh Nguyen, Xuhui Zhang, Jose Blanchet, Angelos Georghiou

Keywords Paper

0

0

0

0

3:15

14/09/2020

Weak approximation of transformed stochastic gradient MCMC

Soma Yokoi, Takuma Otsuka, Issei Sat

Keywords Paper

0

0

0

0

13:39

26/04/2020

On Generalization Error Bounds of Noisy Gradient Methods for Non-Convex Learning

Jian Li, Xuanyuan Luo, Mingda Qiao

Keywords Paper

learning theory, generalization, nonconvex learning, stochastic gradient descent, Langevin dynamics

0

0

0

0

4:50

06/12/2021

Learning to Select Exogenous Events for Marked Temporal Point Process

Ping Zhang, Rishabh Iyer, Ashish Tendulkar and
Gaurav Aggarwal, Abir De

Keywords Paper

0

0

0

0

12:27

03/08/2020

Relaxed Multivariate Bernoulli Distribution and Its Applications to Deep Generative Models

Xi Wang, Junming Yin

Keywords Paper

0

0

0

0

7:56

04/07/2020

A Batch Normalized Inference Network Keeps the KL Vanishing Away

Qile Zhu, Wei Bi, Xiaojiang Liu and
Xiyao Ma, Xiaolin Li, Dapeng Wu

Keywords Paper

amortized inference, language modeling, text classification, dialogue generation

0

0

0

0

11:16

06/12/2020

Understanding Double Descent Requires A Fine-Grained Bias-Variance Decomposition

Ben Adlam, Jeffrey Pennington

Keywords Paper

0

0

0

0

3:30

12/07/2020

Fractional Underdamped Langevin Dynamics: Retargeting SGD with Momentum under Heavy-Tailed Gradient Noise

Umut Simsekli, Lingjiong Zhu, Yee Whye Teh, Mert Gurbuzbalaban

Keywords Paper

Deep Learning - Theory

0

0

0

0

15:37

19/08/2021

Regularizing Variational Autoencoder with Diversity and Uncertainty Awareness

Dazhong Shen, Chuan Qin, Chao Wang and
Hengshu Zhu, Enhong Chen, Hui Xiong

Keywords Paper

Machine Learning, Bayesian Learning, Probabilistic Machine Learning, Unsupervised Learning

0

0

0

0

13:04

06/12/2020

Least Squares Regression with Markovian Data: Fundamental Limits and Algorithms

Dheeraj Nagaraj, Xian Wu, Guy Bresler and
Prateek Jain, Praneeth Netrapalli

Keywords Paper

0

0

0

0

3:34

12/07/2020

Batch Stationary Distribution Estimation

Junfeng Wen, Bo Dai, Lihong Li, Dale Schuurmans

Keywords Paper

Probabilistic Inference - Approximate, Monte Carlo, and Spectral Methods

0

0

0

0

14:47

18/07/2021

Noise and Fluctuation of Finite Learning Rate Stochastic Gradient Descent

Kangqiao Liu, Liu Ziyin, Masahito Ueda

Keywords Paper

Theory, Deep learning Theory

0

0

0

0

5:18

12/07/2020

The continuous categorical: a novel simplex-valued exponential family

Elliott Gordon-Rodriguez, Gabriel Loaiza-Ganem, John Cunningham

Keywords Paper

Probabilistic Inference - Models and Probabilistic Programming

0

0

0

0

14:59

06/12/2021

Slice Sampling Reparameterization Gradients

David M Zoltowski, Diana Cai, Ryan Adams

Keywords Paper

optimization, machine learning, generative model

0

0

0

0

14:43

06/12/2021

Higher Order Kernel Mean Embeddings to Capture Filtrations of Stochastic Processes

Cristopher Salvi, Maud Lemercier, Chong Liu and
Blanka Horvath, Theodoros Damoulas, Terry Lyons

Keywords Paper

machine learning, graph learning, causality

0

0

0

0

15:02

06/12/2021

Non-convex Distributionally Robust Optimization: Non-asymptotic Analysis

Jikai Jin, Bohang Zhang, Haiyang Wang, Liwei Wang

Keywords Paper

optimization

0

0

0

0

14:05

06/12/2020

Hausdorff Dimension, Heavy Tails, and Generalization in Neural Networks

Umut Simsekli, Ozan Sener, George Deligiannidis, Murat Erdogdu

Keywords Paper

Deep Learning -> Supervised Deep Networks, Deep Learning -> Embedding Approaches

0

0

0

0

3:32

06/12/2021

Label Noise SGD Provably Prefers Flat Global Minimizers

Alex Damian, Tengyu Ma, Jason Lee

Keywords Paper

optimization, machine learning

0

0

0

0

11:31

12/07/2020

Minimax-Optimal Off-Policy Evaluation with Linear Function Approximation

Yaqi Duan, Zeyu Jia, Mengdi Wang

Keywords Paper

Learning Theory

0

0

0

0

14:10

06/12/2021

Spatio-Temporal Variational Gaussian Processes

Oliver Hamelijnck, William Wilkinson, Niki Loppi and
Arno Solin, Theodoros Damoulas

Keywords Paper

generative model, kernel methods

0

0

0

0

6:04

06/12/2020

Multi-task Additive Models for Robust Estimation and Automatic Structure Discovery

Yingjie Wang, Hong Chen, Feng Zheng and
Chen Xu, Tieliang Gong, Yanhong Chen

Keywords Paper

Applications -> Time Series Analysis; Probabilistic Methods -> Variational Inference, Probabilistic Methods -> Causal Inference

0

0

0

0

3:00

06/12/2021

Scaling Ensemble Distribution Distillation to Many Classes with Proxy Targets

Max Ryabinin, Andrey Malinin, Mark Gales

Keywords Paper

machine learning

0

0

0

0

12:36

06/12/2021

An Analysis of Constant Step Size SGD in the Non-convex Regime: Asymptotic Normality and Bias

Lu Yu, Krishnakumar Balasubramanian, Stanislav Volgushev, Murat Erdogdu

Keywords Paper

optimization, machine learning

0

0

0

0

10:21

03/05/2021

Noise against noise: stochastic label noise helps combat inherent label noise

Pengfei Chen, Guangyong Chen, Junjie Ye and
jingwei zhao, Pheng-Ann Heng

Keywords Paper

Regularization, SGD noise, Robust Learning, Noisy Labels

0

0

0

0

9:42

06/12/2021

Loss function based second-order Jensen inequality and its application to particle variational inference

Futoshi Futami, Tomoharu Iwata, naonori ueda and
Issei Sato, Masashi Sugiyama

Keywords Paper

optimization, generative model

0

0

0

0

14:09

06/12/2020

Nonasymptotic Guarantees for Spiked Matrix Recovery with Generative Priors

Jorio Cocola, Paul Hand, Vlad Voroninski

Keywords Paper

0

0

0

0

3:15

06/12/2021

Machine learning structure preserving brackets for forecasting irreversible processes

Kookjin Lee, Nathaniel Trask, Panos Stinis

Keywords Paper

deep learning, machine learning

0

0

0

0

15:01

06/12/2020

A Local Temporal Difference Code for Distributional Reinforcement Learning

Pablo Tano, Peter Dayan, Alexandre Pouget

Keywords Paper

0

0

0

0

3:24

06/12/2021

Continuous Latent Process Flows

Ruizhi Deng, Marcus Brubaker, Greg Mori, Andreas M Lehrmann

Keywords Paper

generative model

0

0

0

0

14:54

03/05/2021

When does preconditioning help or hurt generalization?

Shun-ichi Amari, Jimmy Ba, Roger Grosse and
Chen Li, Atsushi Nitanda, Taiji Suzuki, Denny Wu, Ji Xu

Keywords Paper

high-dimensional asymptotics, generalization, second-order optimization, natural gradient descent

0

0

0

0

5:21

12/07/2020

Momentum-Based Policy Gradient Methods

Feihu Huang, Shangqian Gao, Jian Pei, Heng Huang

Keywords Paper

Reinforcement Learning - General

0

0

0

0

13:28

06/12/2021

Sampling with Trusthworthy Constraints: A Variational Gradient Framework

Xingchao Liu, Xin Tong, Qiang Liu

Keywords Paper

optimization, machine learning, fairness, interpretability

0

0

0

0

11:21

26/08/2020

Deterministic Decoding for Discrete Data in Variational Autoencoders

Daniil Polykovskiy, Dmitry Vetrov

Keywords Paper

0

0

0

0

9:00

03/05/2021

Accelerating Convergence of Replica Exchange Stochastic Gradient MCMC via Variance Reduction

Wei Deng, Qi Feng, Georgios Karagiannis and
Guang Lin, Faming Liang

Keywords Paper

Markov jump process, uncertainty quantification, generalized Girsanov theorem, change of measure, stochastic gradient Langevin dynamics, parallel tempering, replica exchange, Dirichlet form, variance reduction

0

0

0

0

5:19

06/12/2021

Determinantal point processes based on orthogonal polynomials for sampling minibatches in SGD

Rémi Bardenet, Subhroshekhar Ghosh, Meixia LIN

Keywords Paper

optimization, machine learning

0

0

0

0

14:51

06/12/2020

Triple descent and the two kinds of overfitting: where & why do they appear?

Stéphane d'Ascoli, Levent Sagun, Giulio Biroli

Keywords Paper

Algorithms -> Active Learning; Algorithms -> Classification; Algorithms -> Ranking and Preference Learning, Theory -> Learning Theory

0

0

0

0

3:28

19/08/2021

Hindsight Value Function for Variance Reduction in Stochastic Dynamic Environment

Jiaming Guo, Rui Zhang, Xishan Zhang and
Shaohui Peng, Qi Yi, Zidong Du, Xing Hu, Qi Guo, Yunji Chen

Keywords Paper

Machine Learning, Deep Learning, Deep Reinforcement Learning, Sequential Decision Making

0

0

0

0

14:36