Noisy gradient descent converges to flat minima for nonconvex matrix factorization

Abstract: Numerous empirical evidences have corroborated the importance of noise in nonconvex optimization problems. The theory behind such empirical observations, however, is still largely unknown. This paper studies this fundamental problem through investigating the nonconvex rectangular matrix factorization problem, which has infinitely many global minima due to rotation and scaling invariance. Hence, gradient descent (GD) can converge to any optimum, depending on the initialization. In contrast, we show that a perturbed form of GD with an arbitrary initialization converges to a global optimum that is uniquely determined by the injected noise. Our result implies that the noise imposes implicit bias towards certain optima. Numerical experiments are provided to support our theory.

06/12/2021

Noisy gradient descent converges to flat minima for nonconvex matrix factorization

Tianyi Liu, Yan Li, Song Wei, Enlu Zhou, Tuo Zhao

Comments

Similar Papers

Optimizing Information-theoretical Generalization Bound via Anisotropic Noise of SGLD

Bohan Wang, Huishuai Zhang, Jieyu Zhang and Qi Meng, Wei Chen, Tie-Yan Liu

Keywords Abstract Paper

On the difficulty of unbiased alpha divergence minimization

Tomas Geffner, Justin Domke

Keywords Abstract Paper

Algorithms, Adversarial Learning, Deep Learning, Adversarial Networks, Probabilistic Methods, Approximate Inference

Parallel Bayesian Optimization of Multiple Noisy Objectives with Expected Hypervolume Improvement

Samuel Daulton, Maximilian Balandat, Eytan Bakshy

Keywords Abstract Paper

optimization, machine learning, kernel methods

Multiplicative Noise and Heavy Tails in Stochastic Optimization

Liam Hodgkinson, Michael Mahoney

Keywords Abstract Paper

Optimization, Stochastic Optimization

Stochastic Neural Network with Kronecker Flow

Chin-Wei Huang, Ahmed Touati, Pascal Vincent and Gintare Karolina Dziugaite, Alexandre Lacoste, Aaron Courville

Keywords Abstract Paper

When does preconditioning help or hurt generalization?

Shun-ichi Amari, Jimmy Ba, Roger Grosse and Chen Li, Atsushi Nitanda, Taiji Suzuki, Denny Wu, Ji Xu

Keywords Abstract Paper

high-dimensional asymptotics, generalization, second-order optimization, natural gradient descent

Unified Robust Semi-Supervised Variational Autoencoder

Keywords Abstract Paper

Deep Learning, Reinforcement Learning and Planning, Reinforcement Learning, Applications, Robotics

Uncertainty quantification for nonconvex tensor completion: Confidence intervals, heteroscedasticity and optimality

Changxiao Cai, H. Vincent Poor, Yuxin Chen

Keywords Abstract Paper

Shape Matters: Understanding the Implicit Bias of the Noise Covariance

Jeff Z. HaoChen, Colin Wei, Jason Lee, Tengyu Ma

Keywords Abstract Paper

Support recovery and sup-norm convergence rates for sparse pivotal estimation

Mathurin Massias, Quentin Bertrand, Alexandre Gramfort, Joseph Salmon

Keywords Abstract Paper

Learning Noise Transition Matrix from Only Noisy Labels via Total Variation Regularization

Yivan Zhang, Gang Niu, Masashi Sugiyama

Keywords Abstract Paper

Algorithms, Semi-Supervised Learning

Adversarial Nonnegative Matrix Factorization

lei luo, yanfu Zhang, Heng Huang

Keywords Abstract Paper

Modulating Surrogates for Bayesian Optimization

Erik Bodin, Markus Kaiser, Ieva Kazlauskaite and Zhenwen Dai, Neill Campbell, Carl Henrik Ek

Keywords Abstract Paper

Square Root Principal Component Pursuit: Tuning-Free Noisy Robust Matrix Recovery

Junhui Zhang, Jingkai Yan, John Wright

Keywords Abstract Paper

Distributionally Robust Local Non-parametric Conditional Estimation

Viet Anh Nguyen, Fan Zhang, Jose Blanchet and Erick Delage, Yinyu Ye

Keywords Abstract Paper

Differential Spectral Normalization (DSN) for PDE Discovery

Chi Chiu So, Tsz On Li, Chufang Wu, Siu Pang Yung

Keywords Abstract Paper

Distributionally Robust Parametric Maximum Likelihood Estimation

Viet Anh Nguyen, Xuhui Zhang, Jose Blanchet, Angelos Georghiou

Keywords Abstract Paper

Efficient hierarchical Bayesian inference for spatio-temporal regression models in neuroimaging

Ali Hashemi, Yijing Gao, Chang Cai and Sanjay Ghosh, Klaus-Robert Müller, Srikantan Nagarajan, Stefan Haufe

Keywords Abstract Paper

theory, optimization

The Statistical Cost of Robust Kernel Hyperparameter Turning

Raphael Meyer, Christopher Musco

Keywords Abstract Paper

Distributionally Robust Bayesian Optimization

Johannes Kirschner, Ilija Bogunovic, Stefanie Jegelka, Andreas Krause

Keywords Abstract Paper

A Note on Sparse Generalized Eigenvalue Problem

Yunfeng Cai, Guanhua Fang, Ping Li

Keywords Abstract Paper

Estimation Rates for Sparse Linear Cyclic Causal Models

Jan-Christian Huetter, Philippe Rigollet

Keywords Abstract Paper

A Central Limit Theorem for Differentially Private Query Answering

Jinshuo Dong, Weijie Su, Linjun Zhang

Keywords Abstract Paper

On the interplay between noise and curvature and its effect on optimization and generalization

Bohan Wang, Huishuai Zhang, Jieyu Zhang and
Qi Meng, Wei Chen, Tie-Yan Liu

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Chin-Wei Huang, Ahmed Touati, Pascal Vincent and
Gintare Karolina Dziugaite, Alexandre Lacoste, Aaron Courville

Keywords Paper

Shun-ichi Amari, Jimmy Ba, Roger Grosse and
Chen Li, Atsushi Nitanda, Taiji Suzuki, Denny Wu, Ji Xu

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Erik Bodin, Markus Kaiser, Ieva Kazlauskaite and
Zhenwen Dai, Neill Campbell, Carl Henrik Ek

Keywords Paper

Keywords Paper

Viet Anh Nguyen, Fan Zhang, Jose Blanchet and
Erick Delage, Yinyu Ye

Keywords Paper

Keywords Paper

Keywords Paper

Ali Hashemi, Yijing Gao, Chang Cai and
Sanjay Ghosh, Klaus-Robert Müller, Srikantan Nagarajan, Stefan Haufe

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Valentin Thomas, Fabian Pedregosa, Bart van Merriënboer and
Pierre-Antoine Manzagol, Yoshua Bengio, Nicolas Le Roux

Keywords Paper

Jingfeng Wu, Wenqing Hu, Haoyi Xiong and
Jun Huan, Vladimir Braverman, Zhanxing Zhu

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Partha Ghosh, Mehdi S. M. Sajjadi, Antonio Vergari and
Michael Black, Bernhard Scholkopf

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper