Benefit of deep learning with non-convex noisy gradient descent: Provable excess risk bound and superiority to kernel methods

Abstract: Establishing a theoretical analysis that explains why deep learning can outperform shallow learning such as kernel methods is one of the biggest issues in the deep learning literature. Towards answering this question, we evaluate excess risk of a deep learning estimator trained by a noisy gradient descent with ridge regularization on a mildly overparameterized neural network, and discuss its superiority to a class of linear estimators that includes neural tangent kernel approach, random feature model, other kernel methods, $k$-NN estimator and so on. We consider a teacher-student regression model, and eventually show that {\it any} linear estimator can be outperformed by deep learning in a sense of the minimax optimal rate especially for a high dimension setting. The obtained excess bounds are so-called fast learning rate which is faster than $O(1/\sqrt{n})$ that is obtained by usual Rademacher complexity analysis. This discrepancy is induced by the non-convex geometry of the model and the noisy gradient descent used for neural network training provably reaches a near global optimal solution even though the loss landscape is highly non-convex. Although the noisy gradient descent does not employ any explicit or implicit sparsity inducing regularization, it shows a preferable generalization performance that dominates linear estimators.

06/12/2020

Deep Learning, Bayesian Neural Networks, Neural Network Gaussian Process, Infinite-Width Limit, Uncertainty, Gaussian Process

4:34

12/07/2020

Benefit of deep learning with non-convex noisy gradient descent: Provable excess risk bound and superiority to kernel methods

Taiji Suzuki, Akiyama Shunta

Comments

Similar Papers

Towards Better Generalization of Adaptive Gradient Methods

Yingxue Zhou, Belhal Karimi, Jinxing Yu and Zhiqiang Xu, Ping Li

Keywords Abstract Paper

Exploring the Uncertainty Properties of Neural Networks’ Implicit Priors in the Infinite-Width Limit

Ben Adlam, Jaehoon Lee, Lechao Xiao and Jeffrey Pennington, Jasper Snoek

Keywords Abstract Paper

Deep Learning, Bayesian Neural Networks, Neural Network Gaussian Process, Infinite-Width Limit, Uncertainty, Gaussian Process

Fractional Underdamped Langevin Dynamics: Retargeting SGD with Momentum under Heavy-Tailed Gradient Noise

Umut Simsekli, Lingjiong Zhu, Yee Whye Teh, Mert Gurbuzbalaban

Keywords Abstract Paper

Deep Learning - Theory

Faster Non-asymptotic Convergence for Double Q-learning

Lin Zhao, Huaqing Xiong, Yingbin Liang

Keywords Abstract Paper

theory, reinforcement learning and planning

Fast and Scalable Adversarial Training of Kernel SVM via Doubly Stochastic Gradients

Huimin Wu, Zhengmian Hu, Bin Gu

Keywords Abstract Paper

Implicit Bias of Gradient Descent based Adversarial Training on Separable Data

Yan Li, Ethan X.Fang, Huan Xu, Tuo Zhao

Keywords Abstract Paper

implicit bias, adversarial training, robustness, gradient descent

Gradient descent follows the regularization path for general losses

Ziwei Ji, Miroslav Dudik, Robert Schapire, Matus Telgarsky

Keywords Abstract Paper

Loss functions, Classification, Convex optimization

Fractal Structure and Generalization Properties of Stochastic Optimization Algorithms

Alexander Camuto, George Deligiannidis, Murat Erdogdu and Mert Gurbuzbalaban, Umut Simsekli, Lingjiong Zhu

Keywords Abstract Paper

theory, deep learning, optimization

Double Trouble in Double Descent: Bias and Variance(s) in the Lazy Regime

Stéphane d'Ascoli, Maria Refinetti, Giulio Biroli, Florent Krzakala

Keywords Abstract Paper

Deep Learning - Theory

SGD: The Role of Implicit Regularization, Batch-size and Multiple-epochs

Ayush Sekhari, Karthik Sridharan, Satyen Kale

Keywords Abstract Paper

theory, deep learning, optimization

Scaling Ensemble Distribution Distillation to Many Classes with Proxy Targets

Max Ryabinin, Andrey Malinin, Mark Gales

Keywords Abstract Paper

machine learning

Improved Analysis of Clipping Algorithms for Non-convex Optimization

Bohang Zhang, Jikai Jin, Cong Fang, Liwei Wang

Keywords Abstract Paper

Reconciling Modern Deep Learning with Traditional Optimization Analyses: The Intrinsic Learning Rate

Zhiyuan Li, Kaifeng Lyu, Sanjeev Arora

Keywords Abstract Paper

Provably Efficient Reinforcement Learning with Kernel and Neural Function Approximations

Zhuoran Yang, Chi Jin, Zhaoran Wang and Mengdi Wang, Michael Jordan

Keywords Abstract Paper

Exponential convergence rates of classification errors on learning with SGD and random features

Shingo Yashima, Atsushi Nitanda, Taiji Suzuki

Keywords Abstract Paper

Fast Rates for Structured Prediction

Vivien A Cabannnes, Francis Bach, Alessandro Rudi

Keywords Abstract Paper

Reliable Estimation of KL Divergence using a Discriminator in Reproducing Kernel Hilbert Space

Sandesh Ghimire, Aria Masoomi, Jennifer Dy

Keywords Abstract Paper

theory, deep learning, machine learning, kernel methods

Agnostic Learning with Multiple Objectives

Corinna Cortes, Mehryar Mohri, Javier Gonzalvo, Dmitry Storcheus

Keywords Abstract Paper

Robust Unsupervised Learning via L-statistic Minimization

Andreas Maurer, Daniela Angela Parletta, Andrea Paudice, Massimiliano Pontil

Keywords Abstract Paper

Theory, Statistical Learning Theory

On the generalization properties of adversarial training

Yue Xing, Qifan Song, Guang Cheng

Keywords Abstract Paper

Improving the Sample and Communication Complexity for Decentralized Non-Convex Optimization: Joint Gradient Estimation and Tracking

Haoran Sun, Songtao Lu, Mingyi Hong

Keywords Abstract Paper

Optimization - Non-convex

The Role of Momentum Parameters in the Optimal Convergence of Adaptive Polyak's Heavy-ball Methods

Yingxue Zhou, Belhal Karimi, Jinxing Yu and
Zhiqiang Xu, Ping Li

Keywords Paper

Ben Adlam, Jaehoon Lee, Lechao Xiao and
Jeffrey Pennington, Jasper Snoek

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Alexander Camuto, George Deligiannidis, Murat Erdogdu and
Mert Gurbuzbalaban, Umut Simsekli, Lingjiong Zhu

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Zhuoran Yang, Chi Jin, Zhaoran Wang and
Mengdi Wang, Michael Jordan

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Yogesh Balaji, Mohammadmahdi Sajedi, Neha Kalibhat and
Mucong Ding, Dominik Stöger, Mahdi Soltanolkotabi, Soheil Feizi

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Zhilei Wang, Pranjal Awasthi, Christoph Dann and
Ayush Sekhari, Claudio Gentile

Keywords Paper

Gen Li, Laixi Shi, Yuxin Chen and
Yuantao Gu, Yuejie Chi

Keywords Paper

Keywords Paper

Keywords Paper

T. Anderson Keller, Jorn Peters, Priyank Jaini and
Emiel Hoogeboom, Patrick Forré, Max Welling

Keywords Paper

Keywords Paper

Yuan Cao, Zhiying Fang, Yue Wu and
Ding-Xuan Zhou, Quanquan Gu

Keywords Paper

Keywords Paper

Keywords Paper