Early-stopped neural networks are consistent

06/12/2021

Early-stopped neural networks are consistent

Ziwei Ji, Justin Li, Matus Telgarsky

Keywords: deep learning, optimization, machine learning

Abstract Paper Similar Papers

Abstract: This work studies the behavior of shallow ReLU networks trained with the logistic loss via gradient descent on binary classification data where the underlying data distribution is general, and the (optimal) Bayes risk is not necessarily zero. In this setting, it is shown that gradient descent with early stopping achieves population risk arbitrarily close to optimal in terms of not just logistic and misclassification losses, but also in terms of calibration, meaning the sigmoid mapping of its outputs approximates the true underlying conditional distribution arbitrarily finely. Moreover, the necessary iteration, sample, and architectural complexities of this analysis all scale naturally with a certain complexity measure of the true conditional model. Lastly, while it is not shown that early stopping is necessary, it is shown that any classifier satisfying a basic local interpolation property is inconsistent.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at NeurIPS 2021 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

06/12/2021

Bayesian decision-making under misspecified priors with applications to meta-learning

Max Simchowitz, Christopher Tosh, Akshay Krishnamurthy and
Daniel Hsu, Thodoris Lykouris, Miro Dudik, Robert E Schapire

Keywords Paper

meta learning, bandits

0

0

0

0

14:58

06/12/2021

Conformal Bayesian Computation

Edwin Fong, Chris C Holmes

Keywords Paper

machine learning

0

0

0

0

14:54

18/07/2021

Risk Bounds and Rademacher Complexity in Batch Reinforcement Learning

Yaqi Duan, Chi Jin, Zhiyuan Li

Keywords Paper

Theory, RL, Decisions and Control Theory

0

0

0

0

5:18

06/12/2021

An Analysis of Constant Step Size SGD in the Non-convex Regime: Asymptotic Normality and Bias

Lu Yu, Krishnakumar Balasubramanian, Stanislav Volgushev, Murat Erdogdu

Keywords Paper

optimization, machine learning

0

0

0

0

10:21

06/12/2021

Uncertainty-Based Offline Reinforcement Learning with Diversified Q-Ensemble

Gaon An, Seungyong Moon, Jang-Hyun Kim, Hyun Oh Song

Keywords Paper

deep learning, reinforcement learning and planning

1

0

0

0

13:50

06/12/2021

Deep Bandits Show-Off: Simple and Efficient Exploration with Deep Networks

Rong Zhu, Mattia Rigotti

Keywords Paper

theory, deep learning, reinforcement learning and planning, bandits

0

0

0

0

8:45

06/12/2020

Adaptive Sampling for Stochastic Risk-Averse Learning

Sebastian Curi, Kfir Y. Levy, Stefanie Jegelka, Andreas Krause

Keywords Paper

0

0

0

0

3:13

06/12/2021

Risk Minimization from Adaptively Collected Data: Guarantees for Supervised and Policy Learning

Aurelien Bibaut, Nathan Kallus, Maria Dimakopoulou and
Antoine Chambaz, Mark van der Laan

Keywords Paper

theory, reinforcement learning and planning, machine learning, bandits

0

0

0

0

16:07

12/07/2020

The continuous categorical: a novel simplex-valued exponential family

Elliott Gordon-Rodriguez, Gabriel Loaiza-Ganem, John Cunningham

Keywords Paper

Probabilistic Inference - Models and Probabilistic Programming

0

0

0

0

14:59

13/04/2021

Learning with gradient descent and weakly convex losses

Dominic Richards, Mike Rabbat

Keywords Paper

0

0

0

0

3:20

06/12/2021

Determinantal point processes based on orthogonal polynomials for sampling minibatches in SGD

Rémi Bardenet, Subhroshekhar Ghosh, Meixia LIN

Keywords Paper

optimization, machine learning

0

0

0

0

14:51

06/12/2021

Differentiable Annealed Importance Sampling and the Perils of Gradient Noise

Guodong Zhang, Kyle Hsu, Jianing Li and
Chelsea Finn, Roger Grosse

Keywords Paper

optimization, generative model

0

0

0

0

15:30

03/05/2021

Sequential Density Ratio Estimation for Simultaneous Optimization of Speed and Accuracy

Akinori Ebihara, Taiki Miyagawa, Kazuyuki Sakurai, Hitoshi Imaoka

Keywords Paper

Density ratio estimation, Early classification, Sequential probability ratio test

0

0

0

0

9:55

06/12/2021

Slice Sampling Reparameterization Gradients

David M Zoltowski, Diana Cai, Ryan Adams

Keywords Paper

optimization, machine learning, generative model

0

0

0

0

14:43

06/12/2021

Non-convex Distributionally Robust Optimization: Non-asymptotic Analysis

Jikai Jin, Bohang Zhang, Haiyang Wang, Liwei Wang

Keywords Paper

optimization

0

0

0

0

14:05

18/07/2021

Instance-Optimal Compressed Sensing via Posterior Sampling

Ajil Jalal, Sushrut Karmalkar, Alex Dimakis, Eric Price

Keywords Paper

Algorithms, Sparsity and Compressed Sensing

0

0

0

0

5:26

06/12/2021

Scaling Ensemble Distribution Distillation to Many Classes with Proxy Targets

Max Ryabinin, Andrey Malinin, Mark Gales

Keywords Paper

machine learning

0

0

0

0

12:36

06/12/2021

Hybrid Regret Bounds for Combinatorial Semi-Bandits and Adversarial Linear Bandits

Shinji Ito

Keywords Paper

bandits

0

0

0

0

10:49

18/07/2021

Exact Gap between Generalization Error and Uniform Convergence in Random Feature Models

Zitong Yang, Yu Bai, Song Mei

Keywords Paper

Theory, Deep learning Theory

0

0

0

0

5:40

06/12/2020

Self-training Avoids Using Spurious Features Under Domain Shift

Yining Chen, Colin Wei, Ananya Kumar, Tengyu Ma

Keywords Paper

0

0

0

0

3:18

12/07/2020

Learning Near Optimal Policies with Low Inherent Bellman Error

Andrea Zanette, Alessandro Lazaric, Mykel Kochenderfer, Emma Brunskill

Keywords Paper

Reinforcement Learning - Theory

0

0

0

0

14:22

09/07/2020

Gradient descent follows the regularization path for general losses

Ziwei Ji, Miroslav Dudik, Robert Schapire, Matus Telgarsky

Keywords Paper

Loss functions, Classification, Convex optimization

0

0

0

0

13:48

04/08/2021

Benign Overfitting of Constant-Stepsize SGD for Linear Regression

Difan Zou, Jingfeng Wu, Vladimir Braverman and
Quanquan Gu, Sham Kakade

Keywords Paper

0

0

0

0

18:27

03/05/2021

What are the Statistical Limits of Offline RL with Linear Function Approximation?

Ruosong Wang, Dean Foster, Sham M Kakade

Keywords Paper

batch reinforcement learning, representation, function approximation, lower bound

0

0

0

0

9:02

06/12/2021

Linear Convergence in Federated Learning: Tackling Client Heterogeneity and Sparse Gradients

Aritra Mitra, Rayana Jaafar, George J. Pappas, Hamed Hassani

Keywords Paper

optimization, federated learning

0

0

0

0

14:43

12/07/2020

Data preprocessing to mitigate bias: A maximum entropy based approach

Elisa Celis, Vijay Keswani, Nisheeth Vishnoi

Keywords Paper

Fairness, Equity, Justice, and Safety

0

0

0

0

14:52

06/12/2021

Adversarial Examples in Multi-Layer Random ReLU Networks

Peter Bartlett, Sebastien Bubeck, Yeshwanth Cherapanamjeri

Keywords Paper

theory, adversarial robustness and security

0

0

0

0

10:49

12/07/2020

Class-Weighted Classification: Trade-offs and Robust Approaches

Ziyu Xu, Chen Dan, Justin Khim, Pradeep Ravikumar

Keywords Paper

Learning Theory

0

0

0

0

11:49

06/12/2021

An Exact Characterization of the Generalization Error for the Gibbs Algorithm

Gholamali Aminian, Yuheng Bu, Laura Toni and
Miguel Rodrigues, Gregory Wornell

Keywords Paper

0

0

0

0

15:01

26/08/2020

Unconditional Coresets for Regularized Loss Minimization

Alireza Samadian, Kirk Pruhs, Benjamin Moseley and
Sungjin Im, Ryan Curtin

Keywords Paper

0

0

0

0

15:15

06/12/2021

Misspecified Gaussian Process Bandit Optimization

Ilija Bogunovic, Andreas Krause

Keywords Paper

optimization, bandits, kernel methods

0

0

0

0

11:41

02/02/2021

Uncertainty-Aware Policy Optimization: A Robust, Adaptive Trust Region Approach

James Queeney, Ioannis Ch. Paschalidis, Christos G. Cassandras

Keywords Paper

0

0

0

0

16:52

18/07/2021

Robust Unsupervised Learning via L-statistic Minimization

Andreas Maurer, Daniela Angela Parletta, Andrea Paudice, Massimiliano Pontil

Keywords Paper

Theory, Statistical Learning Theory

0

0

0

0

5:03

06/12/2020

Stochastic Normalizing Flows

Hao Wu, Jonas Köhler, Frank Noe

Keywords Paper

0

0

0

0

3:19

26/04/2020

Can gradient clipping mitigate label noise?

Aditya Krishna Menon, Ankit Singh Rawat, Sashank J. Reddi, Sanjiv Kumar

Keywords Paper

0

0

0

0

4:56

06/12/2021

Sampling with Trusthworthy Constraints: A Variational Gradient Framework

Xingchao Liu, Xin Tong, Qiang Liu

Keywords Paper

optimization, machine learning, fairness, interpretability

0

0

0

0

11:21

18/07/2021

Wasserstein Distributional Normalization For Robust Distributional Certification of Noisy Labeled Data

Sung Woo Park, Junseok Kwon

Keywords Paper

Deep Learning, Generative Models, Algorithms, Representation Learning; Optimization, Submodular Optimization, Probabilistic Methods, Robust statistics

0

0

0

0

5:20

06/12/2020

Distributionally Robust Local Non-parametric Conditional Estimation

Viet Anh Nguyen, Fan Zhang, Jose Blanchet and
Erick Delage, Yinyu Ye

Keywords Paper

0

0

0

0

3:22

26/04/2020

Conservative Uncertainty Estimation By Fitting Prior Networks

Kamil Ciosek, Vincent Fortuin, Ryota Tomioka and
Katja Hofmann, Richard Turner

Keywords Paper

uncertainty quantification, deep learning, Gaussian process, epistemic uncertainty, random network, prior, Bayesian inference

0

0

0

1

5:06

03/05/2021

Implicit Gradient Regularization

David Barrett, Benoit Dherin

Keywords Paper

regularization, theory, deep learning, implicit regularization, deep learning theory, theoretical issues in deep learning

0

0

0

0

4:55