Entropic gradient descent algorithms and wide flat minima

03/05/2021

Entropic gradient descent algorithms and wide flat minima

Fabrizio Pittorino, Carlo Lucibello, Christoph Feinauer, Gabriele Perugini, Carlo Baldassi, Elizaveta Demyanenko, Riccardo Zecchina

Keywords: flat minima, belief-propagation, statistical physics, entropic algorithms

Abstract Paper Similar Papers

Abstract: The properties of flat minima in the empirical risk landscape of neural networks have been debated for some time. Increasing evidence suggests they possess better generalization capabilities with respect to sharp ones. In this work we first discuss the relationship between alternative measures of flatness: The local entropy, which is useful for analysis and algorithm development, and the local energy, which is easier to compute and was shown empirically in extensive tests on state-of-the-art networks to be the best predictor of generalization capabilities. We show semi-analytically in simple controlled scenarios that these two measures correlate strongly with each other and with generalization. Then, we extend the analysis to the deep learning scenario by extensive numerical validations. We study two algorithms, Entropy-SGD and Replicated-SGD, that explicitly include the local entropy in the optimization objective. We devise a training schedule by which we consistently find flatter minima (using both flatness measures), and improve the generalization error for common architectures (e.g. ResNet, EfficientNet).

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at ICLR 2021 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

06/12/2021

Fractal Structure and Generalization Properties of Stochastic Optimization Algorithms

Alexander Camuto, George Deligiannidis, Murat Erdogdu and
Mert Gurbuzbalaban, Umut Simsekli, Lingjiong Zhu

Keywords Paper

theory, deep learning, optimization

0

0

0

0

14:36

06/12/2020

A Contour Stochastic Gradient Langevin Dynamics Algorithm for Simulations of Multi-modal Distributions

Wei Deng, Guang Lin, Faming Liang

Keywords Paper

0

0

0

0

3:26

06/12/2020

Adaptive Discretization for Model-Based Reinforcement Learning

Sean Sinclair, Tianyu Wang, Gauri Jain and
Sid Banerjee, Christina Yu

Keywords Paper

0

0

0

0

3:12

18/07/2021

Sparsifying Networks via Subdifferential Inclusion

Sagar Verma, Jean-Christophe Pesquet

Keywords Paper

Optimization, Convex Optimization

0

0

0

0

5:10

06/12/2021

Efficient Learning of Discrete-Continuous Computation Graphs

David Friede, Mathias Niepert

Keywords Paper

deep learning, reinforcement learning and planning, graph learning

0

0

0

0

12:31

13/04/2021

Alternating direction method of multipliers for quantization

Tianjian Huang, Prajwal Singhania, Maziar Sanjabi and
Pabitra Mitra, Meisam Razaviyayn

Keywords Paper

1

0

0

0

2:43

13/04/2021

Learning prediction intervals for regression: Generalization and calibration

Haoxian Chen, Ziyi Huang, Henry Lam and
Huajie Qian, Haofeng Zhang

Keywords Paper

0

0

0

0

3:26

12/07/2020

Learning Similarity Metrics for Numerical Simulations

Georg Kohl, Kiwon Um, Nils Thuerey

Keywords Paper

General Machine Learning Techniques

0

0

0

0

15:16

18/07/2021

Training Data Subset Selection for Regression with Controlled Generalization Error

Durga S, Rishabh Iyer, Ganesh Ramakrishnan, Abir De

Keywords Paper

, Algorithms, Online Learning, Algorithms, Supervised Learning

0

0

0

0

4:15

06/12/2021

Meta-Learning for Relative Density-Ratio Estimation

Atsutoshi Kumagai, Tomoharu Iwata, Yasuhiro Fujiwara

Keywords Paper

deep learning, machine learning, meta learning

0

0

0

0

8:56

12/07/2020

Learning Deep Kernels for Non-Parametric Two-Sample Tests

Feng Liu, Wenkai Xu, Jie Lu and
Guangquan Zhang, Arthur Gretton, D.J. Sutherland

Keywords Paper

General Machine Learning Techniques

0

0

0

0

15:00

06/12/2021

Algorithmic stability and generalization of an unsupervised feature selection algorithm

xinxing wu, Qiang Cheng

Keywords Paper

deep learning

0

0

0

0

12:41

06/12/2021

Robust Implicit Networks via Non-Euclidean Contractions

Saber Jafarpour, Alexander Davydov, Anton Proskurnikov, Francesco Bullo

Keywords Paper

theory, deep learning, machine learning, robustness, vision

0

0

0

0

14:59

06/12/2021

Leveraging Recursive Gumbel-Max Trick for Approximate Inference in Combinatorial Spaces

Kirill Struminsky, Artyom Gadetsky, Denis Rakitin and
Danil Karpushkin, Dmitry Vetrov

Keywords Paper

deep learning, optimization

0

0

0

0

14:26

12/07/2020

Sequential Transfer in Reinforcement Learning with a Generative Model

Andrea Tirinzoni, Riccardo Poiani, Marcello Restelli

Keywords Paper

Reinforcement Learning - General

0

0

0

0

10:54

12/07/2020

Estimation of Bounds on Potential Outcomes For Decision Making

Maggie Makar, Fredrik Johansson, John Guttag, David Sontag

Keywords Paper

Causality

0

0

0

0

13:12

12/07/2020

Spectrum Dependent Learning Curves in Kernel Regression and Wide Neural Networks

Blake Bordelon, Abdulkadir Canatar, Cengiz Pehlevan

Keywords Paper

General Machine Learning Techniques

0

0

0

0

14:55

06/12/2021

Robust and Decomposable Average Precision for Image Retrieval

Elias Ramzi, Nicolas THOME, Clément Rambour and
Nicolas Audebert, Xavier Bitot

Keywords Paper

deep learning

0

0

0

0

8:13

06/12/2021

Efficient Training of Retrieval Models using Negative Cache

Erik Lindgren, Sashank Reddi, Ruiqi Guo, Sanjiv Kumar

Keywords Paper

deep learning, machine learning

0

0

0

0

10:41

13/04/2021

DebiNet: Debiasing linear models with nonlinear overparameterized neural networks

Shiyun Xu, Zhiqi Bu

Keywords Paper

0

0

0

0

2:56

12/07/2020

Extrapolation for Large-batch Training in Deep Learning

Tao LIN, Lingjing Kong, Sebastian Stich, Martin Jaggi

Keywords Paper

Deep Learning - Algorithms

0

0

0

0

13:21

06/12/2020

Delta-STN: Efficient Bilevel Optimization for Neural Networks using Structured Response Jacobians

Juhan Bae, Roger Grosse

Keywords Paper

0

0

0

0

3:20

03/05/2021

Dataset Meta-Learning from Kernel Ridge-Regression

Timothy Nguyen, Zhourong Chen, Jaehoon Lee

Keywords Paper

dataset corruption, infinite-width networks, neural kernels, kernel-ridge regression, dataset compression, dataset distillation, meta-learning

0

0

0

0

4:59

06/12/2020

The Advantage of Conditional Meta-Learning for Biased Regularization and Fine Tuning

Giulia Denevi, Massimiliano Pontil, Carlo Ciliberto

Keywords Paper

0

0

0

0

3:24

12/07/2020

Composable Sketches for Functions of Frequencies: Beyond the Worst Case

Edith Cohen, Ofir Geri, Rasmus Pagh

Keywords Paper

Optimization - Large Scale, Parallel and Distributed

0

0

0

0

14:51

26/04/2020

Beyond Linearization: On Quadratic and Higher-Order Approximation of Wide Neural Networks

Yu Bai, Jason D. Lee

Keywords Paper

Neural Tangent Kernels, over-parametrized neural networks, deep learning theory

0

0

0

0

5:25

06/12/2021

$(\textrm{Implicit})^2$: Implicit Layers for Implicit Representations

Zhichun Huang, Shaojie Bai, J. Zico Kolter

Keywords Paper

deep learning, representation learning

1

0

0

1

12:23

18/07/2021

Implicit rate-constrained optimization of non-decomposable objectives

Abhishek Kumar, Harikrishna Narasimhan, Andrew Cotter

Keywords Paper

Algorithms, Supervised Learning

0

0

0

0

3:48

03/05/2021

Modeling the Second Player in Distributionally Robust Optimization

Paul Michel, Tatsunori Hashimoto, Graham Neubig

Keywords Paper

adversarial learning, deep learning, robustness, distributionally robust optimization

0

0

0

0

5:09

06/12/2020

Lower Bounds and Optimal Algorithms for Personalized Federated Learning

Filip Hanzely, Slavomír Hanzely, Samuel Horváth, Peter Richtarik

Keywords Paper

, Theory -> Learning Theory

0

0

0

0

3:24

18/07/2021

Robust Unsupervised Learning via L-statistic Minimization

Andreas Maurer, Daniela Angela Parletta, Andrea Paudice, Massimiliano Pontil

Keywords Paper

Theory, Statistical Learning Theory

0

0

0

0

5:03

18/07/2021

Monotonic Robust Policy Optimization with Model Discrepancy

yuankun jiang, Chenglin Li, Wenrui Dai and
Junni Zou, Hongkai Xiong

Keywords Paper

Reinforcement Learning and Planning

0

0

0

0

5:17

13/04/2021

Adversarially robust estimate and risk analysis in linear regression

Yue Xing, Ruizhi Zhang, Guang Cheng

Keywords Paper

0

0

0

0

3:03

12/07/2020

Why Are Learned Indexes So Effective?

Paolo Ferragina, Fabrizio Lillo, Giorgio Vinciguerra

Keywords Paper

Applications - Other

0

0

0

0

13:22

18/07/2021

Meta-learning Hyperparameter Performance Prediction with Neural Processes

Ying WEI, Peilin Zhao, Junzhou Huang

Keywords Paper

Algorithms, Supervised Learning

0

0

0

0

5:07

06/12/2020

Learning Discrete Energy-based Models via Auxiliary-variable Local Exploration

Hanjun Dai, Rishabh Singh, Bo Dai and
Charles Sutton, Dale Schuurmans

Keywords Paper

0

0

0

0

3:23

26/08/2020

Ordered SGD: A New Stochastic Optimization Framework for Empirical Risk Minimization

Kenji Kawaguchi, Haihao Lu

Keywords Paper

0

0

0

0

14:10

06/12/2021

Meta Two-Sample Testing: Learning Kernels for Testing with Limited Data

Feng Liu, Wenkai Xu, Jie Lu, [deadname] J Sutherland

Keywords Paper

meta learning, kernel methods

0

0

0

0

14:31

03/05/2021

Theoretical bounds on estimation error for meta-learning

James Lucas, Mengye Ren, Irene Raissa KAMENI KAMENI and
Toniann Pitassi, Richard Zemel

Keywords Paper

meta learning, minimax risk, few-shot, lower bounds, learning theory

0

0

0

0

4:46

13/04/2021

Learning with gradient descent and weakly convex losses

Dominic Richards, Mike Rabbat

Keywords Paper

0

0

0

0

3:20