Making L-BFGS Work with Industrial-Strength Nets

07/09/2020

Making L-BFGS Work with Industrial-Strength Nets

Abhay Yadav, Tom Goldstein, David Jacobs

Keywords: deep network training, efficient training, second-order optimization

Abstract Paper Similar Papers

Abstract: L-BFGS has been one of the most popular methods for convex optimization, but good performance by L-BFGS in deep learning has been elusive. Recent work has modified L-BFGS for deep networks for classification tasks and been able to show performance competitive with SGD and Adam (the most popular current algorithms) when batch normalization is not used. However, this work cannot be applied with batch normalization. Since batch normalization is a defacto standard and important to good performance in deep networks, this still limits the use of L-BFGS. In this paper, we address this issue. Our proposed method can be used as a drop-in replacement without changing existing code. The proposed method performs consistently better than Adam and existing L-BFGS approaches, and comparable to carefully tuned SGD. We show results on three datasets, CIFAR-10, CIFAR-100, and STL-10 using three different popular deep networks ResNet, DenseNet and Wide ResNet. This work marks another significant step towards making L-BFGS competitive in the deep learning community.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at BMVC 2020 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

06/12/2020

Bias no more: high-probability data-dependent regret bounds for adversarial bandits and MDPs

Chung-Wei Lee, Haipeng Luo, Chen-Yu Wei, Mengxiao Zhang

Keywords Paper

0

0

0

0

3:16

04/08/2021

Cautiously Optimistic Policy Optimization and Exploration with Linear Function Approximation

Andrea Zanette, Ching-An Cheng, Alekh Agarwal

Keywords Paper

0

0

0

0

15:11

03/05/2021

Neural Pruning via Growing Regularization

Huan Wang, Can Qin, Yulun Zhang, Yun Fu

Keywords Paper

deep neural network pruning, regularization, Hessian matrix, model compression

0

0

0

0

6:15

03/05/2021

Deconstructing the Regularization of BatchNorm

Yann Dauphin, Ekin Cubuk

Keywords Paper

understanding neural networks, batch normalization, regularization, deep learning

0

0

0

0

5:09

06/12/2021

A Faster Decentralized Algorithm for Nonconvex Minimax Problems

Wenhan Xian, Feihu Huang, Yanfu Zhang, Heng Huang

Keywords Paper

optimization, machine learning, adversarial robustness and security

0

0

0

0

13:59

18/07/2021

Active Slices for Sliced Stein Discrepancy

Wenbo Gong, Kaibo Zhang, Yingzhen Li, Jose Miguel Hernandez-Lobato

Keywords Paper

, Deep Learning, Efficient Inference Methods, Algorithms, Kernel Methods

0

0

0

0

5:47

06/12/2021

RETRIEVE: Coreset Selection for Efficient and Robust Semi-Supervised Learning

Krishnateja Killamsetty, Xujiang Zhao, Feng Chen, Rishabh Iyer

Keywords Paper

optimization, semi-supervised learning

0

0

0

0

13:59

06/12/2021

Laplace Redux - Effortless Bayesian Deep Learning

Erik Daxberger, Agustinus Kristiadi, Alexander Immer and
Runa Eschenhagen, Matthias Bauer, Philipp Hennig

Keywords Paper

deep learning

0

0

0

0

11:15

06/12/2021

Scallop: From Probabilistic Deductive Databases to Scalable Differentiable Reasoning

Jiani Huang, Ziyang Li, Binghong Chen and
Karan Samel, Mayur Naik, Le Song, Xujie Si

Keywords Paper

deep learning, transformers, vision

0

0

0

0

15:02

06/12/2021

Efficient First-Order Contextual Bandits: Prediction, Allocation, and Triangular Discrimination

Dylan J Foster, Akshay Krishnamurthy

Keywords Paper

theory, reinforcement learning and planning, bandits, online learning

0

0

0

0

19:34

06/12/2021

Better Algorithms for Individually Fair $k$-Clustering

Maryam Negahbani, Deeparnab Chakrabarty

Keywords Paper

theory, self-supervised learning, clustering, fairness

0

0

0

0

14:02

06/12/2021

Boost Neural Networks by Checkpoints

Feng Wang, Guoyizhe Wei, Qiao Liu and
Jinxiang Ou, xian wei, Hairong Lv

Keywords Paper

deep learning

1

0

0

0

4:45

02/02/2021

A Sharp Leap from Quantified Boolean Formula to Stochastic Boolean Satisfiability Solving

Pei-Wei Chen, Yu-Ching Huang, Jie-Hong R. Jiang

Keywords Paper

0

0

0

0

18:58

04/07/2020

Perturbation Based Learning for Structured NLP tasks with Application to Dependency Parsing

Amichay Doitch, Ram Yazdi, Tamir Hazan, Roi Reichart

Keywords Paper

Structured tasks, Dependency Parsing, NLP, sampling

0

0

0

0

10:53

13/04/2021

Revisiting projection-free online learning: The strongly convex case

Ben Kretzu, Dan Garber

Keywords Paper

0

0

0

0

2:56

14/09/2020

Squeezing Correlated Neurons for Resource-Efficient Deep Neural Networks

Elbruz Ozen, Alex Orailoglu

Keywords Paper

deep learning, information redundancy, pruning

0

0

0

0

14:48

06/12/2020

POMO: Policy Optimization with Multiple Optima for Reinforcement Learning

Yeong-Dae Kwon, Jinho Choo, Byoungjip Kim and
Iljoo Yoon, Youngjune Gwon, Seungjai Min

Keywords Paper

0

0

0

0

3:19

06/12/2020

FixMatch: Simplifying Semi-Supervised Learning with Consistency and Confidence

Kihyuk Sohn, David Berthelot, Nicholas Carlini and
Zizhao Zhang, Han Zhang, Colin A Raffel, Dogus Cubuk, Alexey Kurakin, Chun-Liang Li

Keywords Paper

0

0

0

0

3:17

02/02/2021

Fast and Scalable Adversarial Training of Kernel SVM via Doubly Stochastic Gradients

Huimin Wu, Zhengmian Hu, Bin Gu

Keywords Paper

0

0

0

0

14:04

12/07/2020

Safe Deep Semi-Supervised Learning for Unseen-Class Unlabeled Data

Lan-Zhe Guo, Zhen-Yu Zhang, Yuan Jiang and
Yufeng Li, Zhi-Hua Zhou

Keywords Paper

Unsupervised and Semi-Supervised Learning

0

0

0

0

13:58

06/12/2021

Uniform-PAC Bounds for Reinforcement Learning with Linear Function Approximation

jiafan he, Dongruo Zhou, Quanquan Gu

Keywords Paper

theory, reinforcement learning and planning

0

0

0

0

14:29

06/12/2021

Regularized Softmax Deep Multi-Agent Q-Learning

Ling Pan, Tabish Rashid, Bei Peng and
Longbo Huang, Shimon Whiteson

Keywords Paper

reinforcement learning and planning

0

0

0

0

10:58

14/06/2020

Recursive Least-Squares Estimator-Aided Online Learning for Visual Tracking

Jin Gao, Weiming Hu, Yan Lu

Keywords Paper

online learning, visual tracking, continual learning, recursive least-squares estimation, deep learning, memory retention, recursive learning, mini-batch sgd, normal equation, mlp layer

0

0

0

0

5:01

02/02/2021

Communication-Efficient Frank-Wolfe Algorithm for Nonconvex Decentralized Distributed Learning

Wenhan Xian, Feihu Huang, Heng Huang

Keywords Paper

0

0

0

0

16:02

06/12/2021

Adaptive Proximal Gradient Methods for Structured Neural Networks

Jihun Yun, Aurelie Lozano, Eunho Yang

Keywords Paper

deep learning, optimization, machine learning

0

0

0

0

10:46

06/12/2021

EF21: A New, Simpler, Theoretically Better, and Practically Faster Error Feedback

Peter Richtarik, Igor Sokolov, Ilyas Fatkhullin

Keywords Paper

optimization, machine learning

0

0

0

0

19:56

06/12/2020

Enabling certification of verification-agnostic networks via memory-efficient semidefinite programming

Sumanth Dathathri, Krishnamurthy Dvijotham, Alexey Kurakin and
Aditi Raghunathan, Jonathan Uesato, Rudy Bunel, Shreya Shankar, Jacob Steinhardt, Ian Goodfellow, Percy Liang, Pushmeet Kohli

Keywords Paper

0

0

0

0

3:23

14/06/2020

Generating Accurate Pseudo-Labels in Semi-Supervised Learning and Avoiding Overconfident Predictions via Hermite Polynomial Activations

Vishnu Suresh Lokhande, Songwong Tasneeyapant, Abhay Venkatesh and
Sathya N. Ravi, Vikas Singh

Keywords Paper

hermite polynomials, activation functions, relu, pseudo-labels, semi-supervised learning, faster convergence, noise tolerance, smoothness, polynomial networks, resnet.

0

0

0

0

1:01

18/07/2021

PC-MLP: Model-based Reinforcement Learning with Policy Cover Guided Exploration

Yuda Song, Wen Sun

Keywords Paper

Reinforcement Learning and Planning

0

0

0

0

5:13

04/08/2021

Adversarially Robust Low Dimensional Representations

Pranjal Awasthi, Vaggos Chatziafratis, Xue Chen, Aravindan Vijayaraghavan

Keywords Paper

0

0

0

0

20:19

06/12/2021

Closing the Gap: Tighter Analysis of Alternating Stochastic Gradient Methods for Bilevel Problems

Tianyi Chen, Yuejiao Sun, Wotao Yin

Keywords Paper

theory, optimization, reinforcement learning and planning, machine learning

0

0

0

0

7:27

13/04/2021

Exponential convergence rates of classification errors on learning with SGD and random features

Shingo Yashima, Atsushi Nitanda, Taiji Suzuki

Keywords Paper

0

0

0

0

2:58

06/12/2020

Towards Better Generalization of Adaptive Gradient Methods

Yingxue Zhou, Belhal Karimi, Jinxing Yu and
Zhiqiang Xu, Ping Li

Keywords Paper

0

0

0

0

3:21

12/07/2020

Training Binary Neural Networks using the Bayesian Learning Rule

Xiangming Meng, Roman Bachmann, Mohammad Emtiyaz Khan

Keywords Paper

Deep Learning - General

0

0

0

0

10:27

02/02/2021

Memory and Computation-Efficient Kernel SVM via Binary Embedding and Ternary Model Coefficients

Zijian Lei, Liang Lan

Keywords Paper

0

0

0

0

12:29

06/12/2021

Sample-Efficient Reinforcement Learning Is Feasible for Linearly Realizable MDPs with Limited Revisiting

Gen Li, Yuxin Chen, Yuejie Chi and
Yuantao Gu, Yuting Wei

Keywords Paper

theory, reinforcement learning and planning, generative model

0

0

0

0

15:34

14/06/2020

Adversarial Feature Hallucination Networks for Few-Shot Learning

Kai Li, Yulun Zhang, Kunpeng Li, Yun Fu

Keywords Paper

few-shot learning, data augmentation, feature hallucination, generative adversarial networks

0

0

0

0

1:01

06/12/2020

Convergence of Meta-Learning with Task-Specific Adaptation over Partial Parameters

Kaiyi Ji, Jason Lee, Yingbin Liang, H. Vincent Poor

Keywords Paper

0

0

0

0

3:11

14/06/2020

Fast Sparse ConvNets

Erich Elsen, Marat Dukhan, Trevor Gale, Karen Simonyan

Keywords Paper

vision, convolutional networks, cnns, efficient inference, sparsity, mobile, edge, tensorflow, xnnpack

0

0

0

0

1:01

02/02/2021

HR-Depth: High Resolution Self-Supervised Monocular Depth Estimation

Xiaoyang Lyu, Liang Liu, Mengmeng Wang and
Xin Kong, Lina Liu, Yong Liu, Xinxin Chen, Yi Yuan

Keywords Paper

0

0

0

0

12:10