M-FAC: Efficient Matrix-Free Approximations of Second-Order Information

Abstract: Efficiently approximating local curvature information of the loss function is a useful tool for the optimization and compression of deep neural networks. Yet, most existing methods to approximate second-order information have high computational or storage costs, limiting their practicality. In this work, we investigate matrix-free approaches for estimating Inverse-Hessian Vector Products (IHVPs) for the case when the Hessian can be approximated as a sum of rank-one matrices, as in the classic approximation of the Hessian by the empirical Fisher matrix. The first algorithm we propose is tailored towards network compression and can compute the IHVP for dimension $d$ given a fixed set of $m$ rank-one matrices using $O(dm^2)$ precomputation, $O(dm)$ cost for computing the IHVP and query cost $O(m)$ for computing any single element of the inverse Hessian approximation. The second algorithm targets an optimization setting, where we wish to compute the product between the inverse Hessian, estimated over a sliding window of optimization steps, and a given gradient direction. We give an algorithm with cost $O(dm + m^2)$ for computing the IHVP and $O(dm + m^3)$ for adding or removing any gradient from the sliding window. We show that both algorithms yield competitive results for network pruning and optimization, respectively, with significantly lower computational overhead relative to existing second-order methods.

12/07/2020

M-FAC: Efficient Matrix-Free Approximations of Second-Order Information

Elias Frantar, Eldar Kurtic, Dan Alistarh

Comments

Similar Papers

Improving the Sample and Communication Complexity for Decentralized Non-Convex Optimization: Joint Gradient Estimation and Tracking

Haoran Sun, Songtao Lu, Mingyi Hong

Keywords Abstract Paper

Optimization - Non-convex

Low-Rank Sinkhorn Factorization

Meyer Scetbon, Marco Cuturi, Gabriel Peyré

Keywords Abstract Paper

Algorithms, Optimal Transport

A dynamical view on optimization algorithms of overparameterized neural networks

Zhiqi Bu, Shiyun Xu, Kan Chen

Keywords Abstract Paper

Besov Function Approximation and Binary Classification on Low-Dimensional Manifolds Using Convolutional Residual Networks

Hao Liu, Minshuo Chen, Tuo Zhao, Wenjing Liao

Keywords Abstract Paper

Applications, Computer Vision, , Theory, Deep learning Theory

Computing shortest paths and diameter in the hybrid network model

Fabian Kuhn, Philipp Schneider

Keywords Abstract Paper

A Trace-restricted Kronecker-Factored Approximation to Natural Gradient

Kaixin Gao, Xiaolei Liu, Zhenghai Huang and Min Wang, Zidong Wang, Dachuan Xu, Fan Yu

Keywords Abstract Paper

Dimensionality Reduction for Wasserstein Barycenter

Zachary Izzo, Sandeep Silwal, Samson Zhou

Keywords Abstract Paper

machine learning

Best-First Beam Search

Clara Meister, Ryan Cotterell, Tim Vieira

Keywords Abstract Paper

nlp tasks, exact search, decoding, heuristic algorithm

The Heavy-Tail Phenomenon in SGD

Mert Gurbuzbalaban, Umut Simsekli, Lingjiong Zhu

Keywords Abstract Paper

Optimization, Stochastic Optimization

Analytic Study of Families of Spurious Minima in Two-Layer ReLU Neural Networks: A Tale of Symmetry II

Yossi Arjevani, Michael Field

Keywords Abstract Paper

theory, deep learning, optimization

Stochastic Optimization for Regularized Wasserstein Estimators

Marin Ballu, Quentin Berthet, Francis Bach

Keywords Abstract Paper

Optimization - Convex

Low-Rank Compression of Neural Nets: Learning the Rank of Each Layer

Yerlan Idelbayev, Miguel Á. Carreira-Perpiñán

Keywords Abstract Paper

low-rank compression, rank selection, optimization, discrete-continuous optimization

Multi-Proxy Wasserstein Classifier for Image Classification

Benlin Liu, Yongming Rao, Jiwen Lu and Jie Zhou, Cho-Jui Hsieh

Keywords Abstract Paper

SNODE: Spectral Discretization of Neural ODEs for System Identification

Alessio Quaglino, Marco Gallieri, Jonathan Masci, Jan Koutník

Keywords Abstract Paper

Recurrent neural networks, system identification, neural ODEs

Regularity as Regularization: Smooth and Strongly Convex Brenier Potentials in Optimal Transport

François-Pierre Paty, Alexandre d'Aspremont, Marco Cuturi

Keywords Abstract Paper

A Corrective View of Neural Networks: Representation, Memorization and Learning

Dheeraj M Nagaraj, Guy Bresler

Keywords Abstract Paper

Neural networks/deep learning, Learning with algebraic or combinatorial structure, Supervised learning

Beyond Lazy Training for Over-parameterized Tensor Decomposition

Xiang Wang, Chenwei Wu, Jason Lee and Tengyu Ma, Rong Ge

Keywords Abstract Paper

Scalable Optimal Transport in High Dimensions for Graph Distances, Embedding Alignment, and More

Johannes Klicpera, Marten Lienen, Stephan Günnemann

Keywords Abstract Paper

Algorithms, Optimal Transport

Breaking the Barrier of 2 for the Storage Allocation Problem

Tobias Mömke and Andreas Wiese

Keywords Abstract Paper

Approximation Algorithms, Resource Allocation, Dynamic Programming

Optimal Sketching for Trace Estimation

Shuli Jiang, Hai Pham, David Woodruff, Richard Zhang

Keywords Abstract Paper

machine learning

A Hybrid Stochastic Gradient Hamiltonian Monte Carlo Method

Chao Zhang, Zhijian Li, Zebang Shen and Jiahao Xie, Hui Qian

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Kaixin Gao, Xiaolei Liu, Zhenghai Huang and
Min Wang, Zidong Wang, Dachuan Xu, Fan Yu

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Benlin Liu, Yongming Rao, Jiwen Lu and
Jie Zhou, Cho-Jui Hsieh

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Xiang Wang, Chenwei Wu, Jason Lee and
Tengyu Ma, Rong Ge

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Chao Zhang, Zhijian Li, Zebang Shen and
Jiahao Xie, Hui Qian

Keywords Paper

Keywords Paper

Jiaxian Guo, Mingming Gong, Tongliang Liu and
Kun Zhang, Dacheng Tao

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Shuli Jiang, Dongyu Li, Irene Mengze Li and
Arvind Mahankali, David Woodruff

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

T. Anderson Keller, Jorn Peters, Priyank Jaini and
Emiel Hoogeboom, Patrick Forré, Max Welling

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper