Variance Reduction With Sparse Gradients

26/04/2020

Variance Reduction With Sparse Gradients

Melih Elibol, Lihua Lei, Michael I. Jordan

Keywords: optimization, variance reduction, machine learning, deep neural networks

Abstract Paper Code Similar Papers

Abstract: Variance reduction methods such as SVRG and SpiderBoost use a mixture of large and small batch gradients to reduce the variance of stochastic gradients. Compared to SGD, these methods require at least double the number of operations per update to model parameters. To reduce the computational cost of these methods, we introduce a new sparsity operator: The random-top-k operator. Our operator reduces computational complexity by estimating gradient sparsity exhibited in a variety of applications by combining the top-k operator and the randomized coordinate descent operator. With this operator, large batch gradients offer an extra benefit beyond variance reduction: A reliable estimate of gradient sparsity. Theoretically, our algorithm is at least as good as the best algorithm (SpiderBoost), and further excels in performance whenever the random-top-k operator captures gradient sparsity. Empirically, our algorithm consistently outperforms SpiderBoost using various models on various tasks including image classification, natural language processing, and sparse matrix factorization. We also provide empirical evidence to support the intuition behind our algorithm via a simple gradient entropy computation, which serves to quantify gradient sparsity at every iteration.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at ICLR 2020 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

06/12/2020

Random Reshuffling: Simple Analysis with Vast Improvements

Konstantin Mishchenko, Ahmed Khaled Ragab Bayoumi, Peter Richtarik

Keywords Paper

Reinforcement Learning and Planning -> Planning; Reinforcement Learning and Planning -> Reinforcement Learning, Reinforcement Learning and Planning

0

0

0

0

3:08

06/12/2021

Robust Regression Revisited: Acceleration and Improved Estimation Rates

Arun Jambulapati, Jerry Li, Tselil Schramm, Kevin Tian

Keywords Paper

theory, optimization

0

0

0

0

14:22

09/07/2020

A Fast Spectral Algorithm for Mean Estimation with Sub-Gaussian Rates

Zhixian Lei, Kyle Luh, Prayaag Venkat, Fred Zhang

Keywords Paper

High-dimensional statistics, Adversarial learning and robustness

0

0

0

0

15:00

06/12/2020

AI Feynman 2.0: Pareto-optimal symbolic regression exploiting graph modularity

Silviu-Marian Udrescu, Andrew Tan, Jiahai Feng and
Orisvaldo Neto, Tailin Wu, Max Tegmark

Keywords Paper

0

0

0

0

3:13

06/12/2021

Fast Doubly-Adaptive MCMC to Estimate the Gibbs Partition Function with Weak Mixing Time Bounds

Shahrzad Haddadan, Yue Zhuang, Cyrus Cousins, Eli Upfal

Keywords Paper

generative model, graph learning

0

0

0

0

14:01

06/12/2020

Adaptive Discretization for Model-Based Reinforcement Learning

Sean Sinclair, Tianyu Wang, Gauri Jain and
Sid Banerjee, Christina Yu

Keywords Paper

0

0

0

0

3:12

06/12/2020

Robust and Heavy-Tailed Mean Estimation Made Simple, via Regret Minimization

Sam Hopkins, Jerry Li, Fred Zhang

Keywords Paper

0

0

0

0

3:34

26/08/2020

High Dimensional Robust Sparse Regression

Liu Liu, Yanyao Shen, Tianyang Li, Constantine Caramanis

Keywords Paper

0

0

0

0

14:58

06/12/2020

Efficient semidefinite-programming-based inference for binary and multi-class MRFs

Chirag Pabbaraju, Po-Wei Wang, J. Zico Kolter

Keywords Paper

0

0

0

0

3:19

14/06/2020

A Graduated Filter Method for Large Scale Robust Estimation

Huu Le, Christopher Zach

Keywords Paper

robust fitting, bundle adjustment, non-convex, poor local minima, non-linear least squares, graduated non-convexity.

0

0

0

0

1:01

06/12/2021

Entropy-based adaptive Hamiltonian Monte Carlo

Marcel Hirt, Michalis Titsias, Petros Dellaportas

Keywords Paper

generative model

0

0

0

0

5:40

12/07/2020

History-Gradient Aided Batch Size Adaptation for Variance Reduced Algorithms

Kaiyi Ji, Zhe Wang, Bowen Weng and
Yi Zhou, Wei Zhang, Yingbin LIANG

Keywords Paper

Optimization - Non-convex

0

0

0

0

14:41

06/12/2021

Dual Parameterization of Sparse Variational Gaussian Processes

Vincent ADAM, Paul Chang, Mohammad Emtiyaz Khan, Arno Solin

Keywords Paper

optimization, generative model, kernel methods

0

0

0

0

13:29

06/12/2020

A Contour Stochastic Gradient Langevin Dynamics Algorithm for Simulations of Multi-modal Distributions

Wei Deng, Guang Lin, Faming Liang

Keywords Paper

0

0

0

0

3:26

18/07/2021

On Robust Mean Estimation under Coordinate-level Corruption

Zifan Liu, Jongho Park, Theo Rekatsinas, Christos Tzamos

Keywords Paper

Theory, Computational Learning Theory

0

0

0

0

5:18

18/07/2021

Self Normalizing Flows

T. Anderson Keller, Jorn Peters, Priyank Jaini and
Emiel Hoogeboom, Patrick Forré, Max Welling

Keywords Paper

Deep Learning, Generative Models

0

1

1

0

4:24

03/05/2021

Rao-Blackwellizing the Straight-Through Gumbel-Softmax Gradient Estimator

Max B Paulus, Chris Maddison, Andreas Krause

Keywords Paper

softmax, gumbel, rao-blackwell, rao, straightthrough, straight-through, gumbel-softmax

0

0

0

0

13:25

06/12/2021

Boost Neural Networks by Checkpoints

Feng Wang, Guoyizhe Wei, Qiao Liu and
Jinxiang Ou, xian wei, Hairong Lv

Keywords Paper

deep learning

1

0

0

0

4:45

14/09/2020

Model-based Clustering with HDBSCAN*

Michael Strobl, Joerg Sander, Ricardo Campello, Osmar Zaiane

Keywords Paper

hierarchical clustering, expectation maximization, model selection

0

0

0

0

15:31

18/07/2021

Randomized Algorithms for Submodular Function Maximization with a $k$-System Constraint

Shuang Cui, Kai Han, Tianshuai Zhu and
Jing Tang, Benwei Wu, He Huang

Keywords Paper

Optimization

0

0

0

0

4:48

26/08/2020

A Unified Statistically Efficient Estimation Framework for Unnormalized Models

Masatoshi Uehara, Takafumi Kanamori, Takashi Takenouchi, Takeru Matsuda

Keywords Paper

0

0

0

0

13:58

06/12/2020

Deep Diffusion-Invariant Wasserstein Distributional Classification

Sung Woo Park, Dong Wook Shu, Junseok Kwon

Keywords Paper

0

0

0

0

3:06

06/12/2020

An Empirical Process Approach to the Union Bound: Practical Algorithms for Combinatorial and Linear Bandits

Julian Katz-Samuels, Lalit Jain, zohar karnin, Kevin Jamieson

Keywords Paper

0

0

0

0

3:20

12/07/2020

Tensor denoising and completion based on ordinal observations

Chanwoo Lee, Miaoyan Wang

Keywords Paper

General Machine Learning Techniques

0

0

0

0

12:44

03/05/2021

Greedy-GQ with Variance Reduction: Finite-time Analysis and Improved Complexity

Shaocong Ma, Ziyi Chen, Yi Zhou, Shaofeng Zou

Keywords Paper

Machine Learning, Reinforcement Learning, Optimization

0

0

0

0

2:59

18/07/2021

Active Slices for Sliced Stein Discrepancy

Wenbo Gong, Kaibo Zhang, Yingzhen Li, Jose Miguel Hernandez-Lobato

Keywords Paper

, Deep Learning, Efficient Inference Methods, Algorithms, Kernel Methods

0

0

0

0

5:47

12/07/2020

Accelerated Message Passing for Entropy-Regularized MAP Inference

Jonathan Lee, Aldo Pacchiano, Peter Bartlett, Michael Jordan

Keywords Paper

Probabilistic Inference - Models and Probabilistic Programming

0

0

0

0

14:57

06/12/2020

Faster Wasserstein Distance Estimation with the Sinkhorn Divergence

Lénaïc Chizat, Pierre Roussillon, Flavien Léger and
François-Xavier Vialard, Gabriel Peyré

Keywords Paper

0

0

1

1

3:21

06/12/2020

Diversity can be Transferred: Output Diversification for White- and Black-box Attacks

Yusuke Tashiro, Yang Song, Stefano Ermon

Keywords Paper

0

0

0

0

3:19

06/12/2020

Stochastic Gradient Descent in Correlated Settings: A Study on Gaussian Processes

Hao Chen, Lili Zheng, Raed AL Kontar, Garvesh Raskutti

Keywords Paper

0

0

0

0

3:12

06/12/2020

Markovian Score Climbing: Variational Inference with KL(p||q)

Christian Naesseth, Fredrik Lindsten, David Blei

Keywords Paper

0

0

0

0

2:30

19/08/2021

Improved Guarantees and a Multiple-descent Curve for Column Subset Selection and the Nystrom Method (Extended Abstract)

Michał Dereziński, Rajiv Khanna, Michael W. Mahoney

Keywords Paper

Machine Learning, Dimensionality Reduction, Explainable/Interpretable Machine Learning, Kernel Methods, Unsupervised Learning

0

0

0

0

13:48

26/08/2020

Greed Meets Sparsity: Understanding and Improving Greedy Coordinate Descent for Sparse Optimization

Huang Fang, Zhenan Fan, Yifan Sun, Michael Friedlander

Keywords Paper

0

0

0

0

13:41

12/07/2020

StochasticRank: Global Optimization of Scale-Free Discrete Functions

Aleksei Ustimenko, Liudmila Prokhorenkova

Keywords Paper

Supervised Learning

0

0

0

0

13:06

03/05/2021

Distance-Based Regularisation of Deep Networks for Fine-Tuning

Henry Gouk, Timothy Hospedales, massimiliano pontil

Keywords Paper

Statistical Learning Theory, Transfer Learning, Deep Learning

0

0

0

0

4:57

14/09/2020

Orthogonal Mixture of Hidden Markov Models

Negar Safinianaini, Camila P. E. de Souza, Henrik Boström, Jens Lagergren

Keywords Paper

hidden markov models, mixture models, mixture of hidden markov models, expectation maximization, orthogonality, regularization, penalty

0

0

0

0

14:43

14/06/2020

OctSqueeze: Octree-Structured Entropy Model for LiDAR Compression

Lila Huang, Shenlong Wang, Kelvin Wong and
Jerry Liu, Raquel Urtasun

Keywords Paper

point cloud compression, deep compression, 3d deep learning, geometric deep learning, autonomous driving, lidar

0

0

0

0

5:00

06/12/2020

GCN meets GPU: Decoupling “When to Sample” from “How to Sample”

Morteza Ramezani, Weilin Cong, Mehrdad Mahdavi and
Anand Sivasubramaniam, Mahmut Kandemir

Keywords Paper

0

0

0

0

3:24

18/11/2020

Localizing and amortizing: Efficient inference for gaussian processes

Linfeng Liu, Liping Liu

Keywords Paper

0

0

0

0

6:58

13/04/2021

Direct loss minimization for sparse gaussian processes

Yadi Wei, Rishit Sheth, Roni Khardon

Keywords Paper

0

0

0

0

3:24