Top-KAST: Top-K Always Sparse Training

06/12/2020

Top-KAST: Top-K Always Sparse Training

Sid Jayakumar, Razvan Pascanu, Jack Rae, Simon Osindero, Erich Elsen

Keywords:

Abstract Paper Similar Papers

Abstract: Sparse neural networks are becoming increasingly important as the field seeks to improve the performance of existing models by scaling them up, while simultaneously trying to reduce power consumption and computational footprint. Unfortunately, most existing methods for inducing performant sparse models still entail the instantiation of dense parameters, or dense gradients in the backward-pass, during training. For very large models this requirement can be prohibitive. In this work we propose Top-KAST, a method that preserves constant sparsity throughout training (in both the forward and backward-passes). We demonstrate the efficacy of our approach by showing that it performs comparably to or better than previous works when training models on the established ImageNet benchmark, whilst fully maintaining sparsity. In addition to our ImageNet results, we also demonstrate our approach in the domain of language modeling where the current best performing architectures tend to have tens of billions of parameters and scaling up does not yet seem to have saturated performance. Sparse versions of these architectures can be run with significantly fewer resources, making them more widely accessible and applicable. Furthermore, in addition to being effective, our approach is straightforward and can easily be implemented in a wide range of existing machine learning frameworks with only a few additional lines of code. We therefore hope that our contribution will help enable the broader community to explore the potential held by massive models, without incurring massive computational cost.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at NeurIPS 2020 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

06/12/2021

Powerpropagation: A sparsity inducing weight reparameterisation

Jonathan Schwarz, Siddhant M Jayakumar, Razvan Pascanu and
Peter E Latham, Yee Teh

Keywords Paper

deep learning, optimization, continual learning

0

0

0

1

9:08

26/04/2020

MACER: Attack-free and Scalable Robust Training via Maximizing Certified Radius

Runtian Zhai, Chen Dan, Di He and
Huan Zhang, Boqing Gong, Pradeep Ravikumar, Cho-Jui Hsieh, Liwei Wang

Keywords Paper

Adversarial Robustness, Provable Adversarial Defense, Randomized Smoothing, Robustness Certification

0

0

0

0

5:10

12/07/2020

Inducing and Exploiting Activation Sparsity for Fast Inference on Deep Neural Networks

Mark Kurtz, Justin Kopinsky, Rati Gelashvili and
Alexander Matveev, John Carr, Michael Goin, William Leiserson, Sage Moore, Nir Shavit, Dan Alistarh

Keywords Paper

Deep Learning - Algorithms

0

0

0

0

14:41

14/09/2020

Squeezing Correlated Neurons for Resource-Efficient Deep Neural Networks

Elbruz Ozen, Alex Orailoglu

Keywords Paper

deep learning, information redundancy, pruning

0

0

0

0

14:48

07/09/2020

Paying more Attention to Snapshots of Iterative Pruning: Improving Model Compression via Ensemble Distillation

Duong Le, Nhan Vo, Nam Thoai

Keywords Paper

network pruning, knowledge distillation, ensemble learning

0

0

0

0

8:30

03/05/2021

ChipNet: Budget-Aware Pruning with Heaviside Continuous Approximations

Rishabh Tiwari, Udbhav Bamba, Arnav Chavan, Deepak Gupta

Keywords Paper

Budget constraints, Budget-Aware Pruning, Structured Pruning, Sparsity Learning

0

0

0

0

6:01

06/12/2021

Shapeshifter: a Parameter-efficient Transformer using Factorized Reshaped Matrices

Aliakbar Panahi, Seyran Saeedi, Tom Arodz

Keywords Paper

transformers

0

0

0

0

13:06

26/04/2020

Picking Winning Tickets Before Training by Preserving Gradient Flow

Chaoqi Wang, Guodong Zhang, Roger Grosse

Keywords Paper

neural network, pruning before training, weight pruning

0

0

0

0

5:02

06/12/2021

BulletTrain: Accelerating Robust Neural Network Training via Boundary Example Mining

Weizhe Hua, Yichi Zhang, Chuan Guo and
Zhiru Zhang, G. Edward Suh

Keywords Paper

deep learning, machine learning, robustness, adversarial robustness and security

0

0

0

0

6:36

12/07/2020

Train Big, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers

Zhuohan Li, Eric Wallace, Sheng Shen and
Kevin Lin, Kurt Keutzer, Dan Klein, Joseph Gonzalez

Keywords Paper

Applications - Language, Speech and Dialog

0

0

0

0

15:21

05/01/2021

Spike-Thrift: Towards Energy-Efficient Deep Spiking Neural Networks by Limiting Spiking Activity via Attention-Guided Compression

Souvik Kundu, Gourav Datta, Massoud Pedram, Peter A. Beerel

Keywords Paper

0

0

0

0

5:22

12/07/2020

Overfitting in adversarially robust deep learning

Eric Wong, Leslie Rice, Zico Kolter

Keywords Paper

Adversarial Examples

0

0

0

0

14:44

03/05/2021

Neural Pruning via Growing Regularization

Huan Wang, Can Qin, Yulun Zhang, Yun Fu

Keywords Paper

deep neural network pruning, regularization, Hessian matrix, model compression

0

0

0

0

6:15

18/07/2021

Globally-Robust Neural Networks

Klas Leino, Zifan Wang, Matt Fredrikson

Keywords Paper

Social Aspects of Machine Learning, AI Safety

0

0

0

0

7:55

14/06/2020

GreedyNAS: Towards Fast One-Shot NAS With Greedy Supernet

Shan You, Tao Huang, Mingmin Yang and
Fei Wang, Chen Qian, Changshui Zhang

Keywords Paper

neural architecture search, supernet, one-shot nas, single path, greedy algorithm, exploration and exploitation, searching efficiency

0

0

0

0

1:01

18/07/2021

A Novel Sequential Coreset Method for Gradient Descent Algorithms

Jiawei Huang, Ruomin Huang, wenjie liu and
Nikolaos Freris, Hu Ding

Keywords Paper

Optimization

0

0

0

0

5:15

12/07/2020

Network Pruning by Greedy Subnetwork Selection

Mao Ye, Chengyue Gong, Lizhen Nie and
Denny Zhou, Adam Klivans, Qiang Liu

Keywords Paper

Deep Learning - General

0

0

0

0

10:01

02/02/2021

Memory and Computation-Efficient Kernel SVM via Binary Embedding and Ternary Model Coefficients

Zijian Lei, Liang Lan

Keywords Paper

0

0

0

0

12:29

06/12/2021

BatchQuant: Quantized-for-all Architecture Search with Robust Quantizer

Haoping Bai, Meng Cao, Ping Huang, Jiulong Shan

Keywords Paper

deep learning, optimization

0

0

0

0

4:12

18/07/2021

Few-Shot Neural Architecture Search

Yiyang Zhao, Linnan Wang, Yuandong Tian and
Rodrigo Fonseca, Tian Guo

Keywords Paper

Algorithms, AutoML

0

0

0

0

16:43

02/02/2021

Provable Benefits of Overparameterization in Model Compression: From Double Descent to Pruning Neural Networks

Xiangyu Chang, Yingcong Li, Samet Oymak, Christos Thrampoulidis

Keywords Paper

0

0

0

0

18:14

18/07/2021

Accurate Post Training Quantization With Small Calibration Sets

Itay Hubara, Yury Nahshan, Yair Hanani and
Ron Banner, Daniel Soudry

Keywords Paper

Algorithms, AutoML

0

0

0

0

5:16

19/08/2021

Fast Multi-label Learning

Xiuwen Gong, Dong Yuan, Wei Bao

Keywords Paper

Machine Learning, Multi-instance; Multi-label; Multi-view learning

0

0

0

0

15:18

06/12/2021

Efficient Neural Network Training via Forward and Backward Propagation Sparsification

Xiao Zhou, Weizhong Zhang, Zonghao Chen and
SHIZHE DIAO, Tong Zhang

Keywords Paper

deep learning, optimization

0

0

0

0

7:48

06/12/2021

Chasing Sparsity in Vision Transformers: An End-to-End Exploration

Tianlong Chen, Yu Cheng, Zhe Gan and
Lu Yuan, Lei Zhang, Zhangyang Wang

Keywords Paper

reinforcement learning and planning, transformers

0

0

0

0

11:29

26/04/2020

Minimizing FLOPs to Learn Efficient Sparse Representations

Biswajit Paria, Chih-Kuan Yeh, Ian E.H. Yen and
Ning Xu, Pradeep Ravikumar, Barnabás Póczos

Keywords Paper

sparse embeddings, deep representations, metric learning, regularization

0

0

0

0

4:41

18/07/2021

Whitening and Second Order Optimization Both Make Information in the Dataset Unusable During Training, and Can Reduce or Prevent Generalization

Neha Wadia, Daniel Duckworth, Samuel Schoenholz and
Ethan Dyer, Jascha Sohl-Dickstein

Keywords Paper

Optimization, Probabilistic Methods, Topic Models, Probabilistic Methods, Latent Variable Models

0

0

0

0

5:17

12/07/2020

How to Train Your Neural ODE: the World of Jacobian and Kinetic Regularization

Chris Finlay, Joern-Henrik Jacobsen, Levon Nurbekyan, Adam Oberman

Keywords Paper

Probabilistic Inference - Models and Probabilistic Programming

0

0

0

0

12:34

03/05/2021

Reweighting Augmented Samples by Minimizing the Maximal Expected Loss

Mingyang Yi, LU HOU, Lifeng Shang and
Xin Jiang, Qun Liu, Zhi-Ming Ma

Keywords Paper

sample reweighting, data augmentation

0

0

0

0

4:58

06/12/2021

CAPE: Encoding Relative Positions with Continuous Augmented Positional Embeddings

Tatiana Likhomanenko, Qiantong Xu, Gabriel Synnaeve and
Ronan Collobert, Alex Rogozhnikov

Keywords Paper

deep learning, transformers

0

0

0

0

13:30

06/12/2020

Cream of the Crop: Distilling Prioritized Paths For One-Shot Neural Architecture Search

Houwen Peng, Hao Du, Hongyuan Yu and
QI LI, Jing Liao, Jianlong Fu

Keywords Paper

0

0

0

0

3:12

06/12/2021

Aligned Structured Sparsity Learning for Efficient Image Super-Resolution

Yulun Zhang, Huan Wang, Can Qin, Yun Fu

Keywords Paper

deep learning

0

0

0

0

13:23

02/02/2021

Step-Ahead Error Feedback for Distributed Training with Compressed Gradient

An Xu, Zhouyuan Huo, Heng Huang

Keywords Paper

0

0

0

0

18:26

03/05/2021

Fuzzy Tiling Activations: A Simple Approach to Learning Sparse Representations Online

Yangchen Pan, Kirby Banman, Martha White

Keywords Paper

natural sparsity, Reinforcement learning, fuzzy tiling activation function, sparse representation

0

0

0

1

6:22

06/12/2020

ISTA-NAS: Efficient and Consistent Neural Architecture Search by Sparse Coding

Yibo Yang, Hongyang Li, Shan You and
Fei Wang, Chen Qian, Zhouchen Lin

Keywords Paper

0

0

0

0

3:19

06/12/2021

Boost Neural Networks by Checkpoints

Feng Wang, Guoyizhe Wei, Qiao Liu and
Jinxiang Ou, xian wei, Hairong Lv

Keywords Paper

deep learning

1

0

0

0

4:45

03/05/2021

MetaNorm: Learning to Normalize Few-Shot Batches Across Domains

Yingjun Du, Xiantong Zhen, Ling Shao, Cees G Snoek

Keywords Paper

batch normalization, Meta-learning, few-shot domain generalization

0

0

0

0

5:48

06/12/2021

Speedy Performance Estimation for Neural Architecture Search

Robin Ru, Clare Lyle, Lisa Schut and
Miroslav Fil, Mark van der Wilk, Yarin Gal

Keywords Paper

deep learning

0

0

0

0

13:22

03/05/2021

Regularization Matters in Policy Optimization - An Empirical Study on Continuous Control

Zhuang Liu, Xuanlin Li, Bingyi Kang, trevor darrell

Keywords Paper

Deep Reinforcement Learning, Regularization, Continuous Control, Policy Optimization

0

0

0

0

8:45

06/12/2020

Bayesian Attention Modules

Xinjie Fan, Shujian Zhang, Bo Chen, Mingyuan Zhou

Keywords Paper

0

0

0

0

3:32