Successfully Applying the Stabilized Lottery Ticket Hypothesis to the Transformer Architecture

04/07/2020

Successfully Applying the Stabilized Lottery Ticket Hypothesis to the Transformer Architecture

Christopher Brix, Parnia Bahar, Hermann Ney

Keywords: inference, time-critical computations, transformer architecture, WMT tasks

Abstract Paper Similar Papers

Abstract: Sparse models require less memory for storage and enable a faster inference by reducing the necessary number of FLOPs. This is relevant both for time-critical and on-device computations using neural networks. The stabilized lottery ticket hypothesis states that networks can be pruned after none or few training iterations, using a mask computed based on the unpruned converged model. On the transformer architecture and the WMT 2014 English-to-German and English-to-French tasks, we show that stabilized lottery ticket pruning performs similar to magnitude pruning for sparsity levels of up to 85%, and propose a new combination of pruning techniques that outperforms all other techniques for even higher levels of sparsity. Furthermore, we confirm that the parameter's initial sign and not its specific value is the primary factor for successful training, and show that magnitude pruning cannot be used to find winning lottery tickets.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at ACL 2020 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

06/12/2020

Winning the Lottery with Continuous Sparsification

Pedro Savarese, Hugo Silva, Michael Maire

Keywords Paper

0

0

0

0

3:17

06/12/2021

Pruning Randomly Initialized Neural Networks with Iterative Randomization

Daiki Chijiwa, Shin'ya Yamaguchi, Yasutoshi Ida and
Kenji Umakoshi, Tomohiro INOUE

Keywords Paper

deep learning, optimization

0

0

0

0

3:21

06/12/2021

Validating the Lottery Ticket Hypothesis with Inertial Manifold Theory

Zeru Zhang, Jiayin Jin, Zijie Zhang and
Yang Zhou, Xin Zhao, Jiaxiang Ren, Ji Liu, Lingfei Wu, Ruoming Jin, Dejing Dou

Keywords Paper

theory, deep learning, optimization

0

0

0

0

14:20

06/12/2021

FedDR – Randomized Douglas-Rachford Splitting Algorithms for Nonconvex Federated Composite Optimization

Quoc Tran Dinh, Nhan H Pham, Dzung Phan, Lam Nguyen

Keywords Paper

optimization, federated learning

0

0

0

0

16:59

18/07/2021

PHEW : Constructing Sparse Networks that Learn Fast and Generalize Well without Training Data

Shreyas Malakarjun Patil, Constantine Dovrolis

Keywords Paper

Deep Learning

1

1

0

1

5:20

06/12/2021

Sparse Training via Boosting Pruning Plasticity with Neuroregeneration

Shiwei Liu, Tianlong Chen, Xiaohan Chen and
Zahra Atashgahi, Lu Yin, Huanyu Kou, Li Shen, Mykola Pechenizkiy, Zhangyang Wang, Decebal Constantin Mocanu

Keywords Paper

deep learning

0

0

0

0

10:45

06/12/2021

Only Train Once: A One-Shot Neural Network Training And Pruning Framework

Tianyi Chen, Bo Ji, Tianyu Ding and
Biyi Fang, Guanyi Wang, Zhihui Zhu, Luming Liang, Yixin Shi, Sheng Yi, Xiao Tu

Keywords Paper

deep learning, optimization, reinforcement learning and planning

0

0

0

0

12:53

05/01/2021

Spike-Thrift: Towards Energy-Efficient Deep Spiking Neural Networks by Limiting Spiking Activity via Attention-Guided Compression

Souvik Kundu, Gourav Datta, Massoud Pedram, Peter A. Beerel

Keywords Paper

0

0

0

0

5:22

06/12/2021

Shapeshifter: a Parameter-efficient Transformer using Factorized Reshaped Matrices

Aliakbar Panahi, Seyran Saeedi, Tom Arodz

Keywords Paper

transformers

0

0

0

0

13:06

06/12/2020

Sanity-Checking Pruning Methods: Random Tickets can Win the Jackpot

Jingtong Su, Yihang Chen, Tianle Cai and
Tianhao Wu, Ruiqi Gao, Liwei Wang, Jason Lee

Keywords Paper

Deep Learning -> Adversarial Networks; Deep Learning -> Deep Autoencoders; Deep Learning -> Generative Models, Theory -> Learning Theory

0

0

0

0

3:21

14/06/2020

HRank: Filter Pruning Using High-Rank Feature Map

Mingbao Lin, Rongrong Ji, Yan Wang and
Yichen Zhang, Baochang Zhang, Yonghong Tian, Ling Shao

Keywords Paper

network pruning, neural network compression and acceleration, high-rank feature map, efficient deep learning computing

0

0

0

0

4:57

13/04/2021

On the generalization properties of adversarial training

Yue Xing, Qifan Song, Guang Cheng

Keywords Paper

0

0

0

0

3:05

02/02/2021

Winning Lottery Tickets in Deep Generative Models

Neha Mukund Kalibhat, Yogesh Balaji, Soheil Feizi

Keywords Paper

0

0

0

0

15:40

02/02/2021

Post-training Quantization with Multiple Points: Mixed Precision without Mixed Precision

Xingchao Liu, Mao Ye, Dengyong Zhou, Qiang Liu

Keywords Paper

0

0

0

0

15:18

26/04/2020

On the Convergence of FedAvg on Non-IID Data

Xiang Li, Kaixuan Huang, Wenhao Yang and
Shusen Wang, Zhihua Zhang

Keywords Paper

Federated Learning, stochastic optimization, Federated Averaging

0

0

0

0

13:58

12/07/2020

Up or Down? Adaptive Rounding for Post-Training Quantization

Markus Nagel, Rana Ali Amjad, Marinus van Baalen and
Christos Louizos, Tijmen Blankevoort

Keywords Paper

Deep Learning - Algorithms

0

0

0

0

15:08

06/12/2020

Pruning neural networks without any data by iteratively conserving synaptic flow

Hidenori Tanaka, Daniel Kunin, Daniel Yamins, Surya Ganguli

Keywords Paper

Deep Learning -> Optimization for Deep Networks; Optimization -> Non-Convex Optimization, Theory

1

0

0

0

3:19

03/05/2021

Gradient Origin Networks

Sam Bond-Taylor, Chris G Willcocks

Keywords Paper

Implicit Representation, Generative Models, Deep Learning

0

0

0

0

5:01

12/07/2020

SCAFFOLD: Stochastic Controlled Averaging for Federated Learning

Sai Praneeth Reddy Karimireddy, Satyen Kale, Mehryar Mohri and
Sashank Jakkam Reddi, Sebastian Stich, Ananda Theertha Suresh

Keywords Paper

Optimization - Convex

1

1

0

1

14:57

12/07/2020

Network Pruning by Greedy Subnetwork Selection

Mao Ye, Chengyue Gong, Lizhen Nie and
Denny Zhou, Adam Klivans, Qiang Liu

Keywords Paper

Deep Learning - General

0

0

0

0

10:01

18/07/2021

Efficient Lottery Ticket Finding: Less Data is More

Zhenyu Zhang, Xuxi Chen, Tianlong Chen, Zhangyang Wang

Keywords Paper

Deep Learning, Architectures

0

0

0

0

5:12

26/04/2020

Harnessing the Power of Infinitely Wide Deep Nets on Small-data Tasks

Sanjeev Arora, Simon S. Du, Zhiyuan Li and
Ruslan Salakhutdinov, Ruosong Wang, Dingli Yu

Keywords Paper

small data, neural tangent kernel, UCI database, few-shot learning, kernel SVMs, deep learning theory, kernel design

0

0

0

0

5:02

03/05/2021

Long Live the Lottery: The Existence of Winning Tickets in Lifelong Learning

Tianlong Chen, Zhenyu Zhang, Sijia Liu and
Shiyu Chang, Zhangyang Wang

Keywords Paper

lifelong learning, lottery tickets, winning tickets

0

0

0

0

5:11

18/07/2021

Learning and Planning in Average-Reward Markov Decision Processes

Yi Wan, Abhishek Naik, Richard Sutton

Keywords Paper

Reinforcement Learning and Planning

0

0

0

0

5:05

12/07/2020

Proving the Lottery Ticket Hypothesis: Pruning is All You Need

Eran Malach, Gilad Yehudai, Shai Shalev-Schwartz, Ohad Shamir

Keywords Paper

Deep Learning - Theory

0

0

0

0

14:54

13/04/2021

Amortized bayesian prototype meta-learning: A new probabilistic meta-learning approach to few-shot image classification

Zhuo Sun, Jijie Wu, Xiaoxu Li and
Wenming Yang, Jing-Hao Xue

Keywords Paper

0

0

0

0

2:53

12/07/2020

Don't Waste Your Bits! Squeeze Activations and Gradients for Deep Neural Networks via TinyScript

Fangcheng Fu, Yuzheng Hu, Yihan He and
Jiawei Jiang, Yingxia Shao, Ce Zhang, Bin Cui

Keywords Paper

Optimization - Large Scale, Parallel and Distributed

0

0

0

0

9:59

06/12/2020

Optimal and Practical Algorithms for Smooth and Strongly Convex Decentralized Optimization

Dmitry Kovalev, Adil Salim, Peter Richtarik

Keywords Paper

0

0

0

0

3:27

26/04/2020

PC-DARTS: Partial Channel Connections for Memory-Efficient Architecture Search

Yuhui Xu, Lingxi Xie, Xiaopeng Zhang and
Xin Chen, Guo-Jun Qi, Qi Tian, Hongkai Xiong

Keywords Paper

Neural Architecture Search, DARTS, Regularization, Normalization

0

0

0

0

4:40

06/12/2021

Hyperparameter Tuning is All You Need for LISTA

Xiaohan Chen, Jialin Liu, Zhangyang Wang, Wotao Yin

Keywords Paper

deep learning

0

0

0

0

15:05

03/05/2021

BRECQ: Pushing the Limit of Post-Training Quantization by Block Reconstruction

Yuhang Li, Ruihao Gong, Xu Tan and
Yang Yang, Peng Hu, Qi Zhang, fengwei yu, Wei Wang, Shi Gu

Keywords Paper

Second-order analysis, Mixed Precision, Post Training Quantization

0

0

0

0

4:36

06/12/2021

Delayed Gradient Averaging: Tolerate the Communication Latency for Federated Learning

Ligeng Zhu, Hongzhou Lin, Yao Lu and
Yujun Lin, Song Han

Keywords Paper

optimization, machine learning, federated learning

0

0

0

1

14:48

06/12/2020

An Asymptotically Optimal Primal-Dual Incremental Algorithm for Contextual Linear Bandits

Andrea Tirinzoni, Matteo Pirotta, Marcello Restelli, Alessandro Lazaric

Keywords Paper

0

0

0

0

3:13

06/12/2021

Powerpropagation: A sparsity inducing weight reparameterisation

Jonathan Schwarz, Siddhant M Jayakumar, Razvan Pascanu and
Peter E Latham, Yee Teh

Keywords Paper

deep learning, optimization, continual learning

0

0

0

1

9:08

06/12/2021

RETRIEVE: Coreset Selection for Efficient and Robust Semi-Supervised Learning

Krishnateja Killamsetty, Xujiang Zhao, Feng Chen, Rishabh Iyer

Keywords Paper

optimization, semi-supervised learning

0

0

0

0

13:59

03/05/2021

Neural Pruning via Growing Regularization

Huan Wang, Can Qin, Yulun Zhang, Yun Fu

Keywords Paper

deep neural network pruning, regularization, Hessian matrix, model compression

0

0

0

0

6:15

12/07/2020

NetGAN without GAN: From Random Walks to Low-Rank Approximations

Luca Rendsburg, Holger Heidrich, Ulrike von Luxburg

Keywords Paper

Applications - Other

0

0

0

0

12:04

06/12/2021

Rethinking Space-Time Networks with Improved Memory Coverage for Efficient Video Object Segmentation

Ho Kei Cheng, Yu-Wing Tai, Chi-Keung Tang

Keywords Paper

0

0

0

0

14:41

18/07/2021

Slot Machines: Discovering Winning Combinations of Random Weights in Neural Networks

Maxwell M Aladago, Lorenzo Torresani

Keywords Paper

Deep Learning, Optimization for Deep Networks

0

0

0

0

4:40

06/12/2020

ISTA-NAS: Efficient and Consistent Neural Architecture Search by Sparse Coding

Yibo Yang, Hongyang Li, Shan You and
Fei Wang, Chen Qian, Zhouchen Lin

Keywords Paper

0

0

0

0

3:19