When BERT Plays the Lottery, All Tickets Are Winning

16/11/2020

When BERT Plays the Lottery, All Tickets Are Winning

Sai Prasanna, Anna Rogers, Anna Rumshisky

Keywords: fine-tuned, structured pruning, self-attention heads, self-attention layers

Abstract Paper Similar Papers

Abstract: Large Transformer-based models were shown to be reducible to a smaller number of self-attention heads and layers. We consider this phenomenon from the perspective of the lottery ticket hypothesis, using both structured and magnitude pruning. For fine-tuned BERT, we show that (a) it is possible to find subnetworks achieving performance that is comparable with that of the full model, and (b) similarly-sized subnetworks sampled from the rest of the model perform worse. Strikingly, with structured pruning even the worst possible subnetworks remain highly trainable, indicating that most pre-trained BERT weights are potentially useful. We also study the ``good″ subnetworks to see if their success can be attributed to superior linguistic knowledge, but find them unstable, and not explained by meaningful self-attention patterns.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at EMNLP 2020 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

16/11/2020

TernaryBERT: Distillation-aware Ultra-low Bit BERT

Wei Zhang, Lu Hou, Yichun Yin and
Lifeng Shang, Xiao Chen, Xin Jiang, Qun Liu

Keywords Paper

natural tasks, training process, transformer-based models, bert

0

0

0

0

8:41

16/11/2020

UNION: An Unreferenced Metric for Evaluating Open-ended Story Generation

Jian Guan, Minlie Huang

Keywords Paper

open-ended generation, story generation, evaluating generation, constructing samples

0

0

0

0

11:26

08/12/2020

Specializing Unsupervised Pretraining Models for Word-Level Semantic Similarity

Anne Lauscher, Ivan Vulić, Edoardo Maria Ponti and
Anna Korhonen, Goran Glavaš

Keywords Paper

0

0

0

0

13:01

16/11/2020

Active Learning for BERT: An Empirical Study

Liat Ein-Dor, Alon Halfon, Ariel Gera and
Eyal Shnarch, Lena Dankin, Leshem Choshen, Marina Danilevsky, Ranit Aharonov, Yoav Katz, Noam Slonim

Keywords Paper

text classification, nlp tasks, bert-based classification, binary classification

0

0

0

0

10:53

19/08/2021

Automatic Mixed-Precision Quantization Search of BERT

Changsheng Zhao, Ting Hua, Yilin Shen and
Qian Lou, Hongxia Jin

Keywords Paper

Machine Learning, Deep Learning, NLP Applications and Tools, Text Classification

0

0

0

0

12:12

16/11/2020

With Little Power Comes Great Responsibility

Dallas Card, Peter Henderson, Urvashi Khandelwal and
Robin Jia, Kyle Mahowald, Dan Jurafsky

Keywords Paper

human studies, machine translation, power analysis, power analyses

0

0

0

0

11:51

16/11/2020

On Losses for Modern Language Models

Stéphane Aroca-Ouellette, Frank Rudzicz

Keywords Paper

pre-training, masked modelling, next prediction, nsp

0

0

0

0

11:44

07/09/2020

From Saturation to Zero-Shot Visual Relationship Detection Using Local Context

Nikolaos Gkanatsios, Vassilis Pitsikalis, Petros Maragos

Keywords Paper

Visual Relationship Detection, Scene Graph Generation, Zero-shot Classification, Local Context, Language Bias

0

0

0

0

7:17

16/11/2020

On the Sentence Embeddings from Pre-trained Language Models

Bohan Li, Hao Zhou, Junxian He and
Mingxuan Wang, Yiming Yang, Lei Li

Keywords Paper

natural processing, semantic task, semantic tasks, pre-trained representations

0

0

0

0

9:11

01/07/2020

Are All Languages Created Equal in Multilingual BERT?

Shijie Wu, Mark Dredze

Keywords Paper

0

0

0

0

7:45

06/12/2021

On Component Interactions in Two-Stage Recommender Systems

Jiri Hron, Karl Krauth, Michael Jordan, Niki Kilbertus

Keywords Paper

bandits

0

0

0

0

13:25

05/12/2020

An exploratory study on multilingual quality estimation

Shuo Sun, Marina Fomicheva, Frédéric Blain and
Vishrav Chaudhary, Ahmed El-Kishky, Adithya Renduchintala, Francisco Guzmán, Lucia Specia

Keywords Paper

0

0

0

0

14:31

06/12/2021

True Few-Shot Learning with Language Models

Ethan Perez, Douwe Kiela, Kyunghyun Cho

Keywords Paper

language, few shot learning

0

0

0

0

15:04

16/11/2020

BERT-ATTACK: Adversarial Attack Against BERT Using BERT

Linyang Li, Ruotian Ma, Qipeng Guo and
Xiangyang Xue, Xipeng Qiu

Keywords Paper

adversarial attacks, downstream tasks, calculation, gradient-based methods

0

0

0

0

11:36

16/11/2020

Incremental Processing in the Age of Non-Incremental Encoders: An Empirical Assessment of Bidirectional Models for Incremental NLU

Brielen Madureira, David Schlangen

Keywords Paper

nlp, interactive systems, language encoders, bidirectional lstms

0

0

0

0

10:04

16/11/2020

Syntactic Structure Distillation Pretraining for Bidirectional Encoders

Adhiguna Kuncoro, Lingpeng Kong, Daniel Fried and
Dani Yogatama, Laura Rimell, Chris Dyer, Phil Blunsom

Keywords Paper

bert pretraining, structured tasks, natural understanding, textual learners

0

0

0

0

12:23

22/06/2020

IterefinE: Iterative KG Refinement Embeddings using Symbolic Knowledge

Siddhant Arora, Srikanta Bedathur, Maya Ramanath, Deepak Sharma

Keywords Paper

Knowledge graph refinement, embeddings, inference

0

0

0

0

5:00

02/02/2021

BERT & Family Eat Word Salad: Experiments with Text Understanding

Ashim Gupta, Giorgi Kvernadze, Vivek Srikumar

Keywords Paper

0

0

0

0

19:08

02/06/2020

Hybrid Reasoning Over Large Knowledge Bases Using On-The-Fly Knowledge Extraction

Giorgos Stoilos, Damir Juric, Szymon Wartak and
Claudia Schulz, Mohammad Khodadadi

Keywords Paper

0

0

0

0

28:41

06/12/2020

ColdGANs: Taming Language GANs with Cautious Sampling Strategies

Thomas Scialom, Paul-Alexis Dray, Sylvain Lamprier and
Benjamin Piwowarski, Jacopo Staiano

Keywords Paper

0

0

0

0

3:19

03/05/2021

On the Stability of Fine-tuning BERT: Misconceptions, Explanations, and Strong Baselines

Marius Mosbach, Maksym Andriushchenko, Dietrich Klakow

Keywords Paper

BERT, transfer learning, pretrained language model, fine-tuning stability

0

0

0

0

3:01

04/07/2020

Robust Encodings: A Framework for Combating Adversarial Typos

Erik Jones, Robin Jia, Aditi Raghunathan, Percy Liang

Keywords Paper

Robust Encodings, NLP systems, RobEn, model architecture

0

0

0

0

11:56

04/07/2020

Slot-consistent NLG for Task-oriented Dialogue Systems with Iterative Rectification Network

Yangming Li, Kaisheng Yao, Libo Qin and
Wanxiang Che, Xiaolong Li, Ting Liu

Keywords Paper

Task-oriented Systems, natural generation, natural NLG, NLG

0

0

0

0

10:53

04/07/2020

What BERT Is Not: Lessons from a New Suite of Psycholinguistic Diagnostics for Language Models

Allyson Ettinger

Keywords Paper

Pre-training, NLP tasks, inference, role-based prediction

0

0

0

0

12:39

26/04/2020

Residual Energy-Based Models for Text Generation

Yuntian Deng, Anton Bakhtin, Myle Ott and
Arthur Szlam, Marc'Aurelio Ranzato

Keywords Paper

energy-based models, text generation

0

0

0

0

4:59

26/04/2020

ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators

Kevin Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning

Keywords Paper

Natural Language Processing, Representation Learning

0

0

0

0

5:12

02/02/2021

MASKER: Masked Keyword Regularization for Reliable Text Classification

Seung Jun Moon, Sangwoo Mo, Kimin Lee and
Jaeho Lee, Jinwoo Shin

Keywords Paper

0

0

0

0

15:05

08/12/2020

SLICE: Supersense-based Lightweight Interpretable Contextual Embeddings

Cindy Aloui, Carlos Ramisch, Alexis Nasr, Lucie Barque

Keywords Paper

0

0

0

0

13:17

16/11/2020

HABERTOR: An Efficient and Effective Deep Hatespeech Detector

Thanh Tran, Yifan Hu, Changwei Hu and
Kevin Yen, Fei Tan, Kyumin Lee, Se Rim Park

Keywords Paper

downstream task, hatespeech classification, habertor model, bert model

0

0

0

0

11:46

16/11/2020

An Information Bottleneck Approach for Controlling Conciseness in Rationale Extraction

Bhargavi Paranjape, Mandar Joshi, John Thickstun and
Hannaneh Hajishirzi, Luke Zettlemoyer

Keywords Paper

language understanding, semi-supervised setting, complex models, explainer

0

0

0

0

11:44

06/12/2021

Towards Calibrated Model for Long-Tailed Visual Recognition from Prior Perspective

Zhengzhuo Xu, Zenghao Chai, Chun Yuan

Keywords Paper

theory, machine learning

0

0

0

0

4:23

14/06/2020

Counterfactual Samples Synthesizing for Robust Visual Question Answering

Long Chen, Xin Yan, Jun Xiao and
Hanwang Zhang, Shiliang Pu, Yueting Zhuang

Keywords Paper

visual question answering, counterfactual, debias, language bias, data augmentation, visual-and-language

0

0

0

0

1:01

06/12/2020

ConvBERT: Improving BERT with Span-based Dynamic Convolution

Zi-Hang Jiang, Weihao Yu, Daquan Zhou and
Yunpeng Chen, Jiashi Feng, Shuicheng Yan

Keywords Paper

0

0

0

0

3:20

08/12/2020

ContraCAT: Contrastive Coreference Analytical Templates for Machine Translation

Dario Stojanovski, Benno Krojer, Denis Peskov, Alexander Fraser

Keywords Paper

0

0

0

0

14:09

19/08/2021

BESA: BERT-based Simulated Annealing for Adversarial Text Attacks

Xinghao Yang, Weifeng Liu, Dacheng Tao, Wei Liu

Keywords Paper

Machine Learning, Adversarial Machine Learning, Natural Language Processing

0

0

0

0

14:01

06/12/2020

Generalized Boosting

Arun Suggala, Bingbin Liu, Pradeep Ravikumar

Keywords Paper

0

0

0

0

3:11

02/02/2021

Effective Slot Filling via Weakly-Supervised Dual-Model Learning

Jue Wang, Ke Chen, Lidan Shou and
Sai Wu, Gang Chen

Keywords Paper

0

0

0

0

18:02

04/07/2020

WinoWhy: A Deep Diagnosis of Essential Commonsense Knowledge for Answering Winograd Schema Challenge

Hongming Zhang, Xinran Zhao, Yangqiu Song

Keywords Paper

Deep Knowledge, Answering Challenge, WinoWhy, commonsense reasoning

0

0

0

0

11:58

06/12/2021

PARP: Prune, Adjust and Re-Prune for Self-Supervised Speech Recognition

Cheng-I Jeff Lai, Yang Zhang, Alexander Liu and
Shiyu Chang, Yi-Lun Liao, Yung-Sung Chuang, Kaizhi Qian, Sameer Khurana, David Cox, Jim Glass

Keywords Paper

self-supervised learning, representation learning

0

0

0

0

13:57

16/11/2020

If beam search is the answer, what was the question?

Clara Meister, Ryan Cotterell, Tim Vieira

Keywords Paper

language tasks, beam search, decoding, maximum decoding

0

0

0

0

12:18