PoWER-BERT: Accelerating BERT Inference via Progressive Word-vector Elimination

Abstract: We develop a novel method, called PoWER-BERT, for improving the inference time of the popular BERT model, while maintaining the accuracy. It works by: a) exploiting redundancy pertaining to word-vectors (intermediate encoder outputs) and eliminating the redundant vectors. b) determining which word-vectors to eliminate by developing a strategy for measuring their significance, based on the self-attention mechanism; c) learning how many word-vectors to eliminate by augmenting the BERT model and the loss function. Experiments on the standard GLUE benchmark shows that PoWER-BERT achieves up to 4.5x reduction in inference time over BERT with < 1% loss in accuracy. We show that PoWER-BERT offers significantly better trade-off between accuracy and inference time compared to prior methods. We demonstrate that our method attains up to 6.8x reduction in inference time with < 1% loss in accuracy when applied over ALBERT, a highly compressed version of BERT.

16/11/2020

hidden markov models, mixture models, mixture of hidden markov models, expectation maximization, orthogonality, regularization, penalty

14:43

03/05/2021

PoWER-BERT: Accelerating BERT Inference via Progressive Word-vector Elimination

Saurabh Goyal, Anamitra Roy Choudhury, Venkatesan Chakaravarthy, Saurabh Raje, Yogish Sabharwal, Ashish Verma

Comments

Similar Papers

HABERTOR: An Efficient and Effective Deep Hatespeech Detector

Thanh Tran, Yifan Hu, Changwei Hu and Kevin Yen, Fei Tan, Kyumin Lee, Se Rim Park

Keywords Abstract Paper

downstream task, hatespeech classification, habertor model, bert model

The Out-of-Distribution Problem in Explainability and Search Methods for Feature Importance Explanations

Peter Hase, Harry Xie, Mohit Bansal

Keywords Abstract Paper

machine learning, interpretability

Multilingual Alignment of Contextual Word Representations

Steven Cao, Nikita Kitaev, Dan Klein

Keywords Abstract Paper

multilingual, natural language processing, embedding alignment, BERT, word embeddings, transfer

Specializing Unsupervised Pretraining Models for Word-Level Semantic Similarity

Anne Lauscher, Ivan Vulić, Edoardo Maria Ponti and Anna Korhonen, Goran Glavaš

Keywords Abstract Paper

Doubly robust off-policy evaluation with shrinkage

Yi Su, Maria Dimakopoulou, Akshay Krishnamurthy, Miroslav Dudik

Keywords Abstract Paper

Online Learning, Active Learning, and Bandits

Multi-label Contrastive Predictive Coding

Jiaming Song, Stefano Ermon

Keywords Abstract Paper

Exploring the Limits of Simple Learners in Knowledge Distillation for Document Classification with DocBERT

Ashutosh Adhikari, Achyudh Ram, Raphael Tang and William L. Hamilton, Jimmy Lin

Keywords Abstract Paper

Symmetric regularization based BERT for pair-wise semantic reasoning

Weidi Xu, Xingyi Cheng, Kunlong Chen, Taifeng Wang

Keywords Abstract Paper

BERT, natural language inference

Hybrid Reasoning Over Large Knowledge Bases Using On-The-Fly Knowledge Extraction

Giorgos Stoilos, Damir Juric, Szymon Wartak and Claudia Schulz, Mohammad Khodadadi

Keywords Abstract Paper

Orthogonal Mixture of Hidden Markov Models

Negar Safinianaini, Camila P. E. de Souza, Henrik Boström, Jens Lagergren

Keywords Abstract Paper

hidden markov models, mixture models, mixture of hidden markov models, expectation maximization, orthogonality, regularization, penalty

On the Stability of Fine-tuning BERT: Misconceptions, Explanations, and Strong Baselines

Marius Mosbach, Maksym Andriushchenko, Dietrich Klakow

Keywords Abstract Paper

BERT, transfer learning, pretrained language model, fine-tuning stability

On the Sentence Embeddings from Pre-trained Language Models

Bohan Li, Hao Zhou, Junxian He and Mingxuan Wang, Yiming Yang, Lei Li

Keywords Abstract Paper

natural processing, semantic task, semantic tasks, pre-trained representations

An Information Bottleneck Approach for Controlling Conciseness in Rationale Extraction

Bhargavi Paranjape, Mandar Joshi, John Thickstun and Hannaneh Hajishirzi, Luke Zettlemoyer

Keywords Abstract Paper

language understanding, semi-supervised setting, complex models, explainer

Incorporating BERT into Parallel Sequence Decoding with Adapters

Junliang Guo, Zhirui Zhang, Linli Xu and Hao-Ran Wei, Boxing Chen, Enhong Chen

Keywords Abstract Paper

Joint Training with Semantic Role Labeling for Better Generalization in Natural Language Inference

Cemil Cengiz, Deniz Yuret

Keywords Abstract Paper

Automatic differentiation variational inference with mixtures

Warren Morningstar, Sharad Vikram, Cusuh Ham and Andrew Gallagher, Joshua Dillon

Keywords Abstract Paper

Modelling General Properties of Nouns by Selectively Averaging Contextualised Embeddings

Na Li, Zied Bouraoui, Jose Camacho-Collados and Luis Espinosa-Anke, Qing Gu, Steven Schockaert

Keywords Abstract Paper

Natural Language Processing, Natural Language Semantics, Natural Language Processing

BERxiT: Early exiting for BERT with better fine-tuning and extension to regression

Ji Xin, Raphael Tang, Yaoliang Yu, Jimmy Lin

Keywords Abstract Paper

Boosted CVaR Classification

Runtian Zhai, Chen Dan, Arun Suggala and J. Zico Kolter, Pradeep Ravikumar

Keywords Abstract Paper

machine learning, fairness

Extrapolation Towards Imaginary 0-Nearest Neighbour and Its Improved Convergence Rate

Akifumi Okuno, Hidetoshi Shimodaira

Keywords Abstract Paper

Towards non-task-specific distillation of BERT via sentence representation approximation

Bowen Wu, Huan Zhang, MengYuan Li and Zongsheng Wang, Qihang Feng, Junhong Huang, Baoxun Wang

Keywords Abstract Paper

RankPose: Learning Generalised Feature with Rank Supervision for Head Pose Estimation

Donggen Dai, Wangkit Wong, Zhuojun Chen

Thanh Tran, Yifan Hu, Changwei Hu and
Kevin Yen, Fei Tan, Kyumin Lee, Se Rim Park

Keywords Paper

Keywords Paper

Keywords Paper

Anne Lauscher, Ivan Vulić, Edoardo Maria Ponti and
Anna Korhonen, Goran Glavaš

Keywords Paper

Keywords Paper

Keywords Paper

Ashutosh Adhikari, Achyudh Ram, Raphael Tang and
William L. Hamilton, Jimmy Lin

Keywords Paper

Keywords Paper

Giorgos Stoilos, Damir Juric, Szymon Wartak and
Claudia Schulz, Mohammad Khodadadi

Keywords Paper

Keywords Paper

Keywords Paper

Bohan Li, Hao Zhou, Junxian He and
Mingxuan Wang, Yiming Yang, Lei Li

Keywords Paper

Bhargavi Paranjape, Mandar Joshi, John Thickstun and
Hannaneh Hajishirzi, Luke Zettlemoyer

Keywords Paper

Junliang Guo, Zhirui Zhang, Linli Xu and
Hao-Ran Wei, Boxing Chen, Enhong Chen

Keywords Paper

Keywords Paper

Warren Morningstar, Sharad Vikram, Cusuh Ham and
Andrew Gallagher, Joshua Dillon

Keywords Paper

Na Li, Zied Bouraoui, Jose Camacho-Collados and
Luis Espinosa-Anke, Qing Gu, Steven Schockaert

Keywords Paper

Keywords Paper

Runtian Zhai, Chen Dan, Arun Suggala and
J. Zico Kolter, Pradeep Ravikumar

Keywords Paper

Keywords Paper

Bowen Wu, Huan Zhang, MengYuan Li and
Zongsheng Wang, Qihang Feng, Junhong Huang, Baoxun Wang

Keywords Paper

Keywords Paper

Adhiguna Kuncoro, Lingpeng Kong, Daniel Fried and
Dani Yogatama, Laura Rimell, Chris Dyer, Phil Blunsom

Keywords Paper

Keywords Paper

Kaitao Song, Xu Tan, Tao Qin and
Jianfeng Lu, Tie-Yan Liu

Keywords Paper

Keywords Paper

Zhenzhong Lan, Mingda Chen, Sebastian Goodman and
Kevin Gimpel, Piyush Sharma, Radu Soricut

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Bing Li, Yukai Miao, Yaoshu Wang and
Yifang Sun, Wei Wang

Keywords Paper

Wei Zhang, Lu Hou, Yichun Yin and
Lifeng Shang, Xiao Chen, Xin Jiang, Qun Liu

Keywords Paper

Keywords Paper

Urmish Thakker, Paul Whatmough, ZHIGANG LIU and
Matthew Mattina, Jesse Beu

Keywords Paper