DeFormer: Decomposing Pre-trained Transformers for Faster Question Answering

Abstract: Transformer-based QA models use input-wide self-attention -- i.e. across both the question and the input passage -- at all layers, causing them to be slow and memory-intensive. It turns out that we can get by without input-wide self-attention at all layers, especially in the lower layers. We introduce DeFormer, a decomposed transformer, which substitutes the full self-attention with question-wide and passage-wide self-attentions in the lower layers. This allows for question-independent processing of the input text representations, which in turn enables pre-computing passage representations reducing runtime compute drastically. Furthermore, because DeFormer is largely similar to the original model, we can initialize DeFormer with the pre-training weights of a standard transformer, and directly fine-tune on the target QA dataset. We show DeFormer versions of BERT and XLNet can be used to speed up QA by over 4.3x and with simple distillation-based losses they incur only a 1% drop in accuracy. We open source the code at https://github.com/StonyBrookNLP/deformer.

14/06/2020

Manoj Rohit Vemparala, Nael Fasfous, Lukas Frickenstein and
Alexander Frickenstein, Anmol Singh, Driton Salihu, Christian Unger, Naveen Shankar Nagaraja, WALTER STECHELE

DeFormer: Decomposing Pre-trained Transformers for Faster Question Answering

Qingqing Cao, Harsh Trivedi, Aruna Balasubramanian, Niranjan Balasubramanian

Comments

Similar Papers

Cross-Batch Memory for Embedding Learning

Xun Wang, Haozhi Zhang, Weilin Huang, Matthew R. Scott

Keywords Abstract Paper

embedding learning, hard mining, memory module

IA-RED$^2$: Interpretability-Aware Redundancy Reduction for Vision Transformers

Bowen Pan, Rameswar Panda, Yifan Jiang and Zhangyang Wang, Rogerio Feris, Aude Oliva

Keywords Abstract Paper

deep learning, transformers, vision, interpretability

Modular Meta-Learning with Shrinkage

Yutian Chen, Abe Friesen, Feryal Behbahani and Arnaud Doucet, David Budden, Matthew Hoffman, Nando de Freitas

Keywords Abstract Paper

ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision

Wonjae Kim, Bokyung Son, Ildoo Kim

Keywords Abstract Paper

Algorithms, Multimodal Learning

Hard-Coded Gaussian Attention for Neural Machine Translation

Weiqiu You, Simeng Sun, Mohit Iyyer

Keywords Abstract Paper

Neural Translation, Hard-Coded Attention, variant, encoder

Compressing pre-trained language models by matrix decomposition

Matan Ben Noach, Yoav Goldberg

Keywords Abstract Paper

Accurate Post Training Quantization With Small Calibration Sets

Itay Hubara, Yury Nahshan, Yair Hanani and Ron Banner, Daniel Soudry

Keywords Abstract Paper

Algorithms, AutoML

Hardware-Aware Mixed-Precision Neural Networks using In-Train Quantization

Manoj Rohit Vemparala, Nael Fasfous, Lukas Frickenstein and Alexander Frickenstein, Anmol Singh, Driton Salihu, Christian Unger, Naveen Shankar Nagaraja, WALTER STECHELE

Keywords Abstract Paper

Quantization, Inference, Neural Network Compression, Mixed Precision, Hardware Aware Networks

Scatterbrain: Unifying Sparse and Low-rank Attention

Beidi Chen, Tri Dao, Eric Winsor and Zhao Song, Atri Rudra, Christopher Ré

Keywords Abstract Paper

transformers, generative model

Composed Fine-Tuning: Freezing Pre-Trained Denoising Autoencoders for Improved Generalization

Sang Michael Xie, Tengyu Ma, Percy Liang

Keywords Abstract Paper

Algorithms, Multitask, Transfer, and Meta Learning

Reducing Transformer Depth on Demand with Structured Dropout

Angela Fan, Edouard Grave, Armand Joulin

Keywords Abstract Paper

reduction, regularization, pruning, dropout, transformer

Shapeshifter: a Parameter-efficient Transformer using Factorized Reshaped Matrices

Aliakbar Panahi, Seyran Saeedi, Tom Arodz

Keywords Abstract Paper

transformers

Augmented Shortcuts for Vision Transformers

Yehui Tang, Kai Han, Chang Xu and An Xiao, Yiping Deng, Chao Xu, Yunhe Wang

Keywords Abstract Paper

transformers, vision

Searching for Efficient Transformers for Language Modeling

David So, Wojciech Mańke, Hanxiao Liu and Zihang Dai, Noam Shazeer, Quoc V Le

Keywords Abstract Paper

transformers, language

Incorporating BERT into Parallel Sequence Decoding with Adapters

Junliang Guo, Zhirui Zhang, Linli Xu and Hao-Ran Wei, Boxing Chen, Enhong Chen

Keywords Abstract Paper

From Quantized DNNs to Quantizable DNNs

Kunyuan Du, Ya Zhang, Haibing Guan

Keywords Abstract Paper

Quantized DNNs, Dynamic Bit-width

Anytime Sampling for Autoregressive Models via Ordered Autoencoding

Yilun Xu, Yang Song, Sahaj Garg and Linyuan Gong, Rui Shu, Aditya Grover, Stefano Ermon

Keywords Abstract Paper

Infusing Sequential Information into Conditional Masked Translation Model with Self-Review Mechanism

Pan Xie, Zhi Cui, Xiuying Chen and XiaoHui Hu, Jianwei Cui, Bin Wang

Keywords Abstract Paper

Plug and Play Language Models: A Simple Approach to Controlled Text Generation

Sumanth Dathathri, Andrea Madotto, Janice Lan and Jane Hung, Eric Frank, Piero Molino, Jason Yosinski, Rosanne Liu

Keywords Abstract Paper

controlled text generation, generative models, conditional generative models, language modeling, transformer

AC-GC: Lossy Activation Compression with Guaranteed Convergence

R David Evans, Tor Aamodt

Keywords Abstract Paper

deep learning, optimization, graph learning

Structured Multi-Hashing for Model Compression

Keywords Paper

Bowen Pan, Rameswar Panda, Yifan Jiang and
Zhangyang Wang, Rogerio Feris, Aude Oliva

Keywords Paper

Yutian Chen, Abe Friesen, Feryal Behbahani and
Arnaud Doucet, David Budden, Matthew Hoffman, Nando de Freitas

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Itay Hubara, Yury Nahshan, Yair Hanani and
Ron Banner, Daniel Soudry

Keywords Paper

Manoj Rohit Vemparala, Nael Fasfous, Lukas Frickenstein and
Alexander Frickenstein, Anmol Singh, Driton Salihu, Christian Unger, Naveen Shankar Nagaraja, WALTER STECHELE

Keywords Paper

Beidi Chen, Tri Dao, Eric Winsor and
Zhao Song, Atri Rudra, Christopher Ré

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Yehui Tang, Kai Han, Chang Xu and
An Xiao, Yiping Deng, Chao Xu, Yunhe Wang

Keywords Paper

David So, Wojciech Mańke, Hanxiao Liu and
Zihang Dai, Noam Shazeer, Quoc V Le

Keywords Paper

Junliang Guo, Zhirui Zhang, Linli Xu and
Hao-Ran Wei, Boxing Chen, Enhong Chen

Keywords Paper

Keywords Paper

Yilun Xu, Yang Song, Sahaj Garg and
Linyuan Gong, Rui Shu, Aditya Grover, Stefano Ermon

Keywords Paper

Pan Xie, Zhi Cui, Xiuying Chen and
XiaoHui Hu, Jianwei Cui, Bin Wang

Keywords Paper

Sumanth Dathathri, Andrea Madotto, Janice Lan and
Jane Hung, Eric Frank, Piero Molino, Jason Yosinski, Rosanne Liu

Keywords Paper

Keywords Paper

Elad Eban, Yair Movshovitz-Attias, Hao Wu and
Mark Sandler, Andrew Poon, Yerlan Idelbayev, Miguel Á. Carreira-Perpiñán

Keywords Paper

Bohan Zhuang, Lingqiao Liu, Mingkui Tan and
Chunhua Shen, Ian Reid

Keywords Paper

Yue Liao, Si Liu, Guanbin Li and
Fei Wang, Yanjie Chen, Chen Qian, Bo Li

Keywords Paper

Keywords Paper

Keywords Paper

Florian Mai, Nikolaos Pappas, Ivan Montero and
Noah A. Smith, James Henderson

Keywords Paper

Keywords Paper

Siyi Liu, Chen Gao, Yihong Chen and
Depeng Jin, Yong Li

Keywords Paper

Keywords Paper

Keywords Paper

Qizhe Xie, Zihang Dai, Eduard Hovy and
Thang Luong, Quoc V Le

Keywords Paper

Mengjie Zhao, Tao Lin, Fei Mi and
Martin Jaggi, Hinrich Schütze

Keywords Paper

Keywords Paper

Keywords Paper

Anup Sarma, Sonali Singh, Huaipan Jiang and
Rui Zhang, Mahmut T Kandemir, Chita Das

Keywords Paper

Keywords Paper