Compacter: Efficient Low-Rank Hypercomplex Adapter Layers

06/12/2021

Compacter: Efficient Low-Rank Hypercomplex Adapter Layers

Rabeeh Karimi Mahabadi, James Henderson, Sebastian Ruder

Keywords: optimization

Abstract Paper Similar Papers

Abstract: Adapting large-scale pretrained language models to downstream tasks via fine-tuning is the standard method for achieving state-of-the-art performance on NLP benchmarks. However, fine-tuning all weights of models with millions or billions of parameters is sample-inefficient, unstable in low-resource settings, and wasteful as it requires storing a separate copy of the model for each task. Recent work has developed parameter-efficient fine-tuning methods, but these approaches either still require a relatively large number of parameters or underperform standard fine-tuning. In this work, we propose Compacter, a method for fine-tuning large-scale language models with a better trade-off between task performance and the number of trainable parameters than prior work. Compacter accomplishes this by building on top of ideas from adapters, low-rank optimization, and parameterized hypercomplex multiplication layers.Specifically, Compacter inserts task-specific weight matrices into a pretrained model's weights, which are computed efficiently as a sum of Kronecker products between shared ``slow'' weights and ``fast'' rank-one matrices defined per Compacter layer. By only training 0.047% of a pretrained model's parameters, Compacter performs on par with standard fine-tuning on GLUE and outperforms standard fine-tuning on SuperGLUE and low-resource settings. Our code is publicly available at https://github.com/rabeehk/compacter.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at NeurIPS 2021 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

30/11/2020

Degradation Model Learning for Real-World Single Image Super-resolution

Jin XIAO, Hongwei Yong, Lei Zhang

Keywords Paper

0

0

0

0

8:54

19/08/2021

Automatic Mixed-Precision Quantization Search of BERT

Changsheng Zhao, Ting Hua, Yilin Shen and
Qian Lou, Hongxia Jin

Keywords Paper

Machine Learning, Deep Learning, NLP Applications and Tools, Text Classification

0

0

0

0

12:12

06/12/2021

NxMTransformer: Semi-Structured Sparsification for Natural Language Understanding via ADMM

Connor Holmes, Minjia Zhang, Yuxiong He, Bo Wu

Keywords Paper

optimization, transformers, language

0

0

0

0

10:53

06/12/2021

Shapeshifter: a Parameter-efficient Transformer using Factorized Reshaped Matrices

Aliakbar Panahi, Seyran Saeedi, Tom Arodz

Keywords Paper

transformers

0

0

0

0

13:06

26/04/2020

ALBERT: A Lite BERT for Self-supervised Learning of Language Representations

Zhenzhong Lan, Mingda Chen, Sebastian Goodman and
Kevin Gimpel, Piyush Sharma, Radu Soricut

Keywords Paper

Natural Language Processing, BERT, Representation Learning

0

0

0

0

4:59

06/12/2021

Accelerated Sparse Neural Training: A Provable and Efficient Method to Find N:M Transposable Masks

Itay Hubara, Brian Chmiel, Moshe Island and
Ron Banner, Joseph Naor, Daniel Soudry

Keywords Paper

deep learning

0

0

0

0

11:02

03/05/2021

Supervised Contrastive Learning for Pre-trained Language Model Fine-tuning

Beliz Gunel, Jingfei Du, Alexis Conneau, Veselin Stoyanov

Keywords Paper

supervised contrastive learning, pre-trained language model fine-tuning, natural language understanding, generalization, few-shot learning, robustness

0

0

0

0

4:44

04/07/2020

SMART: Robust and Efficient Fine-Tuning for Pre-trained Natural Language Models through Principled Regularized Optimization

Haoming Jiang, Pengcheng He, Weizhu Chen and
Xiaodong Liu, Jianfeng Gao, Tuo Zhao

Keywords Paper

NLP, generalization, NLP tasks, SMART

0

0

0

0

11:43

06/12/2020

Co-Tuning for Transfer Learning

Kaichao You, Zhi Kou, Mingsheng Long, Jianmin Wang

Keywords Paper

0

0

0

0

3:24

03/05/2021

Conditionally Adaptive Multi-Task Learning: Improving Transfer Learning in NLP Using Fewer Parameters & Less Data

Jonathan Pilault, Amine EL hattami, Chris J Pal

Keywords Paper

Natural Language Processing, Transfer Learning, Adaptive Learning, Multi-Task Learning

0

0

0

0

5:10

16/11/2020

Structured Pruning of Large Language Models

Ziheng Wang, Jeremy Wohlwend, Tao Lei

Keywords Paper

natural tasks, model compression, language tasks, pruning embeddings

0

0

0

0

11:04

02/02/2021

Continuous Self-Attention Models with Neural ODE Networks

Jing Zhang, Peng Zhang, Baiwen Kong and
Junqiu Wei, Xin Jiang

Keywords Paper

0

0

0

0

15:25

14/06/2020

AANet: Adaptive Aggregation Network for Efficient Stereo Matching

Haofei Xu, Juyong Zhang

Keywords Paper

stereo matching, cost aggregation, edge-preserving, deformable convolution, cost volume, dense correspondences

0

0

0

0

1:01

03/05/2021

Training with Quantization Noise for Extreme Model Compression

Pierre Stock, Angela Fan, Benjamin Graham and
Edouard Grave, Rémi Gribonval, Hervé Jégou, Armand Joulin

Keywords Paper

Product Quantization, Compression, Efficiency

0

0

0

0

4:58

03/05/2021

Monotonic Kronecker-Factored Lattice

William Bakst, Nobuyuki Morioka, Erez Louidor

Keywords Paper

Matrix and Tensor Factorization, Evaluation, Efficiency, Machine Learning, Algorithms, Fairness, Regularization, Classification, Theory, Regression

0

0

0

0

5:03

12/07/2020

Train Big, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers

Zhuohan Li, Eric Wallace, Sheng Shen and
Kevin Lin, Kurt Keutzer, Dan Klein, Joseph Gonzalez

Keywords Paper

Applications - Language, Speech and Dialog

0

0

0

0

15:21

04/07/2020

Pre-training Is (Almost) All You Need: An Application to Commonsense Reasoning

Alexandre Tamborrino, Nicola Pellicanò, Baptiste Pannier and
Pascal Voitot, Louise Naudin

Keywords Paper

Commonsense Reasoning, common tasks, plausibility task, pre-training phase

0

0

0

0

11:39

12/07/2020

Low-Rank Bottleneck in Multi-head Attention Models

Srinadh Bhojanapalli, Chulhee Yun, Ankit Singh Rawat and
Sashank Jakkam Reddi, Sanjiv Kumar

Keywords Paper

Deep Learning - Theory

1

0

0

0

14:01

06/12/2021

Continual Learning via Local Module Composition

Oleksiy Ostapenko, Pau Rodriguez, Massimo Caccia, Laurent Charlin

Keywords Paper

continual learning, transfer learning

1

0

0

1

14:32

02/02/2021

GLISTER: Generalization based Data Subset Selection for Efficient and Robust Learning

Krishnateja Killamsetty, Durga Sivasubramanian, Ganesh Ramakrishnan, Rishabh Iyer

Keywords Paper

0

0

0

0

19:14

16/11/2020

Small but Mighty: New Benchmarks for Split and Rephrase

Li Zhang, Huaiyu Zhu, Siddhartha Brahma, Yunyao Li

Keywords Paper

text task, fine-grained evaluation, automatic process, rule-based model

0

0

0

0

6:58

06/12/2020

SMYRF - Efficient Attention using Asymmetric Clustering

Giannis Daras, Nikita Kitaev, Augustus Odena, Alex Dimakis

Keywords Paper

0

0

0

0

3:28

14/09/2020

Squeezing Correlated Neurons for Resource-Efficient Deep Neural Networks

Elbruz Ozen, Alex Orailoglu

Keywords Paper

deep learning, information redundancy, pruning

0

0

0

0

14:48

06/12/2020

DynaBERT: Dynamic BERT with Adaptive Width and Depth

Lu Hou, Zhiqi Huang, Lifeng Shang and
Xin Jiang, Xiao Chen, Qun Liu

Keywords Paper

0

0

0

0

2:59

06/12/2020

Incorporating BERT into Parallel Sequence Decoding with Adapters

Junliang Guo, Zhirui Zhang, Linli Xu and
Hao-Ran Wei, Boxing Chen, Enhong Chen

Keywords Paper

0

0

0

0

3:17

04/07/2020

The Right Tool for the Job: Matching Model and Instance Complexities

Roy Schwartz, Gabriel Stanovsky, Swabha Swayamdipta and
Jesse Dodge, Noah A. Smith

Keywords Paper

inference, early decisions, costly retraining, Job Model

0

0

0

0

11:27

13/04/2021

Accumulations of projections—a unified framework for random sketches in kernel ridge regression

Yifan Chen, Yun Yang

Keywords Paper

0

0

0

0

3:03

03/05/2021

Self-supervised Adversarial Robustness for the Low-label, High-data Regime

Sven Gowal, Po-Sen Huang, Aaron v den and
Timothy A Mann, Pushmeet Kohli

Keywords Paper

self-supervised, adversarial training, robustness

0

0

0

0

5:17

19/04/2021

Keep learning: Self-supervised meta-learning for learning from inference

Akhil Kedia, Sai Chetan Chinthakindi

Keywords Paper

0

0

0

0

11:27

06/12/2021

Learning Compact Representations of Neural Networks using DiscriminAtive Masking (DAM)

Jie Bu, Arka Daw, M. Maruf, Anuj Karpatne

Keywords Paper

deep learning, machine learning, vision, graph learning, representation learning

0

0

0

0

13:59

18/07/2021

LogME: Practical Assessment of Pre-trained Models for Transfer Learning

Kaichao You, Yong Liu, Jianmin Wang, Mingsheng Long

Keywords Paper

Algorithms, Multitask, Transfer, and Meta Learning

1

1

0

0

5:18

18/07/2021

Exact Optimization of Conformal Predictors via Incremental and Decremental Learning

Giovanni Cherubin, Konstantinos Chatzikokolakis, Martin Jaggi

Keywords Paper

Probabilistic Methods

0

0

0

0

5:48

26/08/2020

Sketching Transformed Matrices with Applications to Natural Language Processing

Yingyu Liang, Zhao Song, Mengdi Wang and
Lin Yang, Xin Yang

Keywords Paper

0

0

0

0

11:17

06/12/2021

Long-Short Transformer: Efficient Transformers for Language and Vision

Chen Zhu, Wei Ping, Chaowei Xiao and
Mohammad Shoeybi, Tom Goldstein, Anima Anandkumar, Bryan Catanzaro

Keywords Paper

machine learning, transformers

0

0

0

0

11:44

03/05/2021

Rethinking Embedding Coupling in Pre-trained Language Models

Hyung Won Chung, Thibault Fevry, Henry Tsai and
Melvin Johnson, Sebastian Ruder

Keywords Paper

transfer learning, pre-training, natural language processing, efficiency

0

0

0

0

5:12

23/08/2020

Compositional embeddings using complementary partitions for memory-efficient recommendation systems

Hao-Jun Michael Shi, Dheevatsa Mudigere, Maxim Naumov, Jiyan Yang

Keywords Paper

embeddings, model compression, recommendation systems

0

0

0

0

16:14

06/12/2021

Quantifying and Improving Transferability in Domain Generalization

Guojun Zhang, Han Zhao, Yaoliang Yu, Pascal Poupart

Keywords Paper

domain adaptation, transfer learning

0

0

0

0

10:27

06/12/2021

Scalable Rule-Based Representation Learning for Interpretable Classification

Zhuo Wang, Wei Zhang, Ning Liu, Jianyong Wang

Keywords Paper

optimization, machine learning, representation learning, interpretability

0

0

0

0

14:52

03/05/2021

Share or Not? Learning to Schedule Language-Specific Capacity for Multilingual Translation

Biao Zhang, Ankur Bapna, Rico Sennrich, Orhan Firat

Keywords Paper

multilingual transformer, multilingual translation, language-specific modeling, conditional computation

0

0

0

0

15:04

06/12/2021

Scatterbrain: Unifying Sparse and Low-rank Attention

Beidi Chen, Tri Dao, Eric Winsor and
Zhao Song, Atri Rudra, Christopher Ré

Keywords Paper

transformers, generative model

0

0

0

0

13:15