BERxiT: Early exiting for BERT with better fine-tuning and extension to regression

19/04/2021

BERxiT: Early exiting for BERT with better fine-tuning and extension to regression

Ji Xin, Raphael Tang, Yaoliang Yu, Jimmy Lin

Keywords:

Abstract Paper Similar Papers

Abstract: The slow speed of BERT has motivated much research on accelerating its inference, and the early exiting idea has been proposed to make trade-offs between model quality and efficiency. This paper aims to address two weaknesses of previous work: (1) existing fine-tuning strategies for early exiting models fail to take full advantage of BERT; (2) methods to make exiting decisions are limited to classification tasks. We propose a more advanced fine-tuning strategy and a learning-to-exit module that extends early exiting to tasks other than classification. Experiments demonstrate improved early exiting for BERT, with better trade-offs obtained by the proposed fine-tuning strategy, successful application to regression tasks, and the possibility to combine it with other acceleration methods. Source code can be found at <a href="https://github.com/castorini/berxit" class="acl-markup-url">https://github.com/castorini/berxit</a>.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at EACL 2021 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

04/07/2020

DeeBERT: Dynamic Early Exiting for Accelerating BERT Inference

Ji Xin, Raphael Tang, Jaejun Lee and
Yaoliang Yu, Jimmy Lin

Keywords Paper

Accelerating Inference, NLP applications, inference, real-time applications

0

0

0

0

6:56

03/05/2021

Revisiting Few-sample BERT Fine-tuning

Tianyi Zhang, Felix Wu, Arzoo Katiyar and
Kilian Weinberger, Yoav Artzi

Keywords Paper

BERT, Fine-tuning, Optimization

0

0

0

0

5:31

06/12/2020

Off-Policy Imitation Learning from Observations

Zhuangdi Zhu, Kaixiang Lin, Bo Dai, Jiayu Zhou

Keywords Paper

0

0

0

1

3:24

03/05/2021

On the Stability of Fine-tuning BERT: Misconceptions, Explanations, and Strong Baselines

Marius Mosbach, Maksym Andriushchenko, Dietrich Klakow

Keywords Paper

BERT, transfer learning, pretrained language model, fine-tuning stability

0

0

0

0

3:01

16/11/2020

Active Learning for BERT: An Empirical Study

Liat Ein-Dor, Alon Halfon, Ariel Gera and
Eyal Shnarch, Lena Dankin, Leshem Choshen, Marina Danilevsky, Ranit Aharonov, Yoav Katz, Noam Slonim

Keywords Paper

text classification, nlp tasks, bert-based classification, binary classification

0

0

0

0

10:53

26/08/2020

Robust Optimisation Monte Carlo

Borislav Ikonomov, Michael U. Gutmann

Keywords Paper

0

0

0

0

14:13

16/11/2020

To BERT or Not to BERT: Comparing Task-specific and Task-agnostic Semi-Supervised Approaches for Sequence Tagging

Kasturi Bhattacharjee, Miguel Ballesteros, Rishita Anubhai and
Smaranda Muresan, Jie Ma, Faisal Ladhak, Yaser Al-Onaizan

Keywords Paper

learning representations, downstream tasks, cross-view cvt, sequence tasks

0

0

0

0

6:26

16/11/2020

Exposing Shallow Heuristics of Relation Extraction Models with Challenge Data

Shachar Rosenman, Alon Jacovi, Yoav Goldberg

Keywords Paper

data process, re collection, sota models, tacred

0

0

0

0

5:55

03/05/2021

FOCAL: Efficient Fully-Offline Meta-Reinforcement Learning via Distance Metric Learning and Behavior Regularization

Lanqing Li, Rui Yang, Dijun Luo

Keywords Paper

distance metric learning, offline/batch reinforcement learning, meta-reinforcement learning, contrastive learning, multi-task reinforcement learning

1

0

0

0

6:21

04/07/2020

From Zero to Hero: Human-In-The-Loop Entity Linking in Low Resource Domains

Jan-Christoph Klie, Richard Eckart de Castilho, Iryna Gurevych

Keywords Paper

Human-In-The-Loop Linking, Entity linking, disambiguating mentions, annotation process

0

0

0

0

12:26

05/12/2020

Towards non-task-specific distillation of BERT via sentence representation approximation

Bowen Wu, Huan Zhang, MengYuan Li and
Zongsheng Wang, Qihang Feng, Junhong Huang, Baoxun Wang

Keywords Paper

0

0

0

0

10:51

03/08/2020

Joint Stochastic Approximation and Its Application to Learning Discrete Latent Variable Models

Zhijian Ou, Yunfu Song

Keywords Paper

0

0

0

0

8:24

06/12/2020

Meta-Learning with Adaptive Hyperparameters

Sungyong Baik, Myungsub Choi, Janghoon Choi and
Heewon Kim, Kyoung Mu Lee

Keywords Paper

0

0

0

0

3:23

16/11/2020

Unsupervised Adaptation of Question Answering Systems via Generative Self-training

Steven Rennie, Etienne Marcheret, Neil Mallinar and
David Nahamoo, Vaibhava Goel

Keywords Paper

question-answering tasks, self-supervised tasks, word masking, sentence entailment

0

0

0

0

13:14

06/12/2021

COMBO: Conservative Offline Model-Based Policy Optimization

Tianhe Yu, Aviral Kumar, Rafael Rafailov and
Aravind Rajeswaran, Sergey Levine, Chelsea Finn

Keywords Paper

deep learning, optimization, reinforcement learning and planning

0

0

0

0

12:35

06/12/2021

ByPE-VAE: Bayesian Pseudocoresets Exemplar VAE

Qingzhong Ai, LIRONG HE, SHIYU LIU, Zenglin Xu

Keywords Paper

optimization, generative model, representation learning

0

0

0

0

7:50

06/12/2021

Differentiable Annealed Importance Sampling and the Perils of Gradient Noise

Guodong Zhang, Kyle Hsu, Jianing Li and
Chelsea Finn, Roger Grosse

Keywords Paper

optimization, generative model

0

0

0

0

15:30

05/01/2021

Class-Wise Metric Scaling for Improved Few-Shot Classification

Ge Liu, Linglan Zhao, Wei Li and
Dashan Guo, Xiangzhong Fang

Keywords Paper

0

0

0

0

5:01

18/07/2021

Bootstrapping Fitted Q-Evaluation for Off-Policy Inference

Botao Hao, Xiang Ji, Yaqi Duan and
Hao Lu, Csaba Szepesvari, Mengdi Wang

Keywords Paper

Theory, RL, Decisions and Control Theory

0

0

0

0

5:18

13/04/2021

Comparing the value of labeled and unlabeled data in method-of-moments latent variable estimation

Mayee Chen, Benjamin Cohen-Wang, Stephen Mussmann and
Frederic Sala, Christopher Re

Keywords Paper

0

0

0

0

3:04

03/05/2021

Estimating and Evaluating Regression Predictive Uncertainty in Deep Object Detectors

Ali Harakeh, Steven L Waslander

Keywords Paper

Computer Vision, Object Detection, Energy Score, Variance Networks, Proper Scoring Rules, Predictive Uncertainty Estimation

0

0

0

0

4:44

19/10/2020

ALEX: Active learning based enhancement of a classification model’s EXplainability

Ishani Mondal, Debasis Ganguly

Keywords Paper

image classification, model interpretability, active learning

0

0

0

0

5:00

06/12/2021

The Out-of-Distribution Problem in Explainability and Search Methods for Feature Importance Explanations

Peter Hase, Harry Xie, Mohit Bansal

Keywords Paper

machine learning, interpretability

0

0

0

0

15:05

06/12/2021

Automatic Unsupervised Outlier Model Selection

Yue Zhao, Ryan Rossi, Leman Akoglu

Keywords Paper

machine learning, self-supervised learning, meta learning, clustering

0

0

0

0

15:08

18/07/2021

How Important is the Train-Validation Split in Meta-Learning?

Yu Bai, Minshuo Chen, Pan Zhou and
Tuo Zhao, Jason Lee, Sham Kakade, Huan Wang, Caiming Xiong

Keywords Paper

Theory, Statistical Learning Theory

0

0

0

0

5:11

06/12/2020

Incorporating BERT into Parallel Sequence Decoding with Adapters

Junliang Guo, Zhirui Zhang, Linli Xu and
Hao-Ran Wei, Boxing Chen, Enhong Chen

Keywords Paper

0

0

0

0

3:17

18/07/2021

DORO: Distributional and Outlier Robust Optimization

Runtian Zhai, Chen Dan, Zico Kolter, Pradeep Ravikumar

Keywords Paper

Probabilistic Methods, Robust statistics

0

0

0

1

5:06

16/11/2020

Q-learning with Language Model for Edit-based Unsupervised Summarization

Ryosuke Kohita, Akifumi Wachi, Yang Zhao, Ryuki Tachibana

Keywords Paper

abstractive textsummarization, unsupervised summarization, unsupervised summarizers, unsupervised methods

0

0

0

0

12:32

22/11/2021

Elsa: Energy-based Learning for Semi-supervised Anomaly Detection

Sungwon Han, HyeonHo Song, Seung Eon Lee and
Sungwon Park, Meeyoung Cha

Keywords Paper

contrastive learning, energy-based learning, semi-supervised learning, anomaly detection

0

0

0

0

2:48

12/07/2020

Bidirectional Model-based Policy Optimization

Hang Lai, Jian Shen, Weinan Zhang, Yong Yu

Keywords Paper

Reinforcement Learning - Deep RL

0

0

0

0

15:35

12/07/2020

Tightening Exploration in Upper Confidence Reinforcement Learning

Hippolyte Bourel, Odalric-Ambrym Maillard, Mohammad Sadegh Talebi

Keywords Paper

Reinforcement Learning - General

0

0

0

0

16:14

02/02/2021

ADAHESSIAN: An Adaptive Second Order Optimizer for Machine Learning

Zhewei Yao, Amir Gholami, Sheng Shen and
Mustafa Mustafa, Kurt Keutzer, Michael Mahoney

Keywords Paper

0

0

0

0

19:27

06/12/2021

Characterizing Generalization under Out-Of-Distribution Shifts in Deep Metric Learning

Timo Milbich, Karsten Roth, Samarth Sinha and
Ludwig Schmidt, Marzyeh Ghassemi, Bjorn Ommer

Keywords Paper

representation learning, transfer learning

0

0

0

0

5:52

26/04/2020

Making Sense of Reinforcement Learning and Probabilistic Inference

Brendan O'Donoghue, Ian Osband, Catalin Ionescu

Keywords Paper

Reinforcement learning, Bayesian inference, Exploration

0

0

0

0

5:21

06/12/2021

Towards Understanding Why Lookahead Generalizes Better Than SGD and Beyond

Pan Zhou, Hanshu Yan, Xiaotong Yuan and
Jiashi Feng, Shuicheng Yan

Keywords Paper

deep learning, optimization

0

0

0

0

11:43

26/04/2020

Ranking Policy Gradient

Kaixiang Lin, Jiayu Zhou

Keywords Paper

Sample-efficient reinforcement learning, off-policy learning.

0

0

0

0

5:43

03/05/2021

Learning Value Functions in Deep Policy Gradients using Residual Variance

Yannis Flet-Berliac, reda ouhamma, odalric-ambrym maillard, philippe preux

Keywords Paper

0

0

0

0

4:49

06/12/2020

Robust, Accurate Stochastic Optimization for Variational Inference

Akash Kumar Dhaka, Alejandro Catalina, Michael Andersen and
Måns Magnusson, Jonathan Huggins, Aki Vehtari

Keywords Paper

0

0

0

0

3:23

06/12/2020

Adversarial Distributional Training for Robust Deep Learning

Yinpeng Dong, Zhijie Deng, Tianyu Pang and
Jun Zhu, Hang Su

Keywords Paper

1

0

0

1

3:22

18/07/2021

Provably Correct Optimization and Exploration with Non-linear Policies

Fei Feng, Wotao Yin, Alekh Agarwal, Lin Yang

Keywords Paper

Deep Learning, Adversarial Networks, Applications, Fairness, Accountability, and Transparency, Theory, RL, Decisions and Control Theory

0

0

0

0

5:03