An Empirical Investigation Towards Efficient Multi-Domain Language Model Pre-training

16/11/2020

An Empirical Investigation Towards Efficient Multi-Domain Language Model Pre-training

Kristjan Arumae, Qing Sun, Parminder Bhatia

Keywords: pre-training models, natural community, out tasks, clinical recognition

Abstract Paper Similar Papers

Abstract: Pre-training large language models has become a standard in the natural language processing community. Such models are pre-trained on generic data (e.g. BookCorpus and English Wikipedia) and often fine-tuned on tasks in the same domain. However, in order to achieve state-of-the-art performance on out of domain tasks such as clinical named entity recognition and relation extraction, additional in domain pre-training is required. In practice, staged multi-domain pre-training presents performance deterioration in the form of catastrophic forgetting (CF) when evaluated on a generic benchmark such as GLUE. In this paper we conduct an empirical investigation into known methods to mitigate CF. We find that elastic weight consolidation provides best overall scores yielding only a 0.33% drop in performance across seven generic tasks while remaining competitive in bio-medical tasks. Furthermore, we explore gradient and latent clustering based data selection techniques to improve coverage when using elastic weight consolidation and experience replay methods.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at EMNLP 2020 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

26/04/2020

Mixout: Effective Regularization to Finetune Large-scale Pretrained Language Models

Cheolhyoung Lee, Kyunghyun Cho, Wanmo Kang

Keywords Paper

regularization, finetuning, dropout, dropconnect, adaptive L2-penalty, BERT, pretrained language model

0

0

0

0

5:04

07/09/2020

BCaR: Beginner Classifier as Regularization Towards Generalizable Re-ID

Masato Tamura, Tomoaki Yoshinaga

Keywords Paper

person re-identification, generalizable, soft label, knowledge distillation, Re-ID, domain generalization

0

0

0

0

6:53

03/05/2021

Supervised Contrastive Learning for Pre-trained Language Model Fine-tuning

Beliz Gunel, Jingfei Du, Alexis Conneau, Veselin Stoyanov

Keywords Paper

supervised contrastive learning, pre-trained language model fine-tuning, natural language understanding, generalization, few-shot learning, robustness

0

0

0

0

4:44

03/05/2021

Self-supervised Adversarial Robustness for the Low-label, High-data Regime

Sven Gowal, Po-Sen Huang, Aaron v den and
Timothy A Mann, Pushmeet Kohli

Keywords Paper

self-supervised, adversarial training, robustness

0

0

0

0

5:17

06/12/2020

Unsupervised Data Augmentation for Consistency Training

Qizhe Xie, Zihang Dai, Eduard Hovy and
Thang Luong, Quoc V Le

Keywords Paper

0

0

0

0

3:29

19/04/2021

Communicative-function-based sentence classification for construction of an academic formulaic expression database

Kenichi Iwatsuki, Akiko Aizawa

Keywords Paper

0

0

0

0

9:17

03/05/2021

Conditionally Adaptive Multi-Task Learning: Improving Transfer Learning in NLP Using Fewer Parameters & Less Data

Jonathan Pilault, Amine EL hattami, Chris J Pal

Keywords Paper

Natural Language Processing, Transfer Learning, Adaptive Learning, Multi-Task Learning

0

0

0

0

5:10

14/06/2020

Auxiliary Training: Towards Accurate and Robust Models

Linfeng Zhang, Muzhou Yu, Tong Chen and
Zuoqiang Shi, Chenglong Bao, Kaisheng Ma

Keywords Paper

model robustness, data augmentation, adversarial attack, training method, classification

0

0

0

0

0:56

26/04/2020

Pre-training Tasks for Embedding-based Large-scale Retrieval

Wei-Cheng Chang, Felix X. Yu, Yin-Wen Chang and
Yiming Yang, Sanjiv Kumar

Keywords Paper

natural language processing, large-scale retrieval, unsupervised representation learning, paragraph-level pre-training, two-tower Transformer models

0

0

0

1

4:39

19/04/2021

Jointly improving language understanding and generation with quality-weighted weak supervision of automatic labeling

Ernie Chang, Vera Demberg, Alex Marin

Keywords Paper

0

0

0

0

7:28

06/12/2021

How Should Pre-Trained Language Models Be Fine-Tuned Towards Adversarial Robustness?

Xinshuai Dong, Anh Tuan Luu, Min Lin and
Shuicheng Yan, Hanwang Zhang

Keywords Paper

robustness, adversarial robustness and security, language

0

0

0

0

10:26

12/07/2020

Countering Language Drift with Seeded Iterated Learning

Yuchen Lu, Soumye Singhal, Florian Strub and
Aaron Courville, Olivier Pietquin

Keywords Paper

Deep Learning - Algorithms

0

0

0

0

14:25

02/02/2021

LightXML: Transformer with Dynamic Negative Sampling for High-Performance Extreme Multi-label Text Classification

Ting Jiang, Deqing Wang, Leilei Sun and
Huayi Yang, Zhengyang Zhao, Fuzhen Zhuang

Keywords Paper

0

0

0

0

16:28

06/12/2020

ConvBERT: Improving BERT with Span-based Dynamic Convolution

Zi-Hang Jiang, Weihao Yu, Daquan Zhou and
Yunpeng Chen, Jiashi Feng, Shuicheng Yan

Keywords Paper

0

0

0

0

3:20

08/12/2020

Detecting Urgency Status of Crisis Tweets: A Transfer Learning Approach for Low Resource Languages

Efsun Sarioglu Kayi, Linyong Nan, Bohan Qu and
Mona Diab, Kathleen McKeown

Keywords Paper

0

0

0

0

14:37

05/12/2020

Beyond fine-tuning: Few-sample sentence embedding transfer

Siddhant Garg, Rohit Kumar Sharma, Yingyu Liang

Keywords Paper

0

0

0

0

9:56

18/07/2021

Straight to the Gradient: Learning to Use Novel Tokens for Neural Text Generation

Xiang Lin, Simeng Han, Shafiq Joty

Keywords Paper

Applications, Natural Language Processing

0

0

0

0

16:00

16/11/2020

Ad-hoc Document Retrieval using Weak-Supervision with BERT and GPT2

Yosi Mass, Haggai Roitman

Keywords Paper

ad-hoc retrieval, manually data, weakly-supervised method, deep models

0

0

0

0

8:03

06/12/2020

Robust Pre-Training by Adversarial Contrastive Learning

Ziyu Jiang, Tianlong Chen, Ting Chen, Zhangyang Wang

Keywords Paper

0

0

0

0

3:26

19/08/2021

Generating Senses and RoLes: An End-to-End Model for Dependency- and Span-based Semantic Role Labeling

Rexhina Blloshmi, Simone Conia, Rocco Tripodi, Roberto Navigli

Keywords Paper

Natural Language Processing, Natural Language Semantics, Natural Language Generation, Natural Language Processing

0

0

0

0

15:18

06/12/2021

Data Augmentation Can Improve Robustness

Sylvestre-Alvise Rebuffi, Sven Gowal, Dan Andrei Calian and
Florian Stimberg, Olivia Wiles, Timothy A Mann

Keywords Paper

robustness, adversarial robustness and security

0

0

0

0

8:06

02/02/2021

Continuous Self-Attention Models with Neural ODE Networks

Jing Zhang, Peng Zhang, Baiwen Kong and
Junqiu Wei, Xin Jiang

Keywords Paper

0

0

0

0

15:25

03/05/2021

Learnable Embedding sizes for Recommender Systems

Siyi Liu, Chen Gao, Yihong Chen and
Depeng Jin, Yong Li

Keywords Paper

Deep Learning, Embedding Size, Recommender Systems

0

0

0

0

5:29

06/12/2020

Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks

Patrick Lewis, Ethan Perez, Aleksandra Piktus and
Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Scott Yih, Tim Rocktäschel, Sebastian Riedel, Douwe Kiela

Keywords Paper

0

0

0

0

3:25

07/09/2020

On the Exploration of Incremental Learning for Fine-grained Image Retrieval

Wei Chen, Yu Liu, Weiping Wang and
Tinne Tuytelaars, Erwin M. Bakker, Michael Lew

Keywords Paper

Incremental learning, Fine-grained image retrieval, Catastrophic forgetting, Maximum Mean Discrepancy

0

0

0

0

8:32

06/12/2021

Compacter: Efficient Low-Rank Hypercomplex Adapter Layers

Rabeeh Karimi Mahabadi, James Henderson, Sebastian Ruder

Keywords Paper

optimization

0

0

0

0

14:16

19/08/2021

Automatic Mixed-Precision Quantization Search of BERT

Changsheng Zhao, Ting Hua, Yilin Shen and
Qian Lou, Hongxia Jin

Keywords Paper

Machine Learning, Deep Learning, NLP Applications and Tools, Text Classification

0

0

0

0

12:12

26/04/2020

FreeLB: Enhanced Adversarial Training for Natural Language Understanding

Chen Zhu, Yu Cheng, Zhe Gan and
Siqi Sun, Tom Goldstein, Jingjing Liu

Keywords Paper

0

0

0

0

5:26

18/07/2021

Delving into Deep Imbalanced Regression

Yuzhe Yang, Kaiwen Zha, YINGCONG CHEN and
Hao Wang, Dina Katabi

Keywords Paper

Applications

0

0

0

0

16:37

06/12/2020

Uncertainty-aware Self-training for Few-shot Text Classification

Subhabrata Mukherjee, Ahmed Awadallah

Keywords Paper

0

0

0

0

3:16

03/05/2021

Reweighting Augmented Samples by Minimizing the Maximal Expected Loss

Mingyang Yi, LU HOU, Lifeng Shang and
Xin Jiang, Qun Liu, Zhi-Ming Ma

Keywords Paper

sample reweighting, data augmentation

0

0

0

0

4:58

06/12/2020

ScaleCom: Scalable Sparsified Gradient Compression for Communication-Efficient Distributed Training

Chia-Yu Chen, Jiamin Ni, Songtao Lu and
Xiaodong Cui, Pin-Yu Chen, Xiao Sun, Naigang Wang, Swagath Venkataramani, Vijayalakshmi (Viji) Srinivasan, Wei Zhang, Kailash Gopalakrishnan

Keywords Paper

0

0

0

0

3:06

04/07/2020

SMART: Robust and Efficient Fine-Tuning for Pre-trained Natural Language Models through Principled Regularized Optimization

Haoming Jiang, Pengcheng He, Weizhu Chen and
Xiaodong Liu, Jianfeng Gao, Tuo Zhao

Keywords Paper

NLP, generalization, NLP tasks, SMART

0

0

0

0

11:43

03/05/2021

CoDA: Contrast-enhanced and Diversity-promoting Data Augmentation for Natural Language Understanding

Yanru Qu, Dinghan Shen, Yelong Shen and
Sandra Sajeev, Weizhu Chen, Jiawei Han

Keywords Paper

consistency training, contrastive learning, data augmentation, natural language understanding

0

0

0

0

6:02

06/12/2021

Achieving Forgetting Prevention and Knowledge Transfer in Continual Learning

Zixuan Ke, Bing Liu, Nianzu Ma and
Hu Xu, Lei Shu

Keywords Paper

language, continual learning

0

0

0

0

9:48

19/04/2021

Does typological blinding impede cross-lingual sharing?

Johannes Bjerva, Isabelle Augenstein

Keywords Paper

0

0

0

0

7:52

19/04/2021

Keep learning: Self-supervised meta-learning for learning from inference

Akhil Kedia, Sai Chetan Chinthakindi

Keywords Paper

0

0

0

0

11:27

16/11/2020

Efficient Meta Lifelong-Learning with Limited Memory

Zirui Wang, Sanket Vaibhav Mehta, Barnabas Poczos, Jaime Carbonell

Keywords Paper

lifelong learning, local adaptation, text benchmarks, multi-task learning

0

0

0

0

12:03

18/07/2021

SparseBERT: Rethinking the Importance Analysis in Self-attention

Han Shi, Jiahui Gao, Xiaozhe Ren and
Hang Xu, Xiaodan Liang, Zhenguo Li, James Kwok

Keywords Paper

Applications, Natural Language Processing

0

0

0

0

5:13

13/04/2021

Semi-supervised learning with meta-gradient

Taihong Xiao, Xin-Yu Zhang, Haolin Jia and
Ming-Ming Cheng, Ming-Hsuan Yang

Keywords Paper

0

0

0

0

2:56