Contextual Embeddings: When Are They Worth It?

Abstract: We study the settings for which deep contextual embeddings (e.g., BERT) give large improvements in performance relative to classic pretrained embeddings (e.g., GloVe), and an even simpler baseline---random word embeddings---focusing on the impact of the training set size and the linguistic properties of the task. Surprisingly, we find that both of these simpler baselines can match contextual embeddings on industry-scale data, and often perform within 5 to 10% accuracy (absolute) on benchmark tasks. Furthermore, we identify properties of data for which contextual embeddings give particularly large gains: language containing complex structure, ambiguous word usage, and words unseen in training.

19/04/2021

Contextual Embeddings: When Are They Worth It?

Simran Arora, Avner May, Jian Zhang, Christopher Ré

Comments

Similar Papers

Few-shot learning through contextual data augmentation

Farid Arthaud, Rachel Bawden, Alexandra Birch

Keywords Abstract Paper

A pairwise probe for understanding BERT fine-tuning on machine reading comprehension

Jie Cai, Zhengzhou Zhu, Ping Nie, Qian Liu

Keywords Abstract Paper

machine reading comprehension, pairwise, fine-tune, BERT

Data Boost: Text Data Augmentation Through Reinforcement Learning Guided Conditional Generation

Ruibo Liu, Guangxuan Xu, Chenyan Jia and Weicheng Ma, Lili Wang, Soroush Vosoughi

Keywords Abstract Paper

data augmentation, nlu tasks, data boost, text tasks

To BERT or Not to BERT: Comparing Task-specific and Task-agnostic Semi-Supervised Approaches for Sequence Tagging

Kasturi Bhattacharjee, Miguel Ballesteros, Rishita Anubhai and Smaranda Muresan, Jie Ma, Faisal Ladhak, Yaser Al-Onaizan

Keywords Abstract Paper

learning representations, downstream tasks, cross-view cvt, sequence tasks

An Empirical Study on Robustness to Spurious Correlations using Pre-trained Language Models

Lifu Tu, Garima Lalwani, Spandana Gella, He He

Keywords Abstract Paper

generalization, natural inference, paraphrase identification, pre-trained models

Deberta: Decoding-Enhanced Bert With Disentangled Attention

Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen

Keywords Abstract Paper

Position Encoding, Attention, Natural Language Processing, Language Model Pre-training, Transformer

Span Selection Pre-training for Question Answering

Michael Glass, Alfio Gliozzo, Rishav Chakravarti and Anthony Ferritto, Lin Pan, G P Shrivatsa Bhargav, Dinesh Garg, Avi Sil

Keywords Abstract Paper

Question Answering, language tasks, Next Prediction, pre-training task

BERTRAM: Improved Word Embeddings Have Big Impact on Contextualized Model Performance

Timo Schick, Hinrich Schütze

Keywords Abstract Paper

NLP, rare task, BERTRAM, Word Embeddings

ALBERT: A Lite BERT for Self-supervised Learning of Language Representations

Zhenzhong Lan, Mingda Chen, Sebastian Goodman and Kevin Gimpel, Piyush Sharma, Radu Soricut

Keywords Abstract Paper

Natural Language Processing, BERT, Representation Learning

Pronoun-Targeted Fine-tuning for NMT with Hybrid Losses

Prathyusha Jwalapuram, Shafiq Joty, Youlin Shen

Keywords Abstract Paper

pronoun translations, pronoun translation, neural training, backtranslation

Unsupervised Data Augmentation for Consistency Training

Qizhe Xie, Zihang Dai, Eduard Hovy and Thang Luong, Quoc V Le

Keywords Abstract Paper

PARP: Prune, Adjust and Re-Prune for Self-Supervised Speech Recognition

Cheng-I Jeff Lai, Yang Zhang, Alexander Liu and Shiyu Chang, Yi-Lun Liao, Yung-Sung Chuang, Kaizhi Qian, Sameer Khurana, David Cox, Jim Glass

Keywords Abstract Paper

self-supervised learning, representation learning

The Right Tool for the Job: Matching Model and Instance Complexities

Roy Schwartz, Gabriel Stanovsky, Swabha Swayamdipta and Jesse Dodge, Noah A. Smith

Keywords Abstract Paper

inference, early decisions, costly retraining, Job Model

Reducing Transformer Depth on Demand with Structured Dropout

Angela Fan, Edouard Grave, Armand Joulin

Keywords Abstract Paper

reduction, regularization, pruning, dropout, transformer

TriBERT: Human-centric Audio-visual Representation Learning

Tanzila Rahman, Mengyu Yang, Leonid Sigal

Keywords Abstract Paper

transformers, representation learning

Pretrained Encyclopedia: Weakly Supervised Knowledge-Pretrained Language Model

Wenhan Xiong, Jingfei Du, William Yang Wang, Veselin Stoyanov

Keywords Abstract Paper

How fine can fine-tuning be? Learning efficient language models

Evani Radiya-Dixit, Xin Wang

Keywords Abstract Paper

Exploring the Limits of Simple Learners in Knowledge Distillation for Document Classification with DocBERT

Ashutosh Adhikari, Achyudh Ram, Raphael Tang and William L. Hamilton, Jimmy Lin

Keywords Abstract Paper

DAGA: Data Augmentation with a Generation Approach forLow-resource Tagging Tasks

Bosheng Ding, Linlin Liu, Lidong Bing and Canasai Kruengkrai, Thien Hai Nguyen, Shafiq Joty, Luo Si, Chunyan Miao

Keywords Abstract Paper

machine learning, generalization, low-resource tasks, named recognition

To Pretrain or Not to Pretrain: Examining the Benefits of Pretrainng on Resource Rich Tasks

Sinong Wang, Madian Khabsa, Hao Ma

Keywords Abstract Paper

text tasks, Pretrainng, Pretraining models, NLP models

Bootstrapping Techniques for Polysynthetic Morphological Analysis

Keywords Paper

Keywords Paper

Ruibo Liu, Guangxuan Xu, Chenyan Jia and
Weicheng Ma, Lili Wang, Soroush Vosoughi

Keywords Paper

Kasturi Bhattacharjee, Miguel Ballesteros, Rishita Anubhai and
Smaranda Muresan, Jie Ma, Faisal Ladhak, Yaser Al-Onaizan

Keywords Paper

Keywords Paper

Keywords Paper

Michael Glass, Alfio Gliozzo, Rishav Chakravarti and
Anthony Ferritto, Lin Pan, G P Shrivatsa Bhargav, Dinesh Garg, Avi Sil

Keywords Paper

Keywords Paper

Zhenzhong Lan, Mingda Chen, Sebastian Goodman and
Kevin Gimpel, Piyush Sharma, Radu Soricut

Keywords Paper

Keywords Paper

Qizhe Xie, Zihang Dai, Eduard Hovy and
Thang Luong, Quoc V Le

Keywords Paper

Cheng-I Jeff Lai, Yang Zhang, Alexander Liu and
Shiyu Chang, Yi-Lun Liao, Yung-Sung Chuang, Kaizhi Qian, Sameer Khurana, David Cox, Jim Glass

Keywords Paper

Roy Schwartz, Gabriel Stanovsky, Swabha Swayamdipta and
Jesse Dodge, Noah A. Smith

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Ashutosh Adhikari, Achyudh Ram, Raphael Tang and
William L. Hamilton, Jimmy Lin

Keywords Paper

Bosheng Ding, Linlin Liu, Lidong Bing and
Canasai Kruengkrai, Thien Hai Nguyen, Shafiq Joty, Luo Si, Chunyan Miao

Keywords Paper

Keywords Paper

Keywords Paper

Fabio Petroni, Patrick Lewis, Aleksandra Piktus and
Tim Rocktäschel, Yuxiang Wu, Alexander H. Miller, Sebastian Riedel

Keywords Paper

Mostafa Abdou, Vinit Ravishankar, Maria Barrett and
Yonatan Belinkov, Desmond Elliott, Anders Søgaard

Keywords Paper

Keywords Paper

Keywords Paper

Junghyun Min, R. Thomas McCoy, Dipanjan Das and
Emily Pitler, Tal Linzen

Keywords Paper

Keywords Paper

Mengde Xu, Zheng Zhang, Fangyun Wei and
Yutong Lin, Yue Cao, Stephen Lin, Han Hu, Xiang Bai

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Sanyuan Chen, Yutai Hou, Yiming Cui and
Wanxiang Che, Ting Liu, Xiangzhan Yu

Keywords Paper

Keywords Paper

Kaitao Song, Xu Tan, Tao Qin and
Jianfeng Lu, Tie-Yan Liu

Keywords Paper

Wei Zhang, Lu Hou, Yichun Yin and
Lifeng Shang, Xiao Chen, Xin Jiang, Qun Liu

Keywords Paper

Keywords Paper