Cold-start Active Learning through Self-supervised Language Modeling

Abstract: Active learning strives to reduce annotation costs by choosing the most critical examples to label. Typically, the active learning strategy is contingent on the classification model. For instance, uncertainty sampling depends on poorly calibrated model confidence scores. In the cold-start setting, active learning is impractical because of model instability and data scarcity. Fortunately, modern NLP provides an additional source of information: pre-trained language models. The pre-training loss can find examples that surprise the model and should be labeled for efficient fine-tuning. Therefore, we treat the language modeling loss as a proxy for classification uncertainty. With BERT, we develop a simple strategy based on the masked language modeling loss that minimizes labeling costs for text classification. Compared to other baselines, our approach reaches higher accuracy within less sampling iterations and computation time.

19/04/2021

Cold-start Active Learning through Self-supervised Language Modeling

Michelle Yuan, Hsuan-Tien Lin, Jordan Boyd-Graber

Comments

Similar Papers

Implicit unlikelihood training: Improving neural text generation with reinforcement learning

Evgeny Lagutin, Daniil Gavrilov, Pavel Kalaidin

Keywords Abstract Paper

MASKER: Masked Keyword Regularization for Reliable Text Classification

Seung Jun Moon, Sangwoo Mo, Kimin Lee and Jaeho Lee, Jinwoo Shin

Keywords Abstract Paper

ColdGANs: Taming Language GANs with Cautious Sampling Strategies

Thomas Scialom, Paul-Alexis Dray, Sylvain Lamprier and Benjamin Piwowarski, Jacopo Staiano

Keywords Abstract Paper

Weakly Supervised Semantic Segmentation for Large-Scale Point Cloud

Yachao Zhang, Zonghao Li, Yuan Xie and Yanyun Qu, Cuihua Li, Tao Mei

Keywords Abstract Paper

Does typological blinding impede cross-lingual sharing?

Johannes Bjerva, Isabelle Augenstein

Keywords Abstract Paper

Robust early-learning: Hindering the memorization of noisy labels

Xiaobo Xia, Tongliang Liu, Bo Han and Chen Gong, Nannan Wang, Zongyuan Ge, Yi Chang

Keywords Abstract Paper

Joint Training with Semantic Role Labeling for Better Generalization in Natural Language Inference

Cemil Cengiz, Deniz Yuret

Keywords Abstract Paper

Multi-Label Learning with Pairwise Relevance Ordering

Ming-Kun Xie, Sheng-Jun Huang

Keywords Abstract Paper

machine learning

Learning to Forget for Meta-Learning

Sungyong Baik, Seokil Hong, Kyoung Mu Lee

Keywords Abstract Paper

meta learning, few-shot learning, reinforcement learning

Text Classification with Negative Supervision

Sora Ohashi, Junya Takayama, Tomoyuki Kajiwara and Chenhui Chu, Yuki Arase

Keywords Abstract Paper

Text Classification, text representation, text tasks, single- classifications

Syntactic Structure Distillation Pretraining for Bidirectional Encoders

Adhiguna Kuncoro, Lingpeng Kong, Daniel Fried and Dani Yogatama, Laura Rimell, Chris Dyer, Phil Blunsom

Keywords Abstract Paper

bert pretraining, structured tasks, natural understanding, textual learners

Is a Good Representation Sufficient for Sample Efficient Reinforcement Learning?

Simon S. Du, Sham M. Kakade, Ruosong Wang, Lin F. Yang

Keywords Abstract Paper

reinforcement learning, function approximation, lower bound, representation

Synthetic-to-Real Unsupervised Domain Adaptation for Scene Text Detection in the Wild

Weijia Wu, Ning Lu, Enze Xie and Yuxing Wang, Wenwen Yu, Cheng Yang, Hong Zhou

Keywords Abstract Paper

An Information Bottleneck Approach for Controlling Conciseness in Rationale Extraction

Bhargavi Paranjape, Mandar Joshi, John Thickstun and Hannaneh Hajishirzi, Luke Zettlemoyer

Keywords Abstract Paper

language understanding, semi-supervised setting, complex models, explainer

MPNet: Masked and Permuted Pre-training for Language Understanding

Kaitao Song, Xu Tan, Tao Qin and Jianfeng Lu, Tie-Yan Liu

Keywords Abstract Paper

Empirical Analysis of Unlabeled Entity Problem in Named Entity Recognition

Yangming Li, lemao liu, Shuming Shi

Keywords Abstract Paper

Negative Sampling, Unlabeled Entity Problem, Named Entity Recognition

Refining Language Models with Compositional Explanations

Huihan Yao, Ying Chen, Qinyuan Ye and Xisen Jin, Xiang Ren

Keywords Abstract Paper

machine learning, fairness, language

Robust Pre-Training by Adversarial Contrastive Learning

Ziyu Jiang, Tianlong Chen, Ting Chen, Zhangyang Wang

Keywords Abstract Paper

Counterfactual Maximum Likelihood Estimation for Training Deep Networks

Xinyi Wang, Wenhu Chen, Michael Saxon, William Yang Wang

Keywords Abstract Paper

deep learning, domain adaptation, causality, language

TOD-BERT: Pre-trained Natural Language Understanding for Task-Oriented Dialogue

Chien-Sheng Wu, Steven C.H. Hoi, Richard Socher, Caiming Xiong

Keywords Abstract Paper

language modeling, pre-training, response task, task-oriented applications

Active Learning for BERT: An Empirical Study

Liat Ein-Dor, Alon Halfon, Ariel Gera and Eyal Shnarch, Lena Dankin, Leshem Choshen, Marina Danilevsky, Ranit Aharonov, Yoav Katz, Noam Slonim

Keywords Abstract Paper

text classification, nlp tasks, bert-based classification, binary classification

Multilingual and cross-lingual document classification: A meta-learning approach

Niels Heijden, Helen Yannakoudakis, Pushkar Mishra, Ekaterina Shutova

Keywords Paper

Seung Jun Moon, Sangwoo Mo, Kimin Lee and
Jaeho Lee, Jinwoo Shin

Keywords Paper

Thomas Scialom, Paul-Alexis Dray, Sylvain Lamprier and
Benjamin Piwowarski, Jacopo Staiano

Keywords Paper

Yachao Zhang, Zonghao Li, Yuan Xie and
Yanyun Qu, Cuihua Li, Tao Mei

Keywords Paper

Keywords Paper

Xiaobo Xia, Tongliang Liu, Bo Han and
Chen Gong, Nannan Wang, Zongyuan Ge, Yi Chang

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Sora Ohashi, Junya Takayama, Tomoyuki Kajiwara and
Chenhui Chu, Yuki Arase

Keywords Paper

Adhiguna Kuncoro, Lingpeng Kong, Daniel Fried and
Dani Yogatama, Laura Rimell, Chris Dyer, Phil Blunsom

Keywords Paper

Keywords Paper

Weijia Wu, Ning Lu, Enze Xie and
Yuxing Wang, Wenwen Yu, Cheng Yang, Hong Zhou

Keywords Paper

Bhargavi Paranjape, Mandar Joshi, John Thickstun and
Hannaneh Hajishirzi, Luke Zettlemoyer

Keywords Paper

Kaitao Song, Xu Tan, Tao Qin and
Jianfeng Lu, Tie-Yan Liu

Keywords Paper

Keywords Paper

Huihan Yao, Ying Chen, Qinyuan Ye and
Xisen Jin, Xiang Ren

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Liat Ein-Dor, Alon Halfon, Ariel Gera and
Eyal Shnarch, Lena Dankin, Leshem Choshen, Marina Danilevsky, Ranit Aharonov, Yoav Katz, Noam Slonim

Keywords Paper

Keywords Paper

Jue Wang, Ke Chen, Lidan Shou and
Sai Wu, Gang Chen

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

He Zhao, Dinh Phung, Viet Huynh and
Trung Le, Wray Buntine

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper