Textual Data Augmentation for Efficient Active Learning on Tiny Datasets

Abstract: In this paper we propose a novel data augmentation approach where guided outputs of a language generation model, e.g. GPT-2, when labeled, can improve the performance of text classifiers through an active learning process. We transform the data generation task into an optimization problem which maximizes the usefulness of the generated output, using Monte Carlo Tree Search (MCTS) as the optimization strategy and incorporating entropy as one of the optimization criteria. We test our approach against a Non-Guided Data Generation (NGDG) process that does not optimize for a reward function. Starting with a small set of data, our results show an increased performance with MCTS of 26% on the TREC-6 Questions dataset, and 10% on the Stanford Sentiment Treebank SST-2 dataset. Compared with NGDG, we are able to achieve increases of 3% and 5% on TREC-6 and SST-2.

12/07/2020

affective computing, perceived emotions, context understanding, multimodal, inter-agent interactions, depth maps, deep learning, background, attention maps

1:00

02/02/2021

Textual Data Augmentation for Efficient Active Learning on Tiny Datasets

Husam Quteineh, Spyridon Samothrakis, Richard Sutcliffe

Comments

Similar Papers

Closed Loop Neural-Symbolic Learning via Integrating Neural Perception, Grammar Parsing, and Symbolic Reasoning

Qing Li, Siyuan Huang, Yining Hong and Yixin Chen, Ying Nian Wu, Song-Chun Zhu

Keywords Abstract Paper

Applications - Computer Vision

Pronoun-Targeted Fine-tuning for NMT with Hybrid Losses

Prathyusha Jwalapuram, Shafiq Joty, Youlin Shen

Keywords Abstract Paper

pronoun translations, pronoun translation, neural training, backtranslation

EmotiCon: Context-Aware Multimodal Emotion Recognition Using Frege’s Principle

Trisha Mittal, Pooja Guhan, Uttaran Bhattacharya and Rohan Chandra, Aniket Bera, Dinesh Manocha

Keywords Abstract Paper

affective computing, perceived emotions, context understanding, multimodal, inter-agent interactions, depth maps, deep learning, background, attention maps

DialogBERT: Discourse-Aware Response Generation via Learning to Recover and Rank Utterances

Xiaodong Gu, Kang Min Yoo, Jung-Woo Ha

Keywords Abstract Paper

A Simple and Effective Self-Supervised Contrastive Learning Framework for Aspect Detection

Tian Shi, Liuqing Li, Ping Wang, Chandan K. Reddy

Keywords Abstract Paper

Unsupervised Representation Learning via Neural Activation Coding

Yookoon Park, Sangho Lee, Gunhee Kim, David Blei

Keywords Abstract Paper

Deep Learning, Embedding and Representation learning

AREDSUM: Adaptive redundancy-aware iterative sentence ranking for extractive document summarization

Keping Bi, Rahul Jha, Bruce Croft, Asli Celikyilmaz

Keywords Abstract Paper

Unsupervised Text Generation by Learning from Search

Jingjing Li, Zichao Li, Lili Mou and Xin Jiang, Michael Lyu, Irwin King

Keywords Abstract Paper

Deberta: Decoding-Enhanced Bert With Disentangled Attention

Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen

Keywords Abstract Paper

Position Encoding, Attention, Natural Language Processing, Language Model Pre-training, Transformer

Exploring and Predicting Transferability across NLP Tasks

Tu Vu, Tong Wang, Tsendsuren Munkhdalai and Alessandro Sordoni, Adam Trischler, Andrew Mattarella-Micke, Subhransu Maji, Mohit Iyyer

Keywords Abstract Paper

language modeling, nlp tasks, text classification, question answering

Diversity Enhanced Active Learning with Strictly Proper Scoring Rules

Wei Tan, Lan Du, Wray Buntine

Keywords Abstract Paper

machine learning, active learning

Bayesian Hierarchical Words Representation Learning

Oren Barkan, Idan Rejwan, Avi Caciularu, Noam Koenigstein

Keywords Abstract Paper

Bayesian modeling, Bayesian Learning, BHWR, Variational learning

Using Context in Neural Machine Translation Training Objectives

Danielle Saunders, Felix Stahlberg, Bill Byrne

Keywords Abstract Paper

Neural training, NMT training, document-level training, NMT objective

FreeLB: Enhanced Adversarial Training for Natural Language Understanding

Chen Zhu, Yu Cheng, Zhe Gan and Siqi Sun, Tom Goldstein, Jingjing Liu

Keywords Abstract Paper

Educating Text Autoencoders: Latent Representation Guidance via Denoising

Tianxiao Shen, Jonas Mueller, Regina Barzilay, Tommi Jaakkola

Keywords Abstract Paper

Deep Learning - Generative Models and Autoencoders

DAGA: Data Augmentation with a Generation Approach forLow-resource Tagging Tasks

Bosheng Ding, Linlin Liu, Lidong Bing and Canasai Kruengkrai, Thien Hai Nguyen, Shafiq Joty, Luo Si, Chunyan Miao

Keywords Abstract Paper

machine learning, generalization, low-resource tasks, named recognition

Sequential Generative Exploration Model for Partially Observable Reinforcement Learning

Haiyan Yin, Jianda Chen, Sinno Jialin Pan, Sebastian Tschiatschek

Keywords Abstract Paper

CPR: Classifier-Projection Regularization for Continual Learning

Sungmin Cha, Hsiang Hsu, Taebaek Hwang and Flavio Calmon, Taesup Moon

Keywords Abstract Paper

regularization, wide local minima, continual learning

Generating Dialogue Responses from a Semantic Latent Space

Wei-Jen Ko, Avik Ray, Yilin Shen, Hongxia Jin

Keywords Abstract Paper

generation responses, regression task, open-domain models, end-to-end classification

Self-Adversarial Learning with Comparative Discrimination for Text Generation

Wangchunshu Zhou, Tao Ge, Ke Xu and Furu Wei, Ming Zhou

Keywords Abstract Paper

adversarial learning, text generation

A Max-Min Entropy Framework for Reinforcement Learning

Seungyul Han, Youngchul Sung

Qing Li, Siyuan Huang, Yining Hong and
Yixin Chen, Ying Nian Wu, Song-Chun Zhu

Keywords Paper

Keywords Paper

Trisha Mittal, Pooja Guhan, Uttaran Bhattacharya and
Rohan Chandra, Aniket Bera, Dinesh Manocha

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Jingjing Li, Zichao Li, Lili Mou and
Xin Jiang, Michael Lyu, Irwin King

Keywords Paper

Keywords Paper

Tu Vu, Tong Wang, Tsendsuren Munkhdalai and
Alessandro Sordoni, Adam Trischler, Andrew Mattarella-Micke, Subhransu Maji, Mohit Iyyer

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Chen Zhu, Yu Cheng, Zhe Gan and
Siqi Sun, Tom Goldstein, Jingjing Liu

Keywords Paper

Keywords Paper

Bosheng Ding, Linlin Liu, Lidong Bing and
Canasai Kruengkrai, Thien Hai Nguyen, Shafiq Joty, Luo Si, Chunyan Miao

Keywords Paper

Keywords Paper

Sungmin Cha, Hsiang Hsu, Taebaek Hwang and
Flavio Calmon, Taesup Moon

Keywords Paper

Keywords Paper

Wangchunshu Zhou, Tao Ge, Ke Xu and
Furu Wei, Ming Zhou

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Chao Jiang, Mounica Maddela, Wuwei Lan and
Yang Zhong, Wei Xu

Keywords Paper

Keywords Paper

Ming Zhong, Pengfei Liu, Yiran Chen and
Danqing Wang, Xipeng Qiu, Xuanjing Huang

Keywords Paper

Tuomas Oikarinen, Wang Zhang, Alexandre Megretski and
Luca Daniel, Tsui-Wei Weng

Keywords Paper

Hongbin Ye, Ningyu Zhang, Shumin Deng and
Mosha Chen, Chuanqi Tan, Fei Huang, Huajun Chen

Keywords Paper

Ruibo Liu, Guangxuan Xu, Chenyan Jia and
Weicheng Ma, Lili Wang, Soroush Vosoughi

Keywords Paper

Yi Ren, Shangmin Guo, Matthieu Labeau and
Shay B. Cohen, Simon Kirby

Keywords Paper

Sam Lobel, Chunyuan Li, Jianfeng Gao, Lawrence Carin

Keywords Paper

Will Dabney, André Barreto, Mark Rowland and
Robert Dadashi, John Quan, Marc G. Bellemare, David Silver

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper