SMART: Robust and Efficient Fine-Tuning for Pre-trained Natural Language Models through Principled Regularized Optimization

Abstract: Transfer learning has fundamentally changed the landscape of natural language processing (NLP). Many state-of-the-art models are first pre-trained on a large text corpus and then fine-tuned on downstream tasks. However, due to limited data resources from downstream tasks and the extremely high complexity of pre-trained models, aggressive fine-tuning often causes the fine-tuned model to overfit the training data of downstream tasks and fail to generalize to unseen data. To address such an issue in a principled manner, we propose a new learning framework for robust and efficient fine-tuning for pre-trained models to attain better generalization performance. The proposed framework contains two important ingredients: 1. Smoothness-inducing regularization, which effectively manages the complexity of the model; 2. Bregman proximal point optimization, which is an instance of trust-region methods and can prevent aggressive updating. Our experiments show that the proposed framework achieves new state-of-the-art performance on a number of NLP tasks including GLUE, SNLI, SciTail and ANLI. Moreover, it also outperforms the state-of-the-art T5 model, which is the largest pre-trained model containing 11 billion parameters, on GLUE.

06/12/2021

supervised contrastive learning, pre-trained language model fine-tuning, natural language understanding, generalization, few-shot learning, robustness

4:44

02/02/2021

SMART: Robust and Efficient Fine-Tuning for Pre-trained Natural Language Models through Principled Regularized Optimization

Haoming Jiang, Pengcheng He, Weizhu Chen, Xiaodong Liu, Jianfeng Gao, Tuo Zhao

Comments

Similar Papers

NxMTransformer: Semi-Structured Sparsification for Natural Language Understanding via ADMM

Connor Holmes, Minjia Zhang, Yuxiong He, Bo Wu

Keywords Abstract Paper

optimization, transformers, language

Supervised Contrastive Learning for Pre-trained Language Model Fine-tuning

Beliz Gunel, Jingfei Du, Alexis Conneau, Veselin Stoyanov

Keywords Abstract Paper

supervised contrastive learning, pre-trained language model fine-tuning, natural language understanding, generalization, few-shot learning, robustness

IsoBN: Fine-Tuning BERT with Isotropic Batch Normalization

Wenxuan Zhou, Bill Yuchen Lin, Xiang Ren

Keywords Abstract Paper

Grad2Task: Improved Few-shot Text Classification Using Gradients for Task Representation

Jixuan Wang, Kuan-Chieh Wang, Frank Rudzicz, Michael Brudno

Keywords Abstract Paper

machine learning, transformers, meta learning, language, transfer learning

Why Do Pretrained Language Models Help in Downstream Tasks? An Analysis of Head and Prompt Tuning

Colin Wei, Sang Michael Xie, Tengyu Ma

Keywords Abstract Paper

theory, machine learning, self-supervised learning, generative model, representation learning, language

InfoBERT: Improving Robustness of Language Models from An Information Theoretic Perspective

Boxin Wang, Shuohang Wang, Yu Cheng and Zhe Gan, Ruoxi Jia, Bo Li, Jingjing Liu

Keywords Abstract Paper

adversarial training, QA, NLI, BERT, information theory, adversarial robustness

Unsupervised Word Translation with Adversarial Autoencoder

Tasnim Mohiuddin, Shafiq Joty

Keywords Abstract Paper

Unsupervised Translation, machine translation, transfer learning, word task

Conditionally Adaptive Multi-Task Learning: Improving Transfer Learning in NLP Using Fewer Parameters & Less Data

Jonathan Pilault, Amine EL hattami, Chris J Pal

Keywords Abstract Paper

Natural Language Processing, Transfer Learning, Adaptive Learning, Multi-Task Learning

Multilingual Pre-training with Universal Dependency Learning

Kailai Sun, Zuchao Li, Hai Zhao

Keywords Abstract Paper

language, interpretability

Low-Rank Bottleneck in Multi-head Attention Models

Srinadh Bhojanapalli, Chulhee Yun, Ankit Singh Rawat and Sashank Jakkam Reddi, Sanjiv Kumar

Keywords Abstract Paper

Deep Learning - Theory

Keep learning: Self-supervised meta-learning for learning from inference

Akhil Kedia, Sai Chetan Chinthakindi

Keywords Abstract Paper

Partially-Aligned Data-to-Text Generation with Distant Supervision

Zihao Fu, Bei Shi, Wai Lam and Lidong Bing, Zhiyuan Liu

Keywords Abstract Paper

data-to-text task, generation task, dataset problem, over-generation problem

Structured Pruning of Large Language Models

Ziheng Wang, Jeremy Wohlwend, Tao Lei

Keywords Abstract Paper

natural tasks, model compression, language tasks, pruning embeddings

Gradually Vanishing Bridge for Adversarial Domain Adaptation

Shuhao Cui, Shuhui Wang, Junbao Zhuo and Chi Su, Qingming Huang, Qi Tian

Keywords Abstract Paper

bridge, domain adaptation, adversarial learning

Generating syntactically controlled paraphrases without using annotated parallel pairs

Kuan-Hao Huang, Kai-Wei Chang

Keywords Abstract Paper

Achieving Forgetting Prevention and Knowledge Transfer in Continual Learning

Zixuan Ke, Bing Liu, Nianzu Ma and Hu Xu, Lei Shu

Keywords Abstract Paper

language, continual learning

Improving the Efficiency and Effectiveness for BERT-based Entity Resolution

Bing Li, Yukai Miao, Yaoshu Wang and Yifang Sun, Wei Wang

Keywords Abstract Paper

ALBERT: A Lite BERT for Self-supervised Learning of Language Representations

Zhenzhong Lan, Mingda Chen, Sebastian Goodman and Kevin Gimpel, Piyush Sharma, Radu Soricut

Keywords Abstract Paper

Natural Language Processing, BERT, Representation Learning

Refining Language Models with Compositional Explanations

Huihan Yao, Ying Chen, Qinyuan Ye and Xisen Jin, Xiang Ren

Keywords Abstract Paper

machine learning, fairness, language

Improving Commonsense Causal Reasoning by Adversarial Training and Data Augmentation

Ieva Staliūnaitė, Philip John Gorinski, Ignacio Iacobacci

Keywords Abstract Paper

Self-Supervised Meta-Learning for Few-Shot Natural Language Classification Tasks

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Boxin Wang, Shuohang Wang, Yu Cheng and
Zhe Gan, Ruoxi Jia, Bo Li, Jingjing Liu

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Srinadh Bhojanapalli, Chulhee Yun, Ankit Singh Rawat and
Sashank Jakkam Reddi, Sanjiv Kumar

Keywords Paper

Keywords Paper

Zihao Fu, Bei Shi, Wai Lam and
Lidong Bing, Zhiyuan Liu

Keywords Paper

Keywords Paper

Shuhao Cui, Shuhui Wang, Junbao Zhuo and
Chi Su, Qingming Huang, Qi Tian

Keywords Paper

Keywords Paper

Zixuan Ke, Bing Liu, Nianzu Ma and
Hu Xu, Lei Shu

Keywords Paper

Bing Li, Yukai Miao, Yaoshu Wang and
Yifang Sun, Wei Wang

Keywords Paper

Zhenzhong Lan, Mingda Chen, Sebastian Goodman and
Kevin Gimpel, Piyush Sharma, Radu Soricut

Keywords Paper

Huihan Yao, Ying Chen, Qinyuan Ye and
Xisen Jin, Xiang Ren

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Yanru Qu, Dinghan Shen, Yelong Shen and
Sandra Sajeev, Weizhu Chen, Jiawei Han

Keywords Paper

Hadi Salman, Andrew Ilyas, Logan Engstrom and
Ashish Kapoor, Aleksander Madry

Keywords Paper

Armen Aghajanyan, Akshat Shrivastava, Anchit Gupta and
Naman Goyal, Luke Zettlemoyer, Sonal Gupta

Keywords Paper

Francisco Utrera, Evan Kravitz, N. Benjamin Erichson and
Rajiv Khanna, Michael W Mahoney

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Linfeng Zhang, Muzhou Yu, Tong Chen and
Zuoqiang Shi, Chenglong Bao, Kaisheng Ma

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Lingxiao Wang, Jing Huang, Kevin Huang and
Ziniu Hu, Guangtao Wang, Quanquan Gu

Keywords Paper

Keywords Paper