Intermediate-Task Transfer Learning with Pretrained Language Models: When and Why Does It Work?

04/07/2020

Intermediate-Task Transfer Learning with Pretrained Language Models: When and Why Does It Work?

Yada Pruksachatkun, Jason Phang, Haokun Liu, Phu Mon Htut, Xiaoyi Zhang, Richard Yuanzhe Pang, Clara Vania, Katharina Kann, Samuel R. Bowman

Keywords: Intermediate-Task Learning, natural tasks, data-rich task, intermediate-task training

Abstract Paper Similar Papers

Abstract: While pretrained models such as BERT have shown large gains across natural language understanding tasks, their performance can be improved by further training the model on a data-rich intermediate task, before fine-tuning it on a target task. However, it is still poorly understood when and why intermediate-task training is beneficial for a given target task. To investigate this, we perform a large-scale study on the pretrained RoBERTa model with 110 intermediate-target task combinations. We further evaluate all trained models with 25 probing tasks meant to reveal the specific skills that drive transfer. We observe that intermediate tasks requiring high-level inference and reasoning abilities tend to work best. We also observe that target task performance is strongly correlated with higher-level abilities such as coreference resolution. However, we fail to observe more granular correlations between probing and target task performance, highlighting the need for further work on broad-coverage probing benchmarks. We also observe evidence that the forgetting of knowledge learned during pretraining may limit our analysis, highlighting the need for further work on transfer learning methods in these settings.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at ACL 2020 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

04/07/2020

How does BERT's attention change when you fine-tune? An analysis methodology and a case study in negation scope

Yiyun Zhao, Steven Bethard

Keywords Paper

downstream task, NLP problems, knowledge-related tasks, downstream tasks

0

0

0

0

11:43

06/12/2021

Refining Language Models with Compositional Explanations

Huihan Yao, Ying Chen, Qinyuan Ye and
Xisen Jin, Xiang Ren

Keywords Paper

machine learning, fairness, language

0

0

0

0

13:17

25/07/2020

A pairwise probe for understanding BERT fine-tuning on machine reading comprehension

Jie Cai, Zhengzhou Zhu, Ping Nie, Qian Liu

Keywords Paper

machine reading comprehension, pairwise, fine-tune, BERT

0

0

0

0

6:38

02/02/2021

What's the Best Place for an AI Conference, Vancouver or _______: Why Completing Comparative Questions is Difficult

‪Avishai Zagoury‬, Einat Minkov, Idan Szpektor, William W. Cohen

Keywords Paper

0

0

0

0

15:15

02/02/2021

Knowledge-driven Data Construction for Zero-shot Evaluation in Commonsense Question Answering

Kaixin Ma, Filip Ilievski, Jonathan Francis and
Yonatan Bisk, Eric Nyberg, Alessandro Oltramari

Keywords Paper

0

0

0

0

18:24

08/12/2020

Classifier Probes May Just Learn from Linear Context Features

Jenny Kunz, Marco Kuhlmann

Keywords Paper

0

0

0

0

14:33

06/12/2021

Grad2Task: Improved Few-shot Text Classification Using Gradients for Task Representation

Jixuan Wang, Kuan-Chieh Wang, Frank Rudzicz, Michael Brudno

Keywords Paper

machine learning, transformers, meta learning, language, transfer learning

0

0

0

0

14:45

16/11/2020

Learning Which Features Matter: RoBERTa Acquires a Preference for Linguistic Generalizations (Eventually)

Alex Warstadt, Yian Zhang, Xiaocheng Li and
Haokun Liu, Samuel R. Bowman

Keywords Paper

self-supervised tasks, language understanding, ambiguous tasks, finetuning

0

0

0

0

12:04

02/02/2021

Do Response Selection Models Really Know What’s Next? Utterance Manipulation Strategies for Multi-turn Response Selection

Taesun Whang, Dongyub Lee, Dongsuk Oh and
Chanhee Lee, Kijong Han, Dong-hun Lee, Saebyeok Lee

Keywords Paper

0

0

0

0

17:37

19/04/2021

Keep learning: Self-supervised meta-learning for learning from inference

Akhil Kedia, Sai Chetan Chinthakindi

Keywords Paper

0

0

0

0

11:27

16/11/2020

Information-Theoretic Probing with Minimum Description Length

Elena Voita, Ivan Titov

Keywords Paper

random tasks, estimating mdl, representations, pretrained representations

0

0

0

0

11:29

16/11/2020

Improving AMR Parsing with Sequence-to-Sequence Pre-training

Dongqin Xu, Junhui Li, Muhua Zhu and
Min Zhang, Guodong Zhou

Keywords Paper

abstract parsing, amr parsing, sequence-to-sequence parsing, machine translation

0

0

0

0

11:42

06/12/2021

Supervising the Transfer of Reasoning Patterns in VQA

Corentin Kervadec, Christian Wolf, Grigory Antipov and
Moez Baccouche, Madiha Nadri

Keywords Paper

theory, deep learning, vision

0

0

0

0

12:54

06/12/2020

DynaBERT: Dynamic BERT with Adaptive Width and Depth

Lu Hou, Zhiqi Huang, Lifeng Shang and
Xin Jiang, Xiao Chen, Qun Liu

Keywords Paper

0

0

0

0

2:59

26/04/2020

On the Weaknesses of Reinforcement Learning for Neural Machine Translation

Leshem Choshen, Lior Fox, Zohar Aizenbud, Omri Abend

Keywords Paper

Reinforcement learning, MRT, minimum risk training, reinforce, machine translation, peakkiness, generation

0

0

0

0

5:32

06/12/2021

Achieving Forgetting Prevention and Knowledge Transfer in Continual Learning

Zixuan Ke, Bing Liu, Nianzu Ma and
Hu Xu, Lei Shu

Keywords Paper

language, continual learning

0

0

0

0

9:48

16/11/2020

Recall and Learn: Fine-tuning Deep Pretrained Language Models with Less Forgetting

Sanyuan Chen, Yutai Hou, Yiming Cui and
Wanxiang Che, Ting Liu, Xiangzhan Yu

Keywords Paper

pretraining, pretraining tasks, learning tasks, fine-tuning bert-large

0

0

0

1

10:52

06/12/2020

MPNet: Masked and Permuted Pre-training for Language Understanding

Kaitao Song, Xu Tan, Tao Qin and
Jianfeng Lu, Tie-Yan Liu

Keywords Paper

0

0

0

0

3:23

06/12/2020

Bongard-LOGO: A New Benchmark for Human-Level Concept Learning and Reasoning

Weili Nie, Zhiding Yu, Lei Mao and
Ankit Patel, Yuke Zhu, Anima Anandkumar

Keywords Paper

0

0

0

0

3:23

04/07/2020

A Self-Training Method for Machine Reading Comprehension with Soft Evidence Extraction

Yilin Niu, Fangkai Jiao, Mantong Zhou and
Ting Yao, Jingfang Xu, Minlie Huang

Keywords Paper

Machine Comprehension, Soft Extraction, machine, MRC

0

0

0

0

12:00

06/12/2020

ConvBERT: Improving BERT with Span-based Dynamic Convolution

Zi-Hang Jiang, Weihao Yu, Daquan Zhou and
Yunpeng Chen, Jiashi Feng, Shuicheng Yan

Keywords Paper

0

0

0

0

3:20

04/07/2020

Syntactic Data Augmentation Increases Robustness to Inference Heuristics

Junghyun Min, R. Thomas McCoy, Dipanjan Das and
Emily Pitler, Tal Linzen

Keywords Paper

Syntactic Augmentation, natural inference, natural NLI, NLI

0

0

0

0

6:59

06/12/2021

Automatic Data Augmentation for Generalization in Reinforcement Learning

Roberta Raileanu, Maxwell Goldstein, Denis Yarats and
Ilya Kostrikov, Rob Fergus

Keywords Paper

reinforcement learning and planning, machine learning

0

0

0

0

14:26

16/11/2020

Efficient Meta Lifelong-Learning with Limited Memory

Zirui Wang, Sanket Vaibhav Mehta, Barnabas Poczos, Jaime Carbonell

Keywords Paper

lifelong learning, local adaptation, text benchmarks, multi-task learning

0

0

0

0

12:03

16/11/2020

Syntactic Structure Distillation Pretraining for Bidirectional Encoders

Adhiguna Kuncoro, Lingpeng Kong, Daniel Fried and
Dani Yogatama, Laura Rimell, Chris Dyer, Phil Blunsom

Keywords Paper

bert pretraining, structured tasks, natural understanding, textual learners

0

0

0

0

12:23

08/12/2020

Scientific Keyphrase Identification and Classification by Pre-Trained Language Models Intermediate Task Transfer Learning

Seoyeon Park, Cornelia Caragea

Keywords Paper

0

0

0

0

15:31

26/04/2020

Pretrained Encyclopedia: Weakly Supervised Knowledge-Pretrained Language Model

Wenhan Xiong, Jingfei Du, William Yang Wang, Veselin Stoyanov

Keywords Paper

0

0

0

0

5:00

12/07/2020

More Data Can Expand The Generalization Gap Between Adversarially Robust and Standard Models

Lin Chen, Yifei Min, Mingrui Zhang, Amin Karbasi

Keywords Paper

Adversarial Examples

0

0

0

0

12:01

26/04/2020

ALBERT: A Lite BERT for Self-supervised Learning of Language Representations

Zhenzhong Lan, Mingda Chen, Sebastian Goodman and
Kevin Gimpel, Piyush Sharma, Radu Soricut

Keywords Paper

Natural Language Processing, BERT, Representation Learning

0

0

0

0

4:59

02/02/2021

Learning an Effective Context-Response Matching Model with Self-Supervised Tasks for Retrieval-based Dialogues

Ruijian Xu, Chongyang Tao, Daxin Jiang and
Xueliang Zhao, Dongyan Zhao, Rui Yan

Keywords Paper

0

0

0

1

16:40

04/07/2020

Mind the Trade-off: Debiasing NLU Models without Degrading the In-distribution Performance

Prasetya Ajie Utama, Nafise Sadat Moosavi, Iryna Gurevych

Keywords Paper

Debiasing Models, natural tasks, NLU tasks, debiasing methods

0

0

0

1

11:09

08/12/2020

Multi-Task Learning for Knowledge Graph Completion with Pre-trained Language Models

Bosung Kim, Taesuk Hong, Youngjoong Ko, Jungyun Seo

Keywords Paper

0

0

0

0

10:07

04/07/2020

Data Manipulation: Towards Effective Instance Learning for Neural Dialogue Generation via Learning to Augment and Reweight

Hengyi Cai, Hongshen Chen, Yonghao Song and
Cheng Zhang, Xiaofang Zhao, Dawei Yin

Keywords Paper

Data Manipulation, Neural Generation, learning, dialogue generation

0

0

0

1

9:39

04/07/2020

SMART: Robust and Efficient Fine-Tuning for Pre-trained Natural Language Models through Principled Regularized Optimization

Haoming Jiang, Pengcheng He, Weizhu Chen and
Xiaodong Liu, Jianfeng Gao, Tuo Zhao

Keywords Paper

NLP, generalization, NLP tasks, SMART

0

0

0

0

11:43

01/07/2020

Linguistic Features for Readability Assessment

Tovly Deutsch, Masoud Jasbi, Stuart Shieber

Keywords Paper

0

0

0

0

12:06

16/11/2020

What Does My QA Model Know? Devising Controlled Probes using Expert

Kyle Richardson, Ashish Sabharwal

Keywords Paper

knowledge challenges, benchmark tasks, diagnostic tasks, taxonomic reasoning

0

0

0

0

12:16

26/04/2020

Scalable and Order-robust Continual Learning with Additive Parameter Decomposition

Jaehong Yoon, Saehoon Kim, Eunho Yang, Sung Ju Hwang

Keywords Paper

Continual Learning, Lifelong Learning, Catastrophic Forgetting, Deep Learning

0

0

0

0

5:07

08/12/2020

Specializing Unsupervised Pretraining Models for Word-Level Semantic Similarity

Anne Lauscher, Ivan Vulić, Edoardo Maria Ponti and
Anna Korhonen, Goran Glavaš

Keywords Paper

0

0

0

0

13:01

18/07/2021

LogME: Practical Assessment of Pre-trained Models for Transfer Learning

Kaichao You, Yong Liu, Jianmin Wang, Mingsheng Long

Keywords Paper

Algorithms, Multitask, Transfer, and Meta Learning

1

1

0

0

5:18

07/09/2020

BCaR: Beginner Classifier as Regularization Towards Generalizable Re-ID

Masato Tamura, Tomoaki Yoshinaga

Keywords Paper

person re-identification, generalizable, soft label, knowledge distillation, Re-ID, domain generalization

0

0

0

0

6:53