Maximal multiverse learning for promoting cross-task generalization of fine-tuned language models

19/04/2021

Maximal multiverse learning for promoting cross-task generalization of fine-tuned language models

Itzik Malkiel, Lior Wolf

Keywords:

Abstract Paper Similar Papers

Abstract: Language modeling with BERT consists of two phases of (i) unsupervised pre-training on unlabeled text, and (ii) fine-tuning for a specific supervised task. We present a method that leverages the second phase to its fullest, by applying an extensive number of parallel classifier heads, which are enforced to be orthogonal, while adaptively eliminating the weaker heads during training. We conduct an extensive inter- and intra-dataset evaluation, showing that our method improves the generalization ability of BERT, sometimes leading to a +9% gain in accuracy. These results highlight the importance of a proper fine-tuning procedure, especially for relatively smaller-sized datasets. Our code is attached as supplementary.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at EACL 2021 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

06/12/2020

MPNet: Masked and Permuted Pre-training for Language Understanding

Kaitao Song, Xu Tan, Tao Qin and
Jianfeng Lu, Tie-Yan Liu

Keywords Paper

0

0

0

0

3:23

18/07/2021

Training Data Subset Selection for Regression with Controlled Generalization Error

Durga S, Rishabh Iyer, Ganesh Ramakrishnan, Abir De

Keywords Paper

, Algorithms, Online Learning, Algorithms, Supervised Learning

0

0

0

0

4:15

19/04/2021

Keep learning: Self-supervised meta-learning for learning from inference

Akhil Kedia, Sai Chetan Chinthakindi

Keywords Paper

0

0

0

0

11:27

06/12/2020

Incorporating BERT into Parallel Sequence Decoding with Adapters

Junliang Guo, Zhirui Zhang, Linli Xu and
Hao-Ran Wei, Boxing Chen, Enhong Chen

Keywords Paper

0

0

0

0

3:17

05/12/2020

Investigating learning dynamics of BERT fine-tuning

Yaru Hao, Li Dong, Furu Wei, Ke Xu

Keywords Paper

0

0

0

0

7:10

22/11/2021

Few-shot Action Recognition with Prototype-centered Attentive Learning

Xiatian Zhu, Antoine S Toisoul, Juan-Manuel Perez-Rua and
Li Zhang, Brais Martinez, Tao Xiang

Keywords Paper

Few-shot learning, Video recognition, Action classification, Small training data, Model pre-training, Meta-learning, Transformer, Self-attention learning, Cross-attention learning, Prototype learning, Prototype-centered learning, Hybrid-attention learning

0

0

0

0

2:22

14/06/2020

Adaptive Subspaces for Few-Shot Learning

Christian Simon, Piotr Koniusz, Richard Nock, Mehrtash Harandi

Keywords Paper

subspace, few, shot, meta, learning, classification

0

0

0

0

1:01

18/07/2021

Composed Fine-Tuning: Freezing Pre-Trained Denoising Autoencoders for Improved Generalization

Sang Michael Xie, Tengyu Ma, Percy Liang

Keywords Paper

Algorithms, Multitask, Transfer, and Meta Learning

0

0

0

0

22:15

02/02/2021

Attributes-Guided and Pure-Visual Attention Alignment for Few-Shot Recognition

Siteng Huang, Min Zhang, Yachen Kang, Donglin Wang

Keywords Paper

0

0

0

0

17:04

12/07/2020

Graph-based, Self-Supervised Program Repair from Diagnostic Feedback

Michihiro Yasunaga, Percy Liang

Keywords Paper

Applications - Language, Speech and Dialog

0

0

0

1

14:39

26/08/2020

Data Generation for Neural Programming by Example

Judith Clymo, Adria Gascon, Brooks Paige and
Nathanael Fijalkow, Haik Manukian

Keywords Paper

0

0

0

0

14:31

18/07/2021

Straight to the Gradient: Learning to Use Novel Tokens for Neural Text Generation

Xiang Lin, Simeng Han, Shafiq Joty

Keywords Paper

Applications, Natural Language Processing

0

0

0

0

16:00

16/11/2020

Improving AMR Parsing with Sequence-to-Sequence Pre-training

Dongqin Xu, Junhui Li, Muhua Zhu and
Min Zhang, Guodong Zhou

Keywords Paper

abstract parsing, amr parsing, sequence-to-sequence parsing, machine translation

0

0

0

0

11:42

06/12/2020

ConvBERT: Improving BERT with Span-based Dynamic Convolution

Zi-Hang Jiang, Weihao Yu, Daquan Zhou and
Yunpeng Chen, Jiashi Feng, Shuicheng Yan

Keywords Paper

0

0

0

0

3:20

03/05/2021

InfoBERT: Improving Robustness of Language Models from An Information Theoretic Perspective

Boxin Wang, Shuohang Wang, Yu Cheng and
Zhe Gan, Ruoxi Jia, Bo Li, Jingjing Liu

Keywords Paper

adversarial training, QA, NLI, BERT, information theory, adversarial robustness

0

0

0

0

5:21

22/11/2021

Dynamic Feature Alignment for Semi-supervised Domain Adaptation

Yu Zhang, Gongbo Liang, Nathan Jacobs

Keywords Paper

domain adaptation, semi-supervised learning, image classification, memory bank, feature alignment

0

0

0

0

2:20

16/11/2020

Self-Supervised Meta-Learning for Few-Shot Natural Language Classification Tasks

Trapit Bansal, Rishikesh Jha, Tsendsuren Munkhdalai, Andrew McCallum

Keywords Paper

nlp applications, fine-tuning, meta-learning problem, supervised tasks

0

0

0

0

11:49

16/11/2020

Incremental Processing in the Age of Non-Incremental Encoders: An Empirical Assessment of Bidirectional Models for Incremental NLU

Brielen Madureira, David Schlangen

Keywords Paper

nlp, interactive systems, language encoders, bidirectional lstms

0

0

0

0

10:04

19/01/2020

Partial Type Constructors: Or, Making Ad Hoc Datatypes Less Ad Hoc

Mark Jones, J. Garrett Morris, Richard A. Eisenberg

Keywords Paper

Type constructors, Parametric polymorphism

0

0

0

0

21:37

06/12/2020

Learning Sparse Prototypes for Text Generation

Junxian He, Taylor Berg-Kirkpatrick, Graham Neubig

Keywords Paper

0

0

0

0

3:22

05/12/2020

Self-supervised learning for pairwise data refinement

Gustavo Hernandez Abrego, Bowen Liang, Wei Wang and
Zarana Parekh, Yinfei Yang, Yunhsuan Sung

Keywords Paper

0

0

0

0

15:17

02/02/2021

LRC-BERT: Latent-representation Contrastive Knowledge Distillation for Natural Language Understanding

Hao Fu, Shaojun Zhou, Qihong Yang and
Junjie Tang, Guiquan Liu, Kaikui Liu, Xiaolong Li

Keywords Paper

0

0

0

0

15:25

14/06/2020

Learn to Augment: Joint Data Augmentation and Network Optimization for Text Recognition

Canjie Luo, Yuanzhi Zhu, Lianwen Jin, Yongpan Wang

Keywords Paper

data augmentation, text recognition, joint training

0

0

0

0

0:59

18/07/2021

Offline Meta-Reinforcement Learning with Advantage Weighting

Eric Mitchell, Rafael Rafailov, Xue Bin Peng and
Sergey Levine, Chelsea Finn

Keywords Paper

Algorithms, Multitask, Transfer, and Meta Learning

1

0

0

0

5:08

02/02/2021

SALNet: Semi-supervised Few-Shot Text Classification with Attention-based Lexicon Construction

Ju-Hyoung Lee, Sang-Ki Ko, Yo-Sub Han

Keywords Paper

0

0

0

0

15:28

15/11/2020

A Structural Model for Contextual Code Changes

Shaked Brody, Uri Alon, Eran Yahav

Keywords Paper

Neural Models of Code, Edit Completions, Machine Learning

0

0

0

0

13:01

04/07/2020

Perturbed Masking: Parameter-free Probing for Analyzing and Interpreting BERT

Zhiyong Wu, Yun Chen, Ben Kao, Qun Liu

Keywords Paper

Analyzing BERT, linguistic tasks, dependency parsing, probing tasks

0

0

0

0

11:00

25/07/2020

A pairwise probe for understanding BERT fine-tuning on machine reading comprehension

Jie Cai, Zhengzhou Zhu, Ping Nie, Qian Liu

Keywords Paper

machine reading comprehension, pairwise, fine-tune, BERT

0

0

0

0

6:38

05/01/2021

ClassMix: Segmentation-Based Data Augmentation for Semi-Supervised Learning

Viktor Olsson, Wilhelm Tranheden, Juliano Pinto, Lennart Svensson

Keywords Paper

0

0

0

0

4:58

15/11/2020

Feedback-Driven Semi-supervised Synthesis of Program Transformations

Xiang Gao, Shraddha Barke, Arjun Radhakrishna and
Gustavo Soares, Sumit Gulwani, Alan Leung, Nachiappan Nagappan, Ashish Tiwari

Keywords Paper

Program transformation, Program synthesis, Refactoring, Programming by Example

0

0

0

0

15:43

12/07/2020

Learning and Evaluating Contextual Embedding of Source Code

Aditya Kanade, Petros Maniatis, Gogul Balakrishnan, Kensen Shi

Keywords Paper

Representation Learning

0

0

0

0

12:51

18/07/2021

Self-supervised and Supervised Joint Training for Resource-rich Machine Translation

Yong Cheng, Wei Wang, Lu Jiang, Wolfgang Macherey

Keywords Paper

Applications, Natural Language Processing

0

0

0

0

5:21

06/12/2021

Referring Transformer: A One-step Approach to Multi-task Visual Grounding

Muchen Li, Leonid Sigal

Keywords Paper

transformers, vision

0

0

0

0

7:54

06/12/2020

Robust Pre-Training by Adversarial Contrastive Learning

Ziyu Jiang, Tianlong Chen, Ting Chen, Zhangyang Wang

Keywords Paper

0

0

0

0

3:26

08/12/2020

Mixup-Transformer: Dynamic Data Augmentation for NLP Tasks

Lichao Sun, Congying Xia, Wenpeng Yin and
Tingting Liang, Philip Yu, Lifang He

Keywords Paper

0

0

0

0

9:52

16/11/2020

Meta Fine-Tuning Neural Language Models for Multi-Domain Text Mining

Chengyu Wang, Minghui Qiu, Jun Huang, Xiaofeng He

Keywords Paper

nlp tasks, fine-tuning, learning process, multi-domain tasks

0

0

0

0

9:58

16/11/2020

MAD-X: An Adapter-Based Framework for Multi-Task Cross-Lingual Transfer

Jonas Pfeiffer, Ivan Vulić, Iryna Gurevych, Sebastian Ruder

Keywords Paper

transfer, pre-training, cross transfer, named recognition

0

0

0

0

13:35

04/07/2020

Distilling Knowledge Learned in BERT for Text Generation

Yen-Chun Chen, Zhe Gan, Yu Cheng and
Jingzhou Liu, Jingjing Liu

Keywords Paper

Text Generation, language tasks, language generation, generation tasks

0

0

0

0

10:41

16/11/2020

Simulated multiple reference training improves low-resource machine translation

Huda Khayrallah, Brian Thompson, Matt Post, Philipp Koehn

Keywords Paper

machine mt, mt, simulated training, simulated

0

0

0

0

6:56

06/12/2020

Modular Meta-Learning with Shrinkage

Yutian Chen, Abe Friesen, Feryal Behbahani and
Arnaud Doucet, David Budden, Matthew Hoffman, Nando de Freitas

Keywords Paper

0

0

0

0

3:21