IsoBN: Fine-Tuning BERT with Isotropic Batch Normalization

02/02/2021

IsoBN: Fine-Tuning BERT with Isotropic Batch Normalization

Wenxuan Zhou, Bill Yuchen Lin, Xiang Ren

Keywords:

Abstract Paper Similar Papers

Abstract: Fine-tuning pre-trained language models (PTLMs), such as BERT and its better variant RoBERTa, has been a common practice for advancing performance in natural language understanding (NLU) tasks. Recent advance in representation learning shows that isotropic (i.e., unit-variance and uncorrelated) embeddings can significantly improve performance on downstream tasks with faster convergence and better generalization. The isotropy of the pre-trained embeddings in PTLMs, however, is relatively under-explored. In this paper, we analyze the isotropy of the pre-trained [CLS] embeddings of PTLMs with straightforward visualization, and point out two major issues: high variance in their standard deviation, and high correlation between different dimensions. We also propose a new network regularization method, isotropic batch normalization (IsoBN) to address the issues, towards learning more isotropic representations in fine-tuning by dynamically penalizing dominating principal components. This simple yet effective fine-tuning method yields about 1.0 absolute increment on the average of seven NLU tasks.

The video of this talk cannot be embedded. You can watch it here:

https://slideslive.com/38949013

(Link will open in new window)

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at AAAI 2021 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

04/07/2020

SMART: Robust and Efficient Fine-Tuning for Pre-trained Natural Language Models through Principled Regularized Optimization

Haoming Jiang, Pengcheng He, Weizhu Chen and
Xiaodong Liu, Jianfeng Gao, Tuo Zhao

Keywords Paper

NLP, generalization, NLP tasks, SMART

0

0

0

0

11:43

26/04/2020

Improving Neural Language Generation with Spectrum Control

Lingxiao Wang, Jing Huang, Kevin Huang and
Ziniu Hu, Guangtao Wang, Quanquan Gu

Keywords Paper

0

0

0

0

4:58

03/05/2021

Supervised Contrastive Learning for Pre-trained Language Model Fine-tuning

Beliz Gunel, Jingfei Du, Alexis Conneau, Veselin Stoyanov

Keywords Paper

supervised contrastive learning, pre-trained language model fine-tuning, natural language understanding, generalization, few-shot learning, robustness

0

0

0

0

4:44

26/04/2020

Residual Energy-Based Models for Text Generation

Yuntian Deng, Anton Bakhtin, Myle Ott and
Arthur Szlam, Marc'Aurelio Ranzato

Keywords Paper

energy-based models, text generation

0

0

0

0

4:59

30/11/2020

Scale-Aware Polar Representation for Arbitrarily-Shaped Text Detection

Yanguang Bi, Zhiqiang Hu

Keywords Paper

0

0

0

0

9:56

04/07/2020

Addressing Posterior Collapse with Mutual Information for Improved Variational Neural Machine Translation

Arya D. McCarthy, Xian Li, Jiatao Gu, Ning Dong

Keywords Paper

Variational Translation, posterior collapse, auxiliary task, uncertainty

0

0

0

0

11:00

16/11/2020

Calibrated Language Model Fine-Tuning for In- and Out-of-Distribution Data

Lingkai Kong, Haoming Jiang, Yuchen Zhuang and
Jie Lyu, Tuo Zhao, Chao Zhang

Keywords Paper

augmented training, in-distribution calibration, text classification, expectation error

0

0

0

0

11:47

18/07/2021

SparseBERT: Rethinking the Importance Analysis in Self-attention

Han Shi, Jiahui Gao, Xiaozhe Ren and
Hang Xu, Xiaodan Liang, Zhenguo Li, James Kwok

Keywords Paper

Applications, Natural Language Processing

0

0

0

0

5:13

16/11/2020

Understanding the Mechanics of SPIGOT: Surrogate Gradients for Latent Structure Learning

Tsvetomila Mihaylova, Vlad Niculae, André F. T. Martins

Keywords Paper

pipeline systems, ste, latent models, end-to-end training

0

0

0

0

11:50

03/05/2021

InfoBERT: Improving Robustness of Language Models from An Information Theoretic Perspective

Boxin Wang, Shuohang Wang, Yu Cheng and
Zhe Gan, Ruoxi Jia, Bo Li, Jingjing Liu

Keywords Paper

adversarial training, QA, NLI, BERT, information theory, adversarial robustness

0

0

0

0

5:21

06/12/2020

Do Adversarially Robust ImageNet Models Transfer Better?

Hadi Salman, Andrew Ilyas, Logan Engstrom and
Ashish Kapoor, Aleksander Madry

Keywords Paper

0

0

0

0

4:16

06/12/2021

Refining Language Models with Compositional Explanations

Huihan Yao, Ying Chen, Qinyuan Ye and
Xisen Jin, Xiang Ren

Keywords Paper

machine learning, fairness, language

0

0

0

0

13:17

19/04/2021

Keep learning: Self-supervised meta-learning for learning from inference

Akhil Kedia, Sai Chetan Chinthakindi

Keywords Paper

0

0

0

0

11:27

18/07/2021

Image-Level or Object-Level? A Tale of Two Resampling Strategies for Long-Tailed Detection

Nadine Chang, Zhiding Yu, Yu-Xiong Wang and
Anima Anandkumar, Sanja Fidler, Jose Alvarez

Keywords Paper

Applications, Computer Vision

0

0

0

0

5:17

06/12/2021

When does Contrastive Learning Preserve Adversarial Robustness from Pretraining to Finetuning?

Lijie Fan, Sijia Liu, Pin-Yu Chen and
Gaoyuan Zhang, Chuang Gan

Keywords Paper

machine learning, robustness, adversarial robustness and security, self-supervised learning, vision, contrastive learning, clustering

0

0

0

0

7:33

02/02/2021

Decision-Guided Weighted Automata Extraction from Recurrent Neural Networks

Xiyue Zhang, Xiaoning Du, Xiaofei Xie and
Lei Ma, Yang Liu, Meng Sun

Keywords Paper

0

0

0

0

16:44

06/12/2021

Referring Transformer: A One-step Approach to Multi-task Visual Grounding

Muchen Li, Leonid Sigal

Keywords Paper

transformers, vision

0

0

0

0

7:54

06/12/2021

Unleashing the Power of Contrastive Self-Supervised Visual Models via Contrast-Regularized Fine-Tuning

Yifan Zhang, Bryan Hooi, Dapeng Hu and
Jian Liang, Jiashi Feng

Keywords Paper

optimization, machine learning, self-supervised learning, vision, contrastive learning, representation learning, transfer learning

0

0

0

0

14:34

30/11/2020

Synthetic-to-Real Unsupervised Domain Adaptation for Scene Text Detection in the Wild

Weijia Wu, Ning Lu, Enze Xie and
Yuxing Wang, Wenwen Yu, Cheng Yang, Hong Zhou

Keywords Paper

0

0

0

0

7:53

06/12/2021

Grounding inductive biases in natural images: invariance stems from variations in data

Diane Bouchacourt, Mark Ibrahim, Ari Morcos

Keywords Paper

machine learning, transformers

0

0

0

0

14:19

12/07/2020

Educating Text Autoencoders: Latent Representation Guidance via Denoising

Tianxiao Shen, Jonas Mueller, Regina Barzilay, Tommi Jaakkola

Keywords Paper

Deep Learning - Generative Models and Autoencoders

0

0

0

0

17:06

04/07/2020

Integrating Multimodal Information in Large Pretrained Transformers

Wasifur Rahman, Md Kamrul Hasan, Sangwu Lee and
AmirAli Bagher Zadeh, Chengfeng Mao, Louis-Philippe Morency, Ehsan Hoque

Keywords Paper

NLP, lexical applications, modeling communication, multimodal analysis

0

0

0

0

10:58

18/07/2021

Robust Representation Learning via Perceptual Similarity Metrics

Saeid A Taghanaki, Kristy Choi, Amir Hosein Khasahmadi, Anirudh Goyal

Keywords Paper

Deep Learning, Embedding and Representation learning

0

0

0

0

4:48

14/06/2020

Towards Accurate Scene Text Recognition With Semantic Reasoning Networks

Deli Yu, Xuan Li, Chengquan Zhang and
Tao Liu, Junyu Han, Jingtuo Liu, Errui Ding

Keywords Paper

scene text recognition, global semantic reasoning, strong semantic context, parallel decoding/inference, parallel visual attention, efficient decoder.

0

0

0

0

1:01

07/09/2020

Advancing weakly supervised cross-domain alignment with optimal transport

Siyang Yuan, Ke Bai, Liqun Chen and
Yizhe Zhang, Chenyang Tao, Chunyuan Li, Guoyin Wang, Ricardo Henao, Lawrence Carin Duke

Keywords Paper

Optimal Transport, Cross Domain Alignment

0

0

0

0

10:04

05/01/2021

Intra-Class Part Swapping for Fine-Grained Image Classification

Lianbo Zhang, Shaoli Huang, Wei Liu

Keywords Paper

0

0

0

0

4:43

26/04/2020

Pretrained Encyclopedia: Weakly Supervised Knowledge-Pretrained Language Model

Wenhan Xiong, Jingfei Du, William Yang Wang, Veselin Stoyanov

Keywords Paper

0

0

0

0

5:00

03/05/2021

Share or Not? Learning to Schedule Language-Specific Capacity for Multilingual Translation

Biao Zhang, Ankur Bapna, Rico Sennrich, Orhan Firat

Keywords Paper

multilingual transformer, multilingual translation, language-specific modeling, conditional computation

0

0

0

0

15:04

06/12/2021

NxMTransformer: Semi-Structured Sparsification for Natural Language Understanding via ADMM

Connor Holmes, Minjia Zhang, Yuxiong He, Bo Wu

Keywords Paper

optimization, transformers, language

0

0

0

0

10:53

06/12/2021

Beyond BatchNorm: Towards a Unified Understanding of Normalization in Deep Learning

Ekdeep S Lubana, Robert Dick, Hidenori Tanaka

Keywords Paper

deep learning

0

0

0

0

8:28

04/07/2020

How does BERT's attention change when you fine-tune? An analysis methodology and a case study in negation scope

Yiyun Zhao, Steven Bethard

Keywords Paper

downstream task, NLP problems, knowledge-related tasks, downstream tasks

0

0

0

0

11:43

06/12/2020

LAPAR: Linearly-Assembled Pixel-Adaptive Regression Network for Single Image Super-resolution and Beyond

Wenbo Li, Kun Zhou, lu Qi and
Nianjuan Jiang, Jiangbo Lu, Jiaya Jia

Keywords Paper

0

0

0

0

3:09

16/11/2020

A Diagnostic Study of Explainability Techniques for Text Classification

Pepa Atanasova, Jakob Grue Simonsen, Christina Lioma, Isabelle Augenstein

Keywords Paper

downstream tasks, machine learning, explainability techniques, diverse techniques

0

0

0

0

11:24

08/12/2020

How LSTM Encodes Syntax: Exploring Context Vectors and Semi-Quantization on Natural Text

Chihiro Shibata, Kei Uchiumi, Daichi Mochihashi

Keywords Paper

0

0

0

0

14:45

16/11/2020

To BERT or Not to BERT: Comparing Task-specific and Task-agnostic Semi-Supervised Approaches for Sequence Tagging

Kasturi Bhattacharjee, Miguel Ballesteros, Rishita Anubhai and
Smaranda Muresan, Jie Ma, Faisal Ladhak, Yaser Al-Onaizan

Keywords Paper

learning representations, downstream tasks, cross-view cvt, sequence tasks

0

0

0

0

6:26

04/07/2020

Cross-Lingual Unsupervised Sentiment Classification with Multi-View Transfer Learning

Hongliang Fei, Ping Li

Keywords Paper

Cross-Lingual Classification, sentiment classification, unsupervised system, classification

0

0

0

0

12:23

06/12/2021

Adversarial Reweighting for Partial Domain Adaptation

Xiang Gu, Xi Yu, yan yang and
Jian Sun, Zongben Xu

Keywords Paper

domain adaptation

0

0

0

1

9:03

26/08/2020

Post-Estimation Smoothing: A Simple Baseline for Learning with Side Information

Esther Rolf, Michael Jordan, Benjamin Recht

Keywords Paper

0

0

0

0

14:27

06/12/2021

Why Do Pretrained Language Models Help in Downstream Tasks? An Analysis of Head and Prompt Tuning

Colin Wei, Sang Michael Xie, Tengyu Ma

Keywords Paper

theory, machine learning, self-supervised learning, generative model, representation learning, language

0

0

0

0

14:53

02/02/2021

Do Response Selection Models Really Know What’s Next? Utterance Manipulation Strategies for Multi-turn Response Selection

Taesun Whang, Dongyub Lee, Dongsuk Oh and
Chanhee Lee, Kijong Han, Dong-hun Lee, Saebyeok Lee

Keywords Paper

0

0

0

0

17:37