Distilling Knowledge Learned in BERT for Text Generation

04/07/2020

Distilling Knowledge Learned in BERT for Text Generation

Yen-Chun Chen, Zhe Gan, Yu Cheng, Jingzhou Liu, Jingjing Liu

Keywords: Text Generation, language tasks, language generation, generation tasks

Abstract Paper Similar Papers

Abstract: Large-scale pre-trained language model such as BERT has achieved great success in language understanding tasks. However, it remains an open question how to utilize BERT for language generation. In this paper, we present a novel approach, Conditional Masked Language Modeling (C-MLM), to enable the finetuning of BERT on target generation tasks. The finetuned BERT (teacher) is exploited as extra supervision to improve conventional Seq2Seq models (student) for better text generation performance. By leveraging BERT's idiosyncratic bidirectional nature, distilling knowledge learned in BERT can encourage auto-regressive Seq2Seq models to plan ahead, imposing global sequence-level supervision for coherent text generation. Experiments show that the proposed approach significantly outperforms strong Transformer baselines on multiple language generation tasks such as machine translation and text summarization. Our proposed model also achieves new state of the art on IWSLT German-English and English-Vietnamese MT datasets.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at ACL 2020 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

04/07/2020

Span Selection Pre-training for Question Answering

Michael Glass, Alfio Gliozzo, Rishav Chakravarti and
Anthony Ferritto, Lin Pan, G P Shrivatsa Bhargav, Dinesh Garg, Avi Sil

Keywords Paper

Question Answering, language tasks, Next Prediction, pre-training task

0

0

0

0

13:16

03/05/2021

InfoBERT: Improving Robustness of Language Models from An Information Theoretic Perspective

Boxin Wang, Shuohang Wang, Yu Cheng and
Zhe Gan, Ruoxi Jia, Bo Li, Jingjing Liu

Keywords Paper

adversarial training, QA, NLI, BERT, information theory, adversarial robustness

0

0

0

0

5:21

19/04/2021

Better neural machine translation by extracting linguistic information from BERT

Hassan S. Shavarani, Anoop Sarkar

Keywords Paper

0

0

0

0

12:15

03/05/2021

Contrastive Learning with Adversarial Perturbations for Conditional Text Generation

Seanie Lee, Dong Bok Lee, Sung Ju Hwang

Keywords Paper

contrastive learning, conditional text generation

0

0

0

0

4:51

16/11/2020

Partially-Aligned Data-to-Text Generation with Distant Supervision

Zihao Fu, Bei Shi, Wai Lam and
Lidong Bing, Zhiyuan Liu

Keywords Paper

data-to-text task, generation task, dataset problem, over-generation problem

0

0

0

0

11:58

22/06/2020

How Context Affects Language Models' Factual Predictions

Fabio Petroni, Patrick Lewis, Aleksandra Piktus and
Tim Rocktäschel, Yuxiang Wu, Alexander H. Miller, Sebastian Riedel

Keywords Paper

0

0

0

0

10:16

19/08/2021

MRD-Net: Multi-Modal Residual Knowledge Distillation for Spoken Question Answering

Chenyu You, Nuo Chen, Yuexian Zou

Keywords Paper

Natural Language Processing, Question Answering, Sentiment Analysis and Text Mining, Speech

0

0

0

0

12:23

06/12/2020

MPNet: Masked and Permuted Pre-training for Language Understanding

Kaitao Song, Xu Tan, Tao Qin and
Jianfeng Lu, Tie-Yan Liu

Keywords Paper

0

0

0

0

3:23

04/07/2020

Curriculum Pre-training for End-to-End Speech Translation

Chengyi Wang, Yu Wu, Shujie Liu and
Ming Zhou, Zhenglu Yang

Keywords Paper

Curriculum Pre-training, End-to-End Translation, speech recognition, transcription learning

0

0

0

0

11:10

01/07/2020

Linguistic Features for Readability Assessment

Tovly Deutsch, Masoud Jasbi, Stuart Shieber

Keywords Paper

0

0

0

0

12:06

26/04/2020

StructBERT: Incorporating Language Structures into Pre-training for Deep Language Understanding

Wei Wang, Bin Bi, Ming Yan and
Chen Wu, Jiangnan Xia, Zuyi Bao, Liwei Peng, Luo Si

Keywords Paper

0

0

0

0

5:34

16/11/2020

Adversarial Self-Supervised Data-Free Distillation for Text Classification

Xinyin Ma, Yongliang Shen, Gongfan Fang and
Chen Chen, Chenghao Jia, Weiming Lu

Keywords Paper

nlp tasks, nlp, compressing models, text generation

0

0

0

0

9:36

19/04/2021

Cross-lingual visual pre-training for multimodal machine translation

Ozan Caglayan, Menekse Kuyu, Mustafa Sercan Amac and
Pranava Madhyastha, Erkut Erdem, Aykut Erdem, Lucia Specia

Keywords Paper

0

0

0

0

6:16

22/09/2020

What does BERT know about books, movies and music? Probing BERT for conversational recommendation

Gustavo Penha, Claudia Hauff

Keywords Paper

conversational recommendation, conversational search, probing

0

0

0

0

2:48

03/05/2021

Pre-training Text-to-Text Transformers for Concept-centric Common Sense

Wangchunshu Zhou, Dong-Ho Lee, Ravi Kiran Selvam and
Seyeon Lee, Xiang Ren

Keywords Paper

Self-supervised Learning, Commonsense Reasoning, Language Model Pre-training

0

0

0

0

4:56

06/12/2021

VidLanKD: Improving Language Understanding via Video-Distilled Knowledge Transfer

Zineng Tang, Jaemin Cho, Hao Tan, Mohit Bansal

Keywords Paper

language

0

0

0

0

10:13

02/02/2021

Towards Semantics-Enhanced Pre-Training: Can Lexicon Definitions Help Learning Sentence Meanings?

Xuancheng Ren, Xu Sun, Houfeng Wang, Qun Liu

Keywords Paper

0

0

0

0

16:04

26/04/2020

ReClor: A Reading Comprehension Dataset Requiring Logical Reasoning

Weihao Yu, Zihang Jiang, Yanfei Dong, Jiashi Feng

Keywords Paper

reading comprehension, logical reasoning, natural language processing

0

0

0

0

4:11

06/12/2020

MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers

Wenhui Wang, Furu Wei, Li Dong and
Hangbo Bao, Nan Yang, Ming Zhou

Keywords Paper

0

0

0

0

3:21

16/11/2020

Recall and Learn: Fine-tuning Deep Pretrained Language Models with Less Forgetting

Sanyuan Chen, Yutai Hou, Yiming Cui and
Wanxiang Che, Ting Liu, Xiangzhan Yu

Keywords Paper

pretraining, pretraining tasks, learning tasks, fine-tuning bert-large

0

0

0

1

10:52

16/11/2020

BERT-EMD: Many-to-Many Layer Mapping for BERT Compression with Earth Mover's Distance

Jianquan Li, Xiaokang Liu, Honghong Zhao and
Ruifeng Xu, Min Yang, Yaohong Jin

Keywords Paper

natural tasks, nlp tasks, matching, many-to-many mapping

0

0

0

0

11:58

16/11/2020

Improving AMR Parsing with Sequence-to-Sequence Pre-training

Dongqin Xu, Junhui Li, Muhua Zhu and
Min Zhang, Guodong Zhou

Keywords Paper

abstract parsing, amr parsing, sequence-to-sequence parsing, machine translation

0

0

0

0

11:42

02/02/2021

Continual Learning for Named Entity Recognition

Natawut Monaikul, Giuseppe Castellucci, Simone Filice, Oleg Rokhlenko

Keywords Paper

0

0

0

0

19:49

26/04/2020

Pretrained Encyclopedia: Weakly Supervised Knowledge-Pretrained Language Model

Wenhan Xiong, Jingfei Du, William Yang Wang, Veselin Stoyanov

Keywords Paper

0

0

0

0

5:00

02/02/2021

Learning to Augment for Data-scarce Domain BERT Knowledge Distillation

Lingyun Feng, Minghui Qiu, Yaliang Li and
Hai-Tao Zheng, Ying Shen

Keywords Paper

0

0

0

0

17:11

16/11/2020

Cross-Thought for Sentence Encoder Pre-training

Shuohang Wang, Yuwei Fang, Siqi Sun and
Zhe Gan, Yu Cheng, Jingjing Liu, Jing Jiang

Keywords Paper

pre-training encoder, large-scale tasks, question answering, predicting words

0

0

0

0

12:06

26/04/2020

Incorporating BERT into Neural Machine Translation

Jinhua Zhu, Yingce Xia, Lijun Wu and
Di He, Tao Qin, Wengang Zhou, Houqiang Li, Tieyan Liu

Keywords Paper

BERT, Neural Machine Translation

0

0

0

0

4:47

04/07/2020

Intermediate-Task Transfer Learning with Pretrained Language Models: When and Why Does It Work?

Yada Pruksachatkun, Jason Phang, Haokun Liu and
Phu Mon Htut, Xiaoyi Zhang, Richard Yuanzhe Pang, Clara Vania, Katharina Kann, Samuel R. Bowman

Keywords Paper

Intermediate-Task Learning, natural tasks, data-rich task, intermediate-task training

0

0

0

0

14:47

06/12/2020

ConvBERT: Improving BERT with Span-based Dynamic Convolution

Zi-Hang Jiang, Weihao Yu, Daquan Zhou and
Yunpeng Chen, Jiashi Feng, Shuicheng Yan

Keywords Paper

0

0

0

0

3:20

02/02/2021

LRC-BERT: Latent-representation Contrastive Knowledge Distillation for Natural Language Understanding

Hao Fu, Shaojun Zhou, Qihong Yang and
Junjie Tang, Guiquan Liu, Kaikui Liu, Xiaolong Li

Keywords Paper

0

0

0

0

15:25

02/02/2021

Future-Guided Incremental Transformer for Simultaneous Translation

Shaolei Zhang, Yang Feng, Liangyou Li

Keywords Paper

0

0

0

0

14:44

25/07/2020

Leveraging adversarial training in self-learning for cross-lingual text classification

Xin Dong, Yaxin Zhu, Yupeng Zhang and
Zuohui Fu, Dongkuan Xu, Sen Yang, Gerard Melo

Keywords Paper

multilingual, semantics, text classification, cross-lingual

0

0

0

0

9:19

04/07/2020

Improving Disfluency Detection by Self-Training a Self-Attentive Model

Paria Jamshid Lou, Mark Johnson

Keywords Paper

Disfluency Detection, joint parsing, Self-Attentive Model, Self-attentive parsers

0

0

0

0

12:37

16/11/2020

Self-Supervised Meta-Learning for Few-Shot Natural Language Classification Tasks

Trapit Bansal, Rishikesh Jha, Tsendsuren Munkhdalai, Andrew McCallum

Keywords Paper

nlp applications, fine-tuning, meta-learning problem, supervised tasks

0

0

0

0

11:49

05/12/2020

Investigating learning dynamics of BERT fine-tuning

Yaru Hao, Li Dong, Furu Wei, Ke Xu

Keywords Paper

0

0

0

0

7:10

16/11/2020

Improving Grammatical Error Correction Models with Purpose-Built Adversarial Examples

Lihao Wang, Xiaoqing Zheng

Keywords Paper

grammatical correction, sequence-to-sequence learning, neural networks, gec

0

0

0

0

11:40

16/11/2020

SLM: Learning a Discourse Language Representation with Sentence Unshuffling

Haejun Lee, Drew A. Hudson, Kangwook Lee, Christopher D. Manning

Keywords Paper

nlp, sentence-level modeling, discourse representation, pre-training methods

0

0

0

0

9:21

01/07/2020

Go Figure! Multi-task transformer-based architecture for metaphor detection using idioms: ETS team in 2020 metaphor shared task

Xianyang Chen, Chee Wee (Ben) Leong, Michael Flor, Beata Beigman Klebanov

Keywords Paper

0

0

0

0

4:42

04/07/2020

Single-/Multi-Source Cross-Lingual NER via Teacher-Student Learning on Unlabeled Data in Target Language

Qianhui Wu, Zijia Lin, Börje Karlsson and
Jian-Guang Lou, Biqing Huang

Keywords Paper

Single-/Multi-Source NER, named problem, cross-lingual NER, single-source NER

0

0

0

0

10:54

19/04/2021

Maximal multiverse learning for promoting cross-task generalization of fine-tuned language models

Itzik Malkiel, Lior Wolf

Keywords Paper

0

0

0

0

8:32