Incorporating BERT into Parallel Sequence Decoding with Adapters

Abstract: While large scale pre-trained language models such as BERT have achieved great success on various natural language understanding tasks, how to efficiently and effectively incorporate them into sequence-to-sequence models and the corresponding text generation tasks remains a non-trivial problem. In this paper, we propose to address this problem by taking two different BERT models as the encoder and decoder respectively, and fine-tuning them by introducing simple and lightweight adapter modules, which are inserted between BERT layers and tuned on the task-specific dataset. In this way, we obtain a flexible and efficient model which is able to jointly leverage the information contained in the source-side and target-side BERT models, while bypassing the catastrophic forgetting problem. Each component in the framework can be considered as a plug-in unit, making the framework flexible and task agnostic. Our framework is based on a parallel sequence decoding algorithm named Mask-Predict considering the bi-directional and conditional independent nature of BERT, and can be adapted to traditional autoregressive decoding easily. We conduct extensive experiments on neural machine translation tasks where the proposed method consistently outperforms autoregressive baselines while reducing the inference latency by half, and achieves $36.49$/$33.57$ BLEU scores on IWSLT14 German-English/WMT14 German-English translation. When adapted to autoregressive decoding, the proposed method achieves $30.60$/$43.56$ BLEU scores on WMT14 English-German/English-French translation, on par with the state-of-the-art baseline models.

06/12/2020

Incorporating BERT into Parallel Sequence Decoding with Adapters

Junliang Guo, Zhirui Zhang, Linli Xu, Hao-Ran Wei, Boxing Chen, Enhong Chen

Comments

Similar Papers

MPNet: Masked and Permuted Pre-training for Language Understanding

Kaitao Song, Xu Tan, Tao Qin and Jianfeng Lu, Tie-Yan Liu

Keywords Abstract Paper

Towards non-task-specific distillation of BERT via sentence representation approximation

Bowen Wu, Huan Zhang, MengYuan Li and Zongsheng Wang, Qihang Feng, Junhong Huang, Baoxun Wang

Keywords Abstract Paper

Automatic Mixed-Precision Quantization Search of BERT

Changsheng Zhao, Ting Hua, Yilin Shen and Qian Lou, Hongxia Jin

Keywords Abstract Paper

Machine Learning, Deep Learning, NLP Applications and Tools, Text Classification

Specializing Unsupervised Pretraining Models for Word-Level Semantic Similarity

Anne Lauscher, Ivan Vulić, Edoardo Maria Ponti and Anna Korhonen, Goran Glavaš

Keywords Abstract Paper

Exploring the Limits of Simple Learners in Knowledge Distillation for Document Classification with DocBERT

Ashutosh Adhikari, Achyudh Ram, Raphael Tang and William L. Hamilton, Jimmy Lin

Keywords Abstract Paper

ConvBERT: Improving BERT with Span-based Dynamic Convolution

Zi-Hang Jiang, Weihao Yu, Daquan Zhou and Yunpeng Chen, Jiashi Feng, Shuicheng Yan

Keywords Abstract Paper

Improving the Efficiency and Effectiveness for BERT-based Entity Resolution

Bing Li, Yukai Miao, Yaoshu Wang and Yifang Sun, Wei Wang

Keywords Abstract Paper

BERT-ATTACK: Adversarial Attack Against BERT Using BERT

Linyang Li, Ruotian Ma, Qipeng Guo and Xiangyang Xue, Xipeng Qiu

Keywords Abstract Paper

adversarial attacks, downstream tasks, calculation, gradient-based methods

Retrieval, re-ranking and multi-task learning for knowledge-base question answering

Zhiguo Wang, Patrick Ng, Ramesh Nallapati, Bing Xiang

Keywords Abstract Paper

A Probabilistic Formulation of Unsupervised Text Style Transfer

Junxian He, Xinyi Wang, Graham Neubig, Taylor Berg-Kirkpatrick

Keywords Abstract Paper

unsupervised text style transfer, deep latent sequence model

A Sequence-to-Set Network for Nested Named Entity Recognition

Zeqi Tan, Yongliang Shen, Shuai Zhang and Weiming Lu, Yueting Zhuang

Keywords Abstract Paper

Natural Language Processing, Information Extraction, Named Entities

Semi-Supervised Neural Architecture Search

Renqian Luo, Xu Tan, Rui Wang and Tao Qin, Enhong Chen, Tie-Yan Liu

Keywords Abstract Paper

EfficientTTS: An Efficient and High-Quality Text-to-Speech Architecture

Chenfeng Miao, Liang Shuang, Zhengchen Liu and Chen Minchuan, Jun Ma, Shaojun Wang, Jing Xiao

Keywords Abstract Paper

Applications, Audio and Speech Processing

InfoBERT: Improving Robustness of Language Models from An Information Theoretic Perspective

Boxin Wang, Shuohang Wang, Yu Cheng and Zhe Gan, Ruoxi Jia, Bo Li, Jingjing Liu

Keywords Abstract Paper

adversarial training, QA, NLI, BERT, information theory, adversarial robustness

Lexically Constrained Neural Machine Translation with Explicit Alignment Guidance

Guanhua Chen, Yun Chen, Victor O.K. Li

Keywords Abstract Paper

Improving Neural Language Generation with Spectrum Control

Lingxiao Wang, Jing Huang, Kevin Huang and Ziniu Hu, Guangtao Wang, Quanquan Gu

Keywords Abstract Paper

LAReQA: Language-Agnostic Answer Retrieval from a Multilingual Pool

Uma Roy, Noah Constant, Rami Al-Rfou and Aditya Barua, Aaron Phillips, Yinfei Yang

Keywords Abstract Paper

language-agnostic retrieval, cross-lingual tasks, cross-lingual retrieval, alignment

Composed Fine-Tuning: Freezing Pre-Trained Denoising Autoencoders for Improved Generalization

Sang Michael Xie, Tengyu Ma, Percy Liang

Keywords Abstract Paper

Algorithms, Multitask, Transfer, and Meta Learning

On the Cross-lingual Transferability of Monolingual Representations

Mikel Artetxe, Sebastian Ruder, Dani Yogatama

Keywords Abstract Paper

zero-shot setting, Cross-lingual Representations, unsupervised models, joint training

Bridging the Gap in Multilingual Semantic Role Labeling: a Language-Agnostic Approach

Simone Conia, Roberto Navigli

Keywords Abstract Paper

Reconciling enumerative and deductive program synthesis

Kangjing Huang, Xiaokang Qiu, Peiyuan Shen, Yanjun Wang

Keywords Abstract Paper

divide-and-conquer, enumerative synthesis, syntax-guided synthesis, deductive synthesis

Residual Energy-Based Models for Text Generation

Yuntian Deng, Anton Bakhtin, Myle Ott and Arthur Szlam, Marc'Aurelio Ranzato

Keywords Abstract Paper

Kaitao Song, Xu Tan, Tao Qin and
Jianfeng Lu, Tie-Yan Liu

Keywords Paper

Bowen Wu, Huan Zhang, MengYuan Li and
Zongsheng Wang, Qihang Feng, Junhong Huang, Baoxun Wang

Keywords Paper

Changsheng Zhao, Ting Hua, Yilin Shen and
Qian Lou, Hongxia Jin

Keywords Paper

Anne Lauscher, Ivan Vulić, Edoardo Maria Ponti and
Anna Korhonen, Goran Glavaš

Keywords Paper

Ashutosh Adhikari, Achyudh Ram, Raphael Tang and
William L. Hamilton, Jimmy Lin

Keywords Paper

Zi-Hang Jiang, Weihao Yu, Daquan Zhou and
Yunpeng Chen, Jiashi Feng, Shuicheng Yan

Keywords Paper

Bing Li, Yukai Miao, Yaoshu Wang and
Yifang Sun, Wei Wang

Keywords Paper

Linyang Li, Ruotian Ma, Qipeng Guo and
Xiangyang Xue, Xipeng Qiu

Keywords Paper

Keywords Paper

Keywords Paper

Zeqi Tan, Yongliang Shen, Shuai Zhang and
Weiming Lu, Yueting Zhuang

Keywords Paper

Renqian Luo, Xu Tan, Rui Wang and
Tao Qin, Enhong Chen, Tie-Yan Liu

Keywords Paper

Chenfeng Miao, Liang Shuang, Zhengchen Liu and
Chen Minchuan, Jun Ma, Shaojun Wang, Jing Xiao

Keywords Paper

Boxin Wang, Shuohang Wang, Yu Cheng and
Zhe Gan, Ruoxi Jia, Bo Li, Jingjing Liu

Keywords Paper

Keywords Paper

Lingxiao Wang, Jing Huang, Kevin Huang and
Ziniu Hu, Guangtao Wang, Quanquan Gu

Keywords Paper

Uma Roy, Noah Constant, Rami Al-Rfou and
Aditya Barua, Aaron Phillips, Yinfei Yang

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Yuntian Deng, Anton Bakhtin, Myle Ott and
Arthur Szlam, Marc'Aurelio Ranzato

Keywords Paper

Yan Zhang, Ruidan He, Zuozhu Liu and
Kwan Hui Lim, Lidong Bing

Keywords Paper

Keywords Paper

Shuo Sun, Marina Fomicheva, Frédéric Blain and
Vishrav Chaudhary, Ahmed El-Kishky, Adithya Renduchintala, Francisco Guzmán, Lucia Specia

Keywords Paper

Fabio Petroni, Patrick Lewis, Aleksandra Piktus and
Tim Rocktäschel, Yuxiang Wu, Alexander H. Miller, Sebastian Riedel

Keywords Paper

Bohan Li, Hao Zhou, Junxian He and
Mingxuan Wang, Yiming Yang, Lei Li

Keywords Paper

Keywords Paper

Keywords Paper

Adhiguna Kuncoro, Lingpeng Kong, Daniel Fried and
Dani Yogatama, Laura Rimell, Chris Dyer, Phil Blunsom

Keywords Paper

Keywords Paper

Taesun Whang, Dongyub Lee, Dongsuk Oh and
Chanhee Lee, Kijong Han, Dong-hun Lee, Saebyeok Lee

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper