VL-BERT: Pre-training of Generic Visual-Linguistic Representations

26/04/2020

VL-BERT: Pre-training of Generic Visual-Linguistic Representations

Weijie Su, Xizhou Zhu, Yue Cao, Bin Li, Lewei Lu, Furu Wei, Jifeng Dai

Keywords: Visual-Linguistic, Generic Representation, Pre-training

Abstract Paper Code Similar Papers

Abstract: We introduce a new pre-trainable generic representation for visual-linguistic tasks, called Visual-Linguistic BERT (VL-BERT for short). VL-BERT adopts the simple yet powerful Transformer model as the backbone, and extends it to take both visual and linguistic embedded features as input. In it, each element of the input is either of a word from the input sentence, or a region-of-interest (RoI) from the input image. It is designed to fit for most of the visual-linguistic downstream tasks. To better exploit the generic representation, we pre-train VL-BERT on the massive-scale Conceptual Captions dataset, together with text-only corpus. Extensive empirical analysis demonstrates that the pre-training procedure can better align the visual-linguistic clues and benefit the downstream tasks, such as visual commonsense reasoning, visual question answering and referring expression comprehension. It is worth noting that VL-BERT achieved the first place of single model on the leaderboard of the VCR benchmark.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at ICLR 2020 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

26/04/2020

StructBERT: Incorporating Language Structures into Pre-training for Deep Language Understanding

Wei Wang, Bin Bi, Ming Yan and
Chen Wu, Jiangnan Xia, Zuyi Bao, Liwei Peng, Luo Si

Keywords Paper

0

0

0

0

5:34

14/06/2020

Learning Representations by Predicting Bags of Visual Words

Spyros Gidaris, Andrei Bursuc, Nikos Komodakis and
Patrick Pérez, Matthieu Cord

Keywords Paper

representation learning, self-supervised learning, unsupervised learning, discrete representations, bag of visual words, image understanding, deep learning, convolutional neural networks

0

0

0

0

1:01

03/05/2021

InfoBERT: Improving Robustness of Language Models from An Information Theoretic Perspective

Boxin Wang, Shuohang Wang, Yu Cheng and
Zhe Gan, Ruoxi Jia, Bo Li, Jingjing Liu

Keywords Paper

adversarial training, QA, NLI, BERT, information theory, adversarial robustness

0

0

0

0

5:21

18/07/2021

Unifying Vision-and-Language Tasks via Text Generation

Jaemin Cho, Jie Lei, Hao Tan, Mohit Bansal

Keywords Paper

Algorithms, Multimodal Learning

0

0

0

0

4:58

06/12/2021

Referring Transformer: A One-step Approach to Multi-task Visual Grounding

Muchen Li, Leonid Sigal

Keywords Paper

transformers, vision

0

0

0

0

7:54

06/12/2020

MPNet: Masked and Permuted Pre-training for Language Understanding

Kaitao Song, Xu Tan, Tao Qin and
Jianfeng Lu, Tie-Yan Liu

Keywords Paper

0

0

0

0

3:23

22/09/2020

What does BERT know about books, movies and music? Probing BERT for conversational recommendation

Gustavo Penha, Claudia Hauff

Keywords Paper

conversational recommendation, conversational search, probing

0

0

0

0

2:48

03/05/2021

Pre-training Text-to-Text Transformers for Concept-centric Common Sense

Wangchunshu Zhou, Dong-Ho Lee, Ravi Kiran Selvam and
Seyeon Lee, Xiang Ren

Keywords Paper

Self-supervised Learning, Commonsense Reasoning, Language Model Pre-training

0

0

0

0

4:56

16/11/2020

VD-BERT: A Unified Vision and Dialog Transformer with BERT

Yue Wang, Shafiq Joty, Michael Lyu and
Irwin King, Caiming Xiong, Steven C.H. Hoi

Keywords Paper

visual dialog, vision-language task, visual tasks, answer ranking

0

0

0

0

11:54

16/11/2020

Neural Mask Generator: Learning to Generate Adaptive Word Maskings for Language Model Adaptation

Minki Kang, Moonsu Han, Sung Ju Hwang

Keywords Paper

self-supervised pre-training, question answering, task, reinforcement learning

0

0

0

0

12:00

08/12/2020

Learning distributed sentence vectors with bi-directional 3D convolutions

Bin Liu, Liang Wang, Guosheng Yin

Keywords Paper

0

0

0

0

3:07

06/12/2020

Incorporating BERT into Parallel Sequence Decoding with Adapters

Junliang Guo, Zhirui Zhang, Linli Xu and
Hao-Ran Wei, Boxing Chen, Enhong Chen

Keywords Paper

0

0

0

0

3:17

19/08/2021

Exemplification Modeling: Can You Give Me an Example, Please?

Edoardo Barba, Luigi Procopio, Caterina Lacerra and
Tommaso Pasini, Roberto Navigli

Keywords Paper

Natural Language Processing, Natural Language Semantics, Resources and Evaluation

0

0

0

0

14:47

19/08/2021

Improving Context-Aware Neural Machine Translation with Source-side Monolingual Documents

Linqing Chen, Junhui Li, Zhengxian Gong and
Xiangyu Duan, Boxing Chen, Weihua Luo, Min Zhang, Guodong Zhou

Keywords Paper

Natural Language Processing, Machine Translation

0

0

0

0

12:48

12/07/2020

Pseudo-Masked Language Models for Unified Language Model Pre-Training

Hangbo Bao, Li Dong, Furu Wei and
Wenhui Wang, Nan Yang, Xiaodong Liu, Yu Wang, Jianfeng Gao, Songhao Piao, Ming Zhou, Hsiao-Wuen Hon

Keywords Paper

Applications - Language, Speech and Dialog

0

0

0

0

13:55

14/06/2020

Learn to Augment: Joint Data Augmentation and Network Optimization for Text Recognition

Canjie Luo, Yuanzhi Zhu, Lianwen Jin, Yongpan Wang

Keywords Paper

data augmentation, text recognition, joint training

0

0

0

0

0:59

26/04/2020

Pretrained Encyclopedia: Weakly Supervised Knowledge-Pretrained Language Model

Wenhan Xiong, Jingfei Du, William Yang Wang, Veselin Stoyanov

Keywords Paper

0

0

0

0

5:00

02/02/2021

Attributes-Guided and Pure-Visual Attention Alignment for Few-Shot Recognition

Siteng Huang, Min Zhang, Yachen Kang, Donglin Wang

Keywords Paper

0

0

0

0

17:04

14/06/2020

Vision-Language Navigation With Self-Supervised Auxiliary Reasoning Tasks

Fengda Zhu, Yi Zhu, Xiaojun Chang, Xiaodan Liang

Keywords Paper

computer vision, vision language navigation, reinforcement learning

0

0

0

0

4:25

16/11/2020

Meta Fine-Tuning Neural Language Models for Multi-Domain Text Mining

Chengyu Wang, Minghui Qiu, Jun Huang, Xiaofeng He

Keywords Paper

nlp tasks, fine-tuning, learning process, multi-domain tasks

0

0

0

0

9:58

03/05/2021

GraPPa: Grammar-Augmented Pre-Training for Table Semantic Parsing

Tao Yu, Jason Wu, Xi V Lin and
bailin wang, Yi Tan, Xinyi Yang, Dragomir Radev, Richard Socher, Caiming Xiong

Keywords Paper

pre-training, nlp, semantic parsing, text-to-sql

0

0

0

0

5:13

18/07/2021

SparseBERT: Rethinking the Importance Analysis in Self-attention

Han Shi, Jiahui Gao, Xiaozhe Ren and
Hang Xu, Xiaodan Liang, Zhenguo Li, James Kwok

Keywords Paper

Applications, Natural Language Processing

0

0

0

0

5:13

22/11/2021

Grounded Situation Recognition with Transformers

Junhyeong Cho, Youngseok Yoon, Hyeonjun Lee, Suha Kwak

Keywords Paper

grounded situation recognition, situation recognition, transformers, scene understanding

0

0

0

0

3:00

01/07/2020

Go Figure! Multi-task transformer-based architecture for metaphor detection using idioms: ETS team in 2020 metaphor shared task

Xianyang Chen, Chee Wee (Ben) Leong, Michael Flor, Beata Beigman Klebanov

Keywords Paper

0

0

0

0

4:42

16/11/2020

Cross-Thought for Sentence Encoder Pre-training

Shuohang Wang, Yuwei Fang, Siqi Sun and
Zhe Gan, Yu Cheng, Jingjing Liu, Jing Jiang

Keywords Paper

pre-training encoder, large-scale tasks, question answering, predicting words

0

0

0

0

12:06

16/11/2020

SLM: Learning a Discourse Language Representation with Sentence Unshuffling

Haejun Lee, Drew A. Hudson, Kangwook Lee, Christopher D. Manning

Keywords Paper

nlp, sentence-level modeling, discourse representation, pre-training methods

0

0

0

0

9:21

18/07/2021

Latent Space Energy-Based Model of Symbol-Vector Coupling for Text Generation and Classification

Bo Pang, Ying Nian Wu

Keywords Paper

Algorithms, Unsupervised Learning

0

0

0

0

5:17

08/12/2020

Mixup-Transformer: Dynamic Data Augmentation for NLP Tasks

Lichao Sun, Congying Xia, Wenpeng Yin and
Tingting Liang, Philip Yu, Lifang He

Keywords Paper

0

0

0

0

9:52

19/04/2021

Retrieval, re-ranking and multi-task learning for knowledge-base question answering

Zhiguo Wang, Patrick Ng, Ramesh Nallapati, Bing Xiang

Keywords Paper

0

0

0

0

11:12

16/11/2020

Recall and Learn: Fine-tuning Deep Pretrained Language Models with Less Forgetting

Sanyuan Chen, Yutai Hou, Yiming Cui and
Wanxiang Che, Ting Liu, Xiangzhan Yu

Keywords Paper

pretraining, pretraining tasks, learning tasks, fine-tuning bert-large

0

0

0

1

10:52

19/04/2021

Cross-lingual visual pre-training for multimodal machine translation

Ozan Caglayan, Menekse Kuyu, Mustafa Sercan Amac and
Pranava Madhyastha, Erkut Erdem, Aykut Erdem, Lucia Specia

Keywords Paper

0

0

0

0

6:16

05/12/2020

Investigating learning dynamics of BERT fine-tuning

Yaru Hao, Li Dong, Furu Wei, Ke Xu

Keywords Paper

0

0

0

0

7:10

19/04/2021

Maximal multiverse learning for promoting cross-task generalization of fine-tuned language models

Itzik Malkiel, Lior Wolf

Keywords Paper

0

0

0

0

8:32

04/07/2020

TextBrewer: An Open-Source Knowledge Distillation Toolkit for Natural Language Processing

Ziqing Yang, Yiming Cui, Zhipeng Chen and
Wanxiang Che, Ting Liu, Shijin Wang, Guoping Hu

Keywords Paper

Natural Processing, supervised tasks, text classification, reading comprehension

0

0

0

0

10:36

07/09/2020

Robust Scene Text Recognition Through Adaptive Image Enhancement

Ye Qian, Yuyang Wang, Feng Su

Keywords Paper

text recognition, image enhancement, spatial rectification, end-to-end, scene text

0

0

0

0

7:50

03/05/2021

SCoRe: Pre-Training for Context Representation in Conversational Semantic Parsing

Tao Yu, Rui Zhang, Alex Polozov and
Christopher Meek, Ahmed H Awadallah

Keywords Paper

0

0

0

0

5:11

04/07/2020

SPECTER: Document-level Representation Learning using Citation-informed Transformers

Arman Cohan, Sergey Feldman, Iz Beltagy and
Doug Downey, Daniel Weld

Keywords Paper

Document-level Learning, Representation learning, natural systems, classification

0

0

0

0

13:07

04/07/2020

SpanBERT: Improving Pre-training by Representing and Predicting Spans

Mandar Joshi, Danqi Chen, Yinhan Liu and
Daniel S. Weld, Luke Zettlemoyer, Omer Levy

Keywords Paper

span tasks, question answering, coreference resolution, OntoNotes task

0

0

0

0

14:14

22/06/2020

How Context Affects Language Models' Factual Predictions

Fabio Petroni, Patrick Lewis, Aleksandra Piktus and
Tim Rocktäschel, Yuxiang Wu, Alexander H. Miller, Sebastian Riedel

Keywords Paper

0

0

0

0

10:16

16/11/2020

Improving AMR Parsing with Sequence-to-Sequence Pre-training

Dongqin Xu, Junhui Li, Muhua Zhu and
Min Zhang, Guodong Zhou

Keywords Paper

abstract parsing, amr parsing, sequence-to-sequence parsing, machine translation

0

0

0

0

11:42