Improving Image Captioning with Better Use of Caption

04/07/2020

Improving Image Captioning with Better Use of Caption

Zhan Shi, Xu Zhou, Xipeng Qiu, Xiaodan Zhu

Keywords: Image Captioning, multimodal problem, natural processing, computer community

Abstract Paper Similar Papers

Abstract: Image captioning is a multimodal problem that has drawn extensive attention in both the natural language processing and computer vision community. In this paper, we present a novel image captioning architecture to better explore semantics available in captions and leverage that to enhance both image representation and caption generation. Our models first construct caption-guided visual relationship graphs that introduce beneficial inductive bias using weakly supervised multi-instance learning. The representation is then enhanced with neighbouring and contextual nodes with their textual and visual features. During generation, the model further incorporates visual relationships using multi-task learning for jointly predicting word and object/predicate tag sequences. We perform extensive experiments on the MSCOCO dataset, showing that the proposed framework significantly outperforms the baselines, resulting in the state-of-the-art performance under a wide range of evaluation metrics. The code of our paper has been made publicly available.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at ACL 2020 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

02/02/2021

An Unsupervised Sampling Approach for Image-Sentence Matching Using Document-level Structural Information

Zejun Li, Zhongyu Wei, Zhihao Fan and
Haijun Shan, Xuanjing Huang

Keywords Paper

0

0

0

0

18:39

14/06/2020

Weakly Supervised Visual Semantic Parsing

Alireza Zareian, Svebor Karaman, Shih-Fu Chang

Keywords Paper

scene understanding, scene graph generation, weakly supervised learning, semantic parsing, graph neural networks, visual reasoning

0

0

0

0

5:00

02/02/2021

Entity Structure Within and Throughout: Modeling Mention Dependencies for Document-Level Relation Extraction

Benfeng Xu, Quan Wang, Yajuan Lyu and
Yong Zhu, Zhendong Mao

Keywords Paper

0

0

0

0

14:48

03/05/2021

Bowtie Networks: Generative Modeling for Joint Few-Shot Recognition and Novel-View Synthesis

Zhipeng Bao, Yu-Xiong Wang, Martial Hebert

Keywords Paper

adversarial training, computer vision, object recognition, few-shot learning, generative models

0

0

0

0

5:11

14/06/2020

Semantically Multi-Modal Image Synthesis

Zhen Zhu, Zhiliang Xu, Ansheng You, Xiang Bai

Keywords Paper

label-to-image, semantically multi-modal image synthesis, smis, groupdnet, group convolution, cg-norm

0

0

0

0

1:01

06/12/2021

Integrating Tree Path in Transformer for Code Representation

Han Peng, Ge Li, Wenhan Wang and
YunFei Zhao, Zhi Jin

Keywords Paper

machine learning, transformers

0

0

0

0

4:42

07/09/2020

Multi-label Zero-shot Classification by Learning to Transfer from External Knowledge

He Huang, Wei Tang, Philip Yu and
Yuanwei Chen, Wenhao Zheng, Qing-Guo Chen

Keywords Paper

zero-shot learning, graph neural networks, multi-label classification

0

0

0

0

10:33

07/09/2020

Advancing weakly supervised cross-domain alignment with optimal transport

Siyang Yuan, Ke Bai, Liqun Chen and
Yizhe Zhang, Chenyang Tao, Chunyuan Li, Guoyin Wang, Ricardo Henao, Lawrence Carin Duke

Keywords Paper

Optimal Transport, Cross Domain Alignment

0

0

0

0

10:04

14/06/2020

Webly Supervised Knowledge Embedding Model for Visual Reasoning

Wenbo Zheng, Lan Yan, Chao Gou, Fei-Yue Wang

Keywords Paper

visual reasoning, webly supervised learning

0

0

0

0

1:01

02/02/2021

Train a One-Million-Way Instance Classifier for Unsupervised Visual Representation Learning

Yu Liu, Lianghua Huang, Pan Pan and
Bin Wang, Yinghui Xu, Rong Jin

Keywords Paper

0

0

0

0

15:15

06/12/2021

Towards Context-Agnostic Learning Using Synthetic Data

Charles Jin, Martin Rinard

Keywords Paper

machine learning, vision

0

0

0

0

14:20

03/05/2021

Prototypical Representation Learning for Relation Extraction

Ning Ding, Xiaobin Wang, Yao Fu and
Guangwei Xu, Rui Wang, Pengjun Xie, Ying Shen, Fei Huang, Hai-Tao Zheng, Rui Zhang

Keywords Paper

NLP, Representation Learning, Relation Extraction

0

0

0

0

5:14

12/07/2020

Learning Structured Latent Factors from Dependent Data:A Generative Model Framework from Information-Theoretic Perspective

Ruixiang ZHANG, Katsuhiko Ishiguro, Masanori Koyama

Keywords Paper

Learning Theory

0

0

0

0

14:46

08/12/2020

Probing Multimodal Embeddings for Linguistic Properties: the Visual-Semantic Case

Adam Dahlgren Lindström, Johanna Björklund, Suna Bensch, Frank Drewes

Keywords Paper

0

0

0

0

14:20

06/12/2021

Mixed Supervised Object Detection by Transferring Mask Prior and Semantic Similarity

Yan Liu, Zhijie Zhang, Li Niu and
Junjie Chen, Liqing Zhang

Keywords Paper

vision, transfer learning

0

0

0

0

9:11

06/12/2021

Align before Fuse: Vision and Language Representation Learning with Momentum Distillation

Junnan Li, Ramprasaath Selvaraju, Akhilesh Gotmare and
Shafiq Joty, Caiming Xiong, Steven Chu Hong Hoi

Keywords Paper

transformers, vision, representation learning

0

0

0

0

9:40

23/08/2020

Spectrum-guided adversarial disparity learning

Zhe Liu, Lina Yao, Lei Bai and
Xianzhi Wang, Can Wang

Keywords Paper

adversarial autoencoder, generative models, intraclass variability, activity recognition

0

0

0

0

14:30

04/07/2020

A Transformer-based Approach for Source Code Summarization

Wasi Ahmad, Saikat Chakraborty, Baishakhi Ray, Kai-Wei Chang

Keywords Paper

Source Summarization, summarization, ablation studies, Transformer-based Approach

0

0

0

0

6:14

06/12/2021

Referring Transformer: A One-step Approach to Multi-task Visual Grounding

Muchen Li, Leonid Sigal

Keywords Paper

transformers, vision

0

0

0

0

7:54

02/02/2021

Exploring Explainable Selection to Control Abstractive Summarization

Haonan Wang, Yang Gao, Yu Bai and
Mirella Lapata, Heyan Huang

Keywords Paper

0

0

0

0

18:50

05/01/2021

Two-Level Adversarial Visual-Semantic Coupling for Generalized Zero-Shot Learning

Shivam Chandhok, Vineeth N Balasubramanian

Keywords Paper

0

0

0

0

4:59

18/07/2021

Unifying Vision-and-Language Tasks via Text Generation

Jaemin Cho, Jie Lei, Hao Tan, Mohit Bansal

Keywords Paper

Algorithms, Multimodal Learning

0

0

0

0

4:58

05/01/2021

ChartOCR: Data Extraction From Charts Images via a Deep Hybrid Framework

Junyu Luo, Zekun Li, Jinpeng Wang, Chin-Yew Lin

Keywords Paper

0

0

0

0

4:58

01/07/2020

Character aware models with similarity learning for metaphor detection

Tarun Kumar, Yashvardhan Sharma

Keywords Paper

0

0

0

0

4:28

16/11/2020

Learning to Represent Image and Text with Denotation Graph

Bowen Zhang, Hexiang Hu, Vihan Jain and
Eugene Ie, Fei Sha

Keywords Paper

cross-modal retrieval, referring expression, compositional recognition, pre-training

0

0

0

0

10:59

19/04/2021

Expanding, retrieving and infilling: Diversifying cross-domain question generation with flexible templates

Xiaojing Yu, Anxiao Jiang

Keywords Paper

0

0

0

0

11:40

14/06/2020

Graph Structured Network for Image-Text Matching

Chunxiao Liu, Zhendong Mao, Tianzhu Zhang and
Hongtao Xie, Bin Wang, Yongdong Zhang

Keywords Paper

image-text matching, graph network, cross-modal, fine-grained correspondence, visual-semantic

0

0

0

0

1:01

06/12/2020

Multimodal Graph Networks for Compositional Generalization in Visual Question Answering

Raeid Saqur, Karthik Narasimhan

Keywords Paper

0

0

0

0

3:11

02/02/2021

RpBERT: A Text-image Relation Propagation-based BERT Model for Multimodal NER

Lin Sun, Jiquan Wang, Kai Zhang and
Yindu Su, Fangsheng Weng

Keywords Paper

0

0

0

0

17:21

02/02/2021

Encoder-Decoder Based Unified Semantic Role Labeling with Label-Aware Syntax

Hao Fei, Fei Li, Bobo Li, Donghong Ji

Keywords Paper

0

0

0

0

16:10

05/01/2021

TranstextNet: Transducing Text for Recognizing Unseen Visual Relationships

Gal S. Kenigsfield, Ran El-Yaniv

Keywords Paper

0

0

0

0

5:00

14/06/2020

JL-DCF: Joint Learning and Densely-Cooperative Fusion Framework for RGB-D Salient Object Detection

Keren Fu, Deng-Ping Fan, Ge-Peng Ji, Qijun Zhao

Keywords Paper

visual saliency, salient object detection, rgb-d, depth information, joint learning, dense connections, multi-modal features, feature fusion, deep learning, encoder-decoder

0

0

0

0

1:01

16/11/2020

Neural Deepfake Detection with Factual Structure of Text

Wanjun Zhong, Duyu Tang, Zenan Xu and
Ruize Wang, Nan Duan, Ming Zhou, Jiahai Wang, Jian Yin

Keywords Paper

deepfake detection, automatically text, deepfake text, natural models

0

0

0

0

10:48

14/06/2020

Visual-Semantic Matching by Exploring High-Order Attention and Distraction

Yongzhi Li, Duo Zhang, Yadong Mu

Keywords Paper

visual semantic matching, cross modal retrieval, scene graph, visual distraction, graph matching, gcn

0

0

0

0

1:01

19/04/2021

LESA: Linguistic encapsulation and semantic amalgamation based generalised claim detection from online content

Shreya Gupta, Parantak Singh, Megha Sundriyal and
Md. Shad Akhtar, Tanmoy Chakraborty

Keywords Paper

0

0

0

0

9:51

06/12/2021

Multi-modal Dependency Tree for Video Captioning

Wentian Zhao, Xinxiao Wu, Jiebo Luo

Keywords Paper

reinforcement learning and planning, graph learning, language

0

0

0

0

6:02

06/12/2021

Looking Beyond Single Images for Contrastive Semantic Segmentation Learning

FEIHU ZHANG, Philip Torr, Rene Ranftl, Stephan Richter

Keywords Paper

machine learning, vision, contrastive learning, representation learning

0

0

0

0

14:48

02/02/2021

Visualization of Supervised and Self-Supervised Neural Networks via Attribution Guided Factorization

Shir Gur, Ameen Ali, Lior Wolf

Keywords Paper

0

0

0

0

14:14

04/07/2020

Enhancing Cross-target Stance Detection with Transferable Semantic-Emotion Knowledge

Bowen Zhang, Min Yang, Xutao Li and
Yunming Ye, Xiaofei Xu, Kuai Dai

Keywords Paper

Cross-target Detection, Stance detection, knowledge transfer, stance classifier

0

0

0

0

11:57

06/12/2020

Joint Contrastive Learning with Infinite Possibilities

Qi Cai, Yu Wang, Yingwei Pan and
Ting Yao, Tao Mei

Keywords Paper

0

0

0

0

3:06