Bridge the Gap: High-level Semantic Planning for Image Captioning

08/12/2020

Bridge the Gap: High-level Semantic Planning for Image Captioning

Chenxi Yuan, Yang Bai, Chun Yuan

Keywords:

Abstract Paper Similar Papers

Abstract: Recent image captioning models have made much progress for exploring the multi-modal interaction, such as attention mechanisms. Though these mechanisms can boost the interaction, there are still two gaps between the visual and language domains: (1) the gap between the visual features and textual semantics, (2) the gap between the disordering of visual features and the ordering of texts. To bridge the gaps we propose a high-level semantic planning (HSP) mechanism that incorporates both a semantic reconstruction and an explicit order planning. We integrate the planning mechanism to the attention based caption model and propose the High-level Semantic PLanning based Attention Network (HS-PLAN). First, an attention based reconstruction module is designed to reconstruct the visual features with high-level semantic information. Then we apply a pointer network to serialize the features and obtain the explicit order plan to guide the generation. Experiments conducted on MS COCO show that our model outperforms previous methods and achieves the state-of-the-art performance of 133.4% CIDEr-D score.

The video of this talk cannot be embedded. You can watch it here:

https://underline.io/lecture/6251-bridge-the-gap-high-level-semantic-planning-for-image-captioning

(Link will open in new window)

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at COLING 2020 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

06/12/2020

Diverse Image Captioning with Context-Object Split Latent Spaces

Shweta Mahajan, Stefan Roth

Keywords Paper

0

0

0

0

3:19

14/06/2020

Learning Selective Self-Mutual Attention for RGB-D Saliency Detection

Nian Liu, Ni Zhang, Junwei Han

Keywords Paper

rgb-d saliency detection, middle fusion, self-attention, mutual-attention, non-local network, two-stream cnn

0

0

0

0

1:01

14/06/2020

Squeeze-and-Attention Networks for Semantic Segmentation

Zilong Zhong, Zhong Qiu Lin, Rene Bidart and
Xiaodan Hu, Ibrahim Ben Daya, Zhifeng Li, Wei-Shi Zheng, Jonathan Li, Alexander Wong

Keywords Paper

semantic segmentation, squeeze-and-attention, pixel grouping

0

0

0

0

1:01

22/11/2021

Paying Attention to Varying Receptive Fields: Object Detection with Atrous Filters and Vision Transformers

Arthur Jian Shun Lam, Jun Yi Lim, Ricky Sutopo, Vishnu Monn Baskaran

Keywords Paper

object detection, atrous convolution, vision transformers, attention mechanism

0

0

0

0

3:01

14/06/2020

Non-Local Neural Networks With Grouped Bilinear Attentional Transforms

Lu Chi, Zehuan Yuan, Yadong Mu, Changhu Wang

Keywords Paper

attention, non-local, bilinear, image classification, video classification, grouped, data-adaptive

0

0

0

0

1:01

02/02/2021

Improving Image Captioning by Leveraging Intra- and Inter-layer Global Representation in Transformer Network

Jiayi Ji, Yunpeng Luo, Xiaoshuai Sun and
Fuhai Chen, Gen Luo, Yongjian Wu, Yue Gao, Rongrong Ji

Keywords Paper

0

0

0

0

14:13

19/08/2021

Context-Aware Image Inpainting with Learned Semantic Priors

Wendong Zhang, Junwei Zhu, Ying Tai and
Yunbo Wang, Wenqing Chu, Bingbing Ni, Chengjie Wang, Xiaokang Yang

Keywords Paper

Computer Vision, 2D and 3D Computer Vision, Deep Learning

0

0

0

0

13:26

02/02/2021

Object-Centric Image Generation from Layouts

Tristan Sylvain, Pengchuan Zhang, Yoshua Bengio and
R Devon Hjelm, Shikhar Sharma

Keywords Paper

0

0

0

0

17:44

18/11/2020

Bidirectional dependency-guided attention for relation extraction

Xingchen Deng, Lei Zhang, Yixing Fan and
Long Bai, Jiafeng Guo, Pengfei Wang

Keywords Paper

0

0

0

0

10:02

02/02/2021

Learning Visual Context for Group Activity Recognition

Hangjie Yuan, Dong Ni

Keywords Paper

0

0

0

0

16:54

04/07/2020

Neural Topic Modeling with Bidirectional Adversarial Training

Rui Wang, Xuemeng Hu, Deyu Zhou and
Yulan He, Yuxuan Xiong, Chenchen Ye, Haiyang Xu

Keywords Paper

automatic extraction, model inference, neural modeling, topic inference

0

0

0

0

11:17

14/06/2020

Show, Edit and Tell: A Framework for Editing Image Captions

Fawaz Sammani, Luke Melas-Kyriazi

Keywords Paper

image captioning, image description, editing captions, sequence editing, copy mechanism, adaptive copy mechanism, selecting mechanism, copy lstm

0

0

0

0

1:01

06/12/2021

Unsupervised Object-Level Representation Learning from Scene Images

Jiahao Xie, Xiaohang Zhan, Ziwei Liu and
Yew Soon Ong, Chen Change Loy

Keywords Paper

self-supervised learning, representation learning

0

0

0

0

5:01

06/12/2021

CogView: Mastering Text-to-Image Generation via Transformers

Ming Ding, Zhuoyi Yang, Wenyi Hong and
Wendi Zheng, Chang Zhou, Da Yin, Junyang Lin, Xu Zou, Zhou Shao, Hongxia Yang, Jie Tang

Keywords Paper

transformers, generative model

0

0

0

0

10:54

19/08/2021

Dependent Multi-Task Learning with Causal Intervention for Image Captioning

Wenqing Chen, Jidong Tian, Caoyun Fan and
Hao He, Yaohui Jin

Keywords Paper

Machine Learning, Transfer, Adaptation, Multi-task Learning, Natural Language Generation, Language and Vision

0

0

0

0

12:02

14/06/2020

Relation-Aware Global Attention for Person Re-Identification

Zhizheng Zhang, Cuiling Lan, Wenjun Zeng and
Xin Jin, Zhibo Chen

Keywords Paper

relation-aware global attention, attention mechanism, person re-identification, feature relations, global structural information

0

0

0

0

1:01

14/06/2020

X-Linear Attention Networks for Image Captioning

Yingwei Pan, Ting Yao, Yehao Li, Tao Mei

Keywords Paper

image captioning, bilinear pooling, attention mechanism, high order interaction, transformer, lstm, image encoder, language decoder, infinity order interaction, coco

0

0

0

0

1:00

02/02/2021

F2Net: Learning to Focus on the Foreground for Unsupervised Video Object Segmentation

Daizong Liu, Dongdong Yu, Changhu Wang, Pan Zhou

Keywords Paper

0

0

0

0

16:59

06/12/2021

Dual Progressive Prototype Network for Generalized Zero-Shot Learning

Chaoqun Wang, Shaobo Min, Xuejin Chen and
Xiaoyan Sun, Houqiang Li

Keywords Paper

0

0

0

0

10:51

19/04/2021

Crisscrossed captions: Extended intramodal and intermodal semantic similarity judgments for MS-COCO

Zarana Parekh, Jason Baldridge, Daniel Cer and
Austin Waters, Yinfei Yang

Keywords Paper

0

0

0

0

10:19

12/07/2020

Learning to Combine Top-Down and Bottom-Up Signals in Recurrent Neural Networks with Attention over Modules

Sarthak Mittal, Alex Lamb, Anirudh Goyal and
Vikram Voleti, Murray Shanahan, Guillaume Lajoie, Michael Mozer, Yoshua Bengio

Keywords Paper

Sequential, Network, and Time-Series Modeling

0

0

0

0

12:37

06/12/2021

Align before Fuse: Vision and Language Representation Learning with Momentum Distillation

Junnan Li, Ramprasaath Selvaraju, Akhilesh Gotmare and
Shafiq Joty, Caiming Xiong, Steven Chu Hong Hoi

Keywords Paper

transformers, vision, representation learning

0

0

0

0

9:40

30/11/2020

Background Learnable Cascade for Zero-Shot Object Detection

Ye Zheng, Ruoran Huang, Chuanqi Han and
Xi Huang, Li Cui

Keywords Paper

0

0

0

0

7:26

06/12/2020

RelationNet++: Bridging Visual Representations for Object Detection via Transformer Decoder

Cheng Chi, Fangyun Wei, Han Hu

Keywords Paper

0

0

0

0

2:20

05/01/2021

TranstextNet: Transducing Text for Recognizing Unseen Visual Relationships

Gal S. Kenigsfield, Ran El-Yaniv

Keywords Paper

0

0

0

0

5:00

02/02/2021

Classification by Attention: Scene Graph Classification with Prior Knowledge

Sahand Sharifzadeh, Sina Moayed Baharlou, Volker Tresp

Keywords Paper

0

0

0

0

17:04

02/02/2021

CompFeat: Comprehensive Feature Aggregation for Video Instance Segmentation

Yang Fu, Linjie Yang, Ding Liu and
Thomas S. Huang, Humphrey Shi

Keywords Paper

0

0

0

0

16:24

06/12/2021

Do Vision Transformers See Like Convolutional Neural Networks?

Maithra Raghu, Thomas Unterthiner, Simon Kornblith and
Chiyuan Zhang, Alexey Dosovitskiy

Keywords Paper

deep learning, machine learning, transformers, vision, representation learning, transfer learning

0

0

0

0

13:13

30/11/2020

Jointly Discriminating and Frequent Visual Representation Mining

Qiannan Wang, Ying Zhou, ZhaoYan Zhu and
Xuefeng Liang, Yu Gu

Keywords Paper

0

0

0

0

8:13

06/12/2021

Revisiting Contrastive Methods for Unsupervised Learning of Visual Representations

Wouter Van Gansbeke, Simon Vandenhende, Stamatios Georgoulis, Luc V Gool

Keywords Paper

self-supervised learning, vision, contrastive learning, representation learning

0

0

0

0

13:32

22/09/2020

MEANTIME: Mixture of attention mechanisms with multi-temporal embeddings for sequential recommendation

Sung Min Cho, Eunhyeok Park, Sungjoo Yoo

Keywords Paper

Self-attention, Sequential Recommendation, Temporal Embedding, BERT

0

0

0

0

3:10

30/11/2020

Show, Conceive and Tell: Image Captioning with Prospective Linguistic Information

Yiqing Huang, Jiansheng Chen

Keywords Paper

0

0

0

0

7:08

06/12/2021

Focal Attention for Long-Range Interactions in Vision Transformers

Jianwei Yang, Chunyuan Li, Pengchuan Zhang and
Xiyang Dai, Bin Xiao, Lu Yuan, Jianfeng Gao

Keywords Paper

machine learning, transformers, vision

0

0

0

0

14:39

14/06/2020

RiFeGAN: Rich Feature Generation for Text-to-Image Synthesis From Prior Knowledge

Jun Cheng, Fuxiang Wu, Yanling Tian and
Lei Wang, Dapeng Tao

Keywords Paper

image synthesis, self-attentional embedding mixture, multi-captions, limited information, caption matching

0

0

0

0

1:01

06/12/2020

Generative Neurosymbolic Machines

Jindong Jiang, Sungjin Ahn

Keywords Paper

0

0

0

0

3:21

14/06/2020

Visual Commonsense R-CNN

Tan Wang, Jianqiang Huang, Hanwang Zhang, Qianru Sun

Keywords Paper

visual commonsense learning, causal inference, un-/self-supervised learning, visual representation learning, vision and language

0

0

0

0

1:01

08/12/2020

Span-based Joint Entity and Relation Extraction with Attention-based Span-specific and Contextual Semantic Representations

Bin Ji, Jie Yu, Shasha Li and
Jun Ma, Qingbo Wu, Yusong Tan, Huijun Liu

Keywords Paper

0

0

0

0

10:13

30/11/2020

Image Captioning through Image Transformer

Sen He, Wentong Liao, Hamed R. Tavakoli and
Michael Yang, Bodo Rosenhahn, Nicolas Pugeault

Keywords Paper

0

0

0

0

9:49

06/12/2020

Learning Semantic-aware Normalization for Generative Adversarial Networks

Heliang Zheng, Jianlong Fu, zengyh Zeng and
Jiebo Luo, Zheng-Jun Zha

Keywords Paper

0

0

0

0

3:11

19/08/2021

Disentangled Face Attribute Editing via Instance-Aware Latent Space Search

Yuxuan Han, Jiaolong Yang, Ying Fu

Keywords Paper

Computer Vision, 2D and 3D Computer Vision, Explainable/Interpretable Machine Learning

0

0

0

0

12:51