Image Captioning with Context-Aware Auxiliary Guidance

02/02/2021

Image Captioning with Context-Aware Auxiliary Guidance

Zeliang Song, Xiaofei Zhou, Zhendong Mao, Jianlong Tan

Keywords:

Abstract Paper Similar Papers

Abstract: Image captioning is a challenging computer vision task, which aims to generate a natural language description of an image. Most recent researches follow the encoder-decoder framework which depends heavily on the previous generated words for the current prediction. Such methods can not effectively take advantage of the future predicted information to learn complete semantics. In this paper, we propose Context-Aware Auxiliary Guidance (CAAG) mechanism that can guide the captioning model to perceive global contexts. Upon the captioning model, CAAG performs semantic attention that selectively concentrates on useful information of the global predictions to reproduce the current generation. To validate the adaptability of the method, we apply CAAG to three popular captioners and our proposal achieves competitive performance on the challenging Microsoft COCO image captioning benchmark, e.g. 132.2 CIDEr-D score on Karpathy split and 130.7 CIDEr-D (c40) score on official online evaluation server.

The video of this talk cannot be embedded. You can watch it here:

https://slideslive.com/38948336

(Link will open in new window)

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at AAAI 2021 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

14/06/2020

Referring Image Segmentation via Cross-Modal Progressive Comprehension

Shaofei Huang, Tianrui Hui, Si Liu and
Guanbin Li, Yunchao Wei, Jizhong Han, Luoqi Liu, Bo Li

Keywords Paper

referring segmentation, progressive comprehension, cross-modal, entity perception, relation-aware reasoning

0

0

0

0

1:01

04/07/2020

Improving Image Captioning with Better Use of Caption

Zhan Shi, Xu Zhou, Xipeng Qiu, Xiaodan Zhu

Keywords Paper

Image Captioning, multimodal problem, natural processing, computer community

0

0

0

0

11:11

26/04/2020

Neural Outlier Rejection for Self-Supervised Keypoint Learning

Jiexiong Tang, Hanme Kim, Vitor Guizilini and
Sudeep Pillai, Rares Ambrus

Keywords Paper

Self-Supervised Learning, Keypoint Detection, Outlier Rejection, Deep Learning

0

0

0

0

4:55

14/06/2020

Attention-Based Context Aware Reasoning for Situation Recognition

Thilini Cooray, Ngai-Man Cheung, Wei Lu

Keywords Paper

situation recognition, visual semantic role labelling, scene understanding, vision and language, action recognition

0

0

0

0

1:00

14/06/2020

Learning Saliency Propagation for Semi-Supervised Instance Segmentation

Yanzhao Zhou, Xin Wang, Jianbin Jiao and
Trevor Darrell, Fisher Yu

Keywords Paper

semi-supervised, instance segmentation, saliency, propagation, message passing, multiple instance learning, partial-supervised, generalization

0

0

0

0

1:01

08/12/2020

Finding the Evidence: Localization-aware Answer Prediction for Text Visual Question Answering

Wei Han, Hantao Huang, Tao Han

Keywords Paper

0

0

0

0

9:44

14/06/2020

Image Search With Text Feedback by Visiolinguistic Attention Learning

Yanbei Chen, Shaogang Gong, Loris Bazzani

Keywords Paper

vision and language, image search, text feedback, attention mechanism, transformer, multimodal learning, representation learning, composition, image retrieval, interactive image search

0

0

0

0

1:00

02/02/2021

Proposal-Free Video Grounding with Contextual Pyramid Network

Kun Li, Dan Guo, Meng Wang

Keywords Paper

0

0

0

0

14:19

30/11/2020

Show, Conceive and Tell: Image Captioning with Prospective Linguistic Information

Yiqing Huang, Jiansheng Chen

Keywords Paper

0

0

0

0

7:08

04/07/2020

Multi-Domain Named Entity Recognition with Genre-Aware and Agnostic Inference

Jing Wang, Mayank Kulkarni, Daniel Preotiuc-Pietro

Keywords Paper

Multi-Domain Recognition, Named recognition, domain models, NER

0

0

0

0

11:46

14/06/2020

Prior Guided GAN Based Semantic Inpainting

Avisek Lahiri, Arnav Kumar Jain, Sanskar Agrawal and
Pabitra Mitra, Prabir Kumar Biswas

Keywords Paper

semantic inpainting, generative adversarial networks, video inpainting, facial keypoints, generative models

0

0

0

0

1:01

06/12/2021

Detecting Moments and Highlights in Videos via Natural Language Queries

Jie Lei, Tamara L Berg, Mohit Bansal

Keywords Paper

transformers

0

0

0

0

13:12

16/11/2020

Discriminative Nearest Neighbor Few-Shot Intent Detection by Transferring Natural Language Inference

Jianguo Zhang, Kazuma Hashimoto, Wenhao Liu and
Chien-Sheng Wu, Yao Wan, Philip Yu, Richard Socher, Caiming Xiong

Keywords Paper

intent detection, detecting intents, oos detection, large-scale task

0

0

0

0

11:43

19/04/2021

‘just because you are right, doesn’t mean I am wrong’: Overcoming a bottleneck in development and evaluation of open-ended VQA tasks

Man Luo, Shailaja Keyur Sampat, Riley Tallman and
Yankai Zeng, Manuha Vancha, Akarshan Sajja, Chitta Baral

Keywords Paper

0

0

0

0

7:10

30/11/2020

MLIFeat: Multi-level information fusion based deep local features

Yuyang Zhang Institute of Automation, Chinese Academy of Sciences, University of Chinese Academy of Sciences and
Jinge Wang, Shibiao Xu, Xiao Liu, Xiaopeng Zhang

Keywords Paper

0

0

0

0

5:28

30/11/2020

Rotation Axis Focused Attention Network (RAFA-Net) for Estimating Head Pose

Ardhendu Behera, Zachary Wharton, Pradeep Hewage, Swagat Kumar

Keywords Paper

0

0

0

0

10:19

19/08/2021

Proposal-free One-stage Referring Expression via Grid-Word Cross-Attention

Wei Suo, MengYang Sun, Peng Wang, Qi Wu

Keywords Paper

Computer Vision, Language and Vision, Structural and Model-Based Approaches, Knowledge Representation and Reasoning

0

0

0

0

17:31

14/06/2020

Webly Supervised Knowledge Embedding Model for Visual Reasoning

Wenbo Zheng, Lan Yan, Chao Gou, Fei-Yue Wang

Keywords Paper

visual reasoning, webly supervised learning

0

0

0

0

1:01

02/02/2021

Context-Guided Adaptive Network for Efficient Human Pose Estimation

Lei Zhao, Jun Wen, Pengfei Wang, Nenggan Zheng

Keywords Paper

0

0

0

0

13:47

14/06/2020

Exploring Categorical Regularization for Domain Adaptive Object Detection

Chang-Dong Xu, Xing-Ran Zhao, Xin Jin, Xiu-Shen Wei

Keywords Paper

domain adaptive object detection, image-level categorical regularization, categorical consistency regularization, domain adaptive faster r-cnn

0

0

0

0

1:00

14/06/2020

A Real-Time Cross-Modality Correlation Filtering Method for Referring Expression Comprehension

Yue Liao, Si Liu, Guanbin Li and
Fei Wang, Yanjie Chen, Chen Qian, Bo Li

Keywords Paper

referring expression comprehension, cross modality, correlation filtering, real-time, one stage

0

0

0

0

1:00

19/08/2021

Dependent Multi-Task Learning with Causal Intervention for Image Captioning

Wenqing Chen, Jidong Tian, Caoyun Fan and
Hao He, Yaohui Jin

Keywords Paper

Machine Learning, Transfer, Adaptation, Multi-task Learning, Natural Language Generation, Language and Vision

0

0

0

0

12:02

14/06/2020

Context Prior for Scene Segmentation

Changqian Yu, Jingbo Wang, Changxin Gao and
Gang Yu, Chunhua Shen, Nong Sang

Keywords Paper

semantic segmentation, scene segmentation, context prior, context aggregation, affinity loss, affinity matrix

0

0

0

0

1:01

12/07/2020

On Variational Learning of Controllable Representations for Text without Supervision

Peng Xu, Jackie Chi Kit Cheung, Yanshuai Cao

Keywords Paper

Representation Learning

0

0

0

0

14:51

14/06/2020

Vec2Face: Unveil Human Faces From Their Blackbox Features in Face Recognition

Chi Nhan Duong, Thanh-Dat Truong, Khoa Luu and
Kha Gia Quach, Hung Bui, Kaushik Roy

Keywords Paper

generative models, bijective metric learning, blackbox face matcher, distillation framework, face synthesis, id preservation, feature-conditional structure, feature reconstruction, dibigan.

0

0

0

0

5:03

14/06/2020

Cross-Modal Cross-Domain Moment Alignment Network for Person Search

Ya Jing, Wei Wang, Liang Wang, Tieniu Tan

Keywords Paper

cross-domain adaptation, text-based person search, moment alignment network, cross-modal retrieval, unsupervised learning

0

0

0

0

1:01

05/01/2021

DORi: Discovering Object Relationships for Moment Localization of a Natural Language Query in a Video

Cristian Rodriguez-Opazo, Edison Marrese-Taylor, Basura Fernando and
Hongdong Li, Stephen Gould

Keywords Paper

0

0

0

0

5:02

02/02/2021

MANGO: A Mask Attention Guided One-Stage Scene Text Spotter

Liang Qiao, Ying Chen, Zhanzhan Cheng and
Yunlu Xu, Yi Niu, Shiliang Pu, Fei Wu

Keywords Paper

0

0

0

0

16:32

14/06/2020

Iterative Answer Prediction With Pointer-Augmented Multimodal Transformers for TextVQA

Ronghang Hu, Amanpreet Singh, Trevor Darrell, Marcus Rohrbach

Keywords Paper

textvqa, visual question answering, vqa, vision and language, st-vqa, ocr-vqa, transformer, pointer network, ocr

0

0

0

0

4:56

05/01/2021

ChartOCR: Data Extraction From Charts Images via a Deep Hybrid Framework

Junyu Luo, Zekun Li, Jinpeng Wang, Chin-Yew Lin

Keywords Paper

0

0

0

0

4:58

26/04/2020

TabFact: A Large-scale Dataset for Table-based Fact Verification

Wenhu Chen, Hongmin Wang, Jianshu Chen and
Yunkai Zhang, Hong Wang, Shiyang Li, Xiyou Zhou, William Yang Wang

Keywords Paper

Fact Verification, Tabular Data, Symbolic Reasoning

0

0

0

0

5:49

05/12/2020

Point-of-interest oriented question answering with joint inference of semantic matching and distance correlation

Yifei Yuan, Jingbo Zhou, Wai Lam

Keywords Paper

0

0

0

0

13:14

16/11/2020

T3: Tree-Autoencoder Constrained Adversarial Text Generation for Targeted Attack

Boxin Wang, Hengzhi Pei, Boyuan Pan and
Qian Chen, Shuohang Wang, Bo Li

Keywords Paper

adversarial generation, nlp tasks, sentiment analysis, qa

0

0

0

0

11:59

14/06/2020

Probability Weighted Compact Feature for Domain Adaptive Retrieval

Fuxiang Huang, Lei Zhang, Yang Yang, Xichuan Zhou

Keywords Paper

domain adaptive retrieval, bayesian formulation, learning to hash, transfer learning, focal-triplet loss, histogram feature of neighbors

0

0

0

0

1:03

18/07/2021

Decoupling Representation Learning from Reinforcement Learning

Adam Stooke, Kimin Lee, Pieter Abbeel, Michael Laskin

Keywords Paper

Optimization, Submodular Optimization, Algorithms, Bandit Algorithms; Algorithms, Online Learning, Deep Learning, Embedding and Representation learning

0

0

0

0

5:15

30/11/2020

Adaptive Spotting: Deep Reinforcement Object Search in 3D Point Clouds

Onkar Krishna, Go Irie, Xiaomeng Wu and
Takahito Kawanishi, Kunio Kashino

Keywords Paper

0

0

0

0

6:58

19/08/2021

Generating Senses and RoLes: An End-to-End Model for Dependency- and Span-based Semantic Role Labeling

Rexhina Blloshmi, Simone Conia, Rocco Tripodi, Roberto Navigli

Keywords Paper

Natural Language Processing, Natural Language Semantics, Natural Language Generation, Natural Language Processing

0

0

0

0

15:18

03/05/2021

Loss Function Discovery for Object Detection via Convergence-Simulation Driven Search

Peidong Liu, Gengwei Zhang, Bochao Wang and
Hang Xu, Xiaodan Liang, Yong Jiang, Zhenguo Li

Keywords Paper

AutoML, Loss function search, Evolutionary algorithm, Object detection

0

0

0

0

5:15

02/02/2021

Encoder-Decoder Based Unified Semantic Role Labeling with Label-Aware Syntax

Hao Fei, Fei Li, Bobo Li, Donghong Ji

Keywords Paper

0

0

0

0

16:10

05/01/2021

TranstextNet: Transducing Text for Recognizing Unseen Visual Relationships

Gal S. Kenigsfield, Ran El-Yaniv

Keywords Paper

0

0

0

0

5:00