Dual-level Collaborative Transformer for Image Captioning

02/02/2021

Dual-level Collaborative Transformer for Image Captioning

Yunpeng Luo, Jiayi Ji, Xiaoshuai Sun, Liujuan Cao, Yongjian Wu, Feiyue Huang, Chia-Wen Lin, Rongrong Ji

Keywords:

Abstract Paper Similar Papers

Abstract: Descriptive region features extracted by object detection networks have played an important role in the recent advancements of image captioning. However, they are still criticized for the lack of contextual information and fine-grained details, which in contrast are the merits of traditional grid features. In this paper, we introduce a novel Dual-Level Collaborative Transformer (DLCT) network to realize the complementary advantages of the two features. Concretely, in DLCT, these two features are first processed by a novel Dual-way Self Attenion (DWSA) to mine their intrinsic properties, where a Comprehensive Relation Attention component is also introduced to embed the geometric information. In addition, we propose a Locality-Constrained Cross Attention module to address the semantic noises caused by the direct fusion of these two features, where a geometric alignment graph is constructed to accurately align and reinforce region and grid features. To validate our model, we conduct extensive experiments on the highly competitive MS-COCO dataset, and achieve new state-of-the-art performance on both local and online test sets, i.e., 133.8% CIDEr on Karpathy split and 135.4% CIDEr on the official split.

The video of this talk cannot be embedded. You can watch it here:

https://slideslive.com/38948007

(Link will open in new window)

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at AAAI 2021 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

14/06/2020

PointGroup: Dual-Set Point Grouping for 3D Instance Segmentation

Li Jiang, Hengshuang Zhao, Shaoshuai Shi and
Shu Liu, Chi-Wing Fu, Jiaya Jia

Keywords Paper

instance segmentation, point cloud, 3d, scene understanding, indoor scenes, bottom-up, grouping, dual-set, scannet, s3dis

0

0

0

0

5:01

02/02/2021

Object-Centric Image Generation from Layouts

Tristan Sylvain, Pengchuan Zhang, Yoshua Bengio and
R Devon Hjelm, Shikhar Sharma

Keywords Paper

0

0

0

0

17:44

14/06/2020

Local Class-Specific and Global Image-Level Generative Adversarial Networks for Semantic-Guided Scene Generation

Hao Tang, Dan Xu, Yan Yan and
Philip H.S. Torr, Nicu Sebe

Keywords Paper

generative adversarial networks, local, global, semantic guided, scene generation, semantic image synthesis, cross-view image generation, class-specific feature representation, attention fusion

0

0

0

0

1:00

14/06/2020

Hierarchical Scene Coordinate Classification and Regression for Visual Localization

Xiaotian Li, Shuzhe Wang, Yi Zhao and
Jakob Verbeek, Juho Kannala

Keywords Paper

visual localization, camera relocalization, scene coordinate regression

0

0

0

0

1:01

02/02/2021

Exploiting Relationship for Complex-scene Image Generation

Tianyu Hua, Hongdong Zheng, Yalong Bai and
Wei Zhang, Xiao-Ping Zhang, Tao Mei

Keywords Paper

0

0

0

0

15:01

14/09/2020

Inductive Unsupervised Domain Adaptation for Few-Shot Classification via Clustering

Xin Cong, Bowen Yu, Tingwen Liu and
Shiyao Cui, Hengzhu Tang, Bin Wang

Keywords Paper

few-shot classification, domain adaptation, clustering

0

0

0

0

13:29

06/12/2021

Knowledge-inspired 3D Scene Graph Prediction in Point Cloud

Shoulong Zhang, shuai li, Aimin Hao, Hong Qin

Keywords Paper

deep learning, graph learning

0

0

0

0

11:20

22/11/2021

Multi-Modality Task Cascade for 3D Object Detection

Jinhyung Park, Xinshuo Weng, Yunze Man, Kris Kitani

Keywords Paper

Multi Modality Learning, Object Detection, Semantic Segmentation

0

0

0

0

3:03

30/11/2020

MLIFeat: Multi-level information fusion based deep local features

Yuyang Zhang Institute of Automation, Chinese Academy of Sciences, University of Chinese Academy of Sciences and
Jinge Wang, Shibiao Xu, Xiao Liu, Xiaopeng Zhang

Keywords Paper

0

0

0

0

5:28

02/02/2021

Similarity Reasoning and Filtration for Image-Text Matching

Haiwen Diao, Ying Zhang, Lin Ma, Huchuan Lu

Keywords Paper

0

0

0

0

16:34

30/11/2020

Local Context Attention for Salient Object Segmentation

Jing Tan Research, Pengfei Xiong Research, Zhengyi Lv Research and
Kuntao Xiao Research, Yuwen He Research

Keywords Paper

0

0

0

0

9:35

05/01/2021

Class-Wise Metric Scaling for Improved Few-Shot Classification

Ge Liu, Linglan Zhao, Wei Li and
Dashan Guo, Xiangzhong Fang

Keywords Paper

0

0

0

0

5:01

14/06/2020

Cross-Domain Detection via Graph-Induced Prototype Alignment

Minghao Xu, Hang Wang, Bingbing Ni and
Qi Tian, Wenjun Zhang

Keywords Paper

cross-domain detection, relation graph, prototype-based domain adaptation, balanced training

0

0

0

0

4:53

02/02/2021

PC-RGNN: Point Cloud Completion and Graph Neural Network for 3D Object Detection

Yanan Zhang, Di Huang, Yunhong Wang

Keywords Paper

0

0

0

0

16:30

14/06/2020

Visual-Semantic Matching by Exploring High-Order Attention and Distraction

Yongzhi Li, Duo Zhang, Yadong Mu

Keywords Paper

visual semantic matching, cross modal retrieval, scene graph, visual distraction, graph matching, gcn

0

0

0

0

1:01

14/06/2020

Deep Image Spatial Transformation for Person Image Generation

Yurui Ren, Xiaoming Yu, Junming Chen and
Thomas H. Li, Ge Li

Keywords Paper

pose transfer, image animation, spatial transformation, local attention, novel view synthesis, pose-guided person image generation

0

0

0

0

1:00

02/02/2021

DIRV: Dense Interaction Region Voting for End-to-End Human-Object Interaction Detection

Hao-Shu Fang, Yichen Xie, Dian Shao, Cewu Lu

Keywords Paper

0

0

0

0

5:11

06/12/2020

Pixel-Level Cycle Association: A New Perspective for Domain Adaptive Semantic Segmentation

Guoliang Kang, Yunchao Wei, Yi Yang and
Yueting Zhuang, Alexander Hauptmann

Keywords Paper

0

0

0

0

3:16

05/01/2021

TranstextNet: Transducing Text for Recognizing Unseen Visual Relationships

Gal S. Kenigsfield, Ran El-Yaniv

Keywords Paper

0

0

0

0

5:00

14/06/2020

RPM-Net: Robust Point Matching Using Learned Features

Zi Jian Yew, Gim Hee Lee

Keywords Paper

point cloud, registration, icp, sinkhorn, robust point matching, deep learning

0

0

0

0

1:01

06/12/2020

Hard Example Generation by Texture Synthesis for Cross-domain Shape Similarity Learning

Huan Fu, Shunming Li, Rongfei Jia and
Mingming Gong, Binqiang Zhao, Dacheng Tao

Keywords Paper

0

0

0

0

3:21

14/06/2020

Self-Supervised Monocular Scene Flow Estimation

Junhwa Hur, Stefan Roth

Keywords Paper

monocular scene flow, self-supervised learning, 3d scene flow, optical flow, monocular depth estimation

0

0

0

0

5:00

02/02/2021

Visualization of Supervised and Self-Supervised Neural Networks via Attribution Guided Factorization

Shir Gur, Ameen Ali, Lior Wolf

Keywords Paper

0

0

0

0

14:14

14/06/2020

Exploring Categorical Regularization for Domain Adaptive Object Detection

Chang-Dong Xu, Xing-Ran Zhao, Xin Jin, Xiu-Shen Wei

Keywords Paper

domain adaptive object detection, image-level categorical regularization, categorical consistency regularization, domain adaptive faster r-cnn

0

0

0

0

1:00

26/04/2020

Neural Outlier Rejection for Self-Supervised Keypoint Learning

Jiexiong Tang, Hanme Kim, Vitor Guizilini and
Sudeep Pillai, Rares Ambrus

Keywords Paper

Self-Supervised Learning, Keypoint Detection, Outlier Rejection, Deep Learning

0

0

0

0

4:55

22/11/2021

Planar Shape Based Registration for Multi-modal Geometry

Muxingzi Li, Florent Lafarge

Keywords Paper

global registration, energy minimization, geometric primitives, point cloud, polygonal mesh

0

0

0

0

3:00

25/07/2020

Pairwise view weighted graph network for view-based 3D model retrieval

Zan Gao, Yin-ming Li, Wei-li Guan and
Wei-zhi Nie, Zhi-yong Cheng, An-an Liu

Keywords Paper

non-local graph network, pairwise network architecture, view weighted layer, view-based 3D model retrieval

0

0

0

0

10:43

14/06/2020

The Edge of Depth: Explicit Constraints Between Segmentation and Depth

Shengjie Zhu, Garrick Brazil, Xiaoming Liu

Keywords Paper

self-supervised depth estimation, semantics segmentation, border consistency, morphing, kitti

0

0

0

0

1:00

06/12/2021

HSVA: Hierarchical Semantic-Visual Adaptation for Zero-Shot Learning

Shiming Chen, Guosen Xie, Yang Liu and
Qinmu Peng, Baigui Sun, Hao Li, Xinge You, Ling Shao

Keywords Paper

generative model, domain adaptation

0

0

0

0

9:19

07/09/2020

Attribute-Guided Image Generation from Layout

Ke Ma, Bo Zhao, Leonid Sigal

Keywords Paper

conditional image generation, GAN

0

0

0

0

9:41

14/06/2020

Bidirectional Graph Reasoning Network for Panoptic Segmentation

Yangxin Wu, Gengwei Zhang, Yiming Gao and
Xiajun Deng, Ke Gong, Xiaodan Liang, Liang Lin

Keywords Paper

panoptic segmentation, graph reasoning, instance segmentation, semantic segmentation

0

0

0

0

1:01

14/06/2020

End-to-End 3D Point Cloud Instance Segmentation Without Detection

Haiyong Jiang, Feilong Yan, Jianfei Cai and
Jianmin Zheng, Jun Xiao

Keywords Paper

3d instance segmentation, stable matching, point cloud, label assignment

0

0

0

0

1:01

14/06/2020

Structure Preserving Generative Cross-Domain Learning

Haifeng Xia, Zhengming Ding

Keywords Paper

cross-domain generation, graph alignment, domain-specific classifiers

0

0

0

0

1:01

14/06/2020

Graph Structured Network for Image-Text Matching

Chunxiao Liu, Zhendong Mao, Tianzhu Zhang and
Hongtao Xie, Bin Wang, Yongdong Zhang

Keywords Paper

image-text matching, graph network, cross-modal, fine-grained correspondence, visual-semantic

0

0

0

0

1:01

03/05/2021

Bowtie Networks: Generative Modeling for Joint Few-Shot Recognition and Novel-View Synthesis

Zhipeng Bao, Yu-Xiong Wang, Martial Hebert

Keywords Paper

adversarial training, computer vision, object recognition, few-shot learning, generative models

0

0

0

0

5:11

04/07/2020

Knowledge Graph-Augmented Abstractive Summarization with Semantic-Driven Cloze Reward

Luyang Huang, Lingfei Wu, Lu Wang

Keywords Paper

Knowledge Summarization, abstractive summarization, semantic interpretation, generation summaries

0

0

0

0

12:01

06/12/2021

Dual Progressive Prototype Network for Generalized Zero-Shot Learning

Chaoqun Wang, Shaobo Min, Xuejin Chen and
Xiaoyan Sun, Houqiang Li

Keywords Paper

0

0

0

0

10:51

30/11/2020

Localin Reshuffle Net: Toward Naturally and Efficiently Facial Image Blending

Chengyao Zheng, Siyu Xia, Joseph Robinson and
Changsheng Lu, Wayne Wu, Chen Qian, Ming Shao

Keywords Paper

0

0

0

0

2:19

30/11/2020

Reconstructing Human Body Mesh from Point Clouds by Adversarial GP Network

Boyao Zhou, Jean-Sebastien Franco, Federica Bogo and
Bugra Tekin, Edmond Boyer

Keywords Paper

0

0

0

0

7:09

06/12/2021

Progressive Coordinate Transforms for Monocular 3D Object Detection

Li Wang, Li Zhang, Yi Zhu and
Zhi Zhang, Tong He, Mu Li, Xiangyang Xue

Keywords Paper

vision

0

0

0

0

13:21