FETNet: Feature Exchange Transformer Network for RGB-D Object Detection

22/11/2021

FETNet: Feature Exchange Transformer Network for RGB-D Object Detection

Zhibin Xiao, Jing-Hao Xue, Pengwei Xie, Guijin Wang

Keywords: RGB-D object detection, Multi-modal Fusion, Vision Transformer, Feature Exchange

Abstract Paper Similar Papers

Abstract: In RGB-D object detection, due to the inherent difference between the RGB and Depth modalities, it remains challenging to simultaneously leverage sensed photometric and depth information. In this paper, to address this issue, we propose a Feature Exchange Transformer Network (FETNet), which consists of two well-designed components: the Feature Exchange Module (FEM), and the Multi-modal Vision Transformer (MViT). Specially, we propose the FEM to exchange part of the channels between RGB and depth features at each backbone stage, which facilitates the information flow, and bridges the gap, between the two modalities. Inspired by the success of Vision Transformer (ViT), we develop the variant MViT to effectively fuse multi-modal features and exploit the attention between the RGB and depth features. Different from previous methods developing from specified RGB detection algorithm, our proposal is generic. Extensive experiments prove that, when the proposed modules are integrated into mainstream RGB object detection methods, their RGB-D counterparts can obtain significant performance gains. Moreover, our FETNet surpasses state-of-the-art RGB-D detectors by 7.0% mAP on SUN RGB-D and 1.7% mAP on NYU Depth v2, which also well demonstrates the effectiveness of the proposed method.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at BMVC 2021 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

22/11/2021

Paying Attention to Varying Receptive Fields: Object Detection with Atrous Filters and Vision Transformers

Arthur Jian Shun Lam, Jun Yi Lim, Ricky Sutopo, Vishnu Monn Baskaran

Keywords Paper

object detection, atrous convolution, vision transformers, attention mechanism

0

0

0

0

3:01

30/11/2020

RGB-D Co-attention Network for Semantic Segmentation

Hao Zhou, Lu Qi, Zhaoliang Wan and
Hai Huang, Xu Yang

Keywords Paper

0

0

0

0

8:50

14/06/2020

A2dele: Adaptive and Attentive Depth Distiller for Efficient RGB-D Salient Object Detection

Yongri Piao, Zhengkun Rong, Miao Zhang and
Weisong Ren, Huchuan Lu

Keywords Paper

rgb-d, salient object dection, knowledge distillation, attention, computer vision, cnn

0

0

0

0

1:00

30/11/2020

Low-light Color Imaging via Dual Camera Acquisition

Peiyao Guo, Zhan Ma

Keywords Paper

0

0

0

0

7:28

14/06/2020

Learning Fused Pixel and Feature-Based View Reconstructions for Light Fields

Jinglei Shi, Xiaoran Jiang, Christine Guillemot

Keywords Paper

light field, view synthesis, feature-based reconstruction, pixel-based reconstruction, deep learning, angular super-resolution

0

0

0

0

4:56

14/06/2020

Cross-Domain Semantic Segmentation via Domain-Invariant Interactive Relation Transfer

Fengmao Lv, Tao Liang, Xiang Chen, Guosheng Lin

Keywords Paper

domain adaptation, semantic segmentation, transfer learning, weakly-supervised segmentation.

0

0

0

0

1:01

06/12/2021

A Shading-Guided Generative Implicit Model for Shape-Accurate 3D-Aware Image Synthesis

Xingang Pan, Xudong XU, Chen Change Loy and
Christian Theobalt, Bo Dai

Keywords Paper

generative model

0

0

0

0

11:15

14/06/2020

Structure-Preserving Super Resolution With Gradient Guidance

Cheng Ma, Yongming Rao, Yean Cheng and
Ce Chen, Jiwen Lu, Jie Zhou

Keywords Paper

super resolution, image restoration, image enhancement, structure preserving, generative model, generative adversarial network, gan, deep-learning

0

0

0

0

1:01

30/11/2020

D2D: Keypoint Extraction with Describe to Detect Approach

Yurun Tian, Vassileios Balntas, Tony Ng and
Axel Barroso-Laguna, Yiannis Demiris, Krystian Mikolajczyk

Keywords Paper

0

0

0

0

4:34

30/11/2020

Mask-Ranking Network for Semi-Supervised Video Object Segmentation

Wenjing Li, Xiang Zhang, Yujie Hu, Yingqi Tang

Keywords Paper

0

0

0

0

5:36

22/11/2021

Multi-Modality Task Cascade for 3D Object Detection

Jinhyung Park, Xinshuo Weng, Yunze Man, Kris Kitani

Keywords Paper

Multi Modality Learning, Object Detection, Semantic Segmentation

0

0

0

0

3:03

14/06/2020

Deep 3D Capture: Geometry and Reflectance From Sparse Multi-View Images

Sai Bi, Zexiang Xu, Kalyan Sunkavalli and
David Kriegman, Ravi Ramamoorthi

Keywords Paper

appearance acquisition, 3d reconstruction, multi-view stereo

0

0

0

0

1:01

14/06/2020

Normalizing Flows With Multi-Scale Autoregressive Priors

Apratim Bhattacharyya, Shweta Mahajan, Mario Fritz and
Bernt Schiele, Stefan Roth

Keywords Paper

generative models, normalizing flows, autoregressive models, exact inference, image synthesis

0

0

0

0

1:00

30/11/2020

MLIFeat: Multi-level information fusion based deep local features

Yuyang Zhang Institute of Automation, Chinese Academy of Sciences, University of Chinese Academy of Sciences and
Jinge Wang, Shibiao Xu, Xiao Liu, Xiaopeng Zhang

Keywords Paper

0

0

0

0

5:28

14/06/2020

Harmonizing Transferability and Discriminability for Adapting Object Detectors

Chaoqi Chen, Zebiao Zheng, Xinghao Ding and
Yue Huang, Qi Dou

Keywords Paper

unsupervised domain adaptation, cross-domain object detection, transfer learning, deep learning, hierarchical transferability calibration

0

0

0

0

1:01

06/12/2021

Focal Attention for Long-Range Interactions in Vision Transformers

Jianwei Yang, Chunyuan Li, Pengchuan Zhang and
Xiyang Dai, Bin Xiao, Lu Yuan, Jianfeng Gao

Keywords Paper

machine learning, transformers, vision

0

0

0

0

14:39

14/06/2020

What You See is What You Get: Exploiting Visibility for 3D Object Detection

Peiyun Hu, Jason Ziglar, David Held, Deva Ramanan

Keywords Paper

freespace reasoning, 3d object detection, lidar processing, autonomous driving

0

0

0

0

5:01

19/08/2021

Modality-aware Style Adaptation for RGB-Infrared Person Re-Identification

Ziling Miao, Hong Liu, Wei Shi and
Wanlu Xu, Hanrong Ye

Keywords Paper

Computer Vision, Recognition, Learning Generative Models, Transfer, Adaptation, Multi-task Learning

0

0

0

0

12:18

14/06/2020

Normalized and Geometry-Aware Self-Attention Network for Image Captioning

Longteng Guo, Jing Liu, Xinxin Zhu and
Peng Yao, Shichen Lu, Hanqing Lu

Keywords Paper

image captioning, self-attention, transformer, vision and language, vqa, video captioning, machine translation, ng-san, san, sa

0

0

0

0

1:01

30/11/2020

Jointly Discriminating and Frequent Visual Representation Mining

Qiannan Wang, Ying Zhou, ZhaoYan Zhu and
Xuefeng Liang, Yu Gu

Keywords Paper

0

0

0

0

8:13

22/11/2021

Image-Text Alignment using Adaptive Cross-attention with Transformer Encoder for Scene Graphs

Juyong Song, Sunghyun Choi

Keywords Paper

cross-attention, multi-modal, retrieval, scene-graphs, graph neural networks, contrastive loss

0

0

0

0

3:01

14/06/2020

A U-Net Based Discriminator for Generative Adversarial Networks

Edgar Schönfeld, Bernt Schiele, Anna Khoreva

Keywords Paper

gan, image synthesis, u-net, discriminator, consistency regularization, equivariance, generative adversarial networks, ffhq, biggan

0

0

0

0

1:01

06/12/2021

Learning Generative Vision Transformer with Energy-Based Latent Space for Saliency Prediction

Jing Zhang, Jianwen Xie, Nick Barnes, Ping Li

Keywords Paper

transformers, vision, generative model

0

0

0

0

12:02

02/02/2021

Understanding Deformable Alignment in Video Super-Resolution

Kelvin C.K. Chan, Xintao Wang, Ke Yu and
Chao Dong, Chen Change Loy

Keywords Paper

0

0

0

0

14:15

22/11/2021

Hierarchical Interaction Network for Video Object Segmentation from Referring Expressions

Zhao Yang, Yansong Tang, Luca Bertinetto and
Hengshuang Zhao, Philip Torr

Keywords Paper

segmentation, video object segmentation, referring segmentation, referring video object segmentation, video object segmentation from referring expressions, referring image segmentation, referring image comprehension, optical flow, visual grounding

0

0

0

0

2:57

14/06/2020

Cross-Spectral Face Hallucination via Disentangling Independent Factors

Boyan Duan, Chaoyou Fu, Yi Li and
Xingguang Song, Ran He

Keywords Paper

cross-spectral hallucination, heterogeneous face recognition, face alignment

0

0

0

0

1:01

05/01/2021

Seeing Through Your Skin: Recognizing Objects With a Novel Visuotactile Sensor

Francois R. Hogan, Michael Jenkin, Sahand Rezaei-Shoshtari and
Yogesh Girdhar, David Meger, Gregory Dudek

Keywords Paper

0

0

0

0

4:56

14/06/2020

Extreme Relative Pose Network Under Hybrid Representations

Zhenpei Yang, Siming Yan, Qixing Huang

Keywords Paper

relative pose estimation, rgb-d registration, few shot reconstruction, hybrid representation

0

0

0

0

4:56

06/12/2021

Sparse Steerable Convolutions: An Efficient Learning of SE(3)-Equivariant Features for Estimation and Tracking of Object Poses in 3D Space

Jiehong Lin, Hongyang Li, Ke Chen and
Jiangbo Lu, Kui Jia

Keywords Paper

vision

0

0

0

0

12:29

18/07/2021

Generative Adversarial Transformers

Drew A. Hudson, Larry Zitnick

Keywords Paper

Deep Learning, Architectures

0

0

0

0

5:15

14/06/2020

Squeeze-and-Attention Networks for Semantic Segmentation

Zilong Zhong, Zhong Qiu Lin, Rene Bidart and
Xiaodan Hu, Ibrahim Ben Daya, Zhifeng Li, Wei-Shi Zheng, Jonathan Li, Alexander Wong

Keywords Paper

semantic segmentation, squeeze-and-attention, pixel grouping

0

0

0

0

1:01

02/02/2021

Object-Centric Image Generation from Layouts

Tristan Sylvain, Pengchuan Zhang, Yoshua Bengio and
R Devon Hjelm, Shikhar Sharma

Keywords Paper

0

0

0

0

17:44

14/06/2020

Hierarchical Scene Coordinate Classification and Regression for Visual Localization

Xiaotian Li, Shuzhe Wang, Yi Zhao and
Jakob Verbeek, Juho Kannala

Keywords Paper

visual localization, camera relocalization, scene coordinate regression

0

0

0

0

1:01

02/02/2021

Context-aware Attentional Pooling (CAP) for Fine-grained Visual Classification

Ardhendu Behera, Zachary Wharton, Pradeep R P G Hewage, Asish Bera

Keywords Paper

0

0

0

0

18:54

30/11/2020

Adaptive Spatio-Temporal Regularized Correlation Filters for UAV-based Tracking

Libin Xu, Qilei Li, Jun Jiang and
Guofeng Zou, Zheng Liu, Mingliang Gao

Keywords Paper

0

0

0

0

7:39

14/06/2020

Neuromorphic Camera Guided High Dynamic Range Imaging

Jin Han, Chu Zhou, Peiqi Duan and
Yehui Tang, Chang Xu, Chao Xu, Tiejun Huang, Boxin Shi

Keywords Paper

high dynamic range imaging, neuromorphic camera, hybrid camera, image fusion, image reconstruction

0

0

0

0

1:01

14/06/2020

Stylization-Based Architecture for Fast Deep Exemplar Colorization

Zhongyou Xu, Tingting Wang, Faming Fang and
Yun Sheng, Guixu Zhang

Keywords Paper

colorization, two-subnets, adain, stylization, fast

0

0

0

0

0:58

30/11/2020

Transforming Multi-Concept Attention into Video Summarization

Yen-Ting Liu, Yu-Jhe Li, Yu-Chiang Frank Wang

Keywords Paper

0

0

0

0

7:07

02/02/2021

Patch-Wise Attention Network for Monocular Depth Estimation

Sihaeng Lee, Janghyeon Lee, Byungju Kim and
Eojindl Yi, Junmo Kim

Keywords Paper

0

0

0

0

14:15

07/09/2020

EPI-based Oriented Relation Networks for Light Field Depth Estimation

Kunyuan Li, Jun Zhang, Rui Sun and
Xudong Zhang, Jun Gao

Keywords Paper

depth estimation, light field, relation modeling, epipolar plane image, refocusing, Siamese network

0

0

0

0

9:01