Dual-stream Network for Visual Recognition

06/12/2021

Dual-stream Network for Visual Recognition

Mingyuan Mao, peng gao, Renrui Zhang, Honghui Zheng, Teli Ma, Yan Peng, Errui Ding, Baochang Zhang, Shumin Han

Keywords: machine learning, transformers, vision

Abstract Paper Similar Papers

Abstract: Transformers with remarkable global representation capacities achieve competitive results for visual tasks, but fail to consider high-level local pattern information in input images. In this paper, we present a generic Dual-stream Network (DS-Net) to fully explore the representation capacity of local and global pattern features for image classification. Our DS-Net can simultaneously calculate fine-grained and integrated features and efficiently fuse them. Specifically, we propose an Intra-scale Propagation module to process two different resolutions in each block and an Inter-Scale Alignment module to perform information interaction across features at dual scales. Besides, we also design a Dual-stream FPN (DS-FPN) to further enhance contextual information for downstream dense predictions. Without bells and whistles, the proposed DS-Net outperforms DeiT-Small by 2.4\% in terms of top-1 accuracy on ImageNet-1k and achieves state-of-the-art performance over other Vision Transformers and ResNets. For object detection and instance segmentation, DS-Net-Small respectively outperforms ResNet-50 by 6.4\% and 5.5 \% in terms of mAP on MSCOCO 2017, and surpasses the previous state-of-the-art scheme, which significantly demonstrates its potential to be a general backbone in vision tasks. The code will be released soon.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at NeurIPS 2021 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

06/12/2021

Focal Attention for Long-Range Interactions in Vision Transformers

Jianwei Yang, Chunyuan Li, Pengchuan Zhang and
Xiyang Dai, Bin Xiao, Lu Yuan, Jianfeng Gao

Keywords Paper

machine learning, transformers, vision

0

0

0

0

14:39

06/12/2021

ViTAE: Vision Transformer Advanced by Exploring Intrinsic Inductive Bias

Yufei Xu, Qiming ZHANG, Jing Zhang, Dacheng Tao

Keywords Paper

machine learning, transformers, vision

0

0

0

0

10:16

26/04/2020

Deep 3D Pan via Local adaptive "t-shaped" convolutions with global and local adaptive dilations

Juan Luis Gonzalez Bello, Munchurl Kim

Keywords Paper

Deep learning, Stereoscopic view synthesis, Monocular depth, Deep 3D Pan

0

0

0

0

5:01

06/12/2021

Container: Context Aggregation Networks

peng gao, Jiasen Lu, hongsheng Li and
Roozbeh Mottaghi, Aniruddha Kembhavi

Keywords Paper

deep learning, self-supervised learning, transformers, vision, language

0

0

0

0

8:50

06/12/2021

XCiT: Cross-Covariance Image Transformers

Alaaeldin Ali, Hugo Touvron, Mathilde Caron and
Piotr Bojanowski, Matthijs Douze, Armand Joulin, Ivan Laptev, Natalia Neverova, Gabriel Synnaeve, Jakob Verbeek, Herve Jegou

Keywords Paper

deep learning, machine learning, transformers, vision, language

0

0

0

0

13:15

02/02/2021

Patch-Wise Attention Network for Monocular Depth Estimation

Sihaeng Lee, Janghyeon Lee, Byungju Kim and
Eojindl Yi, Junmo Kim

Keywords Paper

0

0

0

0

14:15

14/06/2020

Hierarchical Scene Coordinate Classification and Regression for Visual Localization

Xiaotian Li, Shuzhe Wang, Yi Zhao and
Jakob Verbeek, Juho Kannala

Keywords Paper

visual localization, camera relocalization, scene coordinate regression

0

0

0

0

1:01

30/11/2020

Attention-Aware Feature Aggregation for Real-time Stereo Matching on Edge Devices

Jia-Ren Chang National Chiao Tung University, aetherAI, Pei-Chun Chang, Yong-Sheng Chen

Keywords Paper

0

0

0

0

9:53

06/12/2021

Intriguing Properties of Vision Transformers

Muhammad Muzammal Naseer, Kanchana Ranasinghe, Salman H Khan and
Munawar Hayat, Fahad Shahbaz Khan, Ming-Hsuan Yang

Keywords Paper

deep learning, machine learning, robustness, transformers, vision, few shot learning

0

0

0

0

12:32

06/12/2021

Space-time Mixing Attention for Video Transformer

Adrian Bulat, Juan Manuel Perez Rua, Swathikiran Sudhakaran and
Brais Martinez, Georgios Tzimiropoulos

Keywords Paper

transformers

0

0

0

0

10:25

06/12/2021

Transformer in Transformer

Kai Han, An Xiao, Enhua Wu and
Jianyuan Guo, Chunjing XU, Yunhe Wang

Keywords Paper

transformers, vision

0

0

0

0

11:24

14/06/2020

RoutedFusion: Learning Real-Time Depth Map Fusion

Silvan Weder, Johannes Schönberger, Marc Pollefeys, Martin R. Oswald

Keywords Paper

depth map fusion, online 3d reconstruction, deep learning, real-time applications, 3d geometry

0

0

0

0

5:00

07/09/2020

Align-and-Attend Network for Globally and Locally Coherent Video Inpainting

Sanghyun Woo, Dahun Kim, KwanYong Park and
Joon-Young Lee, In So Kweon

Keywords Paper

Video Inpainting, Video Processing, Spatio-Temporal Alignment, Spatio-Temporal Non-local Attention

0

0

0

0

5:17

03/05/2021

Large Scale Image Completion via Co-Modulated Generative Adversarial Networks

Shengyu Zhao, Jonathan Cui, Yilun Sheng and
Yue Dong, Xiao Liang, Eric Chang, Yan Xu

Keywords Paper

co-modulation, image completion, generative adversarial networks

0

0

0

0

10:10

06/12/2021

Glance-and-Gaze Vision Transformer

Qihang Yu, Yingda Xia, Yutong Bai and
Yongyi Lu, Alan Yuille, Wei Shen

Keywords Paper

deep learning, machine learning, transformers, vision

0

0

0

0

13:20

02/02/2021

ASHF-Net: Adaptive Sampling and Hierarchical Folding Network for Robust Point Cloud Completion

Daoming Zong, Shiliang Sun, Jing Zhao

Keywords Paper

0

0

0

0

16:49

06/12/2021

Long-Short Transformer: Efficient Transformers for Language and Vision

Chen Zhu, Wei Ping, Chaowei Xiao and
Mohammad Shoeybi, Tom Goldstein, Anima Anandkumar, Bryan Catanzaro

Keywords Paper

machine learning, transformers

0

0

0

0

11:44

22/11/2021

Multi-Modality Task Cascade for 3D Object Detection

Jinhyung Park, Xinshuo Weng, Yunze Man, Kris Kitani

Keywords Paper

Multi Modality Learning, Object Detection, Semantic Segmentation

0

0

0

0

3:03

06/12/2021

TransGAN: Two Pure Transformers Can Make One Strong GAN, and That Can Scale Up

Yifan Jiang, Shiyu Chang, Zhangyang Wang

Keywords Paper

machine learning, transformers, vision, generative model

0

0

0

0

3:44

06/12/2021

Not All Images are Worth 16x16 Words: Dynamic Transformers for Efficient Image Recognition

Yulin Wang, Rui Huang, Shiji Song and
Zeyi Huang, Gao Huang

Keywords Paper

transformers

0

0

0

0

7:20

06/12/2021

TransMatcher: Deep Image Matching Through Transformers for Generalizable Person Re-identification

Shengcai Liao, Ling Shao

Keywords Paper

machine learning, transformers, vision, domain adaptation, representation learning

0

0

0

0

10:43

07/09/2020

Making a Case for 3D Convolutions for Object Segmentation in Videos

Sabarinath Mahadevan, Ali Athar, Aljosa Osep and
Laura Leal-Taixé, Bastian Leibe, Sebastian Hennen

Keywords Paper

object tracking, video segmentation, video object segmentation, video scene understanding, object segmentation

0

0

0

0

8:16

22/11/2021

IB-MVS: An Iterative Algorithm for Deep Multi-View Stereo based on Binary Decisions

Christian Sormann, Mattia Rossi, Andreas Kuhn, Friedrich Fraundorfer

Keywords Paper

multi-view stereo, mvs, iterative algorithm, binary decisions, deep multi-view stereo, deep mvs, depth estimation, 3d reconstruction

0

0

0

0

3:01

06/12/2021

Efficient Training of Visual Transformers with Small Datasets

Yahui Liu, Enver Sangineto, Wei Bi and
Nicu Sebe, Bruno Lepri, Marco Nadai

Keywords Paper

robustness, transformers, vision

0

0

0

0

8:23

14/06/2020

MaskFlownet: Asymmetric Feature Matching With Learnable Occlusion Mask

Shengyu Zhao, Yilun Sheng, Yue Dong and
Eric I-Chao Chang, Yan Xu

Keywords Paper

optical flow, occlusion, mask, asymmetricity, feature matching, warping

0

0

0

0

5:00

14/06/2020

Plug-and-Play Algorithms for Large-Scale Snapshot Compressive Imaging

Xin Yuan, Yang Liu, Jinli Suo, Qionghai Dai

Keywords Paper

snapshot compressive image, plug-and-play, large-scale, video compressive sensing, convergence, coded aperture compresive temporal imaging (cacti), gap, admm, real data

0

0

0

0

5:01

14/06/2020

HVNet: Hybrid Voxel Network for LiDAR Based 3D Object Detection

Maosheng Ye, Shuangjie Xu, Tongyi Cao

Keywords Paper

hybrid voxel network, hybird voxel feature encoding, 3d object detection, autonomous driving, lidar based methods, hybrid scales voxelization, attentive voxel feature encoding, feature fusion pyramid network

0

0

0

0

1:00

06/12/2021

MLP-Mixer: An all-MLP Architecture for Vision

Ilya Tolstikhin, Neil Houlsby, Alexander Kolesnikov and
Lucas Beyer, Xiaohua Zhai, Thomas Unterthiner, Jessica Yung, Andreas Steiner, Daniel Keysers, Jakob Uszkoreit, Mario Lucic, Alexey Dosovitskiy

Keywords Paper

deep learning, machine learning, transformers, vision, transfer learning

0

0

0

0

11:18

03/05/2021

CT-Net: Channel Tensorization Network for Video Classification

Kunchang Li, xianhang li, Yali Wang and
Jun Wang, Yu Qiao

Keywords Paper

3D Convolution, Video Classification, Channel Tensorization

0

0

0

0

4:59

06/12/2021

Global Filter Networks for Image Classification

Yongming Rao, Wenliang Zhao, Zheng Zhu and
Jiwen Lu, Jie Zhou

Keywords Paper

machine learning, robustness, transformers, vision

0

0

0

0

9:28

19/08/2021

Local Representation is Not Enough: Soft Point-Wise Transformer for Descriptor and Detector of Local Features

Zihao Wang, Xueyi Li, Zhen Li

Keywords Paper

Computer Vision, 2D and 3D Computer Vision, Recognition

0

0

0

0

14:56

02/02/2021

Explicitly Modeled Attention Maps for Image Classification

Andong Tan, Duc Tam Nguyen, Maximilian Dax and
Matthias Nießner, Thomas Brox

Keywords Paper

0

0

0

0

16:59

22/11/2021

Feature Fusion Vision Transformer for Fine-Grained Visual Categorization

Jun Wang, Xiaohan Yu, Yongsheng Gao

Keywords Paper

Fine-grained visual categorization, Vision transformer, Self-attention, Feature Fusion

0

0

0

0

3:02

14/06/2020

Learning Depth-Guided Convolutions for Monocular 3D Object Detection

Mingyu Ding, Yuqi Huo, Hongwei Yi and
Zhe Wang, Jianping Shi, Zhiwu Lu, Ping Luo

Keywords Paper

monocular, 3d object detection, depth-guided, dynamic local convolution

0

0

0

0

1:01

22/11/2021

SwinFGHash: Fine-grained Image Retrieval via Transformer-based Hashing Network

Di Lu, Jinpeng Wang, Ziyun Zeng and
Bin Chen, Shudeng Wu, Shu-Tao Xia

Keywords Paper

Image Retrieval, Deep Hashing, Fine-grained, Transformer

0

0

0

0

2:57

14/06/2020

What You See is What You Get: Exploiting Visibility for 3D Object Detection

Peiyun Hu, Jason Ziglar, David Held, Deva Ramanan

Keywords Paper

freespace reasoning, 3d object detection, lidar processing, autonomous driving

0

0

0

0

5:01

06/12/2021

Associating Objects with Transformers for Video Object Segmentation

Zongxin Yang, Yunchao Wei, Yi Yang

Keywords Paper

transformers

0

0

0

0

12:29

06/12/2021

NeuS: Learning Neural Implicit Surfaces by Volume Rendering for Multi-view Reconstruction

Peng Wang, Lingjie Liu, Yuan Liu and
Christian Theobalt, Taku Komura, Wenping Wang

Keywords Paper

optimization, robustness

0

0

0

0

11:21

05/01/2021

The Devil Is in the Boundary: Exploiting Boundary Representation for Basis-Based Instance Segmentation

Myungchul Kim, Sanghyun Woo, Dahun Kim, In So Kweon

Keywords Paper

0

0

0

0

4:47

22/11/2021

PS-Transformer: Learning Sparse Photometric Stereo Network using Self-Attention Mechanism

Satoshi Ikehata

Keywords Paper

photometric stereo, transformer

0

0

0

0

2:56