Multi-Granularity Reference-Aided Attentive Feature Aggregation for Video-Based Person Re-Identification

14/06/2020

Multi-Granularity Reference-Aided Attentive Feature Aggregation for Video-Based Person Re-Identification

Zhizheng Zhang, Cuiling Lan, Wenjun Zeng, Zhibo Chen

Keywords: multi-granularity attention, video person re-identification, attentive feature aggregation, reference-aided attention, feature relations

Abstract Paper Similar Papers

Abstract: Video-based person re-identification (reID) aims at matching the same person across video clips. It is a challenging task due to the existence of redundancy among frames, newly revealed appearance, occlusion, and motion blurs. In this paper, we propose an attentive feature aggregation module, namely Multi-Granularity Reference-aided Attentive Feature Aggregation (MG-RAFA), to delicately aggregate spatio-temporal features into a discriminative video-level feature representation. In order to determine the contribution/importance of a spatial-temporal feature node, we propose to learn the attention from a global view with convolutional operations. Specifically, we stack its relations, \ieno, pairwise correlations with respect to a representative set of reference feature nodes (S-RFNs) that represents global video information, together with the feature itself to infer the attention. Moreover, to exploit the semantics of different levels, we propose to learn multi-granularity attentions based on the relations captured at different granularities. Extensive ablation studies demonstrate the effectiveness of our attentive feature aggregation module MG-RAFA. Our framework achieves the state-of-the-art performance on three benchmark datasets.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at CVPR 2020 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

14/06/2020

Cross-Modality Person Re-Identification With Shared-Specific Feature Transfer

Yan Lu, Yue Wu, Bin Liu and
Tianzhu Zhang, Baopu Li, Qi Chu, Nenghai Yu

Keywords Paper

person re-identification, cross modality

0

0

0

0

0:56

22/11/2021

GTA: Global Temporal Attention for Video Action Understanding

Bo He, Xitong Yang, Zuxuan Wu and
Hao Chen, Ser-Nam Lim, Abhinav Shrivastava

Keywords Paper

action recognition, self-attention, temporal modeling

0

0

0

0

2:55

06/12/2021

End-to-end Multi-modal Video Temporal Grounding

Yi-Wen Chen, Yi-Hsuan Tsai, Ming-Hsuan Yang

Keywords Paper

self-supervised learning, transformers, vision, contrastive learning

0

0

0

0

8:46

04/07/2020

Dense-Caption Matching and Frame-Selection Gating for Temporal Localization in VideoQA

Hyounghun Kim, Zineng Tang, Mohit Bansal

Keywords Paper

Dense-Caption Matching, Temporal VideoQA, answering questions, frame problem

0

0

0

0

10:56

06/12/2020

CoADNet: Collaborative Aggregation-and-Distribution Networks for Co-Salient Object Detection

Qijian Zhang, Runmin Cong, Junhui Hou and
Chongyi Li, Yao Zhao

Keywords Paper

, Theory -> Learning Theory

0

0

0

0

3:14

14/06/2020

Relation-Aware Global Attention for Person Re-Identification

Zhizheng Zhang, Cuiling Lan, Wenjun Zeng and
Xin Jin, Zhibo Chen

Keywords Paper

relation-aware global attention, attention mechanism, person re-identification, feature relations, global structural information

0

0

0

0

1:01

05/01/2021

Alleviating Over-Segmentation Errors by Detecting Action Boundaries

Yuchi Ishikawa, Seito Kasai, Yoshimitsu Aoki, Hirokatsu Kataoka

Keywords Paper

0

0

0

0

4:48

06/12/2020

COOT: Cooperative Hierarchical Transformer for Video-Text Representation Learning

Simon Ging, Mohammadreza Zolfaghari, Hamed Pirsiavash, Thomas Brox

Keywords Paper

0

0

0

0

3:16

14/06/2020

ActBERT: Learning Global-Local Video-Text Representations

Linchao Zhu, Yi Yang

Keywords Paper

actbert, cross-modal pretraining, video and language, transformer, tangled transformer, instructional videos

0

0

0

0

4:58

14/06/2020

Video Instance Segmentation Tracking With a Modified VAE Architecture

Chung-Ching Lin, Ying Hung, Rogerio Feris, Linglin He

Keywords Paper

video instance segmentation, video object tracking, variational autoencoder, vae, gaussian process, multi-task learning

0

0

0

0

1:01

14/06/2020

Image Search With Text Feedback by Visiolinguistic Attention Learning

Yanbei Chen, Shaogang Gong, Loris Bazzani

Keywords Paper

vision and language, image search, text feedback, attention mechanism, transformer, multimodal learning, representation learning, composition, image retrieval, interactive image search

0

0

0

0

1:00

06/12/2021

Temporal-attentive Covariance Pooling Networks for Video Recognition

Zilin Gao, Qilong Wang, Bingbing Zhang and
Qinghua Hu, Peihua Li

Keywords Paper

0

0

0

1

8:13

05/01/2021

Coarse Temporal Attention Network (CTA-Net) for Driver's Activity Recognition

Zachary Wharton, Ardhendu Behera, Yonghuai Liu, Nik Bessis

Keywords Paper

0

0

0

0

5:30

22/11/2021

CTRN: Class-Temporal Relational Network for Action Detection

Rui Dai, Srijan Das, Francois Bremond

Keywords Paper

action detection, graph reasoning, graph convolutional network, temporal modelling, multi-label classification

0

0

0

0

7:02

30/11/2020

Transforming Multi-Concept Attention into Video Summarization

Yen-Ting Liu, Yu-Jhe Li, Yu-Chiang Frank Wang

Keywords Paper

0

0

0

0

7:07

14/06/2020

Learning Selective Self-Mutual Attention for RGB-D Saliency Detection

Nian Liu, Ni Zhang, Junwei Han

Keywords Paper

rgb-d saliency detection, middle fusion, self-attention, mutual-attention, non-local network, two-stream cnn

0

0

0

0

1:01

05/01/2021

PDAN: Pyramid Dilated Attention Network for Action Detection

Rui Dai, Srijan Das, Luca Minciullo and
Lorenzo Garattoni, Gianpiero Francesca, Francois Bremond

Keywords Paper

0

0

0

0

5:00

25/07/2020

3D self-attention for unsupervised video quantization

Jingkuan Song, Ruimin Lang, Xiaosu Zhu and
Xing Xu, Lianli Gao, Heng Tao Shen

Keywords Paper

quantization, video retrieval, ann search

0

0

0

0

9:44

02/02/2021

CompFeat: Comprehensive Feature Aggregation for Video Instance Segmentation

Yang Fu, Linjie Yang, Ding Liu and
Thomas S. Huang, Humphrey Shi

Keywords Paper

0

0

0

0

16:24

30/11/2020

Mask-Ranking Network for Semi-Supervised Video Object Segmentation

Wenjing Li, Xiang Zhang, Yujie Hu, Yingqi Tang

Keywords Paper

0

0

0

0

5:36

07/09/2020

MagnifierNet: Towards Semantic Adversary and Fusion for Person Re-identification

Yushi Lan, Yuan Liu, Xinchi Zhou and
Tian Maoqing, Xuesen Zhang, Shuai Yi, Hongsheng Li

Keywords Paper

person re-identification, adversarial samples, metric learning, multi-task learning, image retrieval

0

0

0

0

5:58

06/12/2021

Look at What I’m Doing: Self-Supervised Spatial Grounding of Narrations in Instructional Videos

Reuben Tan, Bryan Plummer, Kate Saenko and
Hailin Jin, Bryan Russell

Keywords Paper

optimization

0

0

0

0

12:28

30/11/2020

Play Fair: Frame Contributions in Video Models

Will Price, Dima Damen

Keywords Paper

0

0

0

0

9:03

02/02/2021

Proposal-Free Video Grounding with Contextual Pyramid Network

Kun Li, Dan Guo, Meng Wang

Keywords Paper

0

0

0

0

14:19

14/06/2020

TEA: Temporal Excitation and Aggregation for Action Recognition

Yan Li, Bin Ji, Xintian Shi and
Jianguo Zhang, Bin Kang, Limin Wang

Keywords Paper

action recognition, temporal modeling, motion encoding, temporal aggregation

0

0

0

0

1:01

02/02/2021

Semantic Grouping Network for Video Captioning

Hobin Ryu, Sunghun Kang, Haeyong Kang, Chang D. Yoo

Keywords Paper

0

0

0

0

17:41

02/02/2021

Learning Visual Context for Group Activity Recognition

Hangjie Yuan, Dong Ni

Keywords Paper

0

0

0

0

16:54

19/08/2021

Step-Wise Hierarchical Alignment Network for Image-Text Matching

Zhong Ji, Kexin Chen, Haoran Wang

Keywords Paper

Computer Vision, Language and Vision

0

0

0

0

6:07

02/02/2021

Arbitrary Video Style Transfer via Multi-Channel Correlation

Yingying Deng, Fan Tang, Weiming Dong and
Haibin Huang, Chongyang Ma, Changsheng Xu

Keywords Paper

0

0

0

0

14:55

05/01/2021

Cross-Domain Latent Modulation for Variational Transfer Learning

Jinyong Hou, Jeremiah D. Deng, Stephen Cranefield, Xuejie Ding

Keywords Paper

0

0

0

0

4:52

06/12/2021

SIMONe: View-Invariant, Temporally-Abstracted Object Representations via Unsupervised Video Decomposition

Rishabh Kabra, Daniel Zoran, Goker Erdogan and
Loic Matthey, Antonia Creswell, Matt Botvinick, Alexander Lerchner, Chris Burgess

Keywords Paper

self-supervised learning

0

0

0

0

14:42

22/11/2021

Re-ID-AR: Improved Person Re-identification in Videovia Joint Weakly Supervised Action Recognition

Aishah Alsehaim, Toby P Breckon

Keywords Paper

person Re-ID

0

0

0

0

2:57

14/06/2020

Video Super-Resolution With Temporal Group Attention

Takashi Isobe, Songjiang Li, Xu Jia and
Shanxin Yuan, Gregory Slabaugh, Chunjing Xu, Ya-Li Li, Shengjin Wang, Qi Tian

Keywords Paper

video processing, video super-resolution

0

0

0

0

1:00

19/08/2021

Dig into Multi-modal Cues for Video Retrieval with Hierarchical Alignment

Wenzhe Wang, Mengdan Zhang, Runnan Chen and
Guanyu Cai, Penghao Zhou, Pai Peng, Xiaowei Guo, Jian Wu, Xing Sun

Keywords Paper

Computer Vision, Language and Vision, Deep Learning

0

0

0

0

9:07

22/11/2021

Hierarchical Interaction Network for Video Object Segmentation from Referring Expressions

Zhao Yang, Yansong Tang, Luca Bertinetto and
Hengshuang Zhao, Philip Torr

Keywords Paper

segmentation, video object segmentation, referring segmentation, referring video object segmentation, video object segmentation from referring expressions, referring image segmentation, referring image comprehension, optical flow, visual grounding

0

0

0

0

2:57

22/11/2021

Paying Attention to Varying Receptive Fields: Object Detection with Atrous Filters and Vision Transformers

Arthur Jian Shun Lam, Jun Yi Lim, Ricky Sutopo, Vishnu Monn Baskaran

Keywords Paper

object detection, atrous convolution, vision transformers, attention mechanism

0

0

0

0

3:01

03/05/2021

Cross-Attentional Audio-Visual Fusion for Weakly-Supervised Action Localization

Juntae Lee, Mihir Jain, Hyoungwoo Park, Sungrack Yun

Keywords Paper

Action localization, Multimodal Attention, Audio-Visual, Weak-supervision, Event localization

0

0

0

0

5:11

14/06/2020

Syntax-Aware Action Targeting for Video Captioning

Qi Zheng, Chaoyue Wang, Dacheng Tao

Keywords Paper

video and language, video captioning, action predicting

0

0

0

0

1:01

06/12/2021

Contrast and Mix: Temporal Contrastive Video Domain Adaptation with Background Mixing

Aadarsh Sahoo, Rutav Shah, Rameswar Panda and
Kate Saenko, Abir Das

Keywords Paper

domain adaptation, contrastive learning

0

0

0

0

13:20

22/11/2021

Space-Time Memory Network for Sounding Object Localization in Videos

Sizhe Li, Yapeng Tian, Chenliang Xu

Keywords Paper

Sounding object Localization, Space-Time Memory Network, Audio-Visual

0

0

0

0

2:57