Memory Enhanced Global-Local Aggregation for Video Object Detection

14/06/2020

Memory Enhanced Global-Local Aggregation for Video Object Detection

Yihong Chen, Yue Cao, Han Hu, Liwei Wang

Keywords: video object detection, video analysis, object detection, memory, global-local aggregation

Abstract Paper Similar Papers

Abstract: How do humans recognize an object in a piece of video? Due to the deteriorated quality of single frame, it may be hard for people to identify an occluded object in this frame by just utilizing information within one image. We argue that there are two important cues for humans to recognize objects in videos: the global semantic information and the local localization information. Recently, plenty of methods adopt the self-attention mechanisms to enhance the features in key frame with either global semantic information or local localization information. In this paper we introduce memory enhanced global-local aggregation (MEGA) network, which is among the first trials that takes full consideration of both global and local information. Furthermore, empowered by a novel and carefully-designed Long Range Memory (LRM) module, our proposed MEGA could enable the key frame to get access to much more content than any previous methods. Enhanced by these two sources of information, our method achieves state-of-the-art performance on ImageNet VID dataset. Code is available at https://github.com/Scalsol/mega.pytorch.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at CVPR 2020 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

02/02/2021

CompFeat: Comprehensive Feature Aggregation for Video Instance Segmentation

Yang Fu, Linjie Yang, Ding Liu and
Thomas S. Huang, Humphrey Shi

Keywords Paper

0

0

0

0

16:24

05/01/2021

PDAN: Pyramid Dilated Attention Network for Action Detection

Rui Dai, Srijan Das, Luca Minciullo and
Lorenzo Garattoni, Gianpiero Francesca, Francois Bremond

Keywords Paper

0

0

0

0

5:00

02/02/2021

F2Net: Learning to Focus on the Foreground for Unsupervised Video Object Segmentation

Daizong Liu, Dongdong Yu, Changhu Wang, Pan Zhou

Keywords Paper

0

0

0

0

16:59

30/11/2020

Mask-Ranking Network for Semi-Supervised Video Object Segmentation

Wenjing Li, Xiang Zhang, Yujie Hu, Yingqi Tang

Keywords Paper

0

0

0

0

5:36

19/08/2021

Dig into Multi-modal Cues for Video Retrieval with Hierarchical Alignment

Wenzhe Wang, Mengdan Zhang, Runnan Chen and
Guanyu Cai, Penghao Zhou, Pai Peng, Xiaowei Guo, Jian Wu, Xing Sun

Keywords Paper

Computer Vision, Language and Vision, Deep Learning

0

0

0

0

9:07

22/11/2021

Stacked Temporal Attention: Improving First-person Action Recognition by Emphasizing Discriminative Clips

Lijin Yang, Yifei Huang, Yusuke Sugano, Yoichi Sato

Keywords Paper

Egocentric action recognition, Action recognition, Temporal attention

0

0

0

0

3:01

02/02/2021

MAMBA: Multi-level Aggregation via Memory Bank for Video Object Detection

Guanxiong Sun, Yang Hua, Guosheng Hu, Neil Robertson

Keywords Paper

0

0

0

0

16:48

19/08/2021

Detecting Deepfake Videos with Temporal Dropout 3DCNN

Daichi Zhang, Chenyu Li, Fanzhao Lin and
Dan Zeng, Shiming Ge

Keywords Paper

Computer Vision, Biometrics, Face and Gesture Recognition, Fairness, Surveillance, Manipulation of People

0

0

0

0

8:30

22/11/2021

Temporal Meta-Adaptor for Video Object Detection

Chi Wang, Yang Hua, ZHENG LU and
Jian Gao, Neil Robertson

Keywords Paper

video object detection, temporal aggregation, meta-learning, ImageNet VID

0

0

0

0

6:58

14/06/2020

AdaCoF: Adaptive Collaboration of Flows for Video Frame Interpolation

Hyeongmin Lee, Taeoh Kim, Tae-young Chung and
Daehyun Pak, Yuseok Ban, Sangyoun Lee

Keywords Paper

video frame interpolation, video temporal super-resolution, frame rate up conversion, frame synthesis, motion estimation, motion compensation, frame warping

0

0

0

0

1:01

06/12/2021

Compressed Video Contrastive Learning

Yuqi Huo, Mingyu Ding, Haoyu Lu and
Nanyi Fei, Zhiwu Lu, Ji-Rong Wen, Ping Luo

Keywords Paper

self-supervised learning, contrastive learning, representation learning

0

0

0

0

9:07

04/07/2020

MART: Memory-Augmented Recurrent Transformer for Coherent Video Paragraph Captioning

Jie Lei, Liwei Wang, Yelong Shen and
Dong Yu, Tamara Berg, Mohit Bansal

Keywords Paper

Coherent Captioning, Generating descriptions, captioning tasks, coherent generation

0

0

0

0

10:51

22/11/2021

Knowing What, Where and When to Look: Video Action modelling with Attention

Juan-Manuel Perez-Rua, Brais Martinez, Xiatian Zhu and
Antoine S Toisoul, Victor A Escorcia, Tao Xiang

Keywords Paper

Action recognition, Fine-grained action, video attention, Spatial attention, Channel attention, Temporal attention, Spatio-temporal attention, Feature refinement

0

0

0

0

2:46

22/11/2021

Deep Video Inpainting Detection

Peng Zhou, Ning Yu, Zuxuan Wu and
Larry Davis, Abhinav Shrivastava, Ser-Nam Lim

Keywords Paper

Video Inpainting Detection, Manipulation Detection, DeepFake Detection

0

0

0

0

3:01

07/09/2020

Align-and-Attend Network for Globally and Locally Coherent Video Inpainting

Sanghyun Woo, Dahun Kim, KwanYong Park and
Joon-Young Lee, In So Kweon

Keywords Paper

Video Inpainting, Video Processing, Spatio-Temporal Alignment, Spatio-Temporal Non-local Attention

0

0

0

0

5:17

22/11/2021

Local-Global Associative Frame Assemble in Video Re-ID

Qilei Li, Jiabo Huang, Shaogang Gong

Keywords Paper

video Re-ID, local aligned quality, global appearance correlations, associative frame assemble

0

0

0

0

2:24

02/02/2021

SMART Frame Selection for Action Recognition

Shreyank N Gowda, Marcus Rohrbach, Laura Sevilla-Lara

Keywords Paper

0

0

0

0

14:10

14/06/2020

Attention Mechanism Exploits Temporal Contexts: Real-Time 3D Human Pose Reconstruction

Ruixu Liu, Ju Shen, He Wang and
Chen Chen, Sen-ching Cheung, Vijayan Asari

Keywords Paper

3d human pose, attention mechanism, multi-scale dilation convolution, monocular motion reconstruction

0

0

0

0

5:01

05/01/2021

Coarse Temporal Attention Network (CTA-Net) for Driver's Activity Recognition

Zachary Wharton, Ardhendu Behera, Yonghuai Liu, Nik Bessis

Keywords Paper

0

0

0

0

5:30

14/06/2020

Listen to Look: Action Recognition by Previewing Audio

Ruohan Gao, Tae-Hyun Oh, Kristen Grauman, Lorenzo Torresani

Keywords Paper

action recognition, audio-visual learning, multi-modal learning, cross-modal learning, video understanding

0

0

0

0

1:01

06/12/2021

Deep Contextual Video Compression

Jiahao Li, Bin Li, Yan Lu

Keywords Paper

0

0

0

0

6:33

06/12/2021

Contrast and Mix: Temporal Contrastive Video Domain Adaptation with Background Mixing

Aadarsh Sahoo, Rutav Shah, Rameswar Panda and
Kate Saenko, Abir Das

Keywords Paper

domain adaptation, contrastive learning

0

0

0

0

13:20

03/05/2021

Self-Supervised Learning of Compressed Video Representations

Youngjae Yu, Sangho Lee, Gunhee Kim, Yale Song

Keywords Paper

self-supervised learning, Compressed videos

0

0

0

0

4:34

14/06/2020

DAVD-Net: Deep Audio-Aided Video Decompression of Talking Heads

Xi Zhang, Xiaolin Wu, Xinliang Zhai and
Xianye Ben, Chengjie Tu

Keywords Paper

talking heads, video restoration, audio-video correlations, multimodal fusion, encoder information, constraining projection

0

0

0

0

4:57

22/11/2021

Hierarchical Interaction Network for Video Object Segmentation from Referring Expressions

Zhao Yang, Yansong Tang, Luca Bertinetto and
Hengshuang Zhao, Philip Torr

Keywords Paper

segmentation, video object segmentation, referring segmentation, referring video object segmentation, video object segmentation from referring expressions, referring image segmentation, referring image comprehension, optical flow, visual grounding

0

0

0

0

2:57

05/01/2021

Temporal Context Aggregation for Video Retrieval With Contrastive Learning

Jie Shao, Xin Wen, Bingchen Zhao, Xiangyang Xue

Keywords Paper

0

0

0

0

4:50

02/02/2021

C2F-FWN: Coarse-to-Fine Flow Warping Network for Spatial-Temporal Consistent Motion Transfer

Dongxu Wei, Xiaowei Xu, Haibin Shen, Kejie Huang

Keywords Paper

0

0

0

0

19:42

06/12/2021

Temporal-attentive Covariance Pooling Networks for Video Recognition

Zilin Gao, Qilong Wang, Bingbing Zhang and
Qinghua Hu, Peihua Li

Keywords Paper

0

0

0

1

8:13

02/02/2021

Query-Memory Re-Aggregation for Weakly-supervised Video Object Segmentation

Fanchao Lin, Hongtao Xie, Yan Li, Yongdong Zhang

Keywords Paper

0

0

0

0

14:19

14/06/2020

Classifying, Segmenting, and Tracking Object Instances in Video with Mask Propagation

Gedas Bertasius, Lorenzo Torresani

Keywords Paper

instance segmentation, object detection, object tracking, video analysis.

0

0

0

0

4:59

02/02/2021

Translate the Facial Regions You Like Using Self-Adaptive Region Translation

Wenshuang Liu, Wenting Chen, Zhanjia Yang, Linlin Shen

Keywords Paper

0

0

0

0

14:19

14/06/2020

Spatial-Temporal Graph Convolutional Network for Video-Based Person Re-Identification

Jinrui Yang, Wei-Shi Zheng, Qize Yang and
Ying-Cong Chen, Qi Tian

Keywords Paper

computer vision, person re-identification, graph convolutional networks

0

0

0

0

1:01

22/11/2021

Spatial-Temporal Residual Aggregation for High Resolution Video Inpainting

Vishnu Sanjay Ramiya Srinivasan, Rui Ma, Qiang Tang and
Zili Yi, Zhan Xu

Keywords Paper

high resolution video inpainting, spatial-temporal aggregation, residual aggregation, spatial-temporal attention, image alignment

0

0

0

0

2:58

14/06/2020

SmallBigNet: Integrating Core and Contextual Views for Video Classification

Xianhang Li, Yali Wang, Zhipeng Zhou, Yu Qiao

Keywords Paper

video classification, action recognition, temporal convolution, 3d maxpooling, shared convolution

0

0

0

0

1:00

14/06/2020

Learning Video Object Segmentation From Unlabeled Videos

Xiankai Lu, Wenguan Wang, Jianbing Shen and
Yu-Wing Tai, David J. Crandall, Steven C. H. Hoi

Keywords Paper

unsupervised/weakly supervised vos, four granularity, video pattern learning

0

0

0

0

1:01

02/02/2021

Spatiotemporal Graph Neural Network based Mask Reconstruction for Video Object Segmentation

Daizong Liu, Shuangjie Xu, Xiao-Yang Liu and
Zichuan Xu, Wei Wei, Pan Zhou

Keywords Paper

0

0

0

0

14:42

05/01/2021

DynaVSR: Dynamic Adaptive Blind Video Super-Resolution

Suyoung Lee, Myungsub Choi, Kyoung Mu Lee

Keywords Paper

0

0

0

0

4:56

25/07/2020

3D self-attention for unsupervised video quantization

Jingkuan Song, Ruimin Lang, Xiaosu Zhu and
Xing Xu, Lianli Gao, Heng Tao Shen

Keywords Paper

quantization, video retrieval, ann search

0

0

0

0

9:44

30/11/2020

MLIFeat: Multi-level information fusion based deep local features

Yuyang Zhang Institute of Automation, Chinese Academy of Sciences, University of Chinese Academy of Sciences and
Jinge Wang, Shibiao Xu, Xiao Liu, Xiaopeng Zhang

Keywords Paper

0

0

0

0

5:28

22/11/2021

GTA: Global Temporal Attention for Video Action Understanding

Bo He, Xitong Yang, Zuxuan Wu and
Hao Chen, Ser-Nam Lim, Abhinav Shrivastava

Keywords Paper

action recognition, self-attention, temporal modeling

0

0

0

0

2:55