22/11/2021

Temporal Meta-Adaptor for Video Object Detection

Chi Wang, Yang Hua, ZHENG LU, Jian Gao, Neil Robertson

Keywords: video object detection, temporal aggregation, meta-learning, ImageNet VID

Abstract: Detecting objects in a video can be difficult due to occlusions and motion blur, where the output features are easily deteriorated. Recent state-of-the-art methods propose to enhance the features of the key frame with reference frames using attention modules. However, the feature enhancement uses the features extracted from a fixed backbone. It is fundamentally hard for a fixed backbone to generate discriminative features for the frames of both low and high quality. To mitigate this challenge, in this paper, we present a meta-learning scheme that learns to adapt the backbone using temporal features. Specifically, we propose to summarise the temporal feature into a fixed size representation, which is then used to make the backbone generate adaptively discriminative features for low and high quality frames. We demonstrate that the proposed approach can be easily incorporated into latest temporal aggregation approaches with almost no impact on the inference speed. Experiments on ImageNet VID dataset show a consistent gain over state-of-the-art methods.

 0
 0
 0
 0
This is an embedded video. Talk and the respective paper are published at BMVC 2021 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment
no comments yet
code of conduct: tbd Characters remaining: 140

Similar Papers