07/09/2020

Mid-level Fusion for End-to-End Temporal Activity Detection in Untrimmed Video

Md Atiqur Rahman, Robert Laganiere

Keywords: temporal activity detection, action detection, untrimmed video processing, single-stage detection

Abstract: In this paper, we address the problem of human activity detection in temporally untrimmed long video sequences, where the goal is to classify and temporally localize each activity instance in the input video. Inspired by the recent success of the single-stage object detection methods, we propose an end-to-end trainable framework capable of learning task-specific spatio-temporal features of a video sequence for direct classification and localization of the activities. We, further, systematically investigate how and where to fuse multi-stream feature representations of a video and propose a new fusion strategy for temporal activity detection. Together with the proposed fusion strategy, the novel architecture sets new state-of-the-art on the highly challenging THUMOS'14 benchmark -- up from 44.2% to 53.9% mAP (an absolute 9.7% improvement).

 0
 0
 0
 0
This is an embedded video. Talk and the respective paper are published at BMVC 2020 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment
no comments yet
code of conduct: tbd

Similar Papers