22/11/2021

Faster-FCoViAR: Faster Frequency-Domain Compressed Video Action Recognition

Lu Xiong, Xia Jia, Yue Ming, Jiangwan Zhou, Fan Feng, Nan nan Hu

Keywords: action recognition, frequency-domain, compressed videos, teacher-student network

Abstract: Human action recognition (HAR) is an essential task in computer vision, which still faces the critical challenge of reducing the data redundancy of decompressed video frames and extracting identification information. To address this challenge, we propose a novel faster frequency-domain compressed video action recognition framework (termed Faster-FCoViAR), which consists of a frequency-domain partial decompression method (FPDec), a frequency-domain channel selection strategy (FCS), and a spatial-to-frequency domain student-teacher network (S2FNet). The FPDec obtains frequency-domain DCT coefficients of compressed videos directly without inverse discrete cosine transform (IDCT) for decompression. The FCS down-samples frequency-domain data to enhance the saliency of input. The S2FNet transfers spatial semantic knowledge from a spatial teacher network to a light-weight student network in the frequency domain, and it thus improves the spatial feature extraction ability of the frequency-domain network. Experiments on datasets UCF-101, HMDB-51, and Kinetics-400 show that our Faster-FCoViAR is 12.3 times faster than the frame-based methods and 6.7 times faster than other compressed domain methods based on competitive recognition accuracy compared with the state-of-the-art action recognition methods.

 0
 0
 0
 0
This is an embedded video. Talk and the respective paper are published at BMVC 2021 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment
no comments yet
code of conduct: tbd Characters remaining: 140

Similar Papers