22/11/2021

Back to the Future: Cycle Encoding Prediction for Self-supervised Video Representation Learning

Xinyu Yang, Majid Mirmehdi, Tilo Burghardt

Keywords: unsupervised learning, self-supervised learning, video self-supervised learning, contrastive learning, representation learning, cycle consistency, temporal prediction, action recognition

Abstract: In this paper, we show that learning video feature spaces in which temporal cycles are maximally predictable benefits action classification. In particular, we propose a novel learning approach, Cycle Encoding Prediction (CEP), that is able to effectively represent the high-level spatio-temporal structure of unlabelled video content. CEP builds a latent space wherein the concept of closed forward-backwards, as well as backwards-forward, temporal loops is approximately preserved. As a self-supervision signal, CEP leverages the bi-directional temporal coherence of entire video snippets and applies loss functions that encourage both temporal cycle closure and contrastive feature separation. Architecturally, the underpinning network architecture utilises a single feature encoder for all input videos, adding two predictive modules that learn temporal forward and backward transitions. We apply our framework for pretext training of networks for action recognition tasks and report significantly improved results for the standard datasets UCF101 and HMDB51.

 0
 0
 0
 0
This is an embedded video. Talk and the respective paper are published at BMVC 2021 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment
no comments yet
code of conduct: tbd Characters remaining: 140

Similar Papers