02/02/2021

Enhancing Audio-Visual Association with Self-Supervised Curriculum Learning

Jingran Zhang, Xing Xu, Fumin Shen, Huimin Lu, Xin Liu, Heng Tao Shen

Keywords:

Abstract: The recent success of audio-visual representations learning can be largely attributed to their pervasive concurrency property, which can be used as a self-supervision signal and extract correlation information. While most recent works focus on capturing the shared associations between the audio and visual modalities, they rarely consider multiple audio and video pairs at once and pay little attention to exploiting the valuable information within each modality. To tackle this problem, we propose a novel audio-visual representation learning method dubbed self-supervised curriculum learning (SSCL) under the teacher-student learning manner. Specifically, taking advantage of contrastive learning, a two-stage scheme is exploited, which transfers the cross-modal information between teacher and student model as a phased process. The proposed SSCL approach regards the pervasive property of audiovisual concurrency as latent supervision and mutually distills the structure knowledge of visual to audio data. Notably, the SSCL method can learn discriminative audio and visual representations for various downstream applications. Extensive experiments conducted on both action video recognition and audio sound recognition tasks show the remarkably improved performance of the SSCL method compared with the state-of-the-art self-supervised audio-visual representation learning methods.

The video of this talk cannot be embedded. You can watch it here:
https://slideslive.com/38948716
(Link will open in new window)
 0
 0
 0
 0
This is an embedded video. Talk and the respective paper are published at AAAI 2021 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment
no comments yet
code of conduct: tbd

Similar Papers