22/11/2021

Audio-Visual Speech Super-Resolution

Rudrabha Mukhopadhyay, Sindhu B Hegde, Vinay Namboodiri, C.V. Jawahar

Keywords: speech super-resolution, audio-visual data, audio-visual learning, pseudo-visual stream, multi-modal learning

Abstract: In this paper, we present an audio-visual model to perform speech super-resolution at large scale-factors (8x and 16x). Previous works attempted to solve this problem using only the audio modality as input, and thus were limited to low scale-factors of 2x and 4x. In contrast, we propose to incorporate both visual and auditory signals to super-resolve speech of sampling rates as low as 1kHz. In such challenging situations, the visual features assist in learning the content, and improves the quality of the generated speech. Further, we demonstrate the applicability of our approach to arbitrary speech signals where the visual stream is not accessible. Our "pseudo-visual network" precisely synthesizes the visual stream solely from the low-resolution speech input. Extensive experiments illustrate our method's remarkable results and benefits over state-of-the-art audio-only speech super-resolution approaches. Our project website can be found at http://cvit.iiit.ac.in/research/projects/cvit-projects/audio-visual-speech-super-resolution.

 0
 0
 0
 0
This is an embedded video. Talk and the respective paper are published at BMVC 2021 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment
no comments yet
code of conduct: tbd Characters remaining: 140

Similar Papers