02/02/2021

Audio-Visual Localization by Synthetic Acoustic Image Generation

Valentina Sanguineti, Pietro Morerio, Alessio Del Bue, Vittorio Murino

Keywords:

Abstract: Acoustic images constitute an emergent data modality for multimodal scene understanding. Such images have the peculiarity to distinguish the spectral signature of sounds coming from different directions in space, thus providing richer information than the one derived from mono and binaural microphones. However, acoustic images are typically generated by cumbersome microphone arrays, which are not as widespread as ordinary microphones mounted on optical cameras. To exploit this empowered modality while using standard microphones and cameras we propose to leverage the generation of synthetic acoustic images from common audio-video data for the task of audio-visual localization. The generation of synthetic acoustic images is obtained by a novel deep architecture, based on Variational Autoencoder and U-Net models, which is trained to reconstruct the ground truth spatialized audio data collected by a microphone array, from the associated video and its corresponding monaural audio signal. Namely, the model learns how to mimic what an array of microphones can produce in the same conditions. We assess the quality of the generated synthetic acoustic images on the task of unsupervised sound source localization in a qualitative and quantitative manner, while also considering standard generation metrics. Our model is evaluated by considering both multimodal datasets containing acoustic images, used for the training, and unseen datasets containing just monaural audio signals and RGB frames, showing to reach more accurate localization results as compared to the state of the art.

The video of this talk cannot be embedded. You can watch it here:
https://slideslive.com/38948807
(Link will open in new window)
 0
 0
 0
 0
This is an embedded video. Talk and the respective paper are published at AAAI 2021 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment
no comments yet
code of conduct: tbd Characters remaining: 140

Similar Papers