Geometry-Aware Multi-Task Learning for Binaural Audio Generation from Video

Abstract: Binaural audio provides human listeners with an immersive spatial sound experience, but most existing videos lack binaural audio recordings. We propose an audio spatialization method that draws on visual information in videos to convert their monaural (single-channel) audio to binaural audio. Whereas existing approaches leverage visual features extracted directly from video frames, our approach explicitly disentangles the geometric cues present in the visual stream to guide the learning process. In particular, we develop a multi-task framework that learns geometry-aware features for binaural audio generation by accounting for the underlying room impulse response, the visual stream's coherence with the sound source(s) positions, and the consistency in geometry of the sounding objects over time. Furthermore, we introduce a new large video dataset with realistic binaural audio simulated for real-world scanned environments. On two datasets, we demonstrate the efficacy of our method, which achieves state-of-the-art results.

Geometry-Aware Multi-Task Learning for Binaural Audio Generation from Video

Rishabh Garg, Ruohan Gao, Kristen Grauman

Comments

Similar Papers

Binaural Audio-Visual Localization

Xinyi Wu, Zhenyao Wu, Lili Ju, Song Wang

Keywords Abstract Paper

Self-supervised classification for detecting anomalous sounds

Ritwik Giri, Srikanth V. Tenneti, Fangzhou Cheng and Karim Helwani, Umut Isik, Arvindh Krishnaswamy

Keywords Abstract Paper

Telling Left From Right: Learning Spatial Correspondence of Sight and Sound

Karren Yang, Bryan Russell, Justin Salamon

Keywords Abstract Paper

audio-visual learning in video, self-supervision, video dataset, spatial audio, localization, spatialization, upmixing, source separation

Deep 3D Capture: Geometry and Reflectance From Sparse Multi-View Images

Sai Bi, Zexiang Xu, Kalyan Sunkavalli and David Kriegman, Ravi Ramamoorthi

Keywords Abstract Paper

appearance acquisition, 3d reconstruction, multi-view stereo

Learning to Set Waypoints for Audio-Visual Navigation

Changan Chen, Sagnik Majumder, Ziad Al-Halah and Ruohan Gao, Santhosh Kumar Ramakrishnan, Kristen Grauman

Keywords Abstract Paper

visual navigation, audio visual learning, embodied vision

3D Photography Using Context-Aware Layered Depth Inpainting

Meng-Li Shih, Shih-Yang Su, Johannes Kopf, Jia-Bin Huang

Keywords Abstract Paper

computational photography, novel view synthesis

Making a Case for 3D Convolutions for Object Segmentation in Videos

Sabarinath Mahadevan, Ali Athar, Aljosa Osep and Laura Leal-Taixé, Bastian Leibe, Sebastian Hennen

Keywords Abstract Paper

object tracking, video segmentation, video object segmentation, video scene understanding, object segmentation

NeRF-VAE: A Geometry Aware 3D Scene Generative Model

Adam Kosiorek, Heiko Strathmann, Daniel Zoran and Pol Moreno, Rosalia Schneider, Sona Mokra, Danilo J. Rezende

Keywords Abstract Paper

Deep Learning, Generative Models

DSGN: Deep Stereo Geometry Network for 3D Object Detection

Yilun Chen, Shu Liu, Xiaoyong Shen, Jiaya Jia

Keywords Abstract Paper

3d object detection, autonomous vehicle, stereo matching, depth estimation, kitti, 3d perception, lidar sensor, stereo camera, point cloud

WaveletStereo: Learning Wavelet Coefficients of Disparity Map in Stereo Matching

Menglong Yang, Fangrui Wu, Wei Li

Keywords Abstract Paper

stereo matching, wavelet coefficients, inverse wavelet transform, supervised learning, deep representation, multi-scale features, multi-resolution cost volume, wavelet regression, disparity reconstruction, disparity refinement

Learning Object-Centric Representations of Multi-Object Scenes from Multiple Views

Nanbo Li, Cian Eastwood, Robert Fisher

Keywords Abstract Paper

VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text

Hassan Akbari, Liangzhe Yuan, Rui Qian and Wei-Hong Chuang, Shih-Fu Chang, Yin Cui, Boqing Gong

Keywords Abstract Paper

machine learning, self-supervised learning, transformers, vision, contrastive learning

OccuSeg: Occupancy-Aware 3D Instance Segmentation

Lei Han, Tian Zheng, Lan Xu, Lu Fang

Keywords Abstract Paper

instance segmentation, multi-task learning, occupancy

FlowVOS: Weakly-Supervised Visual Warping for Detail-Preserving and Temporally Consistent Single-Shot Video Object Segmentation

Julia Gong, F. Christopher Holsinger, Serena Yeung

Keywords Abstract Paper

video object segmentation, single shot video object segmentation, segmentation, object tracking, optical flow, motion tracking, visual warping, weak supervision, video analysis, object segmentation

Compressed Video Contrastive Learning

Yuqi Huo, Mingyu Ding, Haoyu Lu and Nanyi Fei, Zhiwu Lu, Ji-Rong Wen, Ping Luo

Keywords Abstract Paper

self-supervised learning, contrastive learning, representation learning

Exploiting Audio-Visual Consistency with Partial Supervision for Spatial Audio Generation

Yan-Bo Lin, Yu-Chiang Frank Wang

Keywords Abstract Paper

ViewSynth: Learning Local Features from Depth using View Synthesis

Jisan Mahmud, Rajat Vikram Singh, Peri Akiva and Spondon Kundu, Kuan-Chuan Peng, Jan-Michael Frahm

Keywords Abstract Paper

viewpoint invariant representation learning, depth representation learning, view synthesis, correspondence learning

PS-Transformer: Learning Sparse Photometric Stereo Network using Self-Attention Mechanism

Satoshi Ikehata

Keywords Abstract Paper

photometric stereo, transformer

Semantically-Guided Representation Learning for Self-Supervised Monocular Depth

Vitor Guizilini, Rui Hou, Jie Li and Rares Ambrus, Adrien Gaidon

Keywords Abstract Paper

computer vision, machine learning, deep learning, monocular depth estimation, self-supervised learning

Attention Mechanism Exploits Temporal Contexts: Real-Time 3D Human Pose Reconstruction

Ruixu Liu, Ju Shen, He Wang and Chen Chen, Sen-ching Cheung, Vijayan Asari

Keywords Abstract Paper

3d human pose, attention mechanism, multi-scale dilation convolution, monocular motion reconstruction

Keywords Paper

Ritwik Giri, Srikanth V. Tenneti, Fangzhou Cheng and
Karim Helwani, Umut Isik, Arvindh Krishnaswamy

Keywords Paper

Keywords Paper

Sai Bi, Zexiang Xu, Kalyan Sunkavalli and
David Kriegman, Ravi Ramamoorthi

Keywords Paper

Changan Chen, Sagnik Majumder, Ziad Al-Halah and
Ruohan Gao, Santhosh Kumar Ramakrishnan, Kristen Grauman

Keywords Paper

Keywords Paper

Sabarinath Mahadevan, Ali Athar, Aljosa Osep and
Laura Leal-Taixé, Bastian Leibe, Sebastian Hennen

Keywords Paper

Adam Kosiorek, Heiko Strathmann, Daniel Zoran and
Pol Moreno, Rosalia Schneider, Sona Mokra, Danilo J. Rezende

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Hassan Akbari, Liangzhe Yuan, Rui Qian and
Wei-Hong Chuang, Shih-Fu Chang, Yin Cui, Boqing Gong

Keywords Paper

Keywords Paper

Keywords Paper

Yuqi Huo, Mingyu Ding, Haoyu Lu and
Nanyi Fei, Zhiwu Lu, Ji-Rong Wen, Ping Luo

Keywords Paper

Keywords Paper

Jisan Mahmud, Rajat Vikram Singh, Peri Akiva and
Spondon Kundu, Kuan-Chuan Peng, Jan-Michael Frahm

Keywords Paper

Keywords Paper

Vitor Guizilini, Rui Hou, Jie Li and
Rares Ambrus, Adrien Gaidon

Keywords Paper

Ruixu Liu, Ju Shen, He Wang and
Chen Chen, Sen-ching Cheung, Vijayan Asari

Keywords Paper

Xuan Luo, Jia-Bin Huang, Richard Szeliski and
Kevin Matzen, Johannes Kopf

Keywords Paper

Keywords Paper

Justus Thies, Michael Zollhöfer, Christian Theobalt and
Marc Stamminger, Matthias Nießner

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Martin Sundermeyer, Maximilian Durner, En Yen Puang and
Zoltan-Csaba Marton, Narunas Vaskevicius, Kai O. Arras, Rudolph Triebel

Keywords Paper

Bin Li, Mu Hu, Shuling Wang and
Lianghao Wang, Xiaojin Gong

Keywords Paper

Keywords Paper

Sunghun Joung, Seungryong Kim, Hanjae Kim and
Minsu Kim, Ig-Jae Kim, Junghyun Cho, Kwanghoon Sohn

Keywords Paper

Keywords Paper

Keywords Paper

Zhao Yang, Yansong Tang, Luca Bertinetto and
Hengshuang Zhao, Philip Torr

Keywords Paper

Xudong Zhang, Yutao Hu, Haochen Wang and
Xianbin Cao, Baochang Zhang

Keywords Paper

Keywords Paper

Keywords Paper