Exploiting Audio-Visual Consistency with Partial Supervision for Spatial Audio Generation

02/02/2021

Exploiting Audio-Visual Consistency with Partial Supervision for Spatial Audio Generation

Yan-Bo Lin, Yu-Chiang Frank Wang

Keywords:

Abstract Paper Similar Papers

Abstract: Human perceives rich auditory experience with distinct sound heard by ears. Videos recorded with binaural audio particular simulate how human receives ambient sound. However, a large number of videos are with monaural audio only, which would degrade the user experience due to the lack of ambient information. To address this issue, we propose an audio spatialization framework to convert a monaural video into a binaural one exploiting the relationship across audio and visual components. By preserving the left-right consistency in both audio and visual modalities, our learning strategy can be viewed as a self-supervised learning technique, and alleviates the dependency on a large amount of video data with ground truth binaural audio data during training. Experiments on benchmark datasets confirm the effectiveness of our proposed framework in both semi-supervised and fully supervised scenarios, with ablation studies and visualization further support the use of our model for audio spatialization.

The video of this talk cannot be embedded. You can watch it here:

https://slideslive.com/38949188

(Link will open in new window)

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at AAAI 2021 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

05/01/2021

Boosting Monocular Depth With Panoptic Segmentation Maps

Faraz Saeedan, Stefan Roth

Keywords Paper

0

0

0

0

4:59

19/08/2021

Multi-Scale Selective Feedback Network with Dual Loss for Real Image Denoising

Xiaowan Hu, Yuanhao Cai, Zhihong Liu and
Haoqian Wang, Yulun Zhang

Keywords Paper

Computer Vision, Computational Photography, Photometry, Shape from X, Deep Learning

0

0

0

0

9:52

06/12/2021

Contrastive Learning of Global and Local Video Representations

Shuang Ma, Zhaoyang Zeng, Daniel McDuff, Yale Song

Keywords Paper

machine learning, self-supervised learning, contrastive learning, representation learning

0

0

0

0

15:47

30/11/2020

Do We Need Sound for Sound Source Localization?

Takashi Oya, Shohei Iwase, Ryota Natsume and
Takahiro Itazuri, Shugo Yamaguchi, Shigeo Morishima

Keywords Paper

0

0

0

0

8:43

02/02/2021

Binaural Audio-Visual Localization

Xinyi Wu, Zhenyao Wu, Lili Ju, Song Wang

Keywords Paper

0

0

0

0

13:42

13/04/2021

CLAR: Contrastive learning of auditory representations

Haider Al-Tahan, Yalda Mohsenzadeh

Keywords Paper

0

0

0

0

3:34

06/12/2020

Self-Supervised Learning by Cross-Modal Audio-Video Clustering

Humam Alwassel, Dhruv Mahajan, Bruno Korbar and
Lorenzo Torresani, Bernard Ghanem, Du Tran

Keywords Paper

, Applications -> Computer Vision

0

0

0

0

3:17

03/05/2021

You Only Need Adversarial Supervision for Semantic Image Synthesis

Edgar Schoenfeld, Vadim Sushko, Dan Zhang and
Juergen Gall, Bernt Schiele, Anna Khoreva

Keywords Paper

GANs, Semantic Image Synthesis, Image Generation, Deep Learning

0

0

0

0

5:11

06/12/2020

Self-Adaptive Training: beyond Empirical Risk Minimization

Lang Huang, Chao Zhang, Hongyang Zhang

Keywords Paper

Deep Learning -> Generative Models, Algorithms -> Semi-Supervised Learning

0

0

0

0

3:23

06/12/2021

Data-Efficient Instance Generation from Instance Discrimination

Ceyuan Yang, Yujun Shen, Yinghao Xu, Bolei Zhou

Keywords Paper

machine learning, generative model

0

0

0

0

6:53

18/11/2020

Boosting-based reliable model reuse

Yao-Xiang Ding, Zhi-Hua Zhou

Keywords Paper

1

1

0

0

11:59

03/05/2021

Into the Wild with AudioScope: Unsupervised Audio-Visual Separation of On-Screen Sounds

Efthymios Tzinis, Scott Wisdom, Aren Jansen and
Shawn Hershey, Tal Remez, Dan Ellis, John Hershey

Keywords Paper

self-supervised learning, universal sound separation, in-the-wild data, Audio-visual sound separation, unsupervised learning

0

0

0

0

5:06

14/06/2020

Normal Assisted Stereo Depth Estimation

Uday Kusupati, Shuo Cheng, Rui Chen, Hao Su

Keywords Paper

multi view stereo, 3d vision, deep learning, depth estimation, surface normal estimation, cost volume, cost aggregation, auxiliary supervision

0

0

0

0

1:01

06/12/2020

Self-supervised Co-Training for Video Representation Learning

Tengda Han, Weidi Xie, Andrew Zisserman

Keywords Paper

0

0

0

0

3:08

04/07/2020

Data Manipulation: Towards Effective Instance Learning for Neural Dialogue Generation via Learning to Augment and Reweight

Hengyi Cai, Hongshen Chen, Yonghao Song and
Cheng Zhang, Xiaofang Zhao, Dawei Yin

Keywords Paper

Data Manipulation, Neural Generation, learning, dialogue generation

0

0

0

1

9:39

14/06/2020

Telling Left From Right: Learning Spatial Correspondence of Sight and Sound

Karren Yang, Bryan Russell, Justin Salamon

Keywords Paper

audio-visual learning in video, self-supervision, video dataset, spatial audio, localization, spatialization, upmixing, source separation

0

0

0

0

4:41

03/05/2021

Learning with Instance-Dependent Label Noise: A Sample Sieve Approach

Hao Cheng, Zhaowei Zhu, Xingyu Li and
Yifei Gong, Xing Sun, Yang Liu

Keywords Paper

deep neural networks., instance-based label noise, Learning with noisy labels

0

0

0

0

5:18

06/12/2021

NORESQA: A Framework for Speech Quality Assessment using Non-Matching References

Pranay Manocha, Buye Xu, Anurag Kumar

Keywords Paper

deep learning, robustness, self-supervised learning

0

0

0

0

14:30

02/02/2021

Self-supervised Pre-training and Contrastive Representation Learning for Multiple-choice Video QA

Seonhoon Kim, Seohyeong Jeong, Eunbyul Kim and
Inho Kang, Nojun Kwak

Keywords Paper

0

0

0

0

15:23

06/12/2021

TriBERT: Human-centric Audio-visual Representation Learning

Tanzila Rahman, Mengyu Yang, Leonid Sigal

Keywords Paper

transformers, representation learning

0

0

0

0

13:54

22/11/2021

Fine-grained Multi-Modal Self-Supervised Learning

Duo Wang, Salah Karout

Keywords Paper

self-supervised learning, multi-modal learning

0

0

0

0

2:46

05/01/2021

S-VVAD: Visual Voice Activity Detection by Motion Segmentation

Muhammad Shahid, Cigdem Beyan, Vittorio Murino

Keywords Paper

0

0

0

0

4:56

02/11/2020

Model selection for deep audio source separation via clustering analysis

Alisa Liu, Prem Seetharaman, Bryan Pardo

Keywords Paper

0

0

0

0

12:12

03/05/2021

Cross-Attentional Audio-Visual Fusion for Weakly-Supervised Action Localization

Juntae Lee, Mihir Jain, Hyoungwoo Park, Sungrack Yun

Keywords Paper

Action localization, Multimodal Attention, Audio-Visual, Weak-supervision, Event localization

0

0

0

0

5:11

02/02/2021

Stereopagnosia: Fooling Stereo Networks with Adversarial Perturbations

Alex Wong, Mukund Mundhra, Stefano Soatto

Keywords Paper

0

0

0

0

17:02

06/12/2021

VoiceMixer: Adversarial Voice Style Mixup

Sang-Hoon Lee, Ji-Hoon Kim, Hyunseung Chung, Seong-Whan Lee

Keywords Paper

representation learning

0

0

0

0

10:18

14/06/2020

Distortion Agnostic Deep Watermarking

Xiyang Luo, Ruohan Zhan, Huiwen Chang and
Feng Yang, Peyman Milanfar

Keywords Paper

watermarking, adversarial training, channel coding, steganography, deep learning

0

0

0

0

1:01

06/12/2021

Exploring Cross-Video and Cross-Modality Signals for Weakly-Supervised Audio-Visual Video Parsing

Yan-Bo Lin, Hung-Yu Tseng, Hsin-Ying Lee and
Yen-Yu Lin, Ming-Hsuan Yang

Keywords Paper

0

0

0

0

14:06

02/02/2021

Enhanced Audio Tagging via Multi- to Single-Modal Teacher-Student Mutual Learning

Yifang Yin, Harsh Shrivastava, Ying Zhang and
Zhenguang Liu, Rajiv Ratn Shah, Roger Zimmermann

Keywords Paper

0

0

0

0

14:36

06/12/2021

Open-set Label Noise Can Improve Robustness Against Inherent Label Noise

Hongxin Wei, Lue Tao, RENCHUNZI XIE, Bo An

Keywords Paper

deep learning, robustness

0

0

0

0

2:46

22/11/2021

Geometry-Aware Multi-Task Learning for Binaural Audio Generation from Video

Rishabh Garg, Ruohan Gao, Kristen Grauman

Keywords Paper

Binaural Audio, Audio visual learning

0

0

0

0

9:48

08/12/2020

Multi-task Learning of Spoken Language Understanding by Integrating N-Best Hypotheses with Hierarchical Attention

Mingda Li, Xinyue Liu, Weitong Ruan and
Luca Soldaini, Wael Hamza, Chengwei Su

Keywords Paper

0

0

0

0

14:43

22/11/2021

Alleviating Noisy-label Effects in Image Classification via Probability Transition Matrix

Ziqi Zhang, Yuexiang Li, Hongxin Wei and
Kai Ma, Tao Xu, Yefeng Zheng

Keywords Paper

noisy labels, image classification, instance selection, robust learning, inter-class correlation, soft label, medical image

0

0

0

0

2:52

26/04/2020

High Fidelity Speech Synthesis with Adversarial Networks

Mikołaj Bińkowski, Jeff Donahue, Sander Dieleman and
Aidan Clark, Erich Elsen, Norman Casagrande, Luis C. Cobo, Karen Simonyan

Keywords Paper

texttospeech, speechsynthesis, audiosynthesis, gans, generativeadversarialnetworks, implicitgenerativemodels

0

0

0

0

15:07

06/12/2020

Noise2Same: Optimizing A Self-Supervised Bound for Image Denoising

Yaochen Xie, Zhengyang Wang, Shuiwang Ji

Keywords Paper

0

0

0

0

3:24

03/05/2021

Training GANs with Stronger Augmentations via Contrastive Discriminator

Jongheon Jeong, Jinwoo Shin

Keywords Paper

visual representation learning, contrastive learning, unsupervised learning, data augmentation, generative adversarial networks

0

0

0

0

5:48

16/11/2020

Will I Sound Like Me? Improving Persona Consistency in Dialogues through Pragmatic Self-Consciousness

Hyunwoo Kim, Byeongchang Kim, Gunhee Kim

Keywords Paper

training, dialogue agents, generative agent, persona-based agents

0

0

0

0

11:24

07/09/2020

Unsupervised Monocular Depth Estimation with Multi-Baseline Stereo

Saad Imran, Muhammad Umar Karim Khan, Sikander Mukaram, Chong-Min Kyung

Keywords Paper

Unsupervised Monocular Depth, Small-Baseline, Wide-Baseline, Multi-Baseline, Stereo

0

0

0

0

4:32

02/02/2021

Hierarchical Information Passing Based Noise-Tolerant Hybrid Learning for Semi-Supervised Human Parsing

Yunan Liu, Shanshan Zhang, Jian Yang, PongChi Yuen

Keywords Paper

0

0

0

0

13:22

06/12/2020

Listening to Sounds of Silence for Speech Denoising

Henry Xu, Rundi Wu, Yuko Ishiwaka and
Carl Vondrick, Changxi Zheng

Keywords Paper

0

0

0

0

3:22