Visually Guided Sound Source Separation using Cascaded Opponent Filter Network

30/11/2020

Visually Guided Sound Source Separation using Cascaded Opponent Filter Network

Lingyu Zhu, Esa Rahtu

Keywords:

Abstract Paper Similar Papers

Abstract: The objective of this paper is to recover the original component signals from a mixture audio with the aid of visual cues of the sound sources. Such task is usually referred as visually guided sound source separation. The proposed Cascaded Opponent Filter (COF) framework consists of multiple stages, which recursively refine the source separation. A key element in COF is a novel opponent filter module that identifies and relocates residual components between sources. The system is guided by the appearance and motion of the source, and, for this purpose, we study different representations based on video frames, optical flows, dynamic images, and their combinations. Finally, we propose a Sound Source Location Masking (SSLM) technique, which, together with COF, produces a pixel level mask of the source location. The entire system is trained in an end-to-end manner using a large set of unlabelled videos. We compare COF with recent baselines and obtain the state-of-the-art performance in three challenging datasets (MUSIC, A-MUSIC, and A-NATURAL).

The video of this talk cannot be embedded. You can watch it here:

https://accv2020.github.io/miniconf/poster_293.html

(Link will open in new window)

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at ACCV 2020 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

30/11/2020

MIX'EM: Unsupervised Image Classification using a Mixture of Embeddings

Ali Varamesh, Tinne Tuytelaars

Keywords Paper

0

0

0

0

6:40

25/07/2020

3D self-attention for unsupervised video quantization

Jingkuan Song, Ruimin Lang, Xiaosu Zhu and
Xing Xu, Lianli Gao, Heng Tao Shen

Keywords Paper

quantization, video retrieval, ann search

0

0

0

0

9:44

26/04/2020

Image-guided Neural Object Rendering

Justus Thies, Michael Zollhöfer, Christian Theobalt and
Marc Stamminger, Matthias Nießner

Keywords Paper

Neural Rendering, Neural Image Synthesis

0

0

0

0

4:41

30/11/2020

Mask-Ranking Network for Semi-Supervised Video Object Segmentation

Wenjing Li, Xiang Zhang, Yujie Hu, Yingqi Tang

Keywords Paper

0

0

0

0

5:36

06/12/2021

ViSER: Video-Specific Surface Embeddings for Articulated 3D Shape Reconstruction

Gengshan Yang, Deqing Sun, Varun Jampani and
Daniel Vlasic, Forrester Cole, Ce Liu, Deva Ramanan

Keywords Paper

0

0

0

0

10:42

06/12/2021

End-to-end Multi-modal Video Temporal Grounding

Yi-Wen Chen, Yi-Hsuan Tsai, Ming-Hsuan Yang

Keywords Paper

self-supervised learning, transformers, vision, contrastive learning

0

0

0

0

8:46

03/05/2021

Cross-Attentional Audio-Visual Fusion for Weakly-Supervised Action Localization

Juntae Lee, Mihir Jain, Hyoungwoo Park, Sungrack Yun

Keywords Paper

Action localization, Multimodal Attention, Audio-Visual, Weak-supervision, Event localization

0

0

0

0

5:11

08/12/2020

Mixup-Transformer: Dynamic Data Augmentation for NLP Tasks

Lichao Sun, Congying Xia, Wenpeng Yin and
Tingting Liang, Philip Yu, Lifang He

Keywords Paper

0

0

0

0

9:52

30/11/2020

Robust High Dynamic Range (HDR) Imaging with Complex Motion and Parallax

Zhiyuan Pu, Peiyao Guo, M. Salman Asif, Zhan Ma

Keywords Paper

0

0

0

0

7:38

22/11/2021

V3GAN: Decomposing Background, Foreground and Motion for Video Generation

Arti Keshari, Sonam Gupta, Sukhendu Das

Keywords Paper

video generation, unconditional video generation, shuffling loss, feature level masking, unsupervised learning, GAN, foreground, background, motion decomposition

0

0

0

0

3:02

22/11/2021

Segmenting Invisible Moving Objects

Hala Lamdouar, Weidi Xie, Andrew Zisserman

Keywords Paper

synthetic data generation, motion segmentation, amodal segmentation, video camouflage breaking, self-attention

0

0

0

0

3:05

14/06/2020

LiDAR-Based Online 3D Video Object Detection With Graph-Based Message Passing and Spatiotemporal Transformer Attention

Junbo Yin, Jianbing Shen, Chenye Guan and
Dingfu Zhou, Ruigang Yang

Keywords Paper

3d object detection, point cloud, video, graph, attention, autonomous driving

0

0

0

0

1:02

14/06/2020

Deep Adversarial Decomposition: A Unified Framework for Separating Superimposed Images

Zhengxia Zou, Sen Lei, Tianyang Shi and
Zhenwei Shi, Jieping Ye

Keywords Paper

superimposed image separation, adversarial training, separation-critic, image deraining, photo reflection removal, image shadow removal

0

0

0

0

0:59

22/11/2021

A Design of Contractive Appearance Flow for Photometric Stereo

Lixiong Chen, Victor Adrian Prisacariu

Keywords Paper

photometric stereo, reflectance analysis

0

0

0

0

3:11

18/07/2021

SoundDet: Polyphonic Moving Sound Event Detection and Localization from Raw Waveform

Yuhang He, Niki Trigoni, Andrew Markham

Keywords Paper

Applications, Audio and Speech Processing

0

0

0

0

4:34

14/06/2020

Telling Left From Right: Learning Spatial Correspondence of Sight and Sound

Karren Yang, Bryan Russell, Justin Salamon

Keywords Paper

audio-visual learning in video, self-supervision, video dataset, spatial audio, localization, spatialization, upmixing, source separation

0

0

0

0

4:41

06/12/2021

Non-local Latent Relation Distillation for Self-Adaptive 3D Human Pose Estimation

Jogendra Nath Kundu, Siddharth Seth, Anirudh Jamkhandi and
Pradyumna YM, Varun Jampani, Anirban Chakraborty, Venkatesh Babu R

Keywords Paper

vision, domain adaptation

0

0

0

0

14:55

14/06/2020

Video Instance Segmentation Tracking With a Modified VAE Architecture

Chung-Ching Lin, Ying Hung, Rogerio Feris, Linglin He

Keywords Paper

video instance segmentation, video object tracking, variational autoencoder, vae, gaussian process, multi-task learning

0

0

0

0

1:01

14/06/2020

Learning Fused Pixel and Feature-Based View Reconstructions for Light Fields

Jinglei Shi, Xiaoran Jiang, Christine Guillemot

Keywords Paper

light field, view synthesis, feature-based reconstruction, pixel-based reconstruction, deep learning, angular super-resolution

0

0

0

0

4:56

22/11/2021

LARNet: Latent Action Representation for Human Action Synthesis

Naman Biyani, Aayush Jung Bahadur Rana, Shruti Vyas, Yogesh Rawat

Keywords Paper

action synthesis, video synthesis, joint generative model, human action generation, end-to-end learning, conditional video generation

0

0

0

0

3:02

22/11/2021

Monocular Arbitrary Moving Object Discovery and Segmentation

Michal Neoral, Jan Sochman, Jiri Matas

Keywords Paper

motion segmentation, instance motion segmentation

0

0

0

0

2:55

14/06/2020

Cross-Modality Person Re-Identification With Shared-Specific Feature Transfer

Yan Lu, Yue Wu, Bin Liu and
Tianzhu Zhang, Baopu Li, Qi Chu, Nenghai Yu

Keywords Paper

person re-identification, cross modality

0

0

0

0

0:56

02/02/2021

Learning Intact Features by Erasing-Inpainting for Few-shot Classification

Junjie Li, Zilei Wang, Xiaoming Hu

Keywords Paper

0

0

0

0

15:15

03/05/2021

Fully Unsupervised Diversity Denoising with Convolutional Variational Autoencoders

Mangal Prakash, Alexander Krull, Florian Jug

Keywords Paper

Variational Autoencoders, Noise model, Unsupervised denoising, Diversity denoising

0

0

0

0

4:56

02/02/2021

Proposal-Free Video Grounding with Contextual Pyramid Network

Kun Li, Dan Guo, Meng Wang

Keywords Paper

0

0

0

0

14:19

02/02/2021

Semantic Grouping Network for Video Captioning

Hobin Ryu, Sunghun Kang, Haeyong Kang, Chang D. Yoo

Keywords Paper

0

0

0

0

17:41

06/12/2021

TransformerFusion: Monocular RGB Scene Reconstruction using Transformers

Aljaz Bozic, Pablo Palafox, Justus Thies and
Angela Dai, Matthias Niessner

Keywords Paper

transformers

0

0

0

0

7:14

22/11/2021

Deep Video Decaptioning

Pengpeng Chu, Weize Quan, Tong Wang and
Pan Wang, Peiran Ren, Dong-Ming Yan

Keywords Paper

video decaptioning, caption mask extraction, frame attention, real time

0

0

0

0

2:59

22/11/2021

Space-Time Memory Network for Sounding Object Localization in Videos

Sizhe Li, Yapeng Tian, Chenliang Xu

Keywords Paper

Sounding object Localization, Space-Time Memory Network, Audio-Visual

0

0

0

0

2:57

06/12/2021

SNIPS: Solving Noisy Inverse Problems Stochastically

Bahjat Kawar, Gregory Vaksman, Michael Elad

Keywords Paper

0

0

0

0

12:27

22/11/2021

Audio-Visual Synchronisation in the wild

Triantafyllos Afouras, Honglie Chen, Weidi Xie and
Arsha Nagrani, Andrea Vedaldi, Andrew Zisserman

Keywords Paper

multimodal learning, self supervision, audio-visual synchronisation, dataset

0

0

0

0

3:02

14/06/2020

Single-Image HDR Reconstruction by Learning to Reverse the Camera Pipeline

Yu-Lun Liu, Wei-Sheng Lai, Yu-Sheng Chen and
Yi-Lung Kao, Ming-Hsuan Yang, Yung-Yu Chuang, Jia-Bin Huang

Keywords Paper

high dynamic range, inverse tone mapping, image sensor, dynamic range, camera response function, quantization, computational photography, deep learning, convolutional neural network, computer vision

0

0

0

0

1:01

14/06/2020

DeepFaceFlow: In-the-Wild Dense 3D Facial Motion Estimation

Mohammad Rami Koujan, Anastasios Roussos, Stefanos Zafeiriou

Keywords Paper

3d flow, dense 3d facial motion capture, optical flow, scene flow, 3d reconstruction and tracking, in-the-wild monocular tracking, facial reenactment, expression recognition, performance capture, non-rigid facial deformations

0

0

0

0

1:01

06/12/2020

Counterfactual Contrastive Learning for Weakly-Supervised Vision-Language Grounding

Zhu Zhang, Zhou Zhao, Zhijie Lin and
jieming zhu, Xiuqiang He

Keywords Paper

0

0

0

0

3:14

14/06/2020

Multi-Granularity Reference-Aided Attentive Feature Aggregation for Video-Based Person Re-Identification

Zhizheng Zhang, Cuiling Lan, Wenjun Zeng, Zhibo Chen

Keywords Paper

multi-granularity attention, video person re-identification, attentive feature aggregation, reference-aided attention, feature relations

0

0

0

0

1:01

02/02/2021

Joint Demosaicking and Denoising in the Wild: The Case of Training Under Ground Truth Uncertainty

Jierun Chen, Song Wen, S.-H. Gary Chan

Keywords Paper

0

0

0

0

18:17

07/09/2020

Multimodal Image Translation with Stochastic Style Representations and Mutual Information Loss

Sanghyeon Na, Seungjoo Yoo, Jaegul Choo

Keywords Paper

image-to-image translation, generative adversarial network

0

0

0

0

9:52

05/01/2021

R-MNet: A Perceptual Adversarial Network for Image Inpainting

Jireh Jam, Connah Kendrick, Vincent Drouard and
Kevin Walker, Gee-Sern Hsu, Moi Hoon Yap

Keywords Paper

0

0

0

0

5:02

14/06/2020

Nested Scale-Editing for Conditional Image Synthesis

Lingzhi Zhang, Jiancong Wang, Yinshuang Xu and
Jie Min, Tarmily Wen, James C. Gee, Jianbo Shi

Keywords Paper

scale editing, identity recovery, image synthesis, super-resolution, image outpainting, text2image, cross-modal translation

0

0

0

0

1:01

02/02/2021

Self-supervised Pre-training and Contrastive Representation Learning for Multiple-choice Video QA

Seonhoon Kim, Seohyeong Jeong, Eunbyul Kim and
Inho Kang, Nojun Kwak

Keywords Paper

0

0

0

0

15:23