Investigating U-nets with Various Intermediate Blocks for Spectrogram-based Singing Voice Separation

Abstract: Singing Voice Separation (SVS) tries to separate singing voice from a given mixed musical signal. Recently, many U-Net-based models have been proposed for the SVS task, but there were no existing works that evaluate and compare various types of intermediate blocks that can be used in the U-Net architecture. In this paper, we introduce a variety of intermediate spectrogram transformation blocks. We implement U-nets based on these blocks and train them on complex-valued spectrograms to consider both magnitude and phase. These networks are then compared on the SDR metric. When using a particular block composed of convolutional and fully-connected layers, it achieves state-of-the-art SDR on the MUSDB singing voice separation task by a large margin of 0.9 dB. Our code and models are available online.

22/11/2021

audiovisual, audio-visual, source separation, singing, speech, graph, acappella

2:51

11/10/2020

MIR tasks, Sound source separation, Domain knowledge, Machine learning/Artificial intelligence for music, MIR fundamentals and methodology, Music signal processing

4:10

11/10/2020

MIR tasks, Music synthesis and transformation, Domain knowledge, Machine learning/Artificial intelligence for music, Musical features and properties, Timbre, instrumentation, and voice

2:51

02/02/2021

Listen, Understand and Translate: Triple Supervision Decouples End-to-end Speech-to-text Translation

Qianqian Dong, Rong Ye, Mingxuan Wang and
Hao Zhou, Shuang Xu, Bo Xu, Lei Li

Domain knowledge, Machine learning/Artificial intelligence for music, Computational music theory and musicology, Representations of music, MIR tasks, Automatic classification, Musical features and properties, Harmony, chords, and tonality, Structure, segmentation, and form

3:21

02/11/2020

stereo matching, wavelet coefficients, inverse wavelet transform, supervised learning, deep representation, multi-scale features, multi-resolution cost volume, wavelet regression, disparity reconstruction, disparity refinement

1:01

22/11/2021

MIR tasks, Sound source separation, Evaluation, datasets, and reproducibility, Novel datasets and use cases, MIR fundamentals and methodology, Lyrics and other textual data, web mining, and natural language processing, Multimodality

4:08

02/11/2020

MIR tasks, Music transcription and annotation, MIR fundamentals and methodology, Music signal processing, Musical features and properties, Melody and motives

4:07

06/12/2020

Applications, Music retrieval systems, Domain knowledge, Machine learning/Artificial intelligence for music, Representations of music, MIR fundamentals and methodology, Music signal processing, MIR tasks, Alignment, synchronization, and score following, Musical features and properties, Harmony, chords, and tonality

4:08

11/10/2020

Musical features and properties, Expression and performative aspects of music, Applications, Music training and education, Domain knowledge, Machine learning/Artificial intelligence for music, MIR fundamentals and methodology, Multimodality

4:08

06/12/2021