Abstract:
Source Separation is often used as a pre-processing step in many signal-processing tasks. In this work we propose a novel approach for combined Source Separation and Sound Event Detection in which a Source Separation algorithm is used to enhance the Sound Even -Detection back-end performance. In particular, we present a permutation-invariant training scheme for optimizing the Source Separation system directly with the back-end Sound Event Detection objective without requiring joint training or fine-tuning of the two systems. We show that such an approach has significant advantages over the more standard approach of training the Source Separation system separately using only a Source Separation based objective such as Scale-Invariant Signal-To-Distortion Ratio. On the 2020 Detection and Classification of Acoustic Scenes and Events Task 4 Challenge our proposed approach is able to outperform the baseline source separation system by more than one percent in event-based macro <i>F</i><sub>1</sub> score on the development set with significantly less computational requirements.