02/11/2020

On the effectiveness of spatial and multi-channel features for multi-channel polyphonic sound event detection

Thi Ngoc Tho Nguyen, Douglas L. Jones, Woon Seng Gan

Keywords:

Abstract: Multi-channel log-mel spectrograms and spatial features such as generalized cross-correlation with phase transform have been demonstrated to be useful for multi-channel polyphonic sound event detection for static-source cases. The multi-channel log-mel spectrograms and spatial features are often stacked along the channel dimension similar to RGB images before being passed to a convolutional model to detect sound events better in multi-source cases. In this paper, we investigate the usage of multi-channel log-mel spectrograms and spatial features for polyphonic sound event detection in both static and dynamic-source cases using DCASE2019 and DCASE2020 sound event localization and detection datasets. Our experimental results show that multi-channel log-mel spectrogram and spatial features are more useful for static-source cases than for dynamic-source cases. The best use of multi-channel audio inputs for polyphonic sound event detection in both static and dynamic scenarios is to train a model that use all the single-channel log-mel spectrograms separately as input features and the final prediction during the inference stage is obtained by taking the arithmetic mean of the model’s output predictions of all the input channels.

 0
 0
 0
 0
This is an embedded video. Talk and the respective paper are published at DCASE 2020 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment
no comments yet
code of conduct: tbd Characters remaining: 140

Similar Papers