02/11/2020

Joint training of guided learning and mean teacher models for sound event detection

Hao Yen, Pin-Jui Ku, Ming-Chi Yen, Hung-Shin Lee, Hsin-Min Wang

Keywords:

Abstract: In this paper, we present our system of sound event detection and separation in domestic environments for DCASE 2020. The task aims to determine which sound events appear in a clip and the detailed temporal ranges they occupy. The system is trained by using weakly-labeled and unlabeled real data and synthetic data with strongly annotated labels. Our proposed model structure includes a feature-level front-end based on convolution neural networks (CNN), followed by both embedding-level and instance-level back-end attention modules. In order to make full use of the large amount of unlabeled data, we jointly adopt the Guided Learning and Mean Teacher approaches to carry out weakly-supervised learning and semi-supervised learning. In addition, a set of adaptive median windows for individual sound events is used to smooth the frame-level predictions in post-processing. In the public evaluation set of DCASE 2019, the best event-based <i>F</i><sub>1</sub>-score achieved by our system is 48.50%, which is a relative improvement of 27.16% over the official baseline (38.14%). In addition, in the development set of DCASE 2020, our best system also achieves a relative improvement of 32.91% over the baseline (45.68% vs. 34.37%)

 0
 0
 0
 0
This is an embedded video. Talk and the respective paper are published at DCASE 2020 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment
no comments yet
code of conduct: tbd Characters remaining: 140

Similar Papers