02/11/2020

Two-stage domain adaptation for sound event detection

Liping Yang, Junyong Hao, Zhenwei Hou, Wang Peng

Keywords:

Abstract: Sound event detection under real scenarios is a challenge task. Due to the great distribution mismatch of synthetic and real audio data, the performance of sound event detection model, which is trained on strong-labeled synthetic data, degrades dramatically when it is applied in real environment. To tackle the issue and improve the robustness of sound event detection model, we propose a two-stage domain adaptation sound event detection approach in this paper. The backbone convolutional recurrent neural network (CRNN) leaned using strong-labeled synthetic data is updated by weak-label supervised adaptation and frame-level adversarial do-main adaptation. As a result, the parameters of CRNN are renewed for real audio data, and the input space distribution mismatch be-tween synthetic and real audio data is mitigated in the feature space of CRNN. Moreover, a context clip-level consistency regulariza-tion between the classification outputs of CNN and CRNN is in-troduced to improve the feature representation ability of convolu-tional layers in CRNN. Experiments on DCASE 2019 sound event detection in domestic environments task demonstrate the superiori-ty of our proposed domain adaptation approach. Our approach achieves F1 scores of 48.3% on the validation set and 49.4% on the evaluation set, which are the-state-of-art sound event detection performances of CRNN model without data augmentation.

 0
 0
 0
 0
This is an embedded video. Talk and the respective paper are published at DCASE 2020 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment
no comments yet
code of conduct: tbd Characters remaining: 140

Similar Papers