02/11/2020

Audio tag representation guided dual attention network for acoustic scene classification

Ju-Ho Kim, Jee-Weon Jung, Hye-Jin Shim, Ha-Jin Yu

Keywords:

Abstract: Sound events are crucial to discern a specific acoustic scene, which establishes a close relationship between audio tagging and acoustic scene classification (ASC). In this study, we explore the role and application of sound events based on the ASC task and propose the use of the last hidden layer’s output of an audio tagging system (<i>tag representation</i>), rather than the output itself (<i>tag vector</i>), in ASC. We hypothesize that the tag representation contains sound event information that can improve the classification accuracy of acoustic scenes. The dual attention mechanism is investigated to adequately emphasize the frequency-time and channel dimensions of the feature map of an ASC system using tag representation. Experiments are conducted using the Detection and Classification of Acoustic Scenes and Events 2020 task1-a dataset. The proposed system demonstrates an overall classification accuracy of 69.3%, compared to 65.3% of the baseline.

 0
 0
 0
 0
This is an embedded video. Talk and the respective paper are published at DCASE 2020 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment
no comments yet
code of conduct: tbd Characters remaining: 140

Similar Papers