Audio tag representation guided dual attention network for acoustic scene classification

Abstract: Sound events are crucial to discern a specific acoustic scene, which establishes a close relationship between audio tagging and acoustic scene classification (ASC). In this study, we explore the role and application of sound events based on the ASC task and propose the use of the last hidden layer’s output of an audio tagging system (<i>tag representation</i>), rather than the output itself (<i>tag vector</i>), in ASC. We hypothesize that the tag representation contains sound event information that can improve the classification accuracy of acoustic scenes. The dual attention mechanism is investigated to adequately emphasize the frequency-time and channel dimensions of the feature map of an ASC system using tag representation. Experiments are conducted using the Detection and Classification of Acoustic Scenes and Events 2020 task1-a dataset. The proposed system demonstrates an overall classification accuracy of 69.3%, compared to 65.3% of the baseline.

02/11/2020

David Wan, Chris Kedzie, Faisal Ladhak and
Elsbeth Turcan, Petra Galuscakova, Elena Zotkina, Zhengping Jiang, Peter Bell, Kathleen McKeown

Keywords Paper

11:49

02/11/2020

Improving sound event detection in domestic environments using sound separation

Nicolas Turpault, Scott Wisdom, Hakan Erdogan and
John R. Hershey, Romain Serizel, Eduardo Fonseca, Prem Seetharaman, Justin Salamon

sound propagation, head-related transfer function (HRTF), equalization, wave simulation, virtual acoustics, source directivity, spatial audio, bidirectional impulse response

15:53

02/02/2021

Samuele Cornell, Michel Olvera, Manuel Pariente and
Giovanni Pepe, Emanuele Principi, Leonardo Gabrielli, Stefano Squartini

Keywords Paper

12:30

16/11/2020

Direct Segmentation Models for Streaming Speech Translation

Javier Iranzo-Sánchez, Adrià Giménez Pastor, Joan Albert Silvestre-Cerdà and
Pau Baquero-Arnal, Jorge Civera Saiz, Alfons Juan

Tadanobu Inoue, Phongtharin Vinayavekhin, Shu Morikuni and
Shiqiang Wang, Tuan Hoang Trong, David Wood, Michiaki Tatsubori, Ryuki Tachibana

Keywords Paper

14:59

02/11/2020

Forward-backward convolutional recurrent neural networks and tag-conditioned convolutional neural networks for weakly labeled semi-supervised sound event detection

Papafil: A low complexity sound event localization and detection method with parametric particle filtering and gradient boosting

unsupervised domain adaptation, cross-domain object detection, transfer learning, deep learning, hierarchical transferability calibration

1:01