Acoustic scene classification with spectrogram processing strategies

Abstract: Recently, convolutional neural networks (CNN) have achieved the state-of-the-art performance in acoustic scene classification (ASC) task. The audio data is often transformed into two-dimensional spectrogram representations, which are then fed to the neural networks. In this paper, we study the problem of efficiently taking advantage of different spectrogram representations through discriminative processing strategies. There are two main contributions. The first contribution is exploring the impact of the combination of multiple spectrogram representations at different stages, which provides a meaningful reference for the effective spectrogram fusion. The second contribution is that the processing strategies in multiple frequency bands and multiple temporal frames are proposed to make fully use of a single spectrogram representation. The proposed spectrogram processing strategies can be easily transferred to any network structures. The experiments are carried out on the DCASE 2020 Task1 datasets, and the results show that our method could achieve the accuracy of 81.8% (official baseline: 54.1%) and 92.1% (official baseline: 87.3%) on the officially provided fold 1 evaluation dataset of Task1A and Task1B, respectively.

18/07/2021

event cameras, guided filtering, event denoising and super resolution, video frame synthesis, motion deblur, hdr imaging, motion tracking

1:00

06/12/2021

sound propagation, head-related transfer function (HRTF), equalization, wave simulation, virtual acoustics, source directivity, spatial audio, bidirectional impulse response

15:53

02/11/2020

Tan Yu, XU LI, Yunfeng Cai and
Mingming Sun, Ping Li

Samuele Cornell, Michel Olvera, Manuel Pariente and
Giovanni Pepe, Emanuele Principi, Leonardo Gabrielli, Stefano Squartini

Adrian Bulat, Jean Kossaifi, Sourav Bhattacharya and
Yannis Panagakis, Timothy Hospedales, Georgios Tzimiropoulos, Nicholas Lane, Maja Pantic

Keywords Paper

tensors, tensor networks, tensor decomposition, randomization, adversarial defence, binary networks, network quantization, tensorization

2:21

06/12/2020

Samuele Cornell, Michel Olvera, Manuel Pariente and
Giovanni Pepe, Emanuele Principi, Leonardo Gabrielli, Stefano Squartini

Keywords Paper

12:30

06/12/2021

MLP-Mixer: An all-MLP Architecture for Vision

Ilya Tolstikhin, Neil Houlsby, Alexander Kolesnikov and
Lucas Beyer, Xiaohua Zhai, Thomas Unterthiner, Jessica Yung, Andreas Steiner, Daniel Keysers, Jakob Uszkoreit, Mario Lucic, Alexey Dosovitskiy