Lightweight convolutional neural networks on binaural waveforms for low complexity acoustic scene classification

Abstract: In this paper, we investigate the feasibility of training low complexity convolutional neural networks directly from waveforms. While the vast majority of proposed approaches perform fixed feature extraction based on time-frequency representations such as spectrograms, we propose to fully exploit the information in waveforms directly and to minimize the model size. To do so, we train one dimensional Convolutional Neural Networks (1D-CNN) on raw, subsampled binaural audio waveforms, thus exploiting phase information within and across the two input channels. In addition, our approach relies heavily on data augmentation in the temporal domain. Finally, we apply iterative structured parameter pruning to remove the least important convolutional kernels, and perform weight quantization in floating point half precision. We apply this approach on the TAU Urban Acoustic Scenes 2020 3class dataset, with two network architectures : a 1D-CNN based on VGG-like blocks, as well as a ResNet architecture with 1D convolutions, and compare our results with the baseline model from the DCASE 2020 challenge, task 1 subtask B. We report four models that constitute our submission to the DCASE 2020 challenge, task 1 subtask B. Our results show that we can train, prune and quantify a small VGG model to make it 20 times smaller than the 500 KB challenge limit with an accuracy at baseline level (87.6 %), as well as a larger model achieving 91 % of accuracy while being 8 times smaller than the challenge limit. ResNets could be successfully trained, pruned and quantify in order to be below the 500 KB limit, achieving up to 91.2% accuracy. We also report the stability of these results according to data augmentation and monoraul versus binaural inputs.

06/12/2020

Consistency Regularization with High-dimensional Non-adversarial Source-guided Perturbation for Unsupervised Domain Adaptation in Segmentation

Hanting Chen, Yunhe Wang, Han Shu and
Yehui Tang, Chunjing Xu, Boxin Shi, Chao Xu, Qi Tian, Chang Xu

frequency domain, discrete cosine transform, image downsampling, spectral bias, data pre-processing pipeline, image compression, detection, segmentation.

1:01

02/02/2021

Tianyi Chen, Bo Ji, Tianyu Ding and
Biyi Fang, Guanyi Wang, Zhihui Zhu, Luming Liang, Yixin Shi, Sheng Yi, Xiao Tu

channel attention, efficient, adaptive 1d convolution, deep cnns, image classifcation, object detection, instance segmentation

0:57

05/01/2021

Samuele Cornell, Michel Olvera, Manuel Pariente and
Giovanni Pepe, Emanuele Principi, Leonardo Gabrielli, Stefano Squartini

Zhenhua Liu, Yunhe Wang, Kai Han and
Wei Zhang, Siwei Ma, Wen Gao

Keywords Paper

deep learning, transformers, vision

5:52