Searching for efficient network architectures for acoustic scene classification

Abstract: Acoustic scene classification (ASC) is the task of classifying recorded audio signal into one of the predefined acoustic environment classes. While previous studies reported ASC systems with high accuracy, the computation cost and system complexity may not be optimal for practical mobile applications. Inspired by the success of neural architecture search (NAS) and the efficacy of MobileNets in vision applications, we propose a simple yet effective random search policy to obtain high accuracy ASC models under strict model size constraint. The search policy allows automatic discovery of the best trade-off between model depth and width, and statistical analysis of model design can be carried out using the evaluation results of randomly sampled architectures. To enable fast search, the search space is limited to several predefined efficient convolutional modules based on depth-wise convolution and swish activation function. Experimental results show that the CNN model found by this search policy gives higher accuracy compared to an AlexNet-like CNN benchmark.

Searching for efficient network architectures for acoustic scene classification

Yuzhong Wu, Tan Lee

Comments

Similar Papers