Abstract:
Representation learning, using self-supervised classification has recently been shown to give state-of-the-art accuracies for anomaly detection on computer vision datasets. Geometric transformations on images such as rotations, translations and flipping have been used in these recent works to create auxiliary classification tasks for feature learning. This paper introduces a new self-supervised classification framework for anomaly detection in audio signals. Classification tasks are set up based on differences in the metadata associated with the audio files. Synthetic augmentations such as linearly combining and warping audio-spectrograms are also used to increase the complexity of the classification task, to learn finer features. The proposed approach is validated using the publicly available DCASE 2020 challenge task 2: <i>Unsupervised Detection of Anomalous Sounds for Machine Condition Monitoring dataset</i>. We demonstrate the effectiveness of our approach by comparing against the baseline autoencoder model, showing an improvement of over 12.5% in the average AUC metrics.