Music Structure Analysis Based on an LSTM-HSMM Hybrid Model

Abstract: This paper describes a statistical music structure analysis method that splits an audio signal of popular music into musically meaningful sections at the beat level and classifies them into predefined categories such as intro, verse, and chorus, where beat times are assumed to be estimated in advance. A basic approach to this task is to train a recurrent neural network (e.g., long short-term memory (LSTM) network) that directly predicts section labels from acoustic features. This approach, however, suffers from frequent musically unnatural label switching because the homogeneity, repetitiveness, and duration regularity of musical sections are hard to represent explicitly in the network architecture. To solve this problem, we formulate a unified hidden semi-Markov model (HSMM) that represents the generative process of homogeneous mel-frequency cepstrum coefficients, repetitive chroma features, and mel spectra from section labels, where the emission probabilities of mel spectra are computed from the posterior probabilities of section labels predicted by an LSTM. Given these acoustic features, the most likely label sequence can be estimated with Viterbi decoding. The experimental results show that the proposed LSTM-HSMM hybrid model outperformed a conventional HSMM.

Music Structure Analysis Based on an LSTM-HSMM Hybrid Model

Go Shibata, Ryo Nishikimi, Kazuyoshi Yoshii

Comments

Similar Papers

The Multiple Voices of Musical Emotions: Source Separation for Improving Music Emotion Recognition Models and Their Interpretability

Jacopo de Berardinis, Angelo Cangelosi, Eduardo Coutinho

Keywords Abstract Paper

Musical features and properties, Musical affect, emotion, and mood, Domain knowledge, Machine learning/Artificial intelligence for music, MIR tasks, Sound source separation

Grad-TTS: A Diffusion Probabilistic Model for Text-to-Speech

Vadim Popov, Ivan Vovk, Vladimir Gogoryan and Tasnima Sadekova, Mikhail Kudinov

Keywords Abstract Paper

Applications, Audio and Speech Processing

Music Fadernets: Controllable Music Generation Based on High-level Features via Low-level Feature Modelling

HAO HAO TAN, Dorien Herremans

Keywords Abstract Paper

Domain knowledge, Machine learning/Artificial intelligence for music, MIR tasks, Music synthesis and transformation

Uncovering Audio Patterns in Music with Nonnegative Tucker Decomposition for Structural Segmentation

Axel Marmoret, Jeremy Cohen, Frédéric Bimbot, Nancy Bertin

Keywords Abstract Paper

MIR fundamentals and methodology, Music signal processing, Musical features and properties, Structure, segmentation, and form

FINE Samples for Learning with Noisy Labels

Taehyeon Kim, Jongwoo Ko, sangwook Cho and JinHwan Choi, Se-Young Yun

Keywords Abstract Paper

theory, deep learning, machine learning, vision, semi-supervised learning

Modelling Hierarchical Key Structure with Pitch Scapes

Robert Lieck, Martin Rohrmeier

Keywords Abstract Paper

Domain knowledge, Machine learning/Artificial intelligence for music, Computational music theory and musicology, Representations of music, MIR tasks, Automatic classification, Musical features and properties, Harmony, chords, and tonality, Structure, segmentation, and form

Explaining Perceived Emotion Predictions in Music: an Attentive Approach

Sanga Chaki, Pranjal Doshi, Sourangshu Bhattacharya, Prof. Priyadarshi Patnaik

Keywords Abstract Paper

Musical features and properties, Musical affect, emotion, and mood, Applications, Music recommendation and playlist generation, Music retrieval systems, Domain knowledge, Machine learning/Artificial intelligence for music, MIR tasks, Automatic classification, Pattern matching and detection

Content Based Singing Voice Source Separation via Strong Conditioning Using Aligned Phonemes

Gabriel Meseguer Brocal, Geoffroy Peeters

Keywords Abstract Paper

MIR tasks, Sound source separation, Evaluation, datasets, and reproducibility, Novel datasets and use cases, MIR fundamentals and methodology, Lyrics and other textual data, web mining, and natural language processing, Multimodality

Unsupervised Disentanglement of Pitch and Timbre for Isolated Musical Instrument Sounds

Yin-Jyun Luo, Kin Wai Cheuk, Tomoyasu Nakano and Masataka Goto, Dorien Herremans

Keywords Abstract Paper

Domain knowledge, Machine learning/Artificial intelligence for music

Searching for efficient network architectures for acoustic scene classification

Yuzhong Wu, Tan Lee

Keywords Abstract Paper

SoundDet: Polyphonic Moving Sound Event Detection and Localization from Raw Waveform

Yuhang He, Niki Trigoni, Andrew Markham

Keywords Abstract Paper

Applications, Audio and Speech Processing

Score-informed Source Separation of Choral Music

Matan Gover, Philippe Depalle

Keywords Abstract Paper

MIR tasks, Sound source separation, Domain knowledge, Machine learning/Artificial intelligence for music, MIR fundamentals and methodology, Music signal processing

Time-Aware Multi-Scale RNNs for Time Series Modeling

Zipeng Chen, Qianli Ma, Zhenxi Lin

Keywords Abstract Paper

Machine Learning, Deep Learning, Time-series; Data Streams

Contextual and sequential user embeddings for large-scale music recommendation

Casper Hansen, Christian Hansen, Lucas Maystre and Rishabh Mehrotra, Brian Brost, Federico Tomasi, Mounia Lalmas

Keywords Abstract Paper

Sequence, Music Recommendation, User Embeddings, Context

Compound Word Transformer: Learning to Compose Full-Song Music over Dynamic Directed Hypergraphs

Wen-Yi Hsiao, Jen-Yu Liu, Yin-Cheng Yeh, Yi-Hsuan Yang

Keywords Abstract Paper

Joint Analysis of Mode and Playing Technique in Guqin Performance with Machine Learning

Yu-Fen Huang, Jeng-I Liang, I-CHIEH WEI, Li Su

Keywords Abstract Paper

Domain knowledge, Computational music theory and musicology, MIR tasks, Automatic classification, Musical features and properties, Expression and performative aspects of music

Learning with Feature-Dependent Label Noise: A Progressive Approach

Yikai Zhang, Songzhu Zheng, Pengxiang Wu and Mayank Goswami, Chao Chen

Keywords Abstract Paper

Noisy Label, Classification, Deep Learning

Modeling the Compatibility of Stem Tracks to Generate Music Mashups

Jiawen Huang, Ju-Chiang Wang, Jordan B. L. Smith and Xuchen Song, Yuxuan Wang

Keywords Abstract Paper

On the effectiveness of spatial and multi-channel features for multi-channel polyphonic sound event detection

Thi Ngoc Tho Nguyen, Douglas L. Jones, Woon Seng Gan

Keywords Abstract Paper

SNIPS: Solving Noisy Inverse Problems Stochastically

Bahjat Kawar, Gregory Vaksman, Michael Elad

Keywords Abstract Paper

Multilingual Music Genre Embeddings for Effective Cross-lingual Music Item Annotation

Keywords Paper

Vadim Popov, Ivan Vovk, Vladimir Gogoryan and
Tasnima Sadekova, Mikhail Kudinov

Keywords Paper

Keywords Paper

Keywords Paper

Taehyeon Kim, Jongwoo Ko, sangwook Cho and
JinHwan Choi, Se-Young Yun

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Yin-Jyun Luo, Kin Wai Cheuk, Tomoyasu Nakano and
Masataka Goto, Dorien Herremans

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Casper Hansen, Christian Hansen, Lucas Maystre and
Rishabh Mehrotra, Brian Brost, Federico Tomasi, Mounia Lalmas

Keywords Paper

Keywords Paper

Keywords Paper

Yikai Zhang, Songzhu Zheng, Pengxiang Wu and
Mayank Goswami, Chao Chen

Keywords Paper

Jiawen Huang, Ju-Chiang Wang, Jordan B. L. Smith and
Xuchen Song, Yuxuan Wang

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Yang Song, Jascha Sohl-Dickstein, Durk Kingma and
Abhishek Kumar, Stefano Ermon, Ben Poole

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Jiawen Huang, Yun-Ning Hung, Ashis Pati and
Siddharth Kumar Gururani, Alexander Lerch

Keywords Paper

Nanxin Chen, Yu Zhang, Heiga Zen and
Ron Weiss, Mohammad Norouzi, William Chan

Keywords Paper

Chengyu Zheng, Xiulian Peng, Yuan Zhang and
Sriram Srinivasan, Yan Lu

Keywords Paper

Ayush Patwari, Nicholas Kong, Jun Wang and
Ullas Gargi, Michele Covell, Aren Jansen

Keywords Paper

Keywords Paper