11/10/2020

Zero-shot Singing Voice Conversion

Shahan Nercessian

Keywords: MIR tasks, Music synthesis and transformation, Domain knowledge, Machine learning/Artificial intelligence for music, Musical features and properties, Timbre, instrumentation, and voice

Abstract: In this paper, we propose the use of speaker embedding networks to perform zero-shot singing voice conversion, and suggest two architectures for its realization. The use of speaker embedding networks not only enables the capability to adapt to new voices on-the-fly, but also allows for model training on unlabeled data. This not only facilitates the collection of suitable singing voice data, but also allows networks to be pretrained on large speech corpora before being refined on singing voice datasets, improving network generalization. We illustrate the effectiveness of the proposed zero-shot singing voice conversion algorithms by both qualitative and quantitative means.

 0
 0
 0
 0
This is an embedded video. Talk and the respective paper are published at ISMIR 2020 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment
no comments yet
code of conduct: tbd Characters remaining: 140

Similar Papers