12/07/2020

Non-Autoregressive Neural Text-to-Speech

Kainan Peng, Wei Ping, Zhao Song, Kexin Zhao

Keywords: Applications - Language, Speech and Dialog

Abstract: In this work, we propose ParaNet, a non-autoregressive seq2seq model that converts text to spectrogram. It is fully convolutional and brings 46.7 times speed-up over its autoregressive counterpart at synthesis, while obtaining reasonably good speech quality. ParaNet also produces stable alignment between text and speech on the challenging test sentences by iteratively improving the attention in a layer-by-layer manner. Furthermore, we build the parallel text-to-speech system by applying various parallel neural vocoders, which can synthesize speech from text through a single feed-forward pass. We also explore a novel approach to train the IAF-based vocoder from scratch, which avoids the need for distillation from a separately trained WaveNet.

 0
 0
 0
 0
This is an embedded video. Talk and the respective paper are published at ICML 2020 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment
no comments yet
code of conduct: tbd Characters remaining: 140

Similar Papers