Bidirectional Variational Inference for Non-Autoregressive Text-to-Speech

03/05/2021

Bidirectional Variational Inference for Non-Autoregressive Text-to-Speech

Yoonhyung Lee, Joongbo Shin, Kyomin Jung

Keywords: VAE, non-autoregressive, speech synthesis, text-to-speech

Abstract Paper Similar Papers

Abstract: Although early text-to-speech (TTS) models such as Tacotron 2 have succeeded in generating human-like speech, their autoregressive architectures have several limitations: (1) They require a lot of time to generate a mel-spectrogram consisting of hundreds of steps. (2) The autoregressive speech generation shows a lack of robustness due to its error propagation property. In this paper, we propose a novel non-autoregressive TTS model called BVAE-TTS, which eliminates the architectural limitations and generates a mel-spectrogram in parallel. BVAE-TTS adopts a bidirectional-inference variational autoencoder (BVAE) that learns hierarchical latent representations using both bottom-up and top-down paths to increase its expressiveness. To apply BVAE to TTS, we design our model to utilize text information via an attention mechanism. By using attention maps that BVAE-TTS generates, we train a duration predictor so that the model uses the predicted duration of each phoneme at inference. In experiments conducted on LJSpeech dataset, we show that our model generates a mel-spectrogram 27 times faster than Tacotron 2 with similar speech quality. Furthermore, our BVAE-TTS outperforms Glow-TTS, which is one of the state-of-the-art non-autoregressive TTS models, in terms of both speech quality and inference speed while having 58% fewer parameters.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at ICLR 2021 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

19/04/2021

WER-BERT: Automatic WER estimation with BERT in a balanced ordinal classification paradigm

Akshay Krishna Sheshadri, Anvesh Rao Vijjini, Sukhdeep Kharbanda

Keywords Paper

0

0

0

0

11:45

18/07/2021

Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech

Jaehyeon Kim, Jungil Kong, Juhee Son

Keywords Paper

Applications, Audio and Speech Processing

0

0

0

0

7:21

06/12/2021

PortaSpeech: Portable and High-Quality Generative Text-to-Speech

Yi Ren, Jinglin Liu, Zhou Zhao

Keywords Paper

generative model

0

0

0

0

10:15

18/07/2021

Fused Acoustic and Text Encoding for Multimodal Bilingual Pretraining and Speech Translation

Renjie Zheng, Junkun Chen, Mingbo Ma, Liang Huang

Keywords Paper

Applications, Natural Language Processing

0

0

0

0

5:19

06/12/2021

Speech-T: Transducer for Text to Speech and Beyond

Jiawei Chen, Xu Tan, Yichong Leng and
Jin Xu, Guihua Wen, Tao Qin, Tie-Yan Liu

Keywords Paper

transformers

0

0

0

0

8:38

18/07/2021

EfficientTTS: An Efficient and High-Quality Text-to-Speech Architecture

Chenfeng Miao, Liang Shuang, Zhengchen Liu and
Chen Minchuan, Jun Ma, Shaojun Wang, Jing Xiao

Keywords Paper

Applications, Audio and Speech Processing

0

0

0

0

5:13

02/02/2021

Multi-SpectroGAN: High-Diversity and High-Fidelity Spectrogram Generation with Adversarial Style Combination for Speech Synthesis

Sang-Hoon Lee, Hyun-Wook Yoon, Hyeong-Rae Noh and
Ji-Hoon Kim, Seong-Whan Lee

Keywords Paper

0

0

0

0

14:19

08/12/2020

Understanding the effects of word-level linguistic annotations in under-resourced neural machine translation

Víctor M. Sánchez-Cartagena, Juan Antonio Pérez-Ortiz, Felipe Sánchez-Martínez

Keywords Paper

0

0

0

0

14:59

06/12/2021

FastCorrect: Fast Error Correction with Edit Alignment for Automatic Speech Recognition

Yichong Leng, Xu Tan, Linchen Zhu and
Jin Xu, Renqian Luo, Linquan Liu, Tao Qin, Xiangyang Li, Edward Lin, Tie-Yan Liu

Keywords Paper

0

0

0

0

13:44

06/12/2021

Unsupervised Noise Adaptive Speech Enhancement by Discriminator-Constrained Optimal Transport

Hsin-Yi Lin, Huan-Hsin Tseng, Xugang Lu, Yu Tsao

Keywords Paper

theory, machine learning, adversarial robustness and security, domain adaptation, optimal transport

0

0

0

0

14:40

08/12/2020

Infusing Sequential Information into Conditional Masked Translation Model with Self-Review Mechanism

Pan Xie, Zhi Cui, Xiuying Chen and
XiaoHui Hu, Jianwei Cui, Bin Wang

Keywords Paper

0

0

0

0

6:43

08/12/2020

ContraCAT: Contrastive Coreference Analytical Templates for Machine Translation

Dario Stojanovski, Benno Krojer, Denis Peskov, Alexander Fraser

Keywords Paper

0

0

0

0

14:09

26/04/2020

Residual Energy-Based Models for Text Generation

Yuntian Deng, Anton Bakhtin, Myle Ott and
Arthur Szlam, Marc'Aurelio Ranzato

Keywords Paper

energy-based models, text generation

0

0

0

0

4:59

01/07/2020

End-to-End Speech Translation with Adversarial Training

Xuancai Li, Chen Kehai, Tiejun Zhao, Muyun Yang

Keywords Paper

0

0

0

0

8:53

02/02/2021

Audio-Oriented Multimodal Machine Comprehension via Dynamic Inter- and Intra-modality Attention

Zhiqi Huang, Fenglin Liu, Xian Wu and
Shen Ge, Helin Wang, Wei Fan, Yuexian Zou

Keywords Paper

0

0

0

0

14:47

02/02/2021

Listen, Understand and Translate: Triple Supervision Decouples End-to-end Speech-to-text Translation

Qianqian Dong, Rong Ye, Mingxuan Wang and
Hao Zhou, Shuang Xu, Bo Xu, Lei Li

Keywords Paper

0

0

0

0

14:09

04/07/2020

Unsupervised Multimodal Neural Machine Translation with Pseudo Visual Pivoting

Po-Yao Huang, Junjie Hu, Xiaojun Chang, Alexander Hauptmann

Keywords Paper

Unsupervised Translation, Unsupervised MT, MT, alignment

0

0

0

0

12:17

05/12/2020

An exploratory study on multilingual quality estimation

Shuo Sun, Marina Fomicheva, Frédéric Blain and
Vishrav Chaudhary, Ahmed El-Kishky, Adithya Renduchintala, Francisco Guzmán, Lucia Specia

Keywords Paper

0

0

0

0

14:31

16/11/2020

Effectively pretraining a speech translation decoder with Machine Translation data

Ashkan Alinejad, Anoop Sarkar

Keywords Paper

automatic task, neural task, speech translation, end-to-end approach

0

0

0

0

6:12

08/12/2020

Emergent Communication Pretraining for Few-Shot Machine Translation

Yaoyiran Li, Edoardo Maria Ponti, Ivan Vulić, Anna Korhonen

Keywords Paper

0

0

0

0

14:42

18/07/2021

Meta-StyleSpeech : Multi-Speaker Adaptive Text-to-Speech Generation

Dongchan Min, Dong Bok Lee, Eunho Yang, Sung Ju Hwang

Keywords Paper

Applications, Audio and Speech Processing

0

0

0

0

5:17

16/11/2020

Sparse Text Generation

Pedro Henrique Martins, Zita Marinho, André F. T. Martins

Keywords Paper

story completion, dialogue generation, text generators, language models

0

0

0

0

11:27

12/07/2020

Non-Autoregressive Neural Text-to-Speech

Kainan Peng, Wei Ping, Zhao Song, Kexin Zhao

Keywords Paper

Applications - Language, Speech and Dialog

0

0

0

0

15:12

08/12/2020

Multi-task Learning of Spoken Language Understanding by Integrating N-Best Hypotheses with Hierarchical Attention

Mingda Li, Xinyue Liu, Weitong Ruan and
Luca Soldaini, Wael Hamza, Chengwei Su

Keywords Paper

0

0

0

0

14:43

16/11/2020

Incremental Processing in the Age of Non-Incremental Encoders: An Empirical Assessment of Bidirectional Models for Incremental NLU

Brielen Madureira, David Schlangen

Keywords Paper

nlp, interactive systems, language encoders, bidirectional lstms

0

0

0

0

10:04

02/02/2021

Lexically Constrained Neural Machine Translation with Explicit Alignment Guidance

Guanhua Chen, Yun Chen, Victor O.K. Li

Keywords Paper

0

0

0

0

15:33

04/07/2020

SimulSpeech: End-to-End Simultaneous Speech to Text Translation

Yi Ren, Jinglin Liu, Xu Tan and
Chen Zhang, Tao Qin, Zhou Zhao, Tie-Yan Liu

Keywords Paper

simultaneous translation, simultaneous recognition, ASR, NMT

0

0

0

0

5:51

06/12/2021

NORESQA: A Framework for Speech Quality Assessment using Non-Matching References

Pranay Manocha, Buye Xu, Anurag Kumar

Keywords Paper

deep learning, robustness, self-supervised learning

0

0

0

0

14:30

08/12/2020

SpanAlign: Sentence Alignment Method based on Cross-Language Span Prediction and ILP

Katsuki Chousa, Masaaki Nagata, Masaaki Nishino

Keywords Paper

0

0

0

0

14:39

02/02/2021

Do Response Selection Models Really Know What’s Next? Utterance Manipulation Strategies for Multi-turn Response Selection

Taesun Whang, Dongyub Lee, Dongsuk Oh and
Chanhee Lee, Kijong Han, Dong-hun Lee, Saebyeok Lee

Keywords Paper

0

0

0

0

17:37

06/12/2020

HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis

Jungil Kong, Jaehyeon Kim, Jaekyoung Bae

Keywords Paper

0

0

0

0

2:54

04/07/2020

Unsupervised Paraphasia Classification in Aphasic Speech

Sharan Pai, Nikhil Sachdeva, Prince Sachdeva, Rajiv Ratn Shah

Keywords Paper

Unsupervised Classification, speech disorder, naming detection, treatment

0

0

0

0

10:02

26/04/2020

The Curious Case of Neural Text Degeneration

Ari Holtzman, Jan Buys, Li Du and
Maxwell Forbes, Yejin Choi

Keywords Paper

generation, text, NLG, NLP, natural language, natural language generation, language model, neural, neural language model

0

0

0

0

4:57

02/02/2021

Revisiting Mahalanobis Distance for Transformer-Based Out-of-Domain Detection

Alexander Podolskiy, Dmitry Lipin, Andrey Bout and
Ekaterina Artemova, Irina Piontkovskaya

Keywords Paper

0

0

0

0

16:08

26/04/2020

High Fidelity Speech Synthesis with Adversarial Networks

Mikołaj Bińkowski, Jeff Donahue, Sander Dieleman and
Aidan Clark, Erich Elsen, Norman Casagrande, Luis C. Cobo, Karen Simonyan

Keywords Paper

texttospeech, speechsynthesis, audiosynthesis, gans, generativeadversarialnetworks, implicitgenerativemodels

0

0

0

0

15:07

06/12/2020

Glow-TTS: A Generative Flow for Text-to-Speech via Monotonic Alignment Search

Jaehyeon Kim, Sungwon Kim, Jungil Kong, Sungroh Yoon

Keywords Paper

0

0

0

0

3:11

16/11/2020

oLMpics - On what Language Model Pre-training Captures

Alon Talmor, Yanai Elazar, Yoav Goldberg, Jonathan Berant

Keywords Paper

symbolic tasks, reasoning tasks, zero-shot evaluation, pre-training

0

0

0

0

10:01

12/07/2020

Educating Text Autoencoders: Latent Representation Guidance via Denoising

Tianxiao Shen, Jonas Mueller, Regina Barzilay, Tommi Jaakkola

Keywords Paper

Deep Learning - Generative Models and Autoencoders

0

0

0

0

17:06

16/11/2020

Named Entity Recognition Only from Word Embeddings

Ying Luo, Hai Zhao, Junlang Zhan

Keywords Paper

named recognition, entity detection, type prediction, deep models

0

0

0

0

9:54

04/07/2020

Low-Resource Generation of Multi-hop Reasoning Questions

Jianxing Yu, Wei Liu, Shuang Qiu and
Qinliang Su, Kai Wang, Xiaojun Quan, Jian Yin

Keywords Paper

Low-Resource Questions, generating questions, machine comprehension, multi-hop model

0

0

0

0

11:54