Flowtron: an Autoregressive Flow-based Generative Network for Text-to-Speech Synthesis

Abstract: In this paper we propose Flowtron: an autoregressive flow-based generative network for text-to-speech synthesis with style transfer and speech variation. Flowtron borrows insights from Autoregressive Flows and revamps Tacotron 2 in order to provide high-quality and expressive mel-spectrogram synthesis. Flowtron is optimized by maximizing the likelihood of the training data, which makes training simple and stable. Flowtron learns an invertible mapping of data to a latent space that can be used to modulate many aspects of speech synthesis (timbre, expressivity, accent). Our mean opinion scores (MOS) show that Flowtron matches state-of-the-art TTS models in terms of speech quality. We provide results on speech variation, interpolation over time between samples and style transfer between seen and unseen speakers. Code and pre-trained models are publicly available at \href{https://github.com/NVIDIA/flowtron}{https://github.com/NVIDIA/flowtron}.

04/07/2020

Flowtron: an Autoregressive Flow-based Generative Network for Text-to-Speech Synthesis

Rafael Valle, Kevin J Shih, Ryan Prenger, Bryan Catanzaro

Comments

Similar Papers

Investigating the effect of auxiliary objectives for the automated grading of learner English speech transcriptions

Hannah Craighead, Andrew Caines, Paula Buttery, Helen Yannakoudakis

Keywords Abstract Paper

automated transcriptions, automatically speech, multi-task learning, inductive transfer

EfficientTTS: An Efficient and High-Quality Text-to-Speech Architecture

Chenfeng Miao, Liang Shuang, Zhengchen Liu and Chen Minchuan, Jun Ma, Shaojun Wang, Jing Xiao

Keywords Abstract Paper

Applications, Audio and Speech Processing

Dual-mode ASR: Unify and Improve Streaming ASR with Full-context Modeling

Jiahui Yu, Wei Han, Anmol Gulati and Chung-Cheng Chiu, Bo Li, Tara Sainath, Yonghui Wu, Ruoming Pang

Keywords Abstract Paper

Dual-mode ASR, Low-latency ASR, Streaming ASR, Speech Recognition

Glow-TTS: A Generative Flow for Text-to-Speech via Monotonic Alignment Search

Jaehyeon Kim, Sungwon Kim, Jungil Kong, Sungroh Yoon

Keywords Abstract Paper

Joint energy-based model training for better calibrated natural language understanding models

Tianxing He, Bryan McCann, Caiming Xiong, Ehsan Hosseini-Asl

Keywords Abstract Paper

Meta-StyleSpeech : Multi-Speaker Adaptive Text-to-Speech Generation

Dongchan Min, Dong Bok Lee, Eunho Yang, Sung Ju Hwang

Keywords Abstract Paper

Applications, Audio and Speech Processing

Non-Autoregressive Neural Text-to-Speech

Kainan Peng, Wei Ping, Zhao Song, Kexin Zhao

Keywords Abstract Paper

Applications - Language, Speech and Dialog

TaLNet: Voice Reconstruction from Tongue and Lip Articulation with Transfer Learning from Text-to-Speech Synthesis

Jing-Xuan Zhang, Korin Richmond, Zhen-Hua Ling, Lirong Dai

Keywords Abstract Paper

Towards end-2-end learning for predicting behavior codes from spoken utterances in psychotherapy conversations

Karan Singla, Zhuohao Chen, David Atkins, Shrikanth Narayanan

Keywords Abstract Paper

predicting codes, Spoken tasks, voice detection, speaker diarization

Attentively Embracing Noise for Robust Latent Representation in BERT

Gwenaelle Cunha Sergio, Dennis Singh Moirangthem, Minho Lee

Keywords Abstract Paper

Neural Dubber: Dubbing for Videos According to Scripts

Chenxu Hu, Qiao Tian, Tingle Li and Wang Yuping, Yuxuan Wang, Hang Zhao

Keywords Abstract Paper

deep learning

Listen, Understand and Translate: Triple Supervision Decouples End-to-end Speech-to-text Translation

Qianqian Dong, Rong Ye, Mingxuan Wang and Hao Zhou, Shuang Xu, Bo Xu, Lei Li

Keywords Abstract Paper

Speech2Talking-Face: Inferring and Driving a Face with Synchronized Audio-Visual Representation

Yasheng Sun, Hang Zhou, Ziwei Liu, Hideki Koike

Keywords Abstract Paper

Computer Vision, 2D and 3D Computer Vision, Speech

Start-Before-End and End-to-End: Neural Speech Translation by AppTek and RWTH Aachen University

Parnia Bahar, Patrick Wilken, Tamer Alkhouli and Andreas Guta, Pavel Golik, Evgeny Matusov, Christian Herold

Keywords Abstract Paper

SimulSpeech: End-to-End Simultaneous Speech to Text Translation

Yi Ren, Jinglin Liu, Xu Tan and Chen Zhang, Tao Qin, Zhou Zhao, Tie-Yan Liu

Keywords Abstract Paper

simultaneous translation, simultaneous recognition, ASR, NMT

A Streaming End-to-End Framework For Spoken Language Understanding

Nihal Potdar, Anderson Raymundo Avila, Chao Xing and Dong Wang, Yiran Cao, Xiao Chen

Keywords Abstract Paper

Natural Language Processing, Dialogue, Speech

Delexicalized Paraphrase Generation

Boya Yu, Konstantine Arkoudas, Wael Hamza

Keywords Abstract Paper

End-to-End Speech Translation with Adversarial Training

Xuancai Li, Chen Kehai, Tiejun Zhao, Muyun Yang

Keywords Abstract Paper

KIT’s IWSLT 2020 SLT Translation System

Ngoc-Quan Pham, Felix Schneider, Tuan-Nam Nguyen and Thanh-Le Ha, Thai Son Nguyen, Maximilian Awiszus, Sebastian Stüker, Alexander Waibel

Keywords Abstract Paper

Sequence to Multi-Sequence Learning via Conditional Chain Mapping for Mixture Signals

Jing Shi, Xuankai Chang, Pengcheng Guo and Shinji Watanabe, Yusuke Fujita, Jiaming Xu, Bo Xu, Lei Xie

Keywords Abstract Paper

MRD-Net: Multi-Modal Residual Knowledge Distillation for Spoken Question Answering

Chenyu You, Nuo Chen, Yuexian Zou

Keywords Abstract Paper

Natural Language Processing, Question Answering, Sentiment Analysis and Text Mining, Speech

Exemplification Modeling: Can You Give Me an Example, Please?

Edoardo Barba, Luigi Procopio, Caterina Lacerra and Tommaso Pasini, Roberto Navigli

Keywords Paper

Chenfeng Miao, Liang Shuang, Zhengchen Liu and
Chen Minchuan, Jun Ma, Shaojun Wang, Jing Xiao

Keywords Paper

Jiahui Yu, Wei Han, Anmol Gulati and
Chung-Cheng Chiu, Bo Li, Tara Sainath, Yonghui Wu, Ruoming Pang

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Chenxu Hu, Qiao Tian, Tingle Li and
Wang Yuping, Yuxuan Wang, Hang Zhao

Keywords Paper

Qianqian Dong, Rong Ye, Mingxuan Wang and
Hao Zhou, Shuang Xu, Bo Xu, Lei Li

Keywords Paper

Keywords Paper

Parnia Bahar, Patrick Wilken, Tamer Alkhouli and
Andreas Guta, Pavel Golik, Evgeny Matusov, Christian Herold

Keywords Paper

Yi Ren, Jinglin Liu, Xu Tan and
Chen Zhang, Tao Qin, Zhou Zhao, Tie-Yan Liu

Keywords Paper

Nihal Potdar, Anderson Raymundo Avila, Chao Xing and
Dong Wang, Yiran Cao, Xiao Chen

Keywords Paper

Keywords Paper

Keywords Paper

Ngoc-Quan Pham, Felix Schneider, Tuan-Nam Nguyen and
Thanh-Le Ha, Thai Son Nguyen, Maximilian Awiszus, Sebastian Stüker, Alexander Waibel

Keywords Paper

Jing Shi, Xuankai Chang, Pengcheng Guo and
Shinji Watanabe, Yusuke Fujita, Jiaming Xu, Bo Xu, Lei Xie

Keywords Paper

Keywords Paper

Edoardo Barba, Luigi Procopio, Caterina Lacerra and
Tommaso Pasini, Roberto Navigli

Keywords Paper

Keywords Paper

Qianqian Dong, Mingxuan Wang, Hao Zhou and
Shuang Xu, Bo Xu, Lei Li

Keywords Paper

Keywords Paper

Alexander Podolskiy, Dmitry Lipin, Andrey Bout and
Ekaterina Artemova, Irina Piontkovskaya

Keywords Paper

Keywords Paper

Alexey Gritsenko, Tim Salimans, Rianne van den Berg and
Jasper Snoek, Nal Kalchbrenner

Keywords Paper

Keywords Paper

Keywords Paper

Jakob D. Havtorn, Jan Latko, Joakim Edin and
Lars Maaløe, Lasse Borgholt, Lorenzo Belgrano, Nicolai Jacobsen, Regitze Sdun, Željko Agić

Keywords Paper

Keywords Paper

Sang-Hoon Lee, Hyun-Wook Yoon, Hyeong-Rae Noh and
Ji-Hoon Kim, Seong-Whan Lee

Keywords Paper

Jeff Donahue, Sander Dieleman, Mikolaj Binkowski and
Erich Elsen, Karen Simonyan

Keywords Paper

Raza Habib, Soroosh Mariooryad, Matt Shannon and
Eric Battenberg, RJ Skerry-Ryan, Daisy Stanton, David Kao, Tom Bagby

Keywords Paper

Zhuosheng Zhang, Kehai Chen, Rui Wang and
Masao Utiyama, Eiichiro Sumita, Zuchao Li, Hai Zhao

Keywords Paper