Improving Zero-Shot Voice Style Transfer via Disentangled Representation Learning

03/05/2021

Improving Zero-Shot Voice Style Transfer via Disentangled Representation Learning

Siyang Yuan, Pengyu Cheng, Ruiyi Zhang, Weituo Hao, Zhe Gan, Lawrence Carin

Keywords: Disentanglement, Mutual Information, Zero-shot Learning, Style Transfer

Abstract Paper Similar Papers

Abstract: Voice style transfer, also called voice conversion, seeks to modify one speaker's voice to generate speech as if it came from another (target) speaker. Previous works have made progress on voice conversion with parallel training data and pre-known speakers. However, zero-shot voice style transfer, which learns from non-parallel data and generates voices for previously unseen speakers, remains a challenging problem. In this paper we propose a novel zero-shot voice transfer method via disentangled representation learning. The proposed method first encodes speaker-related style and voice content of each input voice into separate low-dimensional embedding spaces, and then transfers to a new voice by combining the source content embedding and target style embedding through a decoder. With information-theoretic guidance, the style and content embedding spaces are representative and (ideally) independent of each other. On real-world datasets, our method outperforms other baselines and obtains state-of-the-art results in terms of transfer accuracy and voice naturalness.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at ICLR 2021 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

18/07/2021

Learning de-identified representations of prosody from raw audio

Jack Weston, Raphael Lenain, Udeepa Meepegama, Emil Fristed

Keywords Paper

Applications, Audio and Speech Processing

0

0

0

0

4:37

02/02/2021

Multi-SpectroGAN: High-Diversity and High-Fidelity Spectrogram Generation with Adversarial Style Combination for Speech Synthesis

Sang-Hoon Lee, Hyun-Wook Yoon, Hyeong-Rae Noh and
Ji-Hoon Kim, Seong-Whan Lee

Keywords Paper

0

0

0

0

14:19

06/12/2021

VoiceMixer: Adversarial Voice Style Mixup

Sang-Hoon Lee, Ji-Hoon Kim, Hyunseung Chung, Seong-Whan Lee

Keywords Paper

representation learning

0

0

0

0

10:18

02/02/2021

TaLNet: Voice Reconstruction from Tongue and Lip Articulation with Transfer Learning from Text-to-Speech Synthesis

Jing-Xuan Zhang, Korin Richmond, Zhen-Hua Ling, Lirong Dai

Keywords Paper

0

0

0

0

19:58

26/04/2020

From Inference to Generation: End-to-end Fully Self-supervised Generation of Human Face from Speech

Hyeong-Seok Choi, Changdae Park, Kyogu Lee

Keywords Paper

Multi-modal learning, Self-supervised learning, Voice profiling, Conditional GANs

0

0

0

0

5:15

06/12/2020

Unsupervised Sound Separation Using Mixture Invariant Training

Scott Wisdom, Efthymios Tzinis, Hakan Erdogan and
Ron Weiss, Kevin Wilson, John R. Hershey

Keywords Paper

0

0

0

0

3:20

19/08/2021

FedSpeech: Federated Text-to-Speech with Continual Learning

Ziyue Jiang, Yi Ren, Ming Lei, Zhou Zhao

Keywords Paper

Natural Language Processing, Speech, Federated Learning, Privacy Preserving Data Mining

0

0

0

0

6:06

02/11/2020

Model selection for deep audio source separation via clustering analysis

Alisa Liu, Prem Seetharaman, Bryan Pardo

Keywords Paper

0

0

0

0

12:12

11/10/2020

Zero-shot Singing Voice Conversion

Shahan Nercessian

Keywords Paper

MIR tasks, Music synthesis and transformation, Domain knowledge, Machine learning/Artificial intelligence for music, Musical features and properties, Timbre, instrumentation, and voice

0

0

0

0

2:51

14/06/2020

Distortion Agnostic Deep Watermarking

Xiyang Luo, Ruohan Zhan, Huiwen Chang and
Feng Yang, Peyman Milanfar

Keywords Paper

watermarking, adversarial training, channel coding, steganography, deep learning

0

0

0

0

1:01

04/07/2020

On the Cross-lingual Transferability of Monolingual Representations

Mikel Artetxe, Sebastian Ruder, Dani Yogatama

Keywords Paper

zero-shot setting, Cross-lingual Representations, unsupervised models, joint training

0

0

0

0

11:28

12/07/2020

Non-Autoregressive Neural Text-to-Speech

Kainan Peng, Wei Ping, Zhao Song, Kexin Zhao

Keywords Paper

Applications - Language, Speech and Dialog

0

0

0

0

15:12

01/07/2020

End-to-End Speech Translation with Adversarial Training

Xuancai Li, Chen Kehai, Tiejun Zhao, Muyun Yang

Keywords Paper

0

0

0

0

8:53

06/12/2020

Listening to Sounds of Silence for Speech Denoising

Henry Xu, Rundi Wu, Yuko Ishiwaka and
Carl Vondrick, Changxi Zheng

Keywords Paper

0

0

0

0

3:22

06/12/2021

VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text

Hassan Akbari, Liangzhe Yuan, Rui Qian and
Wei-Hong Chuang, Shih-Fu Chang, Yin Cui, Boqing Gong

Keywords Paper

machine learning, self-supervised learning, transformers, vision, contrastive learning

0

0

0

0

15:59

18/07/2021

Meta-StyleSpeech : Multi-Speaker Adaptive Text-to-Speech Generation

Dongchan Min, Dong Bok Lee, Eunho Yang, Sung Ju Hwang

Keywords Paper

Applications, Audio and Speech Processing

0

0

0

0

5:17

18/07/2021

EfficientTTS: An Efficient and High-Quality Text-to-Speech Architecture

Chenfeng Miao, Liang Shuang, Zhengchen Liu and
Chen Minchuan, Jun Ma, Shaojun Wang, Jing Xiao

Keywords Paper

Applications, Audio and Speech Processing

0

0

0

0

5:13

05/01/2021

AVGZSLNet: Audio-Visual Generalized Zero-Shot Learning by Reconstructing Label Features From Multi-Modal Embeddings

Pratik Mazumder, Pravendra Singh, Kranti Kumar Parida, Vinay P. Namboodiri

Keywords Paper

0

0

0

0

4:46

06/12/2021

Neural Analysis and Synthesis: Reconstructing Speech from Self-Supervised Representations

Hyeong-Seok Choi, Juheon Lee, Wansoo Kim and
Jie Lee, Hoon Heo, Kyogu Lee

Keywords Paper

0

0

0

0

11:14

08/12/2020

Dual-decoder Transformer for Joint Automatic Speech Recognition and Multilingual Speech Translation

Hang Le, Juan Pino, Changhan Wang and
Jiatao Gu, Didier Schwab, Laurent Besacier

Keywords Paper

0

0

0

0

12:46

30/11/2020

Do We Need Sound for Sound Source Localization?

Takashi Oya, Shohei Iwase, Ryota Natsume and
Takahiro Itazuri, Shugo Yamaguchi, Shigeo Morishima

Keywords Paper

0

0

0

0

8:43

11/10/2020

Unsupervised Disentanglement of Pitch and Timbre for Isolated Musical Instrument Sounds

Yin-Jyun Luo, Kin Wai Cheuk, Tomoyasu Nakano and
Masataka Goto, Dorien Herremans

Keywords Paper

Domain knowledge, Machine learning/Artificial intelligence for music

0

0

0

0

4:08

19/04/2021

PPT: Parsimonious parser transfer for unsupervised cross-lingual adaptation

Kemal Kurniawan, Lea Frermann, Philip Schulz, Trevor Cohn

Keywords Paper

0

0

0

0

11:52

02/02/2021

Filling the Gap of Utterance-aware and Speaker-aware Representation for Multi-turn Dialogue

Longxiang Liu, Zhuosheng Zhang, Hai Zhao and
Xi Zhou, Xiang Zhou

Keywords Paper

0

0

0

0

18:11

18/07/2021

Fused Acoustic and Text Encoding for Multimodal Bilingual Pretraining and Speech Translation

Renjie Zheng, Junkun Chen, Mingbo Ma, Liang Huang

Keywords Paper

Applications, Natural Language Processing

0

0

0

0

5:19

02/02/2021

Listen, Understand and Translate: Triple Supervision Decouples End-to-end Speech-to-text Translation

Qianqian Dong, Rong Ye, Mingxuan Wang and
Hao Zhou, Shuang Xu, Bo Xu, Lei Li

Keywords Paper

0

0

0

0

14:09

18/07/2021

Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech

Jaehyeon Kim, Jungil Kong, Juhee Son

Keywords Paper

Applications, Audio and Speech Processing

0

0

0

0

7:21

03/05/2021

Into the Wild with AudioScope: Unsupervised Audio-Visual Separation of On-Screen Sounds

Efthymios Tzinis, Scott Wisdom, Aren Jansen and
Shawn Hershey, Tal Remez, Dan Ellis, John Hershey

Keywords Paper

self-supervised learning, universal sound separation, in-the-wild data, Audio-visual sound separation, unsupervised learning

0

0

0

0

5:06

16/11/2020

Learning Music Helps You Read: Using Transfer to Study Linguistic Structure in Language Models

Isabel Papadimitriou, Dan Jurafsky

Keywords Paper

analyzing structure, encoding structure, natural acquisition, transfer learning

0

0

0

0

11:44

04/07/2020

Exploring Contextual Word-level Style Relevance for Unsupervised Style Transfer

Chulun Zhou, Liangyu Chen, Jiachen Liu and
Xinyan Xiao, Jinsong Su, Sheng Guo, Hua Wu

Keywords Paper

Exploring Relevance, Contextual Relevance, Unsupervised Transfer, style transfer

0

0

0

0

7:49

01/07/2020

Learning to Generate Multiple Style Transfer Outputs for an Input Sentence

Kevin Lin, Ming-Yu Liu, Ming-Ting Sun, Jan Kautz

Keywords Paper

0

0

0

0

9:29

04/07/2020

Cross-Lingual Unsupervised Sentiment Classification with Multi-View Transfer Learning

Hongliang Fei, Ping Li

Keywords Paper

Cross-Lingual Classification, sentiment classification, unsupervised system, classification

0

0

0

0

12:23

14/06/2020

Reusing Discriminators for Encoding: Towards Unsupervised Image-to-Image Translation

Runfa Chen, Wenbing Huang, Binghui Huang and
Fuchun Sun, Bin Fang

Keywords Paper

nice-gan, reusing discriminators for encoding, unsupervised image-to-image translation, decoupled training, multi-scale discriminators, adversarial loss, no independent component for encoding, shared layers, residual attention, cyclegan

0

0

0

0

1:01

16/11/2020

Named Entity Recognition Only from Word Embeddings

Ying Luo, Hai Zhao, Junlang Zhan

Keywords Paper

named recognition, entity detection, type prediction, deep models

0

0

0

0

9:54

14/06/2020

A Disentangling Invertible Interpretation Network for Explaining Latent Representations

Patrick Esser, Robin Rombach, Björn Ommer

Keywords Paper

interpretability, inn, disentangling, generative models, invertible neural networks, autoencoders, normalizing flows, vae, explainable, xai

0

0

0

0

1:01

06/12/2021

Unsupervised Noise Adaptive Speech Enhancement by Discriminator-Constrained Optimal Transport

Hsin-Yi Lin, Huan-Hsin Tseng, Xugang Lu, Yu Tsao

Keywords Paper

theory, machine learning, adversarial robustness and security, domain adaptation, optimal transport

0

0

0

0

14:40

16/11/2020

Accurate Word Alignment Induction from Neural Machine Translation

Yun Chen, Yang Liu, Guanhua Chen and
Xin Jiang, Qun Liu

Keywords Paper

transformer, attention mechanism, word methods, shift-att

0

0

0

0

11:47

05/01/2021

S-VVAD: Visual Voice Activity Detection by Motion Segmentation

Muhammad Shahid, Cigdem Beyan, Vittorio Murino

Keywords Paper

0

0

0

0

4:56

22/11/2021

Audio-Visual Speech Super-Resolution

Rudrabha Mukhopadhyay, Sindhu B Hegde, Vinay Namboodiri, C.V. Jawahar

Keywords Paper

speech super-resolution, audio-visual data, audio-visual learning, pseudo-visual stream, multi-modal learning

0

0

0

0

10:01

04/07/2020

Efficient Dialogue State Tracking by Selectively Overwriting Memory

Sungdong Kim, Sohee Yang, Gyuwan Kim, Sang-Woo Lee

Keywords Paper

Dialogue Tracking, predicting operation, training, open setting

0

0

0

0

11:12