Unsupervised Audiovisual Synthesis via Exemplar Autoencoders

03/05/2021

Unsupervised Audiovisual Synthesis via Exemplar Autoencoders

Kangle Deng, Aayush Bansal, Deva Ramanan

Keywords: voice conversion, assistive technology, audiovisual synthesis, autoencoders, speech-impaired, unsupervised learning

Abstract Paper Similar Papers

Abstract: We present an unsupervised approach that converts the input speech of any individual into audiovisual streams of potentially-infinitely many output speakers. Our approach builds on simple autoencoders that project out-of-sample data onto the distribution of the training set. We use exemplar autoencoders to learn the voice, stylistic prosody, and visual appearance of a specific target exemplar speech. In contrast to existing methods, the proposed approach can be easily extended to an arbitrarily large number of speakers and styles using only 3 minutes of target audio-video data, without requiring any training data for the input speaker. To do so, we learn audiovisual bottleneck representations that capture the structured linguistic content of speech. We outperform prior approaches on both audio and video synthesis.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at ICLR 2021 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

02/02/2021

TaLNet: Voice Reconstruction from Tongue and Lip Articulation with Transfer Learning from Text-to-Speech Synthesis

Jing-Xuan Zhang, Korin Richmond, Zhen-Hua Ling, Lirong Dai

Keywords Paper

0

0

0

0

19:58

06/12/2020

Listening to Sounds of Silence for Speech Denoising

Henry Xu, Rundi Wu, Yuko Ishiwaka and
Carl Vondrick, Changxi Zheng

Keywords Paper

0

0

0

0

3:22

26/04/2020

From Inference to Generation: End-to-end Fully Self-supervised Generation of Human Face from Speech

Hyeong-Seok Choi, Changdae Park, Kyogu Lee

Keywords Paper

Multi-modal learning, Self-supervised learning, Voice profiling, Conditional GANs

0

0

0

0

5:15

06/12/2020

wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations

Alexei Baevski, Henry Zhou, Abdelrahman Mohamed, Michael Auli

Keywords Paper

0

0

0

0

3:37

06/12/2021

Neural Analysis and Synthesis: Reconstructing Speech from Self-Supervised Representations

Hyeong-Seok Choi, Juheon Lee, Wansoo Kim and
Jie Lee, Hoon Heo, Kyogu Lee

Keywords Paper

0

0

0

0

11:14

06/12/2020

Language Models are Few-Shot Learners

Tom B Brown, Ben Mann, Nick Ryder and
Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen M Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel Ziegler, Jeffrey Wu, Clemens Winter, Chris Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, Dario Amodei

Keywords Paper

0

0

0

0

3:11

12/07/2020

Unsupervised Speech Decomposition via Triple Information Bottleneck

Kaizhi Qian, Yang Zhang, Shiyu Chang and
Mark Hasegawa-Johnson, David Cox

Keywords Paper

Applications - Language, Speech and Dialog

0

0

0

0

12:34

12/07/2020

Non-Autoregressive Neural Text-to-Speech

Kainan Peng, Wei Ping, Zhao Song, Kexin Zhao

Keywords Paper

Applications - Language, Speech and Dialog

0

0

0

0

15:12

04/07/2020

Towards end-2-end learning for predicting behavior codes from spoken utterances in psychotherapy conversations

Karan Singla, Zhuohao Chen, David Atkins, Shrikanth Narayanan

Keywords Paper

predicting codes, Spoken tasks, voice detection, speaker diarization

0

0

0

0

7:16

18/07/2021

EfficientTTS: An Efficient and High-Quality Text-to-Speech Architecture

Chenfeng Miao, Liang Shuang, Zhengchen Liu and
Chen Minchuan, Jun Ma, Shaojun Wang, Jing Xiao

Keywords Paper

Applications, Audio and Speech Processing

0

0

0

0

5:13

18/07/2021

Meta-StyleSpeech : Multi-Speaker Adaptive Text-to-Speech Generation

Dongchan Min, Dong Bok Lee, Eunho Yang, Sung Ju Hwang

Keywords Paper

Applications, Audio and Speech Processing

0

0

0

0

5:17

18/07/2021

Fused Acoustic and Text Encoding for Multimodal Bilingual Pretraining and Speech Translation

Renjie Zheng, Junkun Chen, Mingbo Ma, Liang Huang

Keywords Paper

Applications, Natural Language Processing

0

0

0

0

5:19

04/07/2020

Meta-Transfer Learning for Code-Switched Speech Recognition

Genta Indra Winata, Samuel Cahyawijaya, Zhaojiang Lin and
Zihan Liu, Peng Xu, Pascale Fung

Keywords Paper

Code-Switched Recognition, speech recognition, speech tasks, language tasks

0

0

0

0

6:07

18/07/2021

Learning de-identified representations of prosody from raw audio

Jack Weston, Raphael Lenain, Udeepa Meepegama, Emil Fristed

Keywords Paper

Applications, Audio and Speech Processing

0

0

0

0

4:37

03/05/2021

Improving Zero-Shot Voice Style Transfer via Disentangled Representation Learning

Siyang Yuan, Pengyu Cheng, Ruiyi Zhang and
Weituo Hao, Zhe Gan, Lawrence Carin

Keywords Paper

Disentanglement, Mutual Information, Zero-shot Learning, Style Transfer

0

0

0

0

5:03

06/12/2020

A Spectral Energy Distance for Parallel Speech Synthesis

Alexey Gritsenko, Tim Salimans, Rianne van den Berg and
Jasper Snoek, Nal Kalchbrenner

Keywords Paper

0

0

0

0

3:11

03/05/2021

Flowtron: an Autoregressive Flow-based Generative Network for Text-to-Speech Synthesis

Rafael Valle, Kevin J Shih, Ryan Prenger, Bryan Catanzaro

Keywords Paper

normalizing flows, deep learning, Text to speech synthesis

0

0

0

0

5:11

06/12/2021

Understanding Adaptive, Multiscale Temporal Integration In Deep Speech Recognition Systems

Menoua Keshishian, Samuel Norman-Haignere, Nima Mesgarani

Keywords Paper

deep learning, machine learning

0

0

0

0

10:28

26/04/2020

Learning Hierarchical Discrete Linguistic Units from Visually-Grounded Speech

David Harwath, Wei-Ning Hsu, James Glass

Keywords Paper

visually-grounded speech, self-supervised learning, discrete representation learning, vision and language, vision and speech, hierarchical representation learning

0

0

0

0

13:42

06/12/2021

Unsupervised Speech Recognition

Alexei Baevski, Wei-Ning Hsu, Alexis CONNEAU, Michael Auli

Keywords Paper

deep learning, adversarial robustness and security, self-supervised learning, generative model

0

0

0

0

19:16

18/07/2021

UniSpeech: Unified Speech Representation Learning with Labeled and Unlabeled Data

Chengyi Wang, Yu Wu, Yao Qian and
Kenichi Kumatani, Shujie Liu, Furu Wei, Michael Zeng, Xuedong Huang

Keywords Paper

Applications, Speech Recognition

0

0

0

0

5:19

14/06/2020

Learning Individual Speaking Styles for Accurate Lip to Speech Synthesis

K R Prajwal, Rudrabha Mukhopadhyay, Vinay P. Namboodiri, C.V. Jawahar

Keywords Paper

lip to speech, lip reading, speech generation, talking face videos, lip2wav, sequence to sequence, multimodal learning, speech restoration, audio-visual understanding, face and speech

0

0

0

0

1:01

02/02/2021

Revisiting Mahalanobis Distance for Transformer-Based Out-of-Domain Detection

Alexander Podolskiy, Dmitry Lipin, Andrey Bout and
Ekaterina Artemova, Irina Piontkovskaya

Keywords Paper

0

0

0

0

16:08

06/12/2021

VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text

Hassan Akbari, Liangzhe Yuan, Rui Qian and
Wei-Hong Chuang, Shih-Fu Chang, Yin Cui, Boqing Gong

Keywords Paper

machine learning, self-supervised learning, transformers, vision, contrastive learning

0

0

0

0

15:59

16/11/2020

Text Classification Using Label Names Only: A Language Model Self-Training Approach

Yu Meng, Yunyi Zhang, Jiaxin Huang and
Chenyan Xiong, Heng Ji, Chao Zhang, Jiawei Han

Keywords Paper

classification, category understanding, document classification, topic classification

0

0

0

0

11:38

02/02/2021

Exploring Transfer Learning For End-to-End Spoken Language Understanding

Subendhu Rongali, Beiye Liu, Liwei Cai and
Konstantine Arkoudas, Chengwei Su, Wael Hamza

Keywords Paper

0

0

0

0

19:30

16/11/2020

Structural Supervision Improves Few-Shot Learning and Syntactic Generalization in Neural Language Models

Ethan Wilcox, Peng Qian, Richard Futrell and
Ryosuke Kohita, Roger Levy, Miguel Ballesteros

Keywords Paper

learning outcomes, syntactic representations, neural models, n-gram baseline

0

0

0

0

11:29

03/05/2021

End-to-end Adversarial Text-to-Speech

Jeff Donahue, Sander Dieleman, Mikolaj Binkowski and
Erich Elsen, Karen Simonyan

Keywords Paper

end-to-end, speech synthesis, feed-forward, text-to-speech, adversarial, generative model, GAN

0

0

0

0

15:23

22/11/2021

Audio-Visual Speech Super-Resolution

Rudrabha Mukhopadhyay, Sindhu B Hegde, Vinay Namboodiri, C.V. Jawahar

Keywords Paper

speech super-resolution, audio-visual data, audio-visual learning, pseudo-visual stream, multi-modal learning

0

0

0

0

10:01

03/05/2021

Bidirectional Variational Inference for Non-Autoregressive Text-to-Speech

Yoonhyung Lee, Joongbo Shin, Kyomin Jung

Keywords Paper

VAE, non-autoregressive, speech synthesis, text-to-speech

0

0

0

0

5:40

02/02/2021

DialogBERT: Discourse-Aware Response Generation via Learning to Recover and Rank Utterances

Xiaodong Gu, Kang Min Yoo, Jung-Woo Ha

Keywords Paper

0

0

0

0

14:24

26/04/2020

LAMOL: LAnguage MOdeling for Lifelong Language Learning

Fan-Keng Sun, Cheng-Hao Ho, Hung-Yi Lee

Keywords Paper

NLP, Deep Learning, Lifelong Learning

0

0

0

0

4:44

19/08/2021

A Streaming End-to-End Framework For Spoken Language Understanding

Nihal Potdar, Anderson Raymundo Avila, Chao Xing and
Dong Wang, Yiran Cao, Xiao Chen

Keywords Paper

Natural Language Processing, Dialogue, Speech

0

0

0

0

14:09

04/07/2020

Learning Spoken Language Representations with Neural Lattice Language Modeling

Chao-Wei Huang, Yun-Nung Chen

Keywords Paper

NLP tasks, spoken tasks, intent detection, Spoken Representations

0

0

0

0

6:39

16/11/2020

Cross-lingual Spoken Language Understanding with Regularized Representation Alignment

Zihan Liu, Genta Indra Winata, Peng Xu and
Zhaojiang Lin, Pascale Fung

Keywords Paper

spoken systems, cross-lingual task, few-shot setting, cross-lingual models

0

0

0

0

9:40

05/12/2020

Analysis of hierarchical multi-content text classification model on B-SHARP dataset for early detection of Alzheimer’s disease

Renxuan Albert Li, Ihab Hajjar, Felicia Goldstein, Jinho D. Choi

Keywords Paper

0

0

0

0

11:36

16/11/2020

Named Entity Recognition Only from Word Embeddings

Ying Luo, Hai Zhao, Junlang Zhan

Keywords Paper

named recognition, entity detection, type prediction, deep models

0

0

0

0

9:54

04/07/2020

SimulSpeech: End-to-End Simultaneous Speech to Text Translation

Yi Ren, Jinglin Liu, Xu Tan and
Chen Zhang, Tao Qin, Zhou Zhao, Tie-Yan Liu

Keywords Paper

simultaneous translation, simultaneous recognition, ASR, NMT

0

0

0

0

5:51

06/12/2021

Multimodal and Multilingual Embeddings for Large-Scale Speech Mining

Paul-Ambroise Duquenne, Hongyu Gong, Holger Schwenk

Keywords Paper

0

0

0

0

10:52

08/12/2020

Learning distributed sentence vectors with bi-directional 3D convolutions

Bin Liu, Liang Wang, Guosheng Yin

Keywords Paper

0

0

0

0

3:07