Learning Individual Speaking Styles for Accurate Lip to Speech Synthesis

14/06/2020

Learning Individual Speaking Styles for Accurate Lip to Speech Synthesis

K R Prajwal, Rudrabha Mukhopadhyay, Vinay P. Namboodiri, C.V. Jawahar

Keywords: lip to speech, lip reading, speech generation, talking face videos, lip2wav, sequence to sequence, multimodal learning, speech restoration, audio-visual understanding, face and speech

Abstract Paper Similar Papers

Abstract: Humans involuntarily tend to infer parts of the conversation from lip movements when the speech is absent or corrupted by external noise. In this work, we explore the task of lip to speech synthesis, i.e., learning to generate natural speech given only the lip movements of a speaker. Acknowledging the importance of contextual and speaker-specific cues for accurate lip-reading, we take a different path from existing works. We focus on learning accurate lip sequences to speech mappings for individual speakers in unconstrained, large vocabulary settings. To this end, we collect and release a large-scale benchmark dataset, the first of its kind, specifically to train and evaluate the single-speaker lip to speech task in natural settings. We propose a novel approach with key design choices to achieve accurate, natural lip to speech synthesis in such unconstrained scenarios for the first time. Extensive evaluation using quantitative, qualitative metrics and human evaluation shows that our method is four times more intelligible than previous works in this space.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at CVPR 2020 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

06/12/2020

Listening to Sounds of Silence for Speech Denoising

Henry Xu, Rundi Wu, Yuko Ishiwaka and
Carl Vondrick, Changxi Zheng

Keywords Paper

0

0

0

0

3:22

02/02/2021

TaLNet: Voice Reconstruction from Tongue and Lip Articulation with Transfer Learning from Text-to-Speech Synthesis

Jing-Xuan Zhang, Korin Richmond, Zhen-Hua Ling, Lirong Dai

Keywords Paper

0

0

0

0

19:58

22/11/2021

Audio-Visual Speech Super-Resolution

Rudrabha Mukhopadhyay, Sindhu B Hegde, Vinay Namboodiri, C.V. Jawahar

Keywords Paper

speech super-resolution, audio-visual data, audio-visual learning, pseudo-visual stream, multi-modal learning

0

0

0

0

10:01

22/11/2021

Talking Head Generation with Audio and Speech Related Facial Action Units

Sen Chen, Zhilei Liu, Jiaxing Liu and
Zhengxiang Yan, Longbiao Wang

Keywords Paper

Talking Face Generation, Facial Action Unit, Generative Adversarial Network, Video Synthesis, Face Manipulation

0

0

0

0

2:41

05/01/2021

Visual Speech Enhancement Without a Real Visual Stream

Sindhu B. Hegde, K.R. Prajwal, Rudrabha Mukhopadhyay and
Vinay P. Namboodiri, C.V. Jawahar

Keywords Paper

0

0

0

0

5:01

16/11/2020

Unsupervised Commonsense Question Answering with Self-Talk

Vered Shwartz, Peter West, Ronan Le Bras and
Chandra Bhagavatula, Yejin Choi

Keywords Paper

natural understanding, multiple-choice tasks, commonsense reasoning, pre-trained models

0

0

0

0

12:43

02/02/2021

Filling the Gap of Utterance-aware and Speaker-aware Representation for Multi-turn Dialogue

Longxiang Liu, Zhuosheng Zhang, Hai Zhao and
Xi Zhou, Xiang Zhou

Keywords Paper

0

0

0

0

18:11

02/02/2021

Listen, Understand and Translate: Triple Supervision Decouples End-to-end Speech-to-text Translation

Qianqian Dong, Rong Ye, Mingxuan Wang and
Hao Zhou, Shuang Xu, Bo Xu, Lei Li

Keywords Paper

0

0

0

0

14:09

25/04/2020

WithYou: Automated Adaptive Speech Tutoring With Context-Dependent Speech Recognition

Xinlei Zhang, Takashi Miyaki, Jun Rekimoto

Keywords Paper

computer assisted language learning (call), speaking, shadowing, speech recognition, intelligent tutoring system, language learning

0

0

0

0

14:41

02/02/2021

Towards Semantics-Enhanced Pre-Training: Can Lexicon Definitions Help Learning Sentence Meanings?

Xuancheng Ren, Xu Sun, Houfeng Wang, Qun Liu

Keywords Paper

0

0

0

0

16:04

02/02/2021

Exploring Transfer Learning For End-to-End Spoken Language Understanding

Subendhu Rongali, Beiye Liu, Liwei Cai and
Konstantine Arkoudas, Chengwei Su, Wael Hamza

Keywords Paper

0

0

0

0

19:30

22/11/2021

Personalized One-Shot Lipreading for an ALS Patient

Bipasha Sen, Aditya Agarwal, Rudrabha Mukhopadhyay and
Vinay Namboodiri, C.V. Jawahar

Keywords Paper

lipreading, variational autoencoders, domain adaptation, synthetic data augmentation, amyotrophic lateral sclerosis, medical, als

0

0

0

0

3:10

02/02/2021

Multi-SpectroGAN: High-Diversity and High-Fidelity Spectrogram Generation with Adversarial Style Combination for Speech Synthesis

Sang-Hoon Lee, Hyun-Wook Yoon, Hyeong-Rae Noh and
Ji-Hoon Kim, Seong-Whan Lee

Keywords Paper

0

0

0

0

14:19

16/11/2020

Vokenization: Improving Language Understanding with Contextualized, Visual-Grounded Supervision

Hao Tan, Mohit Bansal

Keywords Paper

speaking, writing, text-only self-supervision, pure-language tasks

0

0

0

0

11:59

18/07/2021

Learning de-identified representations of prosody from raw audio

Jack Weston, Raphael Lenain, Udeepa Meepegama, Emil Fristed

Keywords Paper

Applications, Audio and Speech Processing

0

0

0

0

4:37

08/12/2020

Emergent Communication Pretraining for Few-Shot Machine Translation

Yaoyiran Li, Edoardo Maria Ponti, Ivan Vulić, Anna Korhonen

Keywords Paper

0

0

0

0

14:42

06/12/2021

Understanding Adaptive, Multiscale Temporal Integration In Deep Speech Recognition Systems

Menoua Keshishian, Samuel Norman-Haignere, Nima Mesgarani

Keywords Paper

deep learning, machine learning

0

0

0

0

10:28

04/07/2020

Towards end-2-end learning for predicting behavior codes from spoken utterances in psychotherapy conversations

Karan Singla, Zhuohao Chen, David Atkins, Shrikanth Narayanan

Keywords Paper

predicting codes, Spoken tasks, voice detection, speaker diarization

0

0

0

0

7:16

16/11/2020

Word Frequency Does Not Predict Grammatical Knowledge in Language Models

Charles Yu, Ryan Sie, Nicolas Tedeschi, Leon Bergen

Keywords Paper

reflexive anaphora, grammatical tasks, neural models, language models

0

0

0

0

11:12

16/11/2020

Structural Supervision Improves Few-Shot Learning and Syntactic Generalization in Neural Language Models

Ethan Wilcox, Peng Qian, Richard Futrell and
Ryosuke Kohita, Roger Levy, Miguel Ballesteros

Keywords Paper

learning outcomes, syntactic representations, neural models, n-gram baseline

0

0

0

0

11:29

02/02/2021

Learning an Effective Context-Response Matching Model with Self-Supervised Tasks for Retrieval-based Dialogues

Ruijian Xu, Chongyang Tao, Daxin Jiang and
Xueliang Zhao, Dongyan Zhao, Rui Yan

Keywords Paper

0

0

0

1

16:40

04/07/2020

Unsupervised Paraphasia Classification in Aphasic Speech

Sharan Pai, Nikhil Sachdeva, Prince Sachdeva, Rajiv Ratn Shah

Keywords Paper

Unsupervised Classification, speech disorder, naming detection, treatment

0

0

0

0

10:02

03/05/2021

Into the Wild with AudioScope: Unsupervised Audio-Visual Separation of On-Screen Sounds

Efthymios Tzinis, Scott Wisdom, Aren Jansen and
Shawn Hershey, Tal Remez, Dan Ellis, John Hershey

Keywords Paper

self-supervised learning, universal sound separation, in-the-wild data, Audio-visual sound separation, unsupervised learning

0

0

0

0

5:06

26/04/2020

From Inference to Generation: End-to-end Fully Self-supervised Generation of Human Face from Speech

Hyeong-Seok Choi, Changdae Park, Kyogu Lee

Keywords Paper

Multi-modal learning, Self-supervised learning, Voice profiling, Conditional GANs

0

0

0

0

5:15

16/11/2020

Learning Music Helps You Read: Using Transfer to Study Linguistic Structure in Language Models

Isabel Papadimitriou, Dan Jurafsky

Keywords Paper

analyzing structure, encoding structure, natural acquisition, transfer learning

0

0

0

0

11:44

04/07/2020

Compositional Generalization by Factorizing Alignment and Translation

Jacob Russin, Jason Jo, Randall O'Reilly, Yoshua Bengio

Keywords Paper

Compositional Generalization, Translation, natural processing, cognitive science

0

0

0

0

10:37

19/04/2021

A phonetic model of non-native spoken word processing

Yevgen Matusevych, Herman Kamper, Thomas Schatz and
Naomi Feldman, Sharon Goldwater

Keywords Paper

0

0

0

0

11:58

02/02/2021

Adversarial Meta Sampling for Multilingual Low-Resource Speech Recognition

Yubei Xiao, Ke Gong, Pan Zhou and
Guolin Zheng, Xiaodan Liang, Liang Lin

Keywords Paper

0

0

0

0

14:04

18/07/2021

Fused Acoustic and Text Encoding for Multimodal Bilingual Pretraining and Speech Translation

Renjie Zheng, Junkun Chen, Mingbo Ma, Liang Huang

Keywords Paper

Applications, Natural Language Processing

0

0

0

0

5:19

02/02/2021

Do Response Selection Models Really Know What’s Next? Utterance Manipulation Strategies for Multi-turn Response Selection

Taesun Whang, Dongyub Lee, Dongsuk Oh and
Chanhee Lee, Kijong Han, Dong-hun Lee, Saebyeok Lee

Keywords Paper

0

0

0

0

17:37

16/11/2020

Unsupervised Natural Language Inference via Decoupled Multimodal Contrastive Learning

Wanyun Cui, Guangyu Zheng, Wei Wang

Keywords Paper

natural problem, plain inference, task-agnostic pretraining, multimodal learning

0

0

0

0

11:25

06/12/2021

Unsupervised Speech Recognition

Alexei Baevski, Wei-Ning Hsu, Alexis CONNEAU, Michael Auli

Keywords Paper

deep learning, adversarial robustness and security, self-supervised learning, generative model

0

0

0

0

19:16

16/11/2020

Learning Which Features Matter: RoBERTa Acquires a Preference for Linguistic Generalizations (Eventually)

Alex Warstadt, Yian Zhang, Xiaocheng Li and
Haokun Liu, Samuel R. Bowman

Keywords Paper

self-supervised tasks, language understanding, ambiguous tasks, finetuning

0

0

0

0

12:04

06/12/2021

Lip to Speech Synthesis with Visual Context Attentional GAN

Minsu Kim, Joanna Hong, Yong Man Ro

Keywords Paper

generative model, contrastive learning

0

0

0

0

6:12

01/07/2020

End-to-End Speech Translation with Adversarial Training

Xuancai Li, Chen Kehai, Tiejun Zhao, Muyun Yang

Keywords Paper

0

0

0

0

8:53

19/08/2021

MRD-Net: Multi-Modal Residual Knowledge Distillation for Spoken Question Answering

Chenyu You, Nuo Chen, Yuexian Zou

Keywords Paper

Natural Language Processing, Question Answering, Sentiment Analysis and Text Mining, Speech

0

0

0

0

12:23

03/05/2021

End-to-end Adversarial Text-to-Speech

Jeff Donahue, Sander Dieleman, Mikolaj Binkowski and
Erich Elsen, Karen Simonyan

Keywords Paper

end-to-end, speech synthesis, feed-forward, text-to-speech, adversarial, generative model, GAN

0

0

0

0

15:23

06/12/2021

NORESQA: A Framework for Speech Quality Assessment using Non-Matching References

Pranay Manocha, Buye Xu, Anurag Kumar

Keywords Paper

deep learning, robustness, self-supervised learning

0

0

0

0

14:30

12/07/2020

On Variational Learning of Controllable Representations for Text without Supervision

Peng Xu, Jackie Chi Kit Cheung, Yanshuai Cao

Keywords Paper

Representation Learning

0

0

0

0

14:51

04/07/2020

Curriculum Learning for Natural Language Understanding

Benfeng Xu, Licheng Zhang, Zhendong Mao and
Quan Wang, Hongtao Xie, Yongdong Zhang

Keywords Paper

Curriculum Learning, Natural Understanding, natural tasks, NLU tasks

0

0

0

0

9:41