Unsupervised Speech Recognition

06/12/2021

Unsupervised Speech Recognition

Alexei Baevski, Wei-Ning Hsu, Alexis CONNEAU, Michael Auli

Keywords: deep learning, adversarial robustness and security, self-supervised learning, generative model

Abstract Paper Similar Papers

Abstract: Despite rapid progress in the recent past, current speech recognition systems still require labeled training data which limits this technology to a small fraction of the languages spoken around the globe. This paper describes wav2vec-U, short for wav2vec Unsupervised, a method to train speech recognition models without any labeled data. We leverage self-supervised speech representations to segment unlabeled audio and learn a mapping from these representations to phonemes via adversarial training. The right representations are key to the success of our method. Compared to the best previous unsupervised work, wav2vec-U reduces the phone error rate on the TIMIT benchmark from 26.1 to 11.3. On the larger English Librispeech benchmark, wav2vec-U achieves a word error rate of 5.9 on test-other, rivaling some of the best published systems trained on 960 hours of labeled data from only two years ago. We also experiment on nine other languages, including low-resource languages such as Kyrgyz, Swahili and Tatar.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at NeurIPS 2021 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

06/12/2020

wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations

Alexei Baevski, Henry Zhou, Abdelrahman Mohamed, Michael Auli

Keywords Paper

0

0

0

0

3:37

19/04/2021

Neural data-to-text generation with LM-based text augmentation

Ernie Chang, Xiaoyu Shen, Dawei Zhu and
Vera Demberg, Hui Su

Keywords Paper

0

0

0

0

7:32

06/12/2021

PARP: Prune, Adjust and Re-Prune for Self-Supervised Speech Recognition

Cheng-I Jeff Lai, Yang Zhang, Alexander Liu and
Shiyu Chang, Yi-Lun Liao, Yung-Sung Chuang, Kaizhi Qian, Sameer Khurana, David Cox, Jim Glass

Keywords Paper

self-supervised learning, representation learning

0

0

0

0

13:57

16/11/2020

Cross-lingual Spoken Language Understanding with Regularized Representation Alignment

Zihan Liu, Genta Indra Winata, Peng Xu and
Zhaojiang Lin, Pascale Fung

Keywords Paper

spoken systems, cross-lingual task, few-shot setting, cross-lingual models

0

0

0

0

9:40

04/07/2020

Jointly Masked Sequence-to-Sequence Model for Non-Autoregressive Neural Machine Translation

Junliang Guo, Linli Xu, Enhong Chen

Keywords Paper

Non-Autoregressive Translation, natural tasks, non-autoregressive translation~(NAT, non-autoregressive

0

0

0

0

10:47

02/02/2021

TaLNet: Voice Reconstruction from Tongue and Lip Articulation with Transfer Learning from Text-to-Speech Synthesis

Jing-Xuan Zhang, Korin Richmond, Zhen-Hua Ling, Lirong Dai

Keywords Paper

0

0

0

0

19:58

25/04/2020

WithYou: Automated Adaptive Speech Tutoring With Context-Dependent Speech Recognition

Xinlei Zhang, Takashi Miyaki, Jun Rekimoto

Keywords Paper

computer assisted language learning (call), speaking, shadowing, speech recognition, intelligent tutoring system, language learning

0

0

0

0

14:41

18/07/2021

Fused Acoustic and Text Encoding for Multimodal Bilingual Pretraining and Speech Translation

Renjie Zheng, Junkun Chen, Mingbo Ma, Liang Huang

Keywords Paper

Applications, Natural Language Processing

0

0

0

0

5:19

26/04/2020

Learning Hierarchical Discrete Linguistic Units from Visually-Grounded Speech

David Harwath, Wei-Ning Hsu, James Glass

Keywords Paper

visually-grounded speech, self-supervised learning, discrete representation learning, vision and language, vision and speech, hierarchical representation learning

0

0

0

0

13:42

06/12/2021

NORESQA: A Framework for Speech Quality Assessment using Non-Matching References

Pranay Manocha, Buye Xu, Anurag Kumar

Keywords Paper

deep learning, robustness, self-supervised learning

0

0

0

0

14:30

14/06/2020

Counterfactual Samples Synthesizing for Robust Visual Question Answering

Long Chen, Xin Yan, Jun Xiao and
Hanwang Zhang, Shiliang Pu, Yueting Zhuang

Keywords Paper

visual question answering, counterfactual, debias, language bias, data augmentation, visual-and-language

0

0

0

0

1:01

04/07/2020

Learning Spoken Language Representations with Neural Lattice Language Modeling

Chao-Wei Huang, Yun-Nung Chen

Keywords Paper

NLP tasks, spoken tasks, intent detection, Spoken Representations

0

0

0

0

6:39

16/11/2020

Named Entity Recognition Only from Word Embeddings

Ying Luo, Hai Zhao, Junlang Zhan

Keywords Paper

named recognition, entity detection, type prediction, deep models

0

0

0

0

9:54

16/11/2020

Combining Self-Training and Self-Supervised Learning for Unsupervised Disfluency Detection

Shaolei Wang, Zhongyuan Wang, Wanxiang Che, Ting Liu

Keywords Paper

disfluency detection, self-supervised techniques, unsupervised paradigm, noisy training

0

0

0

0

9:57

19/04/2021

A phonetic model of non-native spoken word processing

Yevgen Matusevych, Herman Kamper, Thomas Schatz and
Naomi Feldman, Sharon Goldwater

Keywords Paper

0

0

0

0

11:58

06/12/2020

Listening to Sounds of Silence for Speech Denoising

Henry Xu, Rundi Wu, Yuko Ishiwaka and
Carl Vondrick, Changxi Zheng

Keywords Paper

0

0

0

0

3:22

19/04/2021

Self-training pre-trained language models for zero- and few-shot multi-dialectal Arabic sequence labeling

Muhammad Khalifa, Muhammad Abdul-Mageed, Khaled Shaalan

Keywords Paper

0

0

0

0

8:10

08/12/2020

Fine-grained Information Status Classification Using Discourse Context-Aware BERT

Yufang Hou

Keywords Paper

0

0

0

0

13:13

05/01/2021

Visual Speech Enhancement Without a Real Visual Stream

Sindhu B. Hegde, K.R. Prajwal, Rudrabha Mukhopadhyay and
Vinay P. Namboodiri, C.V. Jawahar

Keywords Paper

0

0

0

0

5:01

02/02/2021

Exploring Transfer Learning For End-to-End Spoken Language Understanding

Subendhu Rongali, Beiye Liu, Liwei Cai and
Konstantine Arkoudas, Chengwei Su, Wael Hamza

Keywords Paper

0

0

0

0

19:30

16/11/2020

Effectively pretraining a speech translation decoder with Machine Translation data

Ashkan Alinejad, Anoop Sarkar

Keywords Paper

automatic task, neural task, speech translation, end-to-end approach

0

0

0

0

6:12

16/11/2020

Vokenization: Improving Language Understanding with Contextualized, Visual-Grounded Supervision

Hao Tan, Mohit Bansal

Keywords Paper

speaking, writing, text-only self-supervision, pure-language tasks

0

0

0

0

11:59

06/12/2020

Unsupervised Data Augmentation for Consistency Training

Qizhe Xie, Zihang Dai, Eduard Hovy and
Thang Luong, Quoc V Le

Keywords Paper

0

0

0

0

3:29

18/07/2021

Learning de-identified representations of prosody from raw audio

Jack Weston, Raphael Lenain, Udeepa Meepegama, Emil Fristed

Keywords Paper

Applications, Audio and Speech Processing

0

0

0

0

4:37

08/12/2020

Transliteration of Judeo-Arabic Texts into Arabic Script Using Recurrent Neural Networks

Ori Terner, Kfir Bar, Nachum Dershowitz

Keywords Paper

0

0

0

0

9:50

16/11/2020

Visually Grounded Compound PCFGs

Yanpeng Zhao, Ivan Titov

Keywords Paper

exploiting groundings, language understanding, gradient estimates, fully-differentiable learning

0

0

0

0

12:24

04/07/2020

Unsupervised Paraphasia Classification in Aphasic Speech

Sharan Pai, Nikhil Sachdeva, Prince Sachdeva, Rajiv Ratn Shah

Keywords Paper

Unsupervised Classification, speech disorder, naming detection, treatment

0

0

0

0

10:02

03/05/2021

Bidirectional Variational Inference for Non-Autoregressive Text-to-Speech

Yoonhyung Lee, Joongbo Shin, Kyomin Jung

Keywords Paper

VAE, non-autoregressive, speech synthesis, text-to-speech

0

0

0

0

5:40

25/07/2020

Auto-annotation for voice-enabled entertainment systems

Wenyan Li, Ferhan Ture

Keywords Paper

unsupervised, voice-enabled entertainment systems, automatic speech recognition, error detection and evaluation, auto-annotation

0

0

0

0

8:06

18/07/2021

UniSpeech: Unified Speech Representation Learning with Labeled and Unlabeled Data

Chengyi Wang, Yu Wu, Yao Qian and
Kenichi Kumatani, Shujie Liu, Furu Wei, Michael Zeng, Xuedong Huang

Keywords Paper

Applications, Speech Recognition

0

0

0

0

5:19

01/07/2020

End-to-End Speech Translation with Adversarial Training

Xuancai Li, Chen Kehai, Tiejun Zhao, Muyun Yang

Keywords Paper

0

0

0

0

8:53

16/11/2020

oLMpics - On what Language Model Pre-training Captures

Alon Talmor, Yanai Elazar, Yoav Goldberg, Jonathan Berant

Keywords Paper

symbolic tasks, reasoning tasks, zero-shot evaluation, pre-training

0

0

0

0

10:01

08/12/2020

Neural Machine Translation Models with Back-Translation for the Extremely Low-Resource Indigenous Language Bribri

Isaac Feldman, Rolando Coto-Solano

Keywords Paper

0

0

0

0

13:50

04/07/2020

End-to-End Neural Word Alignment Outperforms GIZA++

Thomas Zenkel, Joern Wuebker, John DeNero

Keywords Paper

Word alignment, unsupervised task, natural processing, neural translation

0

0

0

0

10:32

08/12/2020

Understanding the effects of word-level linguistic annotations in under-resourced neural machine translation

Víctor M. Sánchez-Cartagena, Juan Antonio Pérez-Ortiz, Felipe Sánchez-Martínez

Keywords Paper

0

0

0

0

14:59

05/12/2020

An exploratory study on multilingual quality estimation

Shuo Sun, Marina Fomicheva, Frédéric Blain and
Vishrav Chaudhary, Ahmed El-Kishky, Adithya Renduchintala, Francisco Guzmán, Lucia Specia

Keywords Paper

0

0

0

0

14:31

25/07/2020

A pairwise probe for understanding BERT fine-tuning on machine reading comprehension

Jie Cai, Zhengzhou Zhu, Ping Nie, Qian Liu

Keywords Paper

machine reading comprehension, pairwise, fine-tune, BERT

0

0

0

0

6:38

22/11/2021

Audio-Visual Speech Super-Resolution

Rudrabha Mukhopadhyay, Sindhu B Hegde, Vinay Namboodiri, C.V. Jawahar

Keywords Paper

speech super-resolution, audio-visual data, audio-visual learning, pseudo-visual stream, multi-modal learning

0

0

0

0

10:01

26/04/2020

LAMOL: LAnguage MOdeling for Lifelong Language Learning

Fan-Keng Sun, Cheng-Hao Ho, Hung-Yi Lee

Keywords Paper

NLP, Deep Learning, Lifelong Learning

0

0

0

0

4:44

04/07/2020

LINSPECTOR: Multilingual Probing Tasks for Word Representations

Gözde Gül Sahin, Clara Vania, Ilia Kuznetsov, Iryna Gurevych

Keywords Paper

Word Representations, NLP, classification tasks, probing tasks

0

0

0

0

11:51