UniSpeech: Unified Speech Representation Learning with Labeled and Unlabeled Data

18/07/2021

UniSpeech: Unified Speech Representation Learning with Labeled and Unlabeled Data

Chengyi Wang, Yu Wu, Yao Qian, Kenichi Kumatani, Shujie Liu, Furu Wei, Michael Zeng, Xuedong Huang

Keywords: Applications, Speech Recognition

Abstract Paper Similar Papers

Abstract: In this paper, we propose a unified pre-training approach called UniSpeech to learn speech representations with both labeled and unlabeled data, in which supervised phonetic CTC learning and phonetically-aware contrastive self-supervised learning are conducted in a multi-task learning manner. The resultant representations can capture information more correlated with phonetic structures and improve the generalization across languages and domains. We evaluate the effectiveness of UniSpeech for cross-lingual representation learning on public CommonVoice corpus. The results show that UniSpeech outperforms self-supervised pretraining and supervised transfer learning for speech recognition by a maximum of 13.4\% and 26.9\% relative phone error rate reductions respectively (averaged over all testing languages). The transferability of UniSpeech is also verified on a domain-shift speech recognition task, i.e., a relative word error rate reduction of 6\% against the previous approach.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at ICML 2021 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

04/07/2020

Investigating the effect of auxiliary objectives for the automated grading of learner English speech transcriptions

Hannah Craighead, Andrew Caines, Paula Buttery, Helen Yannakoudakis

Keywords Paper

automated transcriptions, automatically speech, multi-task learning, inductive transfer

0

0

0

0

11:37

04/07/2020

MultiQT: Multimodal learning for real-time question tracking in speech

Jakob D. Havtorn, Jan Latko, Joakim Edin and
Lars Maaløe, Lasse Borgholt, Lorenzo Belgrano, Nicolai Jacobsen, Regitze Sdun, Željko Agić

Keywords Paper

real-time speech, labeling speech, emergency services, real-time labeling

0

0

0

0

11:07

19/08/2021

Speech2Talking-Face: Inferring and Driving a Face with Synchronized Audio-Visual Representation

Yasheng Sun, Hang Zhou, Ziwei Liu, Hideki Koike

Keywords Paper

Computer Vision, 2D and 3D Computer Vision, Speech

0

0

0

0

4:50

04/07/2020

Towards end-2-end learning for predicting behavior codes from spoken utterances in psychotherapy conversations

Karan Singla, Zhuohao Chen, David Atkins, Shrikanth Narayanan

Keywords Paper

predicting codes, Spoken tasks, voice detection, speaker diarization

0

0

0

0

7:16

01/07/2020

End-to-End Simultaneous Translation System for IWSLT2020 Using Modality Agnostic Meta-Learning

Hou Jeung Han, Mohd Abbas Zaidi, Sathish Reddy Indurthi and
Nikhil Kumar Lakumarapu, Beomseok Lee, Sangha Kim

Keywords Paper

0

0

0

0

7:29

02/02/2021

Consecutive Decoding for Speech-to-text Translation

Qianqian Dong, Mingxuan Wang, Hao Zhou and
Shuang Xu, Bo Xu, Lei Li

Keywords Paper

0

0

0

0

14:20

12/07/2020

Word-Level Speech Recognition With a Letter to Word Encoder

Ronan Collobert, Awni Hannun, Gabriel Synnaeve

Keywords Paper

Applications - Language, Speech and Dialog

0

0

0

0

14:53

01/07/2020

End-to-End Speech Translation with Adversarial Training

Xuancai Li, Chen Kehai, Tiejun Zhao, Muyun Yang

Keywords Paper

0

0

0

0

8:53

16/11/2020

Structural Supervision Improves Few-Shot Learning and Syntactic Generalization in Neural Language Models

Ethan Wilcox, Peng Qian, Richard Futrell and
Ryosuke Kohita, Roger Levy, Miguel Ballesteros

Keywords Paper

learning outcomes, syntactic representations, neural models, n-gram baseline

0

0

0

0

11:29

19/08/2021

A Streaming End-to-End Framework For Spoken Language Understanding

Nihal Potdar, Anderson Raymundo Avila, Chao Xing and
Dong Wang, Yiran Cao, Xiao Chen

Keywords Paper

Natural Language Processing, Dialogue, Speech

0

0

0

0

14:09

19/04/2021

El volumen louder por favor: Code-switching in task-oriented semantic parsing

Arash Einolghozati, Abhinav Arora, Lorena Sainz-Maza Lecanda and
Anuj Kumar, Sonal Gupta

Keywords Paper

0

0

0

0

11:39

08/12/2020

Attentively Embracing Noise for Robust Latent Representation in BERT

Gwenaelle Cunha Sergio, Dennis Singh Moirangthem, Minho Lee

Keywords Paper

0

0

0

0

12:55

01/07/2020

Robust Neural Machine Translation with ASR Errors

Haiyang Xue, Yang Feng, Shuhao Gu, Wei Chen

Keywords Paper

0

0

0

0

8:15

02/02/2021

TaLNet: Voice Reconstruction from Tongue and Lip Articulation with Transfer Learning from Text-to-Speech Synthesis

Jing-Xuan Zhang, Korin Richmond, Zhen-Hua Ling, Lirong Dai

Keywords Paper

0

0

0

0

19:58

04/07/2020

SimulSpeech: End-to-End Simultaneous Speech to Text Translation

Yi Ren, Jinglin Liu, Xu Tan and
Chen Zhang, Tao Qin, Zhou Zhao, Tie-Yan Liu

Keywords Paper

simultaneous translation, simultaneous recognition, ASR, NMT

0

0

0

0

5:51

06/12/2021

FastCorrect: Fast Error Correction with Edit Alignment for Automatic Speech Recognition

Yichong Leng, Xu Tan, Linchen Zhu and
Jin Xu, Renqian Luo, Linquan Liu, Tao Qin, Xiangyang Li, Edward Lin, Tie-Yan Liu

Keywords Paper

0

0

0

0

13:44

08/12/2020

Joint Training for Learning Cross-lingual Embeddings with Sub-word Information without Parallel Corpora

Ali Hakimi Parizi, Paul Cook

Keywords Paper

0

0

0

0

12:38

26/04/2020

Learning Hierarchical Discrete Linguistic Units from Visually-Grounded Speech

David Harwath, Wei-Ning Hsu, James Glass

Keywords Paper

visually-grounded speech, self-supervised learning, discrete representation learning, vision and language, vision and speech, hierarchical representation learning

0

0

0

0

13:42

02/02/2021

Multilingual Transfer Learning for QA using Translation as Data Augmentation

Mihaela Bornea, Lin Pan, Sara Rosenthal and
Radu Florian, Avirup Sil

Keywords Paper

0

0

0

0

15:44

06/12/2021

Unsupervised Speech Recognition

Alexei Baevski, Wei-Ning Hsu, Alexis CONNEAU, Michael Auli

Keywords Paper

deep learning, adversarial robustness and security, self-supervised learning, generative model

0

0

0

0

19:16

25/07/2020

Leveraging adversarial training in self-learning for cross-lingual text classification

Xin Dong, Yaxin Zhu, Yupeng Zhang and
Zuohui Fu, Dongkuan Xu, Sen Yang, Gerard Melo

Keywords Paper

multilingual, semantics, text classification, cross-lingual

0

0

0

0

9:19

06/12/2021

Diversity Enhanced Active Learning with Strictly Proper Scoring Rules

Wei Tan, Lan Du, Wray Buntine

Keywords Paper

machine learning, active learning

0

0

0

0

13:21

18/07/2021

Fused Acoustic and Text Encoding for Multimodal Bilingual Pretraining and Speech Translation

Renjie Zheng, Junkun Chen, Mingbo Ma, Liang Huang

Keywords Paper

Applications, Natural Language Processing

0

0

0

0

5:19

03/05/2021

Dual-mode ASR: Unify and Improve Streaming ASR with Full-context Modeling

Jiahui Yu, Wei Han, Anmol Gulati and
Chung-Cheng Chiu, Bo Li, Tara Sainath, Yonghui Wu, Ruoming Pang

Keywords Paper

Dual-mode ASR, Low-latency ASR, Streaming ASR, Speech Recognition

0

0

0

0

5:11

04/07/2020

Worse WER, but Better BLEU? Leveraging Word Embedding as Intermediate in Multitask End-to-End Speech Translation

Shun-Po Chuang, Tzu-Wei Sung, Alexander H. Liu, Hung-yi Lee

Keywords Paper

Speech translation, Word Embedding, ST, multitask learning

0

0

0

0

6:45

01/07/2020

Adapting End-to-End Speech Recognition for Readable Subtitles

Danni Liu, Jan Niehues, Gerasimos Spanakis

Keywords Paper

0

0

0

0

22:16

02/02/2021

UWSpeech: Speech to Speech Translation for Unwritten Languages

Chen Zhang, Xu Tan, Yi Ren and
Tao Qin, Kejun Zhang, Tie-Yan Liu

Keywords Paper

0

0

0

0

15:14

19/04/2021

WER-BERT: Automatic WER estimation with BERT in a balanced ordinal classification paradigm

Akshay Krishna Sheshadri, Anvesh Rao Vijjini, Sukhdeep Kharbanda

Keywords Paper

0

0

0

0

11:45

03/05/2021

Flowtron: an Autoregressive Flow-based Generative Network for Text-to-Speech Synthesis

Rafael Valle, Kevin J Shih, Ryan Prenger, Bryan Catanzaro

Keywords Paper

normalizing flows, deep learning, Text to speech synthesis

0

0

0

0

5:11

18/07/2021

EfficientTTS: An Efficient and High-Quality Text-to-Speech Architecture

Chenfeng Miao, Liang Shuang, Zhengchen Liu and
Chen Minchuan, Jun Ma, Shaojun Wang, Jing Xiao

Keywords Paper

Applications, Audio and Speech Processing

0

0

0

0

5:13

08/12/2020

Mixup-Transformer: Dynamic Data Augmentation for NLP Tasks

Lichao Sun, Congying Xia, Wenpeng Yin and
Tingting Liang, Philip Yu, Lifang He

Keywords Paper

0

0

0

0

9:52

04/07/2020

Cross-Lingual Unsupervised Sentiment Classification with Multi-View Transfer Learning

Hongliang Fei, Ping Li

Keywords Paper

Cross-Lingual Classification, sentiment classification, unsupervised system, classification

0

0

0

0

12:23

16/11/2020

Vokenization: Improving Language Understanding with Contextualized, Visual-Grounded Supervision

Hao Tan, Mohit Bansal

Keywords Paper

speaking, writing, text-only self-supervision, pure-language tasks

0

0

0

0

11:59

19/04/2021

Joint energy-based model training for better calibrated natural language understanding models

Tianxing He, Bryan McCann, Caiming Xiong, Ehsan Hosseini-Asl

Keywords Paper

0

0

0

0

5:58

19/08/2021

MRD-Net: Multi-Modal Residual Knowledge Distillation for Spoken Question Answering

Chenyu You, Nuo Chen, Yuexian Zou

Keywords Paper

Natural Language Processing, Question Answering, Sentiment Analysis and Text Mining, Speech

0

0

0

0

12:23

04/07/2020

Unsupervised Cross-lingual Representation Learning at Scale

Alexis Conneau, Kartikay Khandelwal, Naman Goyal and
Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke Zettlemoyer, Veselin Stoyanov

Keywords Paper

cross-lingual tasks, XNLI, MLQA, NER

0

0

0

0

12:15

08/12/2020

Transliteration of Judeo-Arabic Texts into Arabic Script Using Recurrent Neural Networks

Ori Terner, Kfir Bar, Nachum Dershowitz

Keywords Paper

0

0

0

0

9:50

16/11/2020

Learning Which Features Matter: RoBERTa Acquires a Preference for Linguistic Generalizations (Eventually)

Alex Warstadt, Yian Zhang, Xiaocheng Li and
Haokun Liu, Samuel R. Bowman

Keywords Paper

self-supervised tasks, language understanding, ambiguous tasks, finetuning

0

0

0

0

12:04

04/07/2020

Transferring Monolingual Model to Low-Resource Language: The Case of Tigrinya

Abrhalei Frezghi Tela, Abraham Woubie Zewoudie, Ville Hautamäki

Keywords Paper

natural tasks, NLP, downstream task, pre-training

0

0

0

0

8:47

16/11/2020

Pre-tokenization of Multi-word Expressions in Cross-lingual Word Embeddings

Naoki Otani, Satoru Ozaki, Xingyuan Zhao and
Yucen Li, Micaelah St Johns, Lori Levin

Keywords Paper

word-translation task, mwe translation, single-word translations, cross-lingual algorithms

0

0

0

0

11:56