vq-wav2vec: Self-Supervised Learning of Discrete Speech Representations

26/04/2020

vq-wav2vec: Self-Supervised Learning of Discrete Speech Representations

Alexei Baevski, Steffen Schneider, Michael Auli

Keywords: speech recognition, speech representation learning

Abstract: We propose vq-wav2vec to learn discrete representations of audio segments through a wav2vec-style self-supervised context prediction task. The algorithm uses either a gumbel softmax or online k-means clustering to quantize the dense representations. Discretization enables the direct application of algorithms from the NLP community which require discrete inputs. Experiments show that BERT pre-training achieves a new state of the art on TIMIT phoneme classification and WSJ speech recognition.

0

Share

ICLR

This is an embedded video. Talk and the respective paper are published at ICLR 2020 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet