25/07/2020

Auto-annotation for voice-enabled entertainment systems

Wenyan Li, Ferhan Ture

Keywords: unsupervised, voice-enabled entertainment systems, automatic speech recognition, error detection and evaluation, auto-annotation

Abstract: Voice-activated intelligent entertainment systems are prevalent in modern TVs. These systems require accurate automatic speech recognition (ASR) models to transcribe voice queries for further downstream language understanding tasks. Currently, labeling audio data for training is the main bottleneck in deploying accurate machine learning ASR models, especially when these models require up-to-date training data to adapt to the shifting customer needs. We present an auto-annotation system, which provides high quality training data without any hand-labeled audios by detecting speech recognition errors and providing possible fixes. Through our algorithm, the auto-annotated training data reaches an overall word error rate (WER) of 0.002; furthermore, we obtained a reduction of 0.907 in WER after applying the auto-suggested fixes.

The video of this talk cannot be embedded. You can watch it here:
https://dl.acm.org/doi/10.1145/3397271.3401241#sec-supp
(Link will open in new window)
 0
 0
 0
 0
This is an embedded video. Talk and the respective paper are published at SIGIR 2020 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment
no comments yet
code of conduct: tbd Characters remaining: 140

Similar Papers