Seeing wake words: Audio-visual Keyword Spotting

07/09/2020

Seeing wake words: Audio-visual Keyword Spotting

Liliane Momeni, Triantafyllos Afouras, Themos Stafylakis, Samuel Albanie, Andrew Zisserman

Keywords: keyword spotting, wake word recognition, zero-shot, audio-visual, lip reading, speech recognition, retrieval

Abstract Paper Code Similar Papers

Abstract: The goal of this work is to automatically determine whether and when a word of interest is spoken by a talking face, with or without the audio. We propose a zero-shot method suitable for "in the wild" videos. Our key contributions are: (1) a novel convolutional architecture, KWS-Net, that uses a similarity map intermediate representation to separate the task into (i) sequence matching, and (ii) pattern detection, to decide whether the word is there and when; (2) we demonstrate that if audio is available, visual keyword spotting improves the performance both for a clean and noisy audio signal. Finally, (3) we show that our method generalises to other languages, specifically French and German, and achieves a comparable performance to English with less language specific data, by fine-tuning the network pre-trained on English. The method exceeds the performance of the previous state-of-the-art visual keyword spotting architecture when trained and tested on the same benchmark, and also that of a state-of-the-art lip reading method.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at BMVC 2020 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

06/12/2020

Incorporating BERT into Parallel Sequence Decoding with Adapters

Junliang Guo, Zhirui Zhang, Linli Xu and
Hao-Ran Wei, Boxing Chen, Enhong Chen

Keywords Paper

0

0

0

0

3:17

02/02/2021

TaLNet: Voice Reconstruction from Tongue and Lip Articulation with Transfer Learning from Text-to-Speech Synthesis

Jing-Xuan Zhang, Korin Richmond, Zhen-Hua Ling, Lirong Dai

Keywords Paper

0

0

0

0

19:58

02/02/2021

PGNet: Real-time Arbitrarily-Shaped Text Spotting with Point Gathering Network

Pengfei Wang, Chengquan Zhang, Fei Qi and
Shanshan Liu, Xiaoqiang Zhang, Pengyuan Lyu, Junyu Han, Jingtuo Liu, Errui Ding, Guangming Shi

Keywords Paper

0

0

0

0

18:06

01/07/2020

End-to-End Speech-Translation with Knowledge Distillation: FBK@IWSLT2020

Marco Gaido, Mattia A. Di Gangi, Matteo Negri, Marco Turchi

Keywords Paper

0

0

0

0

10:50

02/02/2021

Exploring Transfer Learning For End-to-End Spoken Language Understanding

Subendhu Rongali, Beiye Liu, Liwei Cai and
Konstantine Arkoudas, Chengwei Su, Wael Hamza

Keywords Paper

0

0

0

0

19:30

16/11/2020

Unsupervised Text Style Transfer with Padded Masked Language Models

Eric Malmi, Aliaksei Severyn, Sascha Rothe

Keywords Paper

style transfer, sentence fusion, sentiment transfer, masker

0

0

0

0

6:53

08/12/2020

Attentively Embracing Noise for Robust Latent Representation in BERT

Gwenaelle Cunha Sergio, Dennis Singh Moirangthem, Minho Lee

Keywords Paper

0

0

0

0

12:55

03/05/2021

Multiscale Score Matching for Out-of-Distribution Detection

Ahsan Mahmood, Junier Oliva, Martin A Styner

Keywords Paper

out-of-distribution detection, deep learning, score matching, outlier detection

0

0

0

0

5:13

19/04/2021

Self-training pre-trained language models for zero- and few-shot multi-dialectal Arabic sequence labeling

Muhammad Khalifa, Muhammad Abdul-Mageed, Khaled Shaalan

Keywords Paper

0

0

0

0

8:10

14/06/2020

Reusing Discriminators for Encoding: Towards Unsupervised Image-to-Image Translation

Runfa Chen, Wenbing Huang, Binghui Huang and
Fuchun Sun, Bin Fang

Keywords Paper

nice-gan, reusing discriminators for encoding, unsupervised image-to-image translation, decoupled training, multi-scale discriminators, adversarial loss, no independent component for encoding, shared layers, residual attention, cyclegan

0

0

0

0

1:01

04/07/2020

Cross-Lingual Unsupervised Sentiment Classification with Multi-View Transfer Learning

Hongliang Fei, Ping Li

Keywords Paper

Cross-Lingual Classification, sentiment classification, unsupervised system, classification

0

0

0

0

12:23

16/11/2020

Tackling the Low-resource Challenge for Canonical Segmentation

Manuel Mager, Özlem Çetinoğlu, Katharina Kann

Keywords Paper

morphological generation, canonical segmentation, lstm pointer-generator, sequence-to-sequence model

0

0

0

0

11:55

08/12/2020

Mixup-Transformer: Dynamic Data Augmentation for NLP Tasks

Lichao Sun, Congying Xia, Wenpeng Yin and
Tingting Liang, Philip Yu, Lifang He

Keywords Paper

0

0

0

0

9:52

02/02/2021

Non-Autoregressive Coarse-to-Fine Video Captioning

Bang Yang, Yuexian Zou, Fenglin Liu, Can Zhang

Keywords Paper

0

0

0

0

18:21

18/07/2021

EfficientTTS: An Efficient and High-Quality Text-to-Speech Architecture

Chenfeng Miao, Liang Shuang, Zhengchen Liu and
Chen Minchuan, Jun Ma, Shaojun Wang, Jing Xiao

Keywords Paper

Applications, Audio and Speech Processing

0

0

0

0

5:13

04/07/2020

On the Cross-lingual Transferability of Monolingual Representations

Mikel Artetxe, Sebastian Ruder, Dani Yogatama

Keywords Paper

zero-shot setting, Cross-lingual Representations, unsupervised models, joint training

0

0

0

0

11:28

26/04/2020

Learning Hierarchical Discrete Linguistic Units from Visually-Grounded Speech

David Harwath, Wei-Ning Hsu, James Glass

Keywords Paper

visually-grounded speech, self-supervised learning, discrete representation learning, vision and language, vision and speech, hierarchical representation learning

0

0

0

0

13:42

06/12/2020

Make One-Shot Video Object Segmentation Efficient Again

Tim Meinhardt, Laura Leal-Taixé

Keywords Paper

0

0

0

0

3:17

30/11/2020

Watch, read and lookup: learning to spot signs from multiple supervisors

Liliane Momeni, Gul Varol, Samuel Albanie and
Triantafyllos Afouras, Andrew Zisserman

Keywords Paper

0

0

0

0

9:58

08/12/2020

Bridging the Gap in Multilingual Semantic Role Labeling: a Language-Agnostic Approach

Simone Conia, Roberto Navigli

Keywords Paper

0

0

0

0

14:40

05/01/2021

Legacy Photo Editing With Learned Noise Prior

Yuzhi Zhao, Lai-Man Po, Tingyu Lin and
Xuehui Wang, Kangcheng Liu, Yujia Zhang, Wing-Yin Yu, Pengfei Xian, Jingjing Xiong

Keywords Paper

0

0

0

0

4:51

02/02/2021

Self-supervised Pre-training and Contrastive Representation Learning for Multiple-choice Video QA

Seonhoon Kim, Seohyeong Jeong, Eunbyul Kim and
Inho Kang, Nojun Kwak

Keywords Paper

0

0

0

0

15:23

01/07/2020

Neural Simultaneous Speech Translation Using Alignment-Based Chunking

Patrick Wilken, Tamer Alkhouli, Evgeny Matusov, Pavel Golik

Keywords Paper

0

0

0

0

20:12

18/07/2021

Self-supervised and Supervised Joint Training for Resource-rich Machine Translation

Yong Cheng, Wei Wang, Lu Jiang, Wolfgang Macherey

Keywords Paper

Applications, Natural Language Processing

0

0

0

0

5:21

04/07/2020

NAT: Noise-Aware Training for Robust Neural Sequence Labeling

Marcin Namysl, Sven Behnke, Joachim Köhler

Keywords Paper

Noise-Aware Training, Robust Labeling, Sequence systems, noisy problem

0

0

0

0

10:07

19/04/2021

Neural data-to-text generation with LM-based text augmentation

Ernie Chang, Xiaoyu Shen, Dawei Zhu and
Vera Demberg, Hui Su

Keywords Paper

0

0

0

0

7:32

08/12/2020

Detecting Urgency Status of Crisis Tweets: A Transfer Learning Approach for Low Resource Languages

Efsun Sarioglu Kayi, Linyong Nan, Bohan Qu and
Mona Diab, Kathleen McKeown

Keywords Paper

0

0

0

0

14:37

05/01/2021

AVGZSLNet: Audio-Visual Generalized Zero-Shot Learning by Reconstructing Label Features From Multi-Modal Embeddings

Pratik Mazumder, Pravendra Singh, Kranti Kumar Parida, Vinay P. Namboodiri

Keywords Paper

0

0

0

0

4:46

06/12/2020

Unsupervised Data Augmentation for Consistency Training

Qizhe Xie, Zihang Dai, Eduard Hovy and
Thang Luong, Quoc V Le

Keywords Paper

0

0

0

0

3:29

06/12/2020

Pixel-Level Cycle Association: A New Perspective for Domain Adaptive Semantic Segmentation

Guoliang Kang, Yunchao Wei, Yi Yang and
Yueting Zhuang, Alexander Hauptmann

Keywords Paper

0

0

0

0

3:16

19/08/2021

Multi-Scale Selective Feedback Network with Dual Loss for Real Image Denoising

Xiaowan Hu, Yuanhao Cai, Zhihong Liu and
Haoqian Wang, Yulun Zhang

Keywords Paper

Computer Vision, Computational Photography, Photometry, Shape from X, Deep Learning

0

0

0

0

9:52

19/04/2021

Modeling context in answer sentence selection systems on a latency budget

Rujun Han, Luca Soldaini, Alessandro Moschitti

Keywords Paper

0

0

0

0

7:03

16/11/2020

HABERTOR: An Efficient and Effective Deep Hatespeech Detector

Thanh Tran, Yifan Hu, Changwei Hu and
Kevin Yen, Fei Tan, Kyumin Lee, Se Rim Park

Keywords Paper

downstream task, hatespeech classification, habertor model, bert model

0

0

0

0

11:46

06/12/2021

PARP: Prune, Adjust and Re-Prune for Self-Supervised Speech Recognition

Cheng-I Jeff Lai, Yang Zhang, Alexander Liu and
Shiyu Chang, Yi-Lun Liao, Yung-Sung Chuang, Kaizhi Qian, Sameer Khurana, David Cox, Jim Glass

Keywords Paper

self-supervised learning, representation learning

0

0

0

0

13:57

06/12/2021

FastCorrect: Fast Error Correction with Edit Alignment for Automatic Speech Recognition

Yichong Leng, Xu Tan, Linchen Zhu and
Jin Xu, Renqian Luo, Linquan Liu, Tao Qin, Xiangyang Li, Edward Lin, Tie-Yan Liu

Keywords Paper

0

0

0

0

13:44

19/08/2021

Weakly Supervised Dense Video Captioning via Jointly Usage of Knowledge Distillation and Cross-modal Matching

Bofeng Wu, Guocheng Niu, Jun Yu and
Xinyan Xiao, Jian Zhang, Hua Wu

Keywords Paper

Computer Vision, Language and Vision, Multi-instance; Multi-label; Multi-view learning

0

0

0

0

12:03

06/12/2021

Learning to Generate Realistic Noisy Images via Pixel-level Noise-aware Adversarial Training

Yuanhao Cai, Xiaowan Hu, Haoqian Wang and
Yulun Zhang, Hanspeter Pfister, Donglai Wei

Keywords Paper

deep learning, adversarial robustness and security, vision, generative model, graph learning

0

0

0

0

3:05

16/11/2020

MultiCQA: Zero-Shot Transfer of Self-Supervised Text Matching Models on a Massive Scale

Andreas Rücklé, Jonas Pfeiffer, Iryna Gurevych

Keywords Paper

answer tasks, zero-shot transfer, text models, self-supervised training

0

0

0

0

10:07

14/06/2020

Visual Grounding in Video for Unsupervised Word Translation

Gunnar A. Sigurdsson, Jean-Baptiste Alayrac, Aida Nematzadeh and
Lucas Smaira, Mateusz Malinowski, João Carreira, Phil Blunsom, Andrew Zisserman

Keywords Paper

video, translation, multimodal learning, unsupervised learning, unsupervised translation, youtube, howto100m, multilingual, language, deep learning

0

0

0

0

1:01

02/02/2021

Joint Demosaicking and Denoising in the Wild: The Case of Training Under Ground Truth Uncertainty

Jierun Chen, Song Wen, S.-H. Gary Chan

Keywords Paper

0

0

0

0

18:17