Joint training of guided learning and mean teacher models for sound event detection

02/11/2020

Joint training of guided learning and mean teacher models for sound event detection

Hao Yen, Pin-Jui Ku, Ming-Chi Yen, Hung-Shin Lee, Hsin-Min Wang

Keywords:

Abstract Paper Similar Papers

Abstract: In this paper, we present our system of sound event detection and separation in domestic environments for DCASE 2020. The task aims to determine which sound events appear in a clip and the detailed temporal ranges they occupy. The system is trained by using weakly-labeled and unlabeled real data and synthetic data with strongly annotated labels. Our proposed model structure includes a feature-level front-end based on convolution neural networks (CNN), followed by both embedding-level and instance-level back-end attention modules. In order to make full use of the large amount of unlabeled data, we jointly adopt the Guided Learning and Mean Teacher approaches to carry out weakly-supervised learning and semi-supervised learning. In addition, a set of adaptive median windows for individual sound events is used to smooth the frame-level predictions in post-processing. In the public evaluation set of DCASE 2019, the best event-based <i>F</i><sub>1</sub>-score achieved by our system is 48.50%, which is a relative improvement of 27.16% over the official baseline (38.14%). In addition, in the development set of DCASE 2020, our best system also achieves a relative improvement of 32.91% over the baseline (45.68% vs. 34.37%)

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at DCASE 2020 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

02/11/2020

Forward-backward convolutional recurrent neural networks and tag-conditioned convolutional neural networks for weakly labeled semi-supervised sound event detection

Janek Ebbers, Reinhold Haeb-Umbach

Keywords Paper

0

0

0

0

14:47

02/11/2020

Sound event localization and detection based on CRNN using rectangular filters and channel rotation data augmentation

Francesca Ronchini, Daniel Arteaga, Andrés Pérez-López

Keywords Paper

0

0

0

0

12:51

02/11/2020

Conformer-based sound event detection with semi-supervised learning and data augmentation

Koichi Miyazaki, Tatsuya Komatsu, Tomoki Hayashi and
Shinji Watanabe, Tomoki Toda, Kazuya Takeda

Keywords Paper

0

0

0

0

14:29

02/11/2020

Guided multi-branch learning systems for sound event detection with sound separation

Yuxin Huang, Liwei Lin, Shuo Ma and
Xiangdong Wang, Hong Liu, Yueliang Qian, Min Liu, Kazushige Ouchi

Keywords Paper

0

0

0

0

12:52

02/11/2020

Self-supervised classification for detecting anomalous sounds

Ritwik Giri, Srikanth V. Tenneti, Fangzhou Cheng and
Karim Helwani, Umut Isik, Arvindh Krishnaswamy

Keywords Paper

0

0

0

0

13:28

26/04/2020

Learning Hierarchical Discrete Linguistic Units from Visually-Grounded Speech

David Harwath, Wei-Ning Hsu, James Glass

Keywords Paper

visually-grounded speech, self-supervised learning, discrete representation learning, vision and language, vision and speech, hierarchical representation learning

0

0

0

0

13:42

02/02/2021

Listen, Understand and Translate: Triple Supervision Decouples End-to-end Speech-to-text Translation

Qianqian Dong, Rong Ye, Mingxuan Wang and
Hao Zhou, Shuang Xu, Bo Xu, Lei Li

Keywords Paper

0

0

0

0

14:09

02/11/2020

Two-stage domain adaptation for sound event detection

Liping Yang, Junyong Hao, Zhenwei Hou, Wang Peng

Keywords Paper

0

0

0

0

13:16

18/07/2021

SoundDet: Polyphonic Moving Sound Event Detection and Localization from Raw Waveform

Yuhang He, Niki Trigoni, Andrew Markham

Keywords Paper

Applications, Audio and Speech Processing

0

0

0

0

4:34

02/11/2020

Event-independent network for polyphonic sound event localization and detection

Yin Cao, Turab Iqbal, Qiuqiang Kong and
Yue Zhong, Wenwu Wang, Mark D. Plumbley

Keywords Paper

0

0

0

0

13:45

02/11/2020

Temporal sub-sampling of audio feature sequences for automated audio captioning

Khoa Nguyen, Konstantinos Drossos, Tuomas Virtanen

Keywords Paper

0

0

0

0

14:09

02/11/2020

Task-aware separation for the DCASE 2020 task 4 sound event detection and separation challenge

Samuele Cornell, Michel Olvera, Manuel Pariente and
Giovanni Pepe, Emanuele Principi, Leonardo Gabrielli, Stefano Squartini

Keywords Paper

0

0

0

0

12:30

02/11/2020

Group masked autoencoder based density estimator for audio anomaly detection

Ritwik Giri, Fangzhou Cheng, Karim Helwani and
Srikanth V. Tenneti, Umut Isik, Arvindh Krishnaswamy

Keywords Paper

0

0

0

0

15:43

19/04/2021

Joint energy-based model training for better calibrated natural language understanding models

Tianxing He, Bryan McCann, Caiming Xiong, Ehsan Hosseini-Asl

Keywords Paper

0

0

0

0

5:58

08/12/2020

Multi-task Learning of Spoken Language Understanding by Integrating N-Best Hypotheses with Hierarchical Attention

Mingda Li, Xinyue Liu, Weitong Ruan and
Luca Soldaini, Wael Hamza, Chengwei Su

Keywords Paper

0

0

0

0

14:43

30/11/2020

Do We Need Sound for Sound Source Localization?

Takashi Oya, Shohei Iwase, Ryota Natsume and
Takahiro Itazuri, Shugo Yamaguchi, Shigeo Morishima

Keywords Paper

0

0

0

0

8:43

04/07/2020

Improved Speech Representations with Multi-Target Autoregressive Predictive Coding

Yu-An Chung, James Glass

Keywords Paper

Training objectives, downstream tasks, generalization task, phonetic classification

0

0

0

0

4:42

14/06/2020

Telling Left From Right: Learning Spatial Correspondence of Sight and Sound

Karren Yang, Bryan Russell, Justin Salamon

Keywords Paper

audio-visual learning in video, self-supervision, video dataset, spatial audio, localization, spatialization, upmixing, source separation

0

0

0

0

4:41

04/07/2020

Multimodal and Multiresolution Speech Recognition with Transformers

Georgios Paraskevopoulos, Srinivas Parthasarathy, Aparna Khare, Shiva Sundaram

Keywords Paper

Multimodal Recognition, ASR, multiresolution ASR, Transformers

0

0

0

0

6:48

06/12/2021

Understanding Adaptive, Multiscale Temporal Integration In Deep Speech Recognition Systems

Menoua Keshishian, Samuel Norman-Haignere, Nima Mesgarani

Keywords Paper

deep learning, machine learning

0

0

0

0

10:28

06/12/2021

TriBERT: Human-centric Audio-visual Representation Learning

Tanzila Rahman, Mengyu Yang, Leonid Sigal

Keywords Paper

transformers, representation learning

0

0

0

0

13:54

05/12/2020

English intermediate-task training improves zero-shot cross-lingual transfer too

Jason Phang, Iacer Calixto, Phu Mon Htut and
Yada Pruksachatkun, Haokun Liu, Clara Vania, Katharina Kann, Samuel R. Bowman

Keywords Paper

0

0

0

0

14:13

03/05/2021

Deberta: Decoding-Enhanced Bert With Disentangled Attention

Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen

Keywords Paper

Position Encoding, Attention, Natural Language Processing, Language Model Pre-training, Transformer

0

0

0

0

6:06

01/07/2020

End-to-End Speech Translation with Adversarial Training

Xuancai Li, Chen Kehai, Tiejun Zhao, Muyun Yang

Keywords Paper

0

0

0

0

8:53

06/12/2020

Unsupervised Sound Separation Using Mixture Invariant Training

Scott Wisdom, Efthymios Tzinis, Hakan Erdogan and
Ron Weiss, Kevin Wilson, John R. Hershey

Keywords Paper

0

0

0

0

3:20

06/12/2021

Improved Regularization and Robustness for Fine-tuning in Neural Networks

Dongyue Li, Hongyang Zhang

Keywords Paper

deep learning, machine learning, robustness, vision, transfer learning

0

0

0

0

12:03

02/11/2020

Lightweight convolutional neural networks on binaural waveforms for low complexity acoustic scene classification

Nicolas Pajusco, Richard Huang, Nicolas Farrugia

Keywords Paper

0

0

0

0

11:50

08/12/2020

Attentively Embracing Noise for Robust Latent Representation in BERT

Gwenaelle Cunha Sergio, Dennis Singh Moirangthem, Minho Lee

Keywords Paper

0

0

0

0

12:55

02/11/2020

Effects of word-frequency based pre- and post- processings for audio captioning

Daiki Takeuchi, Yuma Koizumi, Yasunori Ohishi and
Noboru Harada, Kunio Kashino

Keywords Paper

0

0

0

0

13:56

02/11/2020

Ensemble of sequence matching networks for dynamic sound event localization, detection, and tracking

Thi Ngoc Tho Nguyen, Douglas L. Jones, Woon Seng Gan

Keywords Paper

0

0

0

0

11:06

03/05/2021

Multi-timescale Representation Learning in LSTM Language Models

Shivangi Mahto, Vy Vo, Javier Turek, Alexander Huth

Keywords Paper

LSTM, timescales, Language Model

0

0

0

0

4:57

02/11/2020

Audio tag representation guided dual attention network for acoustic scene classification

Ju-Ho Kim, Jee-Weon Jung, Hye-Jin Shim, Ha-Jin Yu

Keywords Paper

0

0

0

0

12:46

22/11/2021

Audio-Visual Synchronisation in the wild

Triantafyllos Afouras, Honglie Chen, Weidi Xie and
Arsha Nagrani, Andrea Vedaldi, Andrew Zisserman

Keywords Paper

multimodal learning, self supervision, audio-visual synchronisation, dataset

0

0

0

0

3:02

02/02/2021

TaLNet: Voice Reconstruction from Tongue and Lip Articulation with Transfer Learning from Text-to-Speech Synthesis

Jing-Xuan Zhang, Korin Richmond, Zhen-Hua Ling, Lirong Dai

Keywords Paper

0

0

0

0

19:58

16/11/2020

Feature Adaptation of Pre-Trained Language Models across Languages and Domains with Robust Self-Training

Hai Ye, Qingyu Tan, Ruidan He and
Juntao Li, Hwee Tou Ng, Lidong Bing

Keywords Paper

unsupervised adaptation, self-training, pre-trained models, bert

0

0

0

0

10:33

04/07/2020

Towards end-2-end learning for predicting behavior codes from spoken utterances in psychotherapy conversations

Karan Singla, Zhuohao Chen, David Atkins, Shrikanth Narayanan

Keywords Paper

predicting codes, Spoken tasks, voice detection, speaker diarization

0

0

0

0

7:16

02/02/2021

Denoising Distantly Supervised Named Entity Recognition via a Hypergeometric Probabilistic Model

Wenkai Zhang, Hongyu Lin, Xianpei Han and
Le Sun, Huidan Liu, Zhicheng Wei, Nicholas Yuan

Keywords Paper

0

0

0

0

19:22

08/12/2020

Mixup-Transformer: Dynamic Data Augmentation for NLP Tasks

Lichao Sun, Congying Xia, Wenpeng Yin and
Tingting Liang, Philip Yu, Lifang He

Keywords Paper

0

0

0

0

9:52

02/11/2020

ID-conditioned auto-encoder for unsupervised anomaly detection

Sławomir Kapka

Keywords Paper

0

0

0

0

13:51

30/11/2020

Watch, read and lookup: learning to spot signs from multiple supervisors

Liliane Momeni, Gul Varol, Samuel Albanie and
Triantafyllos Afouras, Andrew Zisserman

Keywords Paper

0

0

0

0

9:58