Listen carefully and tell: An audio captioning system based on residual learning and gammatone audio representation

02/11/2020

Listen carefully and tell: An audio captioning system based on residual learning and gammatone audio representation

Sergi Perez-Castanos, Javier Naranjo-Alcazar, Pedro Zuccarello, Maximo Cobos

Keywords:

Abstract Paper Similar Papers

Abstract: Automated audio captioning is machine listening task whose goal is to describe an audio using free text. An automated audio captioning system has to be implemented as it accepts an audio as input and outputs as textual description, that is, the caption of the signal. This task can be useful in many applications such as automatic content description or machine-to-machine interaction. In this work, an automatic audio captioning based on residual learning on the encoder phase is proposed. The encoder phase is implemented via different Residual Networks configurations. The decoder phase (create the caption) is run using recurrent layers plus attention mechanism. The audio representation chosen has been Gammatone. Results show that the framework proposed in this work surpass the baseline system in challenge results.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at DCASE 2020 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

02/02/2021

Listen, Understand and Translate: Triple Supervision Decouples End-to-end Speech-to-text Translation

Qianqian Dong, Rong Ye, Mingxuan Wang and
Hao Zhou, Shuang Xu, Bo Xu, Lei Li

Keywords Paper

0

0

0

0

14:09

26/04/2020

High Fidelity Speech Synthesis with Adversarial Networks

Mikołaj Bińkowski, Jeff Donahue, Sander Dieleman and
Aidan Clark, Erich Elsen, Norman Casagrande, Luis C. Cobo, Karen Simonyan

Keywords Paper

texttospeech, speechsynthesis, audiosynthesis, gans, generativeadversarialnetworks, implicitgenerativemodels

0

0

0

0

15:07

02/11/2020

Audio captioning based on transformer and pre-trained CNN

Kun Chen, Yusong Wu, Ziyue Wang and
Xuan Zhang, Fudong Nian, Shengchen Li, Xi Shao

Keywords Paper

0

0

0

0

12:40

02/02/2021

Audio-Oriented Multimodal Machine Comprehension via Dynamic Inter- and Intra-modality Attention

Zhiqi Huang, Fenglin Liu, Xian Wu and
Shen Ge, Helin Wang, Wei Fan, Yuexian Zou

Keywords Paper

0

0

0

0

14:47

02/11/2020

Self-supervised classification for detecting anomalous sounds

Ritwik Giri, Srikanth V. Tenneti, Fangzhou Cheng and
Karim Helwani, Umut Isik, Arvindh Krishnaswamy

Keywords Paper

0

0

0

0

13:28

03/05/2021

Flowtron: an Autoregressive Flow-based Generative Network for Text-to-Speech Synthesis

Rafael Valle, Kevin J Shih, Ryan Prenger, Bryan Catanzaro

Keywords Paper

normalizing flows, deep learning, Text to speech synthesis

0

0

0

0

5:11

02/11/2020

Temporal sub-sampling of audio feature sequences for automated audio captioning

Khoa Nguyen, Konstantinos Drossos, Tuomas Virtanen

Keywords Paper

0

0

0

0

14:09

19/08/2021

Exemplification Modeling: Can You Give Me an Example, Please?

Edoardo Barba, Luigi Procopio, Caterina Lacerra and
Tommaso Pasini, Roberto Navigli

Keywords Paper

Natural Language Processing, Natural Language Semantics, Resources and Evaluation

0

0

0

0

14:47

26/04/2020

Neural Machine Translation with Universal Visual Representation

Zhuosheng Zhang, Kehai Chen, Rui Wang and
Masao Utiyama, Eiichiro Sumita, Zuchao Li, Hai Zhao

Keywords Paper

Neural Machine Translation, Visual Representation, Multimodal Machine Translation, Language Representation

0

0

0

0

4:50

06/12/2020

Glow-TTS: A Generative Flow for Text-to-Speech via Monotonic Alignment Search

Jaehyeon Kim, Sungwon Kim, Jungil Kong, Sungroh Yoon

Keywords Paper

0

0

0

0

3:11

12/07/2020

Non-Autoregressive Neural Text-to-Speech

Kainan Peng, Wei Ping, Zhao Song, Kexin Zhao

Keywords Paper

Applications - Language, Speech and Dialog

0

0

0

0

15:12

08/12/2020

Attentively Embracing Noise for Robust Latent Representation in BERT

Gwenaelle Cunha Sergio, Dennis Singh Moirangthem, Minho Lee

Keywords Paper

0

0

0

0

12:55

30/11/2020

Image Captioning through Image Transformer

Sen He, Wentong Liao, Hamed R. Tavakoli and
Michael Yang, Bodo Rosenhahn, Nicolas Pugeault

Keywords Paper

0

0

0

0

9:49

12/07/2020

On Variational Learning of Controllable Representations for Text without Supervision

Peng Xu, Jackie Chi Kit Cheung, Yanshuai Cao

Keywords Paper

Representation Learning

0

0

0

0

14:51

19/04/2021

Joint energy-based model training for better calibrated natural language understanding models

Tianxing He, Bryan McCann, Caiming Xiong, Ehsan Hosseini-Asl

Keywords Paper

0

0

0

0

5:58

01/07/2020

KIT’s IWSLT 2020 SLT Translation System

Ngoc-Quan Pham, Felix Schneider, Tuan-Nam Nguyen and
Thanh-Le Ha, Thai Son Nguyen, Maximilian Awiszus, Sebastian Stüker, Alexander Waibel

Keywords Paper

0

0

0

0

14:58

19/08/2021

A Streaming End-to-End Framework For Spoken Language Understanding

Nihal Potdar, Anderson Raymundo Avila, Chao Xing and
Dong Wang, Yiran Cao, Xiao Chen

Keywords Paper

Natural Language Processing, Dialogue, Speech

0

0

0

0

14:09

06/12/2020

Sequence to Multi-Sequence Learning via Conditional Chain Mapping for Mixture Signals

Jing Shi, Xuankai Chang, Pengcheng Guo and
Shinji Watanabe, Yusuke Fujita, Jiaming Xu, Bo Xu, Lei Xie

Keywords Paper

0

0

0

0

3:23

07/09/2020

Robust Scene Text Recognition Through Adaptive Image Enhancement

Ye Qian, Yuyang Wang, Feng Su

Keywords Paper

text recognition, image enhancement, spatial rectification, end-to-end, scene text

0

0

0

0

7:50

06/12/2021

VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text

Hassan Akbari, Liangzhe Yuan, Rui Qian and
Wei-Hong Chuang, Shih-Fu Chang, Yin Cui, Boqing Gong

Keywords Paper

machine learning, self-supervised learning, transformers, vision, contrastive learning

0

0

0

0

15:59

14/06/2020

Learn to Augment: Joint Data Augmentation and Network Optimization for Text Recognition

Canjie Luo, Yuanzhi Zhu, Lianwen Jin, Yongpan Wang

Keywords Paper

data augmentation, text recognition, joint training

0

0

0

0

0:59

19/04/2021

Modelling context emotions using multi-task learning for emotion controlled dialog generation

Deeksha Varshney, Asif Ekbal, Pushpak Bhattacharyya

Keywords Paper

0

0

0

0

9:50

14/06/2020

ActBERT: Learning Global-Local Video-Text Representations

Linchao Zhu, Yi Yang

Keywords Paper

actbert, cross-modal pretraining, video and language, transformer, tangled transformer, instructional videos

0

0

0

0

4:58

19/08/2021

Progressive Open-Domain Response Generation with Multiple Controllable Attributes

Haiqin Yang, Xiaoyuan Yao, Yiqun Duan and
Jianping Shen, Jie Zhong, Kun Zhang

Keywords Paper

Machine Learning, Learning Generative Models, Dialogue

0

0

0

0

14:43

08/12/2020

Have Your Text and Use It Too! End-to-End Neural Data-to-Text Generation with Semantic Fidelity

Hamza Harkous, Isabel Groves, Amir Saffari

Keywords Paper

0

0

0

0

14:37

18/07/2021

EfficientTTS: An Efficient and High-Quality Text-to-Speech Architecture

Chenfeng Miao, Liang Shuang, Zhengchen Liu and
Chen Minchuan, Jun Ma, Shaojun Wang, Jing Xiao

Keywords Paper

Applications, Audio and Speech Processing

0

0

0

0

5:13

06/12/2021

Trash or Treasure? An Interactive Dual-Stream Strategy for Single Image Reflection Separation

Qiming Hu, Xiaojie Guo

Keywords Paper

deep learning

0

0

0

0

12:25

08/12/2020

Extreme Model Compression for On-device Natural Language Understanding

Kanthashree Mysore Sathyendra, Samridhi Choudhary, Leah Nicolich-Henkin

Keywords Paper

0

0

0

0

15:00

22/11/2021

From Seq2Seq Recognition to Handwritten Word Embeddings

George Retsinas, Giorgos Sfikas, Christophoros Nikou, Petros Maragos

Keywords Paper

keyword spotting, handwritten text recognition, sequence-to-sequence

0

0

0

0

2:59

26/04/2020

Residual Energy-Based Models for Text Generation

Yuntian Deng, Anton Bakhtin, Myle Ott and
Arthur Szlam, Marc'Aurelio Ranzato

Keywords Paper

energy-based models, text generation

0

0

0

0

4:59

18/07/2021

Latent Space Energy-Based Model of Symbol-Vector Coupling for Text Generation and Classification

Bo Pang, Ying Nian Wu

Keywords Paper

Algorithms, Unsupervised Learning

0

0

0

0

5:17

06/12/2021

TriBERT: Human-centric Audio-visual Representation Learning

Tanzila Rahman, Mengyu Yang, Leonid Sigal

Keywords Paper

transformers, representation learning

0

0

0

0

13:54

04/07/2020

Probabilistically Masked Language Model Capable of Autoregressive Generation in Arbitrary Word Order

Yi Liao, Xin Jiang, Qun Liu

Keywords Paper

Autoregressive Generation, natural tasks, natural generation, natural NLG

0

0

0

0

12:11

06/12/2020

Incorporating BERT into Parallel Sequence Decoding with Adapters

Junliang Guo, Zhirui Zhang, Linli Xu and
Hao-Ran Wei, Boxing Chen, Enhong Chen

Keywords Paper

0

0

0

0

3:17

30/11/2020

Show, Conceive and Tell: Image Captioning with Prospective Linguistic Information

Yiqing Huang, Jiansheng Chen

Keywords Paper

0

0

0

0

7:08

16/11/2020

DAGA: Data Augmentation with a Generation Approach forLow-resource Tagging Tasks

Bosheng Ding, Linlin Liu, Lidong Bing and
Canasai Kruengkrai, Thien Hai Nguyen, Shafiq Joty, Luo Si, Chunyan Miao

Keywords Paper

machine learning, generalization, low-resource tasks, named recognition

0

0

0

0

11:09

02/02/2021

Merging Statistical Feature via Adaptive Gate for Improved Text Classification

Xianming Li, Zongxi Li, Haoran Xie, Qing Li

Keywords Paper

0

0

0

0

14:56

26/04/2020

Plug and Play Language Models: A Simple Approach to Controlled Text Generation

Sumanth Dathathri, Andrea Madotto, Janice Lan and
Jane Hung, Eric Frank, Piero Molino, Jason Yosinski, Rosanne Liu

Keywords Paper

controlled text generation, generative models, conditional generative models, language modeling, transformer

0

0

1

1

4:58

26/04/2020

Semi-Supervised Generative Modeling for Controllable Speech Synthesis

Raza Habib, Soroosh Mariooryad, Matt Shannon and
Eric Battenberg, RJ Skerry-Ryan, Daisy Stanton, David Kao, Tom Bagby

Keywords Paper

TTS, Speech Synthesis, Semi-supervised Models, VAE, disentanglement

0

0

0

0

5:44

04/07/2020

SimulSpeech: End-to-End Simultaneous Speech to Text Translation

Yi Ren, Jinglin Liu, Xu Tan and
Chen Zhang, Tao Qin, Zhou Zhao, Tie-Yan Liu

Keywords Paper

simultaneous translation, simultaneous recognition, ASR, NMT

0

0

0

0

5:51