Audio-Visual Event Localization via Recursive Fusion by Joint Co-Attention

05/01/2021

Audio-Visual Event Localization via Recursive Fusion by Joint Co-Attention

Bin Duan, Hao Tang, Wei Wang, Ziliang Zong, Guowei Yang, Yan Yan

Keywords:

Abstract Paper Similar Papers

Abstract: The major challenge in audio-visual event localization task lies in how to fuse information from multiple modalities effectively. Recent works have shown that the attention mechanism is beneficial to the fusion process. In this paper, we propose a novel joint attention mechanism with multimodal fusion methods for audio-visual event localization. Particularly, we present a concise yet valid architecture that effectively learns representations from multiple modalities in a joint manner. Initially, visual features are combined with auditory features and then turned into joint representations. Next, we make use of the joint representations to attend to visual features and auditory features, respectively. With the help of this joint co-attention, new visual and auditory features are produced, and thus both features can enjoy the mutually improved benefits from each other. It is worth noting that the joint co-attention unit is recursive meaning that it can be performed multiple times for obtaining better joint representations progressively. Extensive experiments on the public AVE dataset have shown that the proposed method achieves significantly better results than the state-of-the-art methods.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at WACV 2021 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

06/12/2021

Attention Bottlenecks for Multimodal Fusion

Arsha Nagrani, Shan Yang, Anurag Arnab and
Aren Jansen, Cordelia Schmid, Chen Sun

Keywords Paper

machine learning, transformers

0

0

0

0

9:44

02/02/2021

Generative Partial Visual-Tactile Fused Object Clustering

Tao Zhang, Yang Cong, Gan Sun and
Jiahua Dong, Yuyang Liu, Zhengming Ding

Keywords Paper

0

0

0

0

15:49

14/09/2020

FedMAX: Mitigating Activation Divergence for Accurate and Communication-Efficient Federated Learning

Wei Chen, Kartikeya Bhardwaj, Radu Marculescu

Keywords Paper

federated learning, maximum entropy, non-iid

0

0

0

0

15:03

16/11/2020

Repulsive Attention: Rethinking Multi-head Attention as Bayesian Inference

Bang An, Jie Lyu, Zhenyi Wang and
Chunyuan Li, Changwei Hu, Fei Tan, Ruiyi Zhang, Yifan Hu, Changyou Chen

Keywords Paper

natural applications, attention collapse, neural mechanism, bayesian perspective

0

0

0

0

9:29

02/02/2021

Correlative Channel-Aware Fusion for Multi-View Time Series Classification

Yue Bai, Lichen Wang, Zhiqiang Tao and
Sheng Li, Yun Fu

Keywords Paper

0

0

0

0

14:04

13/04/2021

CLAR: Contrastive learning of auditory representations

Haider Al-Tahan, Yalda Mohsenzadeh

Keywords Paper

0

0

0

0

3:34

14/06/2020

Learning Selective Self-Mutual Attention for RGB-D Saliency Detection

Nian Liu, Ni Zhang, Junwei Han

Keywords Paper

rgb-d saliency detection, middle fusion, self-attention, mutual-attention, non-local network, two-stream cnn

0

0

0

0

1:01

07/09/2020

Centroid Based Concept Learning for RGB-D Indoor Scene Classification

Ali Ayub, Alan Wagner

Keywords Paper

cognitively-inspired learning, RGBD analysis, scene classification, category merging, labeling flaws analysis

0

0

0

0

10:03

02/02/2021

Interactive Speech and Noise Modeling for Speech Enhancement

Chengyu Zheng, Xiulian Peng, Yuan Zhang and
Sriram Srinivasan, Yan Lu

Keywords Paper

0

0

0

0

14:47

06/12/2021

Neural Population Geometry Reveals the Role of Stochasticity in Robust Perception

Joel Dapello, Jenelle Feather, Hang Le and
Tiago Marques, David Cox, Josh McDermott, James J DiCarlo, Sueyeon Chung

Keywords Paper

deep learning, machine learning, robustness, adversarial robustness and security, neuroscience

0

0

0

0

14:19

06/12/2020

Self-Supervised Learning by Cross-Modal Audio-Video Clustering

Humam Alwassel, Dhruv Mahajan, Bruno Korbar and
Lorenzo Torresani, Bernard Ghanem, Du Tran

Keywords Paper

, Applications -> Computer Vision

0

0

0

0

3:17

06/12/2020

Reconstructing Perceptive Images from Brain Activity by Shape-Semantic GAN

Tao Fang, Yu Qi, Gang Pan

Keywords Paper

0

0

0

0

3:16

30/11/2020

RGB-D Co-attention Network for Semantic Segmentation

Hao Zhou, Lu Qi, Zhaoliang Wan and
Hai Huang, Xu Yang

Keywords Paper

0

0

0

0

8:50

08/12/2020

Federated Learning for Spoken Language Understanding

Zhiqi Huang, Fenglin Liu, Yuexian Zou

Keywords Paper

0

0

0

0

14:05

04/07/2020

Multi-Domain Dialogue Acts and Response Co-Generation

Kai Wang, Junfeng Tian, Rui Wang and
Xiaojun Quan, Jianxing Yu

Keywords Paper

Generating responses, task-oriented systems, response generation, automatic evaluations

0

0

0

1

10:01

30/11/2020

Jointly Discriminating and Frequent Visual Representation Mining

Qiannan Wang, Ying Zhou, ZhaoYan Zhu and
Xuefeng Liang, Yu Gu

Keywords Paper

0

0

0

0

8:13

30/11/2020

Towards Robust Fine-grained Recognition by Maximal Separation of Discriminative Features

Krishna Kanth Nakka, Mathieu Salzmann

Keywords Paper

0

0

0

0

9:28

03/05/2021

Support-set bottlenecks for video-text representation learning

Mandela Patrick, Po-Yao Huang, Yuki Asano and
Florian Metze, Alexander G Hauptmann, Joao F. Henriques, Andrea Vedaldi

Keywords Paper

contrastive learning, video-text learning, multi-modal learning, video representation learning

0

0

0

0

6:40

05/01/2021

Regional Attention Networks With Context-Aware Fusion for Group Emotion Recognition

Ahmed Shehab Khan, Zhiyuan Li, Jie Cai, Yan Tong

Keywords Paper

0

0

0

0

5:00

19/04/2021

AREDSUM: Adaptive redundancy-aware iterative sentence ranking for extractive document summarization

Keping Bi, Rahul Jha, Bruce Croft, Asli Celikyilmaz

Keywords Paper

0

0

0

0

11:50

16/11/2020

Multistage Fusion with Forget Gate for Multimodal Summarization in Open-Domain Videos

Nayu Liu, Xian Sun, Hongfeng Yu and
Wenkai Zhang, Guangluan Xu

Keywords Paper

multimodal summarization, multimodal tasks, multiencoder-decoder frameworks, multistage network

0

0

0

0

11:24

06/12/2021

TriBERT: Human-centric Audio-visual Representation Learning

Tanzila Rahman, Mengyu Yang, Leonid Sigal

Keywords Paper

transformers, representation learning

0

0

0

0

13:54

06/12/2021

Contrastive Learning of Global and Local Video Representations

Shuang Ma, Zhaoyang Zeng, Daniel McDuff, Yale Song

Keywords Paper

machine learning, self-supervised learning, contrastive learning, representation learning

0

0

0

0

15:47

14/06/2020

RiFeGAN: Rich Feature Generation for Text-to-Image Synthesis From Prior Knowledge

Jun Cheng, Fuxiang Wu, Yanling Tian and
Lei Wang, Dapeng Tao

Keywords Paper

image synthesis, self-attentional embedding mixture, multi-captions, limited information, caption matching

0

0

0

0

1:01

02/02/2021

MERL: Multimodal Event Representation Learning in Heterogeneous Embedding Spaces

Linhai Zhang, Deyu Zhou, Yulan He, Zeng Yang

Keywords Paper

0

0

0

0

13:57

02/02/2021

Bridging Towers of Multi-task Learning with a Gating Mechanism for Aspect-based Sentiment Analysis and Sequential Metaphor Identification

Rui Mao, Xiao Li

Keywords Paper

0

0

0

0

19:27

07/09/2020

Contrastively-reinforced Attention Convolutional Neural Network for Fine-grained Image Recognition

Dichao Liu, Yu Wang, Jien Kato, Kenji Mase

Keywords Paper

attention learning, fine-grained recognition, deep learning

0

0

0

0

7:46

14/06/2020

Discriminative Multi-Modality Speech Recognition

Bo Xu, Cheng Lu, Yandong Guo, Jacob Wang

Keywords Paper

multi-modal, audio-visual, speech recognition, lip reading, deep learning, eleatt-gru, deep learning

0

0

0

0

1:01

14/06/2020

Squeeze-and-Attention Networks for Semantic Segmentation

Zilong Zhong, Zhong Qiu Lin, Rene Bidart and
Xiaodan Hu, Ibrahim Ben Daya, Zhifeng Li, Wei-Shi Zheng, Jonathan Li, Alexander Wong

Keywords Paper

semantic segmentation, squeeze-and-attention, pixel grouping

0

0

0

0

1:01

06/12/2020

Shared Space Transfer Learning for analyzing multi-site fMRI data

Tony Yousefnezhad, Alessandro Selvitella, Daoqiang Zhang and
Andrew Greenshaw, Russell Greiner

Keywords Paper

0

0

0

0

3:06

22/11/2021

Multiple Fusion Adaptation: A Strong Framework for Unsupervised Semantic Segmentation Adaptation

Kai Zhang, Yifan Sun, Rui Wang and
Haichang Li, Xiaohui Hu

Keywords Paper

domain adaptation, semantic segmentation, pseudo label learning

0

0

0

0

2:48

19/04/2021

Joint learning of hyperbolic label embeddings for hierarchical multi-label classification

Soumya Chatterjee, Ayush Maheshwari, Ganesh Ramakrishnan, Saketha Nath Jagaralpudi

Keywords Paper

0

0

0

0

12:31

06/12/2021

Variational Multi-Task Learning with Gumbel-Softmax Priors

Jiayi Shen, Xiantong Zhen, Marcel Worring, Ling Shao

Keywords Paper

machine learning, generative model

0

0

0

0

13:09

19/08/2021

GASP: Gated Attention for Saliency Prediction

Fares Abawi, Tom Weber, Stefan Wermter

Keywords Paper

Computer Vision, 2D and 3D Computer Vision, Biometrics, Face and Gesture Recognition, Deep Learning

0

0

0

0

13:14

06/12/2020

A new inference approach for training shallow and deep generalized linear models of noisy interacting neurons

Gabriel Mahuas, Giulio Isacchini, Olivier Marre and
Ulisse Ferrari, Thierry Mora

Keywords Paper

0

0

0

0

3:07

16/11/2020

SynSetExpan: An Iterative Framework for Joint Entity Set Expansion and Synonym Discovery

Jiaming Shen, Wenda Qiu, Jingbo Shang and
Michelle Vanni, Xiang Ren, Jiawei Han

Keywords Paper

entity expansion, synonym discovery, nlp tasks, synonym tasks

0

0

0

0

10:47

16/11/2020

The Return of Lexical Dependencies: Neural Lexicalized PCFGs

Hao Zhu, Yonatan Bisk, Graham Neubig

Keywords Paper

grammar induction, unsupervised induction, context methods, syntactic formalisms

0

0

0

0

11:37

05/01/2021

Dual-Stream Fusion Network for Spatiotemporal Video Super-Resolution

Min-Yuan Tseng, Yen-Chung Chen, Yi-Lun Lee and
Wei-Sheng Lai, Yi-Hsuan Tsai, Wei-Chen Chiu

Keywords Paper

0

0

0

0

4:58

05/01/2021

Intra-Class Part Swapping for Fine-Grained Image Classification

Lianbo Zhang, Shaoli Huang, Wei Liu

Keywords Paper

0

0

0

0

4:43

06/12/2020

Learning identifiable and interpretable latent models of high-dimensional neural activity using pi-VAE

Ding Zhou, Xue-Xin Wei

Keywords Paper

Reinforcement Learning and Planning -> Multi-Agent RL, Theory -> Game Theory and Computational Economics

0

0

0

0

3:22