Attention Bottlenecks for Multimodal Fusion

06/12/2021

Attention Bottlenecks for Multimodal Fusion

Arsha Nagrani, Shan Yang, Anurag Arnab, Aren Jansen, Cordelia Schmid, Chen Sun

Keywords: machine learning, transformers

Abstract Paper Similar Papers

Abstract: Humans perceive the world by concurrently processing and fusing high-dimensional inputs from multiple modalities such as vision and audio. Machine perception models, in stark contrast, are typically modality-specific and optimised for unimodal benchmarks.A common approach for building multimodal models is to simply combine multiple of these modality-specific architectures using late-stage fusion of final representations or predictions ('late-fusion').Instead, we introduce a novel transformer based architecture that uses 'attention bottlenecks' for modality fusion at multiple layers. Compared to traditional pairwise self-attention, these bottlenecks force information between different modalities to pass through a small number of '`bottleneck' latent units, requiring the model to collate and condense the most relevant information in each modality and only share what is necessary. We find that such a strategy improves fusion performance, at the same time reducing computational cost. We conduct thorough ablation studies, and achieve state-of-the-art results on multiple audio-visual classification benchmarks including Audioset, Epic-Kitchens and VGGSound. All code and models will be released.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at NeurIPS 2021 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

18/07/2021

Perceiver: General Perception with Iterative Attention

Andrew Jaegle, Felix Axel Gimeno Gil, Andy Brock and
Oriol Vinyals, Andrew Zisserman, Joao Carreira

Keywords Paper

Deep Learning, Architectures

0

0

0

0

5:13

01/07/2020

Low Rank Fusion based Transformers for Multimodal Sequences

Saurav Sahay, Eda Okur, Shachi H Kumar, Lama Nachman

Keywords Paper

0

0

0

0

14:12

05/01/2021

Audio-Visual Event Localization via Recursive Fusion by Joint Co-Attention

Bin Duan, Hao Tang, Wei Wang and
Ziliang Zong, Guowei Yang, Yan Yan

Keywords Paper

0

0

0

0

4:11

06/12/2021

Associating Objects with Transformers for Video Object Segmentation

Zongxin Yang, Yunchao Wei, Yi Yang

Keywords Paper

transformers

0

0

0

0

12:29

19/04/2021

Adaptive fusion techniques for multimodal data

Gaurav Sahu, Olga Vechtomova

Keywords Paper

0

0

0

0

11:45

16/11/2020

Multistage Fusion with Forget Gate for Multimodal Summarization in Open-Domain Videos

Nayu Liu, Xian Sun, Hongfeng Yu and
Wenkai Zhang, Guangluan Xu

Keywords Paper

multimodal summarization, multimodal tasks, multiencoder-decoder frameworks, multistage network

0

0

0

0

11:24

23/08/2020

Time-aware user embeddings as a service

Martin Pavlovski, Jelena Gligorijevic, Ivan Stojkovic and
Shubham Agrawal, Shabhareesh Komirishetty, Djordje Gligorijevic, Narayan Bhamidipati, Zoran Obradovic

Keywords Paper

sequential models, user representation, neural embeddings

0

0

0

0

19:42

05/01/2021

Attentional Feature Fusion

Yimian Dai, Fabian Gieseke, Stefan Oehmcke and
Yiquan Wu, Kobus Barnard

Keywords Paper

0

0

0

0

4:58

18/07/2021

OmniNet: Omnidirectional Representations from Transformers

Yi Tay, Mostafa Dehghani, Vamsi Aribandi and
Jai Gupta, Philip Pham, Zhen Qin, Dara Bahri, Da-Cheng Juan, Don Metzler

Keywords Paper

Deep Learning, Predictive Models, Algorithms, Representation Learning; Neuroscience and Cognitive Science; Neuroscience and Cognitive Science, Problem Solvin, Deep Learning, Architectures

0

0

0

0

17:00

18/07/2021

Bayesian Attention Belief Networks

Shujian Zhang, Xinjie Fan, Bo Chen, Mingyuan Zhou

Keywords Paper

, Applications, Program Understanding and Generation, Deep Learning, Bayesian Deep Learning

0

0

0

0

4:28

16/11/2020

Transformer Based Multi-Source Domain Adaptation

Dustin Wright, Isabelle Augenstein

Keywords Paper

unsupervised adaptation, cnns, rnns, domain classifiers

0

0

0

0

11:30

06/12/2021

Co-Adaptation of Algorithmic and Implementational Innovations in Inference-based Deep Reinforcement Learning

Hiroki Furuta, Tadashi Kozuno, Tatsuya Matsushima and
Yutaka Matsuo, Shixiang (Shane) Gu

Keywords Paper

reinforcement learning and planning

0

0

0

0

10:00

19/08/2021

Progressive Open-Domain Response Generation with Multiple Controllable Attributes

Haiqin Yang, Xiaoyuan Yao, Yiqun Duan and
Jianping Shen, Jie Zhong, Kun Zhang

Keywords Paper

Machine Learning, Learning Generative Models, Dialogue

0

0

0

0

14:43

07/09/2020

Multimodal Image Translation with Stochastic Style Representations and Mutual Information Loss

Sanghyeon Na, Seungjoo Yoo, Jaegul Choo

Keywords Paper

image-to-image translation, generative adversarial network

0

0

0

0

9:52

02/02/2021

Dual-Octave Convolution for Accelerated Parallel MR Image Reconstruction

Chun-Mei Feng, Zhanyuan Yang, Geng Chen and
Yong Xu, Ling Shao

Keywords Paper

0

0

0

0

14:05

26/04/2020

Multiplicative Interactions and Where to Find Them

Siddhant M. Jayakumar, Wojciech M. Czarnecki, Jacob Menick and
Jonathan Schwarz, Jack Rae, Simon Osindero, Yee Whye Teh, Tim Harley, Razvan Pascanu

Keywords Paper

multiplicative interactions, hypernetworks, attention

0

0

0

0

5:34

18/07/2021

Explaining Time Series Predictions with Dynamic Masks

Jonathan Crabbé, Mihaela van der Schaar

Keywords Paper

Social Aspects of Machine Learning, Fairness, Accountability, and Transparency

0

0

0

0

5:17

03/05/2021

Rethinking Attention with Performers

Krzysztof Choromanski, Valerii Likhosherstov, David Dohan and
Richard Song, Georgiana-Andreea Gane, Tamas Sarlos, Peter Hawkins, Jared Q Davis, Afroz Mohiuddin, Lukasz Kaiser, David Belanger, Lucy J Colwell, Adrian Weller

Keywords Paper

attention, transformer, sparsity, softmax, linear, approximation, performer, bert, bidirectional, unidirectional, orthogonal, random, features, FAVOR, kernel, generalized, reformer, linformer, protein, trembl, uniprot

0

0

0

0

12:28

06/12/2021

Discrete-Valued Neural Communication

Dianbo Liu, Alex Lamb, Kenji Kawaguchi and
Anirudh Goyal ALIAS PARTH GOYAL, Chen Sun, Michael Mozer, Yoshua Bengio

Keywords Paper

deep learning, robustness, transformers, generative model, graph learning

0

0

0

0

11:09

06/12/2021

Estimating the Unique Information of Continuous Variables

Ari Pakman, Amin Nejatbakhsh, Dar Gilboa and
Abdullah Makkeh, Luca Mazzucato, Michael Wibral, Elad Schneidman

Keywords Paper

deep learning, optimization, generative model

0

0

0

0

12:39

14/06/2020

What Makes Training Multi-Modal Classification Networks Hard?

Weiyao Wang, Du Tran, Matt Feiszli

Keywords Paper

video classification, multi-modal, overfitting, action recognition, acoustic event detection

0

0

0

0

1:01

26/08/2020

BasisVAE: Translation-invariant feature-level clustering with Variational Autoencoders

Kaspar Märtens, Christopher Yau

Keywords Paper

0

0

0

0

15:30

06/12/2020

Deep reconstruction of strange attractors from time series

William Gilpin

Keywords Paper

0

0

0

0

3:21

26/04/2020

Learning Nearly Decomposable Value Functions Via Communication Minimization

Tonghan Wang, Jianhao Wang, Chongyi Zheng, Chongjie Zhang

Keywords Paper

Multi-agent reinforcement learning, Nearly decomposable value function, Minimized communication

0

0

0

0

5:00

03/05/2021

Domain-Robust Visual Imitation Learning with Mutual Information Constraints

Edoardo Cetin, Oya Celiktutan

Keywords Paper

Domain Adaption, Third-Person Imitation, Observational Imitation, Reinforcement Learning, Machine Learning, Mutual Information, Imitation Learning

0

0

0

0

4:51

15/06/2020

Reconciling enumerative and deductive program synthesis

Kangjing Huang, Xiaokang Qiu, Peiyuan Shen, Yanjun Wang

Keywords Paper

divide-and-conquer, enumerative synthesis, syntax-guided synthesis, deductive synthesis

0

0

0

0

16:00

14/06/2020

RoutedFusion: Learning Real-Time Depth Map Fusion

Silvan Weder, Johannes Schönberger, Marc Pollefeys, Martin R. Oswald

Keywords Paper

depth map fusion, online 3d reconstruction, deep learning, real-time applications, 3d geometry

0

0

0

0

5:00

18/07/2021

Evolving Attention with Residual Convolutions

Yujing Wang, Yaming Yang, Jiangang Bai and
Mingliang Zhang, Jing Bai, JING YU, Ce Zhang, Gao Huang, Yunhai Tong

Keywords Paper

Deep Learning, Architectures

0

0

0

0

4:36

04/07/2020

Multi-Domain Dialogue Acts and Response Co-Generation

Kai Wang, Junfeng Tian, Rui Wang and
Xiaojun Quan, Jianxing Yu

Keywords Paper

Generating responses, task-oriented systems, response generation, automatic evaluations

0

0

0

1

10:01

05/01/2021

Facial Emotion Recognition With Noisy Multi-Task Annotations

Siwei Zhang, Zhiwu Huang, Danda Pani Paudel, Luc Van Gool

Keywords Paper

0

0

0

0

4:48

06/12/2020

Shared Space Transfer Learning for analyzing multi-site fMRI data

Tony Yousefnezhad, Alessandro Selvitella, Daoqiang Zhang and
Andrew Greenshaw, Russell Greiner

Keywords Paper

0

0

0

0

3:06

06/12/2021

MLP-Mixer: An all-MLP Architecture for Vision

Ilya Tolstikhin, Neil Houlsby, Alexander Kolesnikov and
Lucas Beyer, Xiaohua Zhai, Thomas Unterthiner, Jessica Yung, Andreas Steiner, Daniel Keysers, Jakob Uszkoreit, Mario Lucic, Alexey Dosovitskiy

Keywords Paper

deep learning, machine learning, transformers, vision, transfer learning

0

0

0

0

11:18

19/08/2021

Details (Don't) Matter: Isolating Cluster Information in Deep Embedded Spaces

Lukas Miklautz, Lena G. M. Bauer, Dominik Mautz and
Sebastian Tschiatschek, Christian Böhm, Claudia Plant

Keywords Paper

Machine Learning, Deep Learning, Explainable/Interpretable Machine Learning, Clustering

0

0

0

0

14:37

02/02/2021

Improving Image Captioning by Leveraging Intra- and Inter-layer Global Representation in Transformer Network

Jiayi Ji, Yunpeng Luo, Xiaoshuai Sun and
Fuhai Chen, Gen Luo, Yongjian Wu, Yue Gao, Rongrong Ji

Keywords Paper

0

0

0

0

14:13

25/07/2020

Multiplex behavioral relation learning for recommendation via memory augmented transformer network

Lianghao Xia, Chao Huang, Yong Xu and
Peng Dai, Bo Zhang, Liefeng Bo

Keywords Paper

collaborative filtering, transformer network, recommendation, multi-behavior learning, deep neural networks

0

0

0

0

19:21

06/12/2020

GOCor: Bringing Globally Optimized Correspondence Volumes into Your Neural Network

Prune Truong, Martin Danelljan, Luc V Gool, Radu Timofte

Keywords Paper

0

0

0

0

3:18

14/06/2020

WaveletStereo: Learning Wavelet Coefficients of Disparity Map in Stereo Matching

Menglong Yang, Fangrui Wu, Wei Li

Keywords Paper

stereo matching, wavelet coefficients, inverse wavelet transform, supervised learning, deep representation, multi-scale features, multi-resolution cost volume, wavelet regression, disparity reconstruction, disparity refinement

0

0

0

0

1:01

04/07/2020

SimulSpeech: End-to-End Simultaneous Speech to Text Translation

Yi Ren, Jinglin Liu, Xu Tan and
Chen Zhang, Tao Qin, Zhou Zhao, Tie-Yan Liu

Keywords Paper

simultaneous translation, simultaneous recognition, ASR, NMT

0

0

0

0

5:51

06/12/2020

Sparse Graphical Memory for Robust Planning

Scott Emmons, Ajay Jain, Misha Laskin and
Thanard Kurutach, Pieter Abbeel, Deepak Pathak

Keywords Paper

0

0

0

0

3:23

30/11/2020

Robust High Dynamic Range (HDR) Imaging with Complex Motion and Parallax

Zhiyuan Pu, Peiyao Guo, M. Salman Asif, Zhan Ma

Keywords Paper

0

0

0

0

7:38