On the Relationship between Self-Attention and Convolutional Layers

26/04/2020

On the Relationship between Self-Attention and Convolutional Layers

Jean-Baptiste Cordonnier, Andreas Loukas, Martin Jaggi

Keywords: self-attention, attention, transformers, convolution, CNN, image, expressivity, capacity

Abstract Paper Code Similar Papers

Abstract: Recent trends of incorporating attention mechanisms in vision have led researchers to reconsider the supremacy of convolutional layers as a primary building block. Beyond helping CNNs to handle long-range dependencies, Ramachandran et al. (2019) showed that attention can completely replace convolution and achieve state-of-the-art performance on vision tasks. This raises the question: do learned attention layers operate similarly to convolutional layers? This work provides evidence that attention layers can perform convolution and, indeed, they often learn to do so in practice. Specifically, we prove that a multi-head self-attention layer with sufficient number of heads is at least as expressive as any convolutional layer. Our numerical experiments then show that self-attention layers attend to pixel-grid patterns similarly to CNN layers, corroborating our analysis. Our code is publicly available.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at ICLR 2020 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

14/06/2020

Self-Supervised Monocular Trained Depth Estimation Using Self-Attention and Discrete Disparity Volume

Adrian Johnston, Gustavo Carneiro

Keywords Paper

self-supervised depth estimation, self-supervised learning, self-attention, depth estimation, uncertainty

0

0

0

0

1:01

06/12/2020

Self-Learning Transformations for Improving Gaze and Head Redirection

Yufeng Zheng, Seonwook Park, Xucong Zhang and
Shalini De Mello, Otmar Hilliges

Keywords Paper

0

0

0

0

3:20

02/02/2021

SeCo: Exploring Sequence Supervision for Unsupervised Representation Learning

Ting Yao, Yiheng Zhang, Zhaofan Qiu and
Yingwei Pan, Tao Mei

Keywords Paper

0

0

0

0

16:17

14/06/2020

Learning Selective Self-Mutual Attention for RGB-D Saliency Detection

Nian Liu, Ni Zhang, Junwei Han

Keywords Paper

rgb-d saliency detection, middle fusion, self-attention, mutual-attention, non-local network, two-stream cnn

0

0

0

0

1:01

02/02/2021

Augmented Partial Mutual Learning with Frame Masking for Video Captioning

Ke Lin, Zhuoxin Gan, Liwei Wang

Keywords Paper

0

0

0

0

16:57

06/12/2021

MLP-Mixer: An all-MLP Architecture for Vision

Ilya Tolstikhin, Neil Houlsby, Alexander Kolesnikov and
Lucas Beyer, Xiaohua Zhai, Thomas Unterthiner, Jessica Yung, Andreas Steiner, Daniel Keysers, Jakob Uszkoreit, Mario Lucic, Alexey Dosovitskiy

Keywords Paper

deep learning, machine learning, transformers, vision, transfer learning

0

0

0

0

11:18

14/06/2020

AdaCoF: Adaptive Collaboration of Flows for Video Frame Interpolation

Hyeongmin Lee, Taeoh Kim, Tae-young Chung and
Daehyun Pak, Yuseok Ban, Sangyoun Lee

Keywords Paper

video frame interpolation, video temporal super-resolution, frame rate up conversion, frame synthesis, motion estimation, motion compensation, frame warping

0

0

0

0

1:01

17/08/2020

Consistent video depth estimation

Xuan Luo, Jia-Bin Huang, Richard Szeliski and
Kevin Matzen, Johannes Kopf

Keywords Paper

video, depth estimation

0

0

0

1

12:43

14/06/2020

Deep Homography Estimation for Dynamic Scenes

Hoang Le, Feng Liu, Shu Zhang, Aseem Agarwala

Keywords Paper

homography estimation, dynamic scenes, motion estimation, multi-task learning, deep learning

0

0

0

0

1:01

02/02/2021

CompFeat: Comprehensive Feature Aggregation for Video Instance Segmentation

Yang Fu, Linjie Yang, Ding Liu and
Thomas S. Huang, Humphrey Shi

Keywords Paper

0

0

0

0

16:24

14/06/2020

Multi-Domain Learning for Accurate and Few-Shot Color Constancy

Jin Xiao, Shuhang Gu, Lei Zhang

Keywords Paper

color constancy, multi-domain learning, few-shot

0

0

0

0

1:01

07/09/2020

Attention Distillation for Learning Video Representations

Miao Liu, Xin Chen, Yun Zhang and
Yin Li, James Rehg

Keywords Paper

Action Recognition, Deep Learning, Representation Learning

0

0

0

0

9:50

06/12/2021

Revisiting ResNets: Improved Training and Scaling Strategies

Irwan Bello, William Fedus, Xianzhi Du and
Ekin Dogus Cubuk, Aravind Srinivas, Tsung-Yi Lin, Jonathon Shlens, Barret Zoph

Keywords Paper

machine learning, vision, semi-supervised learning

0

0

0

0

13:59

22/11/2021

StyleVideoGAN: A Temporal Generative Model using a Pretrained StyleGAN

Gereon Fox, Ayush Tewari, Mohamed Elgharib, Christian Theobalt

Keywords Paper

video generation, StyleGAN, GAN, embedding, faces, hands, cars, RNN

0

0

0

0

8:07

12/07/2020

VideoOneNet: Bidirectional Convolutional Recurrent OneNet with Trainable Data Steps for Video Processing

Zoltán Milacski, Barnabás Póczos, Andras Lorincz

Keywords Paper

Sequential, Network, and Time-Series Modeling

0

0

0

0

14:17

14/06/2020

Focus on Defocus: Bridging the Synthetic to Real Domain Gap for Depth Estimation

Maxim Maximov, Kevin Galim, Laura Leal-Taixé

Keywords Paper

depth estimation, generalisation, depth from focus, blur estimation, depth

0

0

0

0

1:01

06/12/2021

Improving Computational Efficiency in Visual Reinforcement Learning via Stored Embeddings

Lili Chen, Kimin Lee, Aravind Srinivas, Pieter Abbeel

Keywords Paper

deep learning, reinforcement learning and planning

0

0

0

0

4:36

22/11/2021

Hierarchical Contrastive Motion Learning for Video Action Recognition

Xitong Yang, Xiaodong Yang, Sifei Liu and
Deqing Sun, Larry Davis, Jan Kautz

Keywords Paper

action recognition, motion hierarchy, motion representation, contrastive learning

0

0

0

0

8:29

06/12/2021

Shifted Chunk Transformer for Spatio-Temporal Representational Learning

Xuefan Zha, Wentao Zhu, Lv Xun and
Sen Yang, Ji Liu

Keywords Paper

machine learning, transformers, vision, language

0

0

0

0

6:14

02/02/2021

Generalized Adversarially Learned Inference

Yatin Dandi, Homanga Bharadhwaj, Abhishek Kumar, Piyush Rai

Keywords Paper

0

0

0

0

16:22

02/02/2021

Arbitrary Video Style Transfer via Multi-Channel Correlation

Yingying Deng, Fan Tang, Weiming Dong and
Haibin Huang, Chongyang Ma, Changsheng Xu

Keywords Paper

0

0

0

0

14:55

14/06/2020

Towards Visually Explaining Variational Autoencoders

Wenqian Liu, Runze Li, Meng Zheng and
Srikrishna Karanam, Ziyan Wu, Bir Bhanu, Richard J. Radke, Octavia Camps

Keywords Paper

explainability, reasoning, attention, vae, anomaly, disentanglement

0

0

0

0

4:56

06/12/2021

Neural Routing by Memory

Kaipeng Zhang, Zhenqiang Li, Zhifeng Li and
Wei Liu, Yoichi Sato

Keywords Paper

deep learning

0

0

0

0

6:41

05/01/2021

High-Quality Frame Interpolation via Tridirectional Inference

Jinsoo Choi, Jaesik Park, In So Kweon

Keywords Paper

0

0

0

0

4:08

22/11/2021

Single-Modal Entropy based Active Learning for Visual Question Answering

Dong-Jin Kim, Jae Won Cho, Jinsoo Choi and
Yunjae Jung, In So Kweon

Keywords Paper

Visual Question Answering, Vision and Language, Active Learning

0

0

0

0

2:42

22/11/2021

Self-Supervised Monocular Depth Estimation with Internal Feature Fusion

Hang Zhou, David Greenwood, Sarah Taylor

Keywords Paper

depth estimation, structure from motion

0

0

0

0

2:49

14/06/2020

Mnemonics Training: Multi-Class Incremental Learning Without Forgetting

Yaoyao Liu, Yuting Su, An-An Liu and
Bernt Schiele, Qianru Sun

Keywords Paper

incremental learning, continual learning, classification, recognition, transfer learning, representation learning, bilevel optimization, online learning, imagenet, cifar-100

0

0

0

0

5:01

14/06/2020

Exploring Self-Attention for Image Recognition

Hengshuang Zhao, Jiaya Jia, Vladlen Koltun

Keywords Paper

self-attention, pairwise, patchwise, vector attention, image recognition

0

0

0

0

1:02

14/06/2020

Cascaded Deep Video Deblurring Using Temporal Sharpness Prior

Jinshan Pan, Haoran Bai, Jinhui Tang

Keywords Paper

video deblurring, deep convolutional neural network, motion blur estimation, optical flow, temporal sharpness prior, image restoration

0

0

0

0

0:53

18/07/2021

Improved OOD Generalization via Adversarial Training and Pretraing

Mingyang Yi, Lu Hou, Jiacheng Sun and
Lifeng Shang, Xin Jiang, Qun Liu, Zhiming Ma

Keywords Paper

Theory, Deep learning Theory

0

0

0

0

4:11

05/01/2021

Dual-Stream Fusion Network for Spatiotemporal Video Super-Resolution

Min-Yuan Tseng, Yen-Chung Chen, Yi-Lun Lee and
Wei-Sheng Lai, Yi-Hsuan Tsai, Wei-Chen Chiu

Keywords Paper

0

0

0

0

4:58

05/01/2021

Learning Fast Converging, Effective Conditional Generative Adversarial Networks With a Mirrored Auxiliary Classifier

Zi Wang

Keywords Paper

0

0

0

0

4:59

14/06/2020

Syntax-Aware Action Targeting for Video Captioning

Qi Zheng, Chaoyue Wang, Dacheng Tao

Keywords Paper

video and language, video captioning, action predicting

0

0

0

0

1:01

06/12/2021

Revisiting Contrastive Methods for Unsupervised Learning of Visual Representations

Wouter Van Gansbeke, Simon Vandenhende, Stamatios Georgoulis, Luc V Gool

Keywords Paper

self-supervised learning, vision, contrastive learning, representation learning

0

0

0

0

13:32

30/11/2020

Transforming Multi-Concept Attention into Video Summarization

Yen-Ting Liu, Yu-Jhe Li, Yu-Chiang Frank Wang

Keywords Paper

0

0

0

0

7:07

06/12/2021

Learning from Inside: Self-driven Siamese Sampling and Reasoning for Video Question Answering

Weijiang Yu, Haoteng Zheng, Mengfei Li and
Lei Ji, Lijun Wu, Nong Xiao, Nan Duan

Keywords Paper

transformers

0

0

0

0

13:47

14/06/2020

Time Flies: Animating a Still Image With Time-Lapse Video As Reference

Chia-Chi Cheng, Hung-Yu Chen, Wei-Chen Chiu

Keywords Paper

time-lapse video animation, self-supervised learning, style transfer, temporal consistency

0

0

0

0

1:01

03/05/2021

$i$-Mix: A Domain-Agnostic Strategy for Contrastive Representation Learning

Kibok Lee, Yian Zhu, Kihyuk Sohn and
Chun-Liang Li, Jinwoo Shin, Honglak Lee

Keywords Paper

self-supervised learning, unsupervised representation learning, data augmentation, MixUp, contrastive representation learning

0

0

0

0

5:04

26/04/2020

AssembleNet: Searching for Multi-Stream Neural Connectivity in Video Architectures

Michael S. Ryoo, AJ Piergiovanni, Mingxing Tan, Anelia Angelova

Keywords Paper

video representation learning, video understanding, activity recognition, neural architecture search

0

0

0

0

5:02

14/06/2020

CONSAC: Robust Multi-Model Fitting by Conditional Sample Consensus

Florian Kluger, Eric Brachmann, Hanno Ackermann and
Carsten Rother, Michael Ying Yang, Bodo Rosenhahn

Keywords Paper

robust estimator, reinforcement learning, self-supervised, unsupervised, multi-model, ransac, dataset, vanishing points, homography, 3d reconstruction

0

0

0

0

1:00