MLP-Mixer: An all-MLP Architecture for Vision

06/12/2021

MLP-Mixer: An all-MLP Architecture for Vision

Ilya Tolstikhin, Neil Houlsby, Alexander Kolesnikov, Lucas Beyer, Xiaohua Zhai, Thomas Unterthiner, Jessica Yung, Andreas Steiner, Daniel Keysers, Jakob Uszkoreit, Mario Lucic, Alexey Dosovitskiy

Keywords: deep learning, machine learning, transformers, vision, transfer learning

Abstract Paper Similar Papers

Abstract: Convolutional Neural Networks (CNNs) are the go-to model for computer vision. Recently, attention-based networks, such as the Vision Transformer, have also become popular. In this paper we show that while convolutions and attention are both sufficient for good performance, neither of them are necessary. We present MLP-Mixer, an architecture based exclusively on multi-layer perceptrons (MLPs). MLP-Mixer contains two types of layers: one with MLPs applied independently to image patches (i.e. "mixing" the per-location features), and one with MLPs applied across patches (i.e. "mixing" spatial information). When trained on large datasets, or with modern regularization schemes, MLP-Mixer attains competitive scores on image classification benchmarks, with pre-training and inference cost comparable to state-of-the-art models. We hope that these results spark further research beyond the realms of well established CNNs and Transformers.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at NeurIPS 2021 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

06/12/2021

Container: Context Aggregation Networks

peng gao, Jiasen Lu, hongsheng Li and
Roozbeh Mottaghi, Aniruddha Kembhavi

Keywords Paper

deep learning, self-supervised learning, transformers, vision, language

0

0

0

0

8:50

22/11/2021

Rethinking Token-Mixing MLP for MLP-based Vision Backbone

Tan Yu, XU LI, Yunfeng Cai and
Mingming Sun, Ping Li

Keywords Paper

vision backbone, MLP, image recognition

0

0

0

0

1:59

26/04/2020

Scale-Equivariant Steerable Networks

Ivan Sosnovik, Michał Szmaja, Arnold Smeulders

Keywords Paper

Scale Equivariance, Steerable Filters

0

0

0

0

5:48

06/12/2021

Do Vision Transformers See Like Convolutional Neural Networks?

Maithra Raghu, Thomas Unterthiner, Simon Kornblith and
Chiyuan Zhang, Alexey Dosovitskiy

Keywords Paper

deep learning, machine learning, transformers, vision, representation learning, transfer learning

0

0

0

0

13:13

06/12/2021

Intriguing Properties of Vision Transformers

Muhammad Muzammal Naseer, Kanchana Ranasinghe, Salman H Khan and
Munawar Hayat, Fahad Shahbaz Khan, Ming-Hsuan Yang

Keywords Paper

deep learning, machine learning, robustness, transformers, vision, few shot learning

0

0

0

0

12:32

26/04/2020

Network Deconvolution

Chengxi Ye, Matthew Evanusa, Hua He and
Anton Mitrokhin, Tom Goldstein, James A. Yorke, Cornelia Fermuller, Yiannis Aloimonos

Keywords Paper

convolutional networks, network deconvolution, whitening

0

0

0

0

4:59

03/05/2021

Robust and Generalizable Visual Representation Learning via Random Convolutions

Zhenlin Xu, Deyi Liu, Junlin Yang and
Colin Raffel, Marc Niethammer

Keywords Paper

robustness, domain generalization, representation learning, data augmentation

0

1

0

0

5:06

02/02/2021

Explaining Convolutional Neural Networks through Attribution-Based Input Sampling and Block-Wise Feature Aggregation

Sam Sattarzadeh, Mahesh Sudhakar, Anthony Lem and
Shervin Mehryar, Konstantinos N Plataniotis, Jongseong Jang, Hyunwoo Kim, Yeonjeong Jeong, Sangmin Lee, Kyunghoon Bae

Keywords Paper

0

0

0

0

19:59

02/02/2021

DenserNet: Weakly Supervised Visual Localization Using Multi-Scale Feature Aggregation

Dongfang Liu, Yiming Cui, Liqi Yan and
Christos Mousas, Baijian Yang, Yingjie Chen

Keywords Paper

0

0

0

0

16:15

03/05/2021

Attentional Constellation Nets for Few-Shot Learning

Weijian Xu, Yifan Xu, Huaijin Wang, Zhuowen Tu

Keywords Paper

few-shot learning, constellation models

0

0

0

0

5:10

14/06/2020

Adaptive Fractional Dilated Convolution Network for Image Aesthetics Assessment

Qiuyu Chen, Wei Zhang, Ning Zhou and
Peng Lei, Yi Xu, Yu Zheng, Jianping Fan

Keywords Paper

image aesthetics assessment, kernel embedding, adaptive convolution, parameter-free, aspect ratio

0

0

0

0

1:01

26/04/2020

On the Relationship between Self-Attention and Convolutional Layers

Jean-Baptiste Cordonnier, Andreas Loukas, Martin Jaggi

Keywords Paper

self-attention, attention, transformers, convolution, CNN, image, expressivity, capacity

0

0

0

0

5:18

22/11/2021

SamplingAug: On the Importance of Patch Sampling Augmentation for Single Image Super-Resolution

Shizun Wang, Ming Lu, Kaixin Chen and
Jiaming Liu, Xiaoqi Li, Chuang Zhang, Ming Wu

Keywords Paper

Super-Resolution, Patch Sampling

0

0

0

0

2:18

22/11/2021

Searching for TrioNet: Combining Convolution with Local and Global Self-Attention

Huaijin Pi, Huiyu Wang, Yingwei Li and
Zizhang Li, Alan Yuille

Keywords Paper

Self-Attention, Neural Architecture Search

0

0

0

0

2:56

02/02/2021

Patch-Wise Attention Network for Monocular Depth Estimation

Sihaeng Lee, Janghyeon Lee, Byungju Kim and
Eojindl Yi, Junmo Kim

Keywords Paper

0

0

0

0

14:15

30/11/2020

Attention-Aware Feature Aggregation for Real-time Stereo Matching on Edge Devices

Jia-Ren Chang National Chiao Tung University, aetherAI, Pei-Chun Chang, Yong-Sheng Chen

Keywords Paper

0

0

0

0

9:53

12/07/2020

VideoOneNet: Bidirectional Convolutional Recurrent OneNet with Trainable Data Steps for Video Processing

Zoltán Milacski, Barnabás Póczos, Andras Lorincz

Keywords Paper

Sequential, Network, and Time-Series Modeling

0

0

0

0

14:17

06/12/2021

Dynamic Normalization and Relay for Video Action Recognition

Dongqi Cai, Anbang Yao, Yurong Chen

Keywords Paper

deep learning, representation learning

0

0

0

0

10:42

19/10/2020

Deep adaptive feature aggregation in multi-task convolutional neural networks

Zhen Shen, Chaoran Cui, Jin Huang and
Jian Zong, Meng Chen, Yilong Yin

Keywords Paper

convolutional neural networks, multi-task learning, adaptive feature aggregation

0

0

0

0

6:36

14/06/2020

Unified Dynamic Convolutional Network for Super-Resolution With Variational Degradations

Yu-Syuan Xu, Shou-Yao Roy Tseng, Yu Tseng and
Hsien-Kai Kuo, Yi-Min Tsai

Keywords Paper

super-resolution, dynamic convolution, variational degradations, multiple degradations

0

0

0

0

1:00

07/09/2020

Making a Case for 3D Convolutions for Object Segmentation in Videos

Sabarinath Mahadevan, Ali Athar, Aljosa Osep and
Laura Leal-Taixé, Bastian Leibe, Sebastian Hennen

Keywords Paper

object tracking, video segmentation, video object segmentation, video scene understanding, object segmentation

0

0

0

0

8:16

03/05/2021

Domain Generalization with MixStyle

Kaiyang Zhou, Yongxin Yang, Yu Qiao, Tao Xiang

Keywords Paper

Style Mixing, Domain Generalization

0

0

0

0

4:28

19/08/2021

Few-shot Neural Human Performance Rendering from Sparse RGBD Videos

Anqi Pang, Xin Chen, Haimin Luo and
Minye Wu, Jingyi Yu, Lan Xu

Keywords Paper

Computer Vision, 2D and 3D Computer Vision, Biometrics, Face and Gesture Recognition, Motion and Tracking

0

0

0

0

11:02

06/12/2021

Adaptive Denoising via GainTuning

Sreyas Mohan, Joshua L Vincent, Ramon Manzorro and
Peter Crozier, Carlos Fernandez-Granda, Eero P Simoncelli

Keywords Paper

deep learning

0

0

0

0

15:08

14/06/2020

Erasing Integrated Learning: A Simple Yet Effective Approach for Weakly Supervised Object Localization

Jinjie Mai, Meng Yang, Wenfeng Luo

Keywords Paper

weakly supervised, object localization, adversarial erasing

0

0

0

0

5:00

14/06/2020

Non-Local Neural Networks With Grouped Bilinear Attentional Transforms

Lu Chi, Zehuan Yuan, Yadong Mu, Changhu Wang

Keywords Paper

attention, non-local, bilinear, image classification, video classification, grouped, data-adaptive

0

0

0

0

1:01

05/01/2021

Hierarchical Generative Adversarial Networks for Single Image Super-Resolution

Weimin Chen, Yuqing Ma, Xianglong Liu, Yi Yuan

Keywords Paper

0

0

0

0

4:46

06/12/2020

Sparse Graphical Memory for Robust Planning

Scott Emmons, Ajay Jain, Misha Laskin and
Thanard Kurutach, Pieter Abbeel, Deepak Pathak

Keywords Paper

0

0

0

0

3:23

07/09/2020

Few-Shot Learning with Complex-valued Neural Networks

Zhen Liu, Baochang Zhang, Guodong Guo

Keywords Paper

few-shot learning, complex-valued network, metric-learning, image classification

0

0

0

0

7:15

06/12/2021

Encoding Robustness to Image Style via Adversarial Feature Perturbations

Manli Shu, Zuxuan Wu, Micah Goldblum, Tom Goldstein

Keywords Paper

deep learning, machine learning, robustness, adversarial robustness and security, domain adaptation

0

0

0

0

7:36

14/06/2020

Improving Convolutional Networks With Self-Calibrated Convolutions

Jiang-Jiang Liu, Qibin Hou, Ming-Ming Cheng and
Changhu Wang, Jiashi Feng

Keywords Paper

self-calibrated, feature transformation, image classification, network architecture, convolutional neural networks

0

0

0

0

1:00

22/11/2021

SwinFGHash: Fine-grained Image Retrieval via Transformer-based Hashing Network

Di Lu, Jinpeng Wang, Ziyun Zeng and
Bin Chen, Shudeng Wu, Shu-Tao Xia

Keywords Paper

Image Retrieval, Deep Hashing, Fine-grained, Transformer

0

0

0

0

2:57

14/06/2020

Regularizing CNN Transfer Learning With Randomised Regression

Yang Zhong, Atsuto Maki

Keywords Paper

transfer learning, network regularization, randomised regression, pseudo task regularization, limited samples

0

0

0

0

0:58

14/06/2020

Deep Optics for Single-Shot High-Dynamic-Range Imaging

Christopher A. Metzler, Hayato Ikoma, Yifan Peng, Gordon Wetzstein

Keywords Paper

high-dynamic-range imaging, point-spread-function engineering, end-to-end learning, computational imaging, deep learning, optics, photography

0

0

0

0

5:01

18/07/2021

Exploiting Shared Representations for Personalized Federated Learning

Liam Collins, Hamed Hassani, Aryan Mokhtari, Sanjay Shakkottai

Keywords Paper

Algorithms, Multitask, Transfer, and Meta Learning

1

1

0

1

5:09

30/11/2020

CLASS: Cross-Level Attention and Supervision for Salient Objects Detection

Lv Tang, Bo Li

Keywords Paper

0

0

0

0

7:04

06/12/2021

Neural Routing by Memory

Kaipeng Zhang, Zhenqiang Li, Zhifeng Li and
Wei Liu, Yoichi Sato

Keywords Paper

deep learning

0

0

0

0

6:41

14/06/2020

Self-Supervised Monocular Trained Depth Estimation Using Self-Attention and Discrete Disparity Volume

Adrian Johnston, Gustavo Carneiro

Keywords Paper

self-supervised depth estimation, self-supervised learning, self-attention, depth estimation, uncertainty

0

0

0

0

1:01

06/12/2020

Bootstrap Your Own Latent - A New Approach to Self-Supervised Learning

Jean-Bastien Grill, Florian Strub, Florent Altché and
Corentin Tallec, Pierre Richemond, Elena Buchatskaya, Carl Doersch, Bernardo Avila Pires, Daniel Guo, Mohammad Gheshlaghi Azar, Bilal Piot, koray kavukcuoglu, Remi Munos, Michal Valko

Keywords Paper

0

0

0

0

3:27

14/06/2020

Orthogonal Convolutional Neural Networks

Jiayun Wang, Yubei Chen, Rudrasis Chakraborty, Stella X. Yu

Keywords Paper

orthogonal convolution, orthogonality, regularization, filter redundancy, robustness, classification, retrieval, semi-supervised, gans, inpainting

0

0

0

0

1:00