Scaling Vision with Sparse Mixture of Experts

06/12/2021

Scaling Vision with Sparse Mixture of Experts

Carlos Riquelme, Joan Puigcerver, Basil Mustafa, Maxim Neumann, Rodolphe Jenatton, André Susano Pinto, Daniel Keysers, Neil Houlsby

Keywords: transformers, vision, language

Abstract Paper Similar Papers

Abstract: Sparsely-gated Mixture of Experts networks (MoEs) have demonstrated excellent scalability in Natural Language Processing. In Computer Vision, however, almost all performant networks are "dense", that is, every input is processed by every parameter. We present a Vision MoE (V-MoE), a sparse version of the Vision Transformer, that is scalable and competitive with the largest dense networks. When applied to image recognition, V-MoE matches the performance of state-of-the-art networks, while requiring as little as half of the compute at inference time. Further, we propose an extension to the routing algorithm that can prioritize subsets of each input across the entire batch, leading to adaptive per-image compute. This allows V-MoE to trade-off performance and compute smoothly at test-time. Finally, we demonstrate the potential of V-MoE to scale vision models, and train a 15B parameter model that attains 90.35% on ImageNet.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at NeurIPS 2021 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

06/12/2020

Unsupervised Learning of Visual Features by Contrasting Cluster Assignments

Mathilde Caron, Ishan Misra, Julien Mairal and
Priya Goyal, Piotr Bojanowski, Armand Joulin

Keywords Paper

0

1

0

0

3:22

06/12/2021

Long-Short Transformer: Efficient Transformers for Language and Vision

Chen Zhu, Wei Ping, Chaowei Xiao and
Mohammad Shoeybi, Tom Goldstein, Anima Anandkumar, Bryan Catanzaro

Keywords Paper

machine learning, transformers

0

0

0

0

11:44

26/08/2020

'Bring Your Own Greedy'+Max: Near-Optimal 1/2-Approximations for Submodular Knapsack

Grigory Yaroslavtsev, Samson Zhou, Dmitrii Avdiukhin

Keywords Paper

0

0

0

0

13:14

18/07/2021

Bayesian Attention Belief Networks

Shujian Zhang, Xinjie Fan, Bo Chen, Mingyuan Zhou

Keywords Paper

, Applications, Program Understanding and Generation, Deep Learning, Bayesian Deep Learning

0

0

0

0

4:28

12/07/2020

Inducing and Exploiting Activation Sparsity for Fast Inference on Deep Neural Networks

Mark Kurtz, Justin Kopinsky, Rati Gelashvili and
Alexander Matveev, John Carr, Michael Goin, William Leiserson, Sage Moore, Nir Shavit, Dan Alistarh

Keywords Paper

Deep Learning - Algorithms

0

0

0

0

14:41

12/07/2020

Variable Skipping for Autoregressive Range Density Estimation

Eric Liang, Zongheng Yang, Ion Stoica and
Pieter Abbeel, Yan Duan, Peter Chen

Keywords Paper

Deep Learning - Generative Models and Autoencoders

0

0

0

0

13:01

30/11/2020

SDCNet: Size Divide and Conquer Network for Salient Object Detection

Senbo Yan, Xiaowen Song, chuer yu

Keywords Paper

0

0

0

0

7:28

14/06/2020

Rethinking Differentiable Search for Mixed-Precision Neural Networks

Zhaowei Cai, Nuno Vasconcelos

Keywords Paper

mixed-precision network, bit allocation, differentiable, architecture search

0

0

0

0

1:01

14/06/2020

BlendMask: Top-Down Meets Bottom-Up for Instance Segmentation

Hao Chen, Kunyang Sun, Zhi Tian and
Chunhua Shen, Yongming Huang, Youliang Yan

Keywords Paper

instance segmentation, fully-convolutional, object detection, real-time

0

0

0

0

4:39

02/02/2021

Explicitly Modeled Attention Maps for Image Classification

Andong Tan, Duc Tam Nguyen, Maximilian Dax and
Matthias Nießner, Thomas Brox

Keywords Paper

0

0

0

0

16:59

14/06/2020

X3D: Expanding Architectures for Efficient Video Recognition

Christoph Feichtenhofer

Keywords Paper

video classification, action recognition, video detection, video understanding, deep learning, neural networks

0

0

0

0

4:56

14/06/2020

Resolution Adaptive Networks for Efficient Inference

Le Yang, Yizeng Han, Xi Chen and
Shiji Song, Jifeng Dai, Gao Huang

Keywords Paper

adaptive inference, efficient deep learning, multi-scale feature learning, budgeted batch classification

0

0

0

0

0:59

06/12/2021

Scatterbrain: Unifying Sparse and Low-rank Attention

Beidi Chen, Tri Dao, Eric Winsor and
Zhao Song, Atri Rudra, Christopher Ré

Keywords Paper

transformers, generative model

0

0

0

0

13:15

18/07/2021

OmniNet: Omnidirectional Representations from Transformers

Yi Tay, Mostafa Dehghani, Vamsi Aribandi and
Jai Gupta, Philip Pham, Zhen Qin, Dara Bahri, Da-Cheng Juan, Don Metzler

Keywords Paper

Deep Learning, Predictive Models, Algorithms, Representation Learning; Neuroscience and Cognitive Science; Neuroscience and Cognitive Science, Problem Solvin, Deep Learning, Architectures

0

0

0

0

17:00

18/07/2021

ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision

Wonjae Kim, Bokyung Son, Ildoo Kim

Keywords Paper

Algorithms, Multimodal Learning

0

0

0

0

19:03

03/05/2021

Fast Geometric Projections for Local Robustness Certification

Aymeric Fromherz, Klas Leino, Matt Fredrikson and
Bryan Parno, Corina Pasareanu

Keywords Paper

verification, robustness, safety

0

1

0

0

11:54

14/06/2020

ACNe: Attentive Context Normalization for Robust Permutation-Equivariant Learning

Weiwei Sun, Wei Jiang, Eduard Trulls and
Andrea Tagliasacchi, Kwang Moo Yi

Keywords Paper

point clouds, attention, normalization, iterative reweighted least squares, permutation-equivariance, wide-baseline stereo, point cloud classification, nerual line fitting, pointnet, context normalization

0

0

0

0

1:01

05/01/2021

Splatty- a Unified Image Demosaicing and Rectification Method

Pranav Verma, Dominique E. Meyer, Hanyang Xu, Falko Kuester

Keywords Paper

0

0

0

0

4:43

19/08/2021

GSPL: A Succinct Kernel Model for Group-Sparse Projections Learning of Multiview Data

Danyang Wu, Jin Xu, Xia Dong and
Meng Liao, Rong Wang, Feiping Nie, Xuelong Li

Keywords Paper

Machine Learning, Learning Sparse Models, Multi-instance; Multi-label; Multi-view learning, Unsupervised Learning

0

0

0

0

11:48

06/12/2021

Dataset Distillation with Infinitely Wide Convolutional Networks

Timothy Nguyen, Roman Novak, Lechao Xiao, Jaehoon Lee

Keywords Paper

deep learning, machine learning, vision, meta learning

0

0

0

0

14:56

14/06/2020

Attention Mechanism Exploits Temporal Contexts: Real-Time 3D Human Pose Reconstruction

Ruixu Liu, Ju Shen, He Wang and
Chen Chen, Sen-ching Cheung, Vijayan Asari

Keywords Paper

3d human pose, attention mechanism, multi-scale dilation convolution, monocular motion reconstruction

0

0

0

0

5:01

12/07/2020

Sparse Sinkhorn Attention

Yi Tay, Dara Bahri, Liu Yang and
Don Metzler, Da-Cheng Juan

Keywords Paper

Deep Learning - Algorithms

0

0

0

1

12:12

26/04/2020

Reducing Transformer Depth on Demand with Structured Dropout

Angela Fan, Edouard Grave, Armand Joulin

Keywords Paper

reduction, regularization, pruning, dropout, transformer

0

0

0

0

5:01

14/06/2020

MemNAS: Memory-Efficient Neural Architecture Search With Grow-Trim Learning

Peiye Liu, Bo Wu, Huadong Ma, Mingoo Seok

Keywords Paper

neural architecture search, recurrent neural network, memory optimization

0

0

0

0

0:59

04/07/2020

GAN-BERT: Generative Adversarial Learning for Robust Text Classification with a Bunch of Labeled Examples

Danilo Croce, Giuseppe Castellucci, Roberto Basili

Keywords Paper

Robust Classification, Natural tasks, image processing, generative setting

0

0

0

0

6:48

22/11/2021

One-Step Pixel-Level Perturbation-Based Saliency Detector

Vinnam Kim, Hyunsouk Cho, Sehee Chung

Keywords Paper

explainable ai, saliency map

0

0

0

0

3:30

26/04/2020

Computation Reallocation for Object Detection

Feng Liang, Chen Lin, Ronghao Guo and
Ming Sun, Wei Wu, Junjie Yan, Wanli Ouyang

Keywords Paper

Neural Architecture Search, Object Detection

0

0

0

0

5:29

06/12/2020

Top-KAST: Top-K Always Sparse Training

Sid Jayakumar, Razvan Pascanu, Jack Rae and
Simon Osindero, Erich Elsen

Keywords Paper

0

0

0

0

3:18

14/06/2020

FBNetV2: Differentiable Neural Architecture Search for Spatial and Channel Dimensions

Alvin Wan, Xiaoliang Dai, Peizhao Zhang and
Zijian He, Yuandong Tian, Saining Xie, Bichen Wu, Matthew Yu, Tao Xu, Kan Chen, Peter Vajda, Joseph E. Gonzalez

Keywords Paper

nas, dnas, fbnet, state-of-the-art, imagenet, mobilenetv3, efficientnet, classification, neural architecture search, differentiable neural architecture search

0

0

0

0

1:01

06/12/2021

Global Filter Networks for Image Classification

Yongming Rao, Wenliang Zhao, Zheng Zhu and
Jiwen Lu, Jie Zhou

Keywords Paper

machine learning, robustness, transformers, vision

0

0

0

0

9:28

14/06/2020

Improved Few-Shot Visual Classification

Peyman Bateni, Raghav Goyal, Vaden Masrani and
Frank Wood, Leonid Sigal

Keywords Paper

meta-learning, few-shot classification, transfer learning, mahalanobis metric, bergman divergences

0

0

0

0

1:01

04/07/2020

A Mixture of h - 1 Heads is Better than h Heads

Hao Peng, Roy Schwartz, Dianqi Li, Noah A. Smith

Keywords Paper

natural tasks, machine translation, language modeling, Multi-head architectures

0

0

0

0

11:59

04/07/2020

Neural Data-to-Text Generation via Jointly Learning the Segmentation and Correspondence

Xiaoyu Shen, Ernie Chang, Hui Su and
Cheng Niu, Dietrich Klakow

Keywords Paper

Neural Generation, Segmentation, data-to-text tasks, neural model

0

0

0

0

9:09

15/06/2020

Learning fast and precise numerical analysis

Jingxuan He, Gagandeep Singh, Markus Püschel, Martin Vechev

Keywords Paper

Abstract interpretation, Performance optimization, Machine learning, Numerical domains

0

0

0

0

14:20

26/08/2020

Improving Maximum Likelihood Training for Text Generation with Density Ratio Estimation

Yuxuan Song, Ning Miao, Hao Zhou and
Lantao Yu, Mingxuan Wang, Lei Li

Keywords Paper

0

0

0

0

12:32

06/12/2020

Efficient Clustering Based On A Unified View Of $K$-means And Ratio-cut

Shenfei Pei, Feiping Nie, Rong Wang, Xuelong Li

Keywords Paper

0

0

0

0

3:16

06/12/2021

Shapeshifter: a Parameter-efficient Transformer using Factorized Reshaped Matrices

Aliakbar Panahi, Seyran Saeedi, Tom Arodz

Keywords Paper

transformers

0

0

0

0

13:06

14/09/2020

Squeezing Correlated Neurons for Resource-Efficient Deep Neural Networks

Elbruz Ozen, Alex Orailoglu

Keywords Paper

deep learning, information redundancy, pruning

0

0

0

0

14:48

03/05/2021

VA-RED$^2$: Video Adaptive Redundancy Reduction

Bowen Pan, Rameswar Panda, Camilo L Fosco and
Chung-Ching Lin, Alex J Andonian, Yue Meng, Kate Saenko, Aude Oliva, Rogerio Feris

Keywords Paper

0

0

0

0

5:02

26/08/2020

Naive Feature Selection: Sparsity in Naive Bayes

Armin Askari, Alexandre d'Aspremont, Laurent El Ghaoui

Keywords Paper

0

0

0

0

14:32