DSelect-k: Differentiable Selection in the Mixture of Experts with Applications to Multi-Task Learning

06/12/2021

DSelect-k: Differentiable Selection in the Mixture of Experts with Applications to Multi-Task Learning

Hussein Hazimeh, Zhe Zhao, Aakanksha Chowdhery, Maheswaran Sathiamoorthy, Yihua Chen, Rahul Mazumder, Lichan Hong, Ed Chi

Keywords: deep learning, optimization

Abstract Paper Similar Papers

Abstract: The Mixture-of-Experts (MoE) architecture is showing promising results in improving parameter sharing in multi-task learning (MTL) and in scaling high-capacity neural networks. State-of-the-art MoE models use a trainable "sparse gate'" to select a subset of the experts for each input example. While conceptually appealing, existing sparse gates, such as Top-k, are not smooth. The lack of smoothness can lead to convergence and statistical performance issues when training with gradient-based methods. In this paper, we develop DSelect-k: a continuously differentiable and sparse gate for MoE, based on a novel binary encoding formulation. The gate can be trained using first-order methods, such as stochastic gradient descent, and offers explicit control over the number of experts to select. We demonstrate the effectiveness of DSelect-k on both synthetic and real MTL datasets with up to 128 tasks. Our experiments indicate that DSelect-k can achieve statistically significant improvements in prediction and expert selection over popular MoE gates. Notably, on a real-world, large-scale recommender system, DSelect-k achieves over 22% improvement in predictive performance compared to Top-k. We provide an open-source implementation of DSelect-k.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at NeurIPS 2021 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

06/12/2021

SmoothMix: Training Confidence-calibrated Smoothed Classifiers for Certified Robustness

Jongheon Jeong, Sejun Park, Minkyu Kim and
Heung-Chang Lee, Do-Guk Kim, Jinwoo Shin

Keywords Paper

deep learning, machine learning, robustness, adversarial robustness and security

0

0

0

0

12:23

08/12/2020

DoLFIn: Distributions over Latent Features for Interpretability

Phong Le, Willem Zuidema

Keywords Paper

0

0

0

0

9:47

18/07/2021

Knowledge Enhanced Machine Learning Pipeline against Diverse Adversarial Attacks

Nezihe Merve Gürel, Xiangyu Qi, Luka Rimanic and
Ce Zhang, Bo Li

Keywords Paper

Algorithms, Adversarial Examples

0

0

0

0

5:46

03/05/2021

Theoretical bounds on estimation error for meta-learning

James Lucas, Mengye Ren, Irene Raissa KAMENI KAMENI and
Toniann Pitassi, Richard Zemel

Keywords Paper

meta learning, minimax risk, few-shot, lower bounds, learning theory

0

0

0

0

4:46

14/06/2020

Overcoming Multi-Model Forgetting in One-Shot NAS With Diversity Maximization

Miao Zhang, Huiqi Li, Shirui Pan and
Xiaojun Chang, Steven Su

Keywords Paper

automl, neural architecture search, catastrophic forgetting, novelty search, continual learning

0

0

0

0

1:01

18/07/2021

Sparsifying Networks via Subdifferential Inclusion

Sagar Verma, Jean-Christophe Pesquet

Keywords Paper

Optimization, Convex Optimization

0

0

0

0

5:10

16/11/2020

Transformer Based Multi-Source Domain Adaptation

Dustin Wright, Isabelle Augenstein

Keywords Paper

unsupervised adaptation, cnns, rnns, domain classifiers

0

0

0

0

11:30

06/12/2020

Model Fusion via Optimal Transport

Sidak Pal Singh, Martin Jaggi

Keywords Paper

1

0

0

1

3:10

22/06/2020

Efficiently learning structured distributions from untrusted batches

Sitan Chen, Jerry Li, Ankur Moitra

Keywords Paper

sum-of-squares, federated learning, VC complexity, Robust statistics

0

0

0

0

24:38

06/12/2021

AugMax: Adversarial Composition of Random Augmentations for Robust Training

Haotao Wang, Chaowei Xiao, Jean Kossaifi and
Zhiding Yu, Anima Anandkumar, Zhangyang Wang

Keywords Paper

deep learning, robustness, adversarial robustness and security

0

0

0

0

11:19

02/02/2021

Deterministic Mini-batch Sequencing for Training Deep Neural Networks

Subhankar Banerjee, Shayok Chakraborty

Keywords Paper

0

0

0

0

16:00

26/08/2020

Improving Maximum Likelihood Training for Text Generation with Density Ratio Estimation

Yuxuan Song, Ning Miao, Hao Zhou and
Lantao Yu, Mingxuan Wang, Lei Li

Keywords Paper

0

0

0

0

12:32

02/02/2021

Longitudinal Deep Kernel Gaussian Process Regression

Junjie Liang, Yanting Wu, Dongkuan Xu, Vasant G Honavar

Keywords Paper

0

0

0

0

16:27

18/07/2021

Efficient Statistical Tests: A Neural Tangent Kernel Approach

Sheng Jia, Ehsan Nezhadarya, Yuhuai Wu, Jimmy Ba

Keywords Paper

Deep Learning

0

0

0

0

5:13

26/08/2020

Learning in Gated Neural Networks

Ashok Makkuva, Sewoong Oh, Sreeram Kannan, Pramod Viswanath

Keywords Paper

0

0

0

0

14:42

03/05/2021

Exploring the Uncertainty Properties of Neural Networks’ Implicit Priors in the Infinite-Width Limit

Ben Adlam, Jaehoon Lee, Lechao Xiao and
Jeffrey Pennington, Jasper Snoek

Keywords Paper

Deep Learning, Bayesian Neural Networks, Neural Network Gaussian Process, Infinite-Width Limit, Uncertainty, Gaussian Process

0

0

0

0

4:34

22/11/2021

Model Composition: Can Multiple Neural Networks Be Combined into a Single Network Using Only Unlabeled Data?

Amin Banitalebi-Dehkordi, Xinyu Kang, Yong Zhang

Keywords Paper

Model Composition, Combining Neural Networks, Pseudo Label, Self Training, Label Aggregation, Combining Models

0

0

0

0

2:58

23/08/2020

Diverse rule sets

Guangyi Zhang, Aristides Gionis

Keywords Paper

sampling, classifier, pattern mining, rule learning, diversification, rule sets

0

0

0

0

9:41

04/07/2020

Estimating the influence of auxiliary tasks for multi-task learning of sequence tagging tasks

Fynn Schröder, Chris Biemann

Keywords Paper

multi-task tasks, MTL, TL, MTL setups

0

0

0

0

12:02

02/02/2021

DIBS: Diversity Inducing Information Bottleneck in Model Ensembles

Samarth Sinha, Homanga Bharadhwaj, Anirudh Goyal and
Hugo Larochelle, Animesh Garg, Florian Shkurti

Keywords Paper

0

0

0

0

16:26

03/05/2021

Neurally Augmented ALISTA

Freya Behrens, Jonathan Sauder, Peter Jung

Keywords Paper

learned ISTA, unrolled algorithms, compressed sensing, sparse reconstruction

0

0

0

0

5:18

18/07/2021

Path Planning using Neural A* Search

Ryo Yonetani, Tatsunori Taniai, Mohammadamin Barekatain and
Mai Nishimura, Asako Kanezaki

Keywords Paper

Reinforcement Learning and Planning, Planning and Control

0

0

0

0

5:01

05/01/2021

G2D: Generate to Detect Anomaly

Masoud Pourreza, Bahram Mohammadi, Mostafa Khaki and
Samir Bouindour, Hichem Snoussi, Mohammad Sabokrou

Keywords Paper

0

0

0

0

5:12

02/02/2021

Token-Aware Virtual Adversarial Training in Natural Language Understanding

Linyang Li, Xipeng Qiu

Keywords Paper

0

0

0

0

12:49

14/06/2020

Conditional Gaussian Distribution Learning for Open Set Recognition

Xin Sun, Zhenning Yang, Chi Zhang and
Keck-Voon Ling, Guohao Peng

Keywords Paper

open set recognition, conditional variational auto-encoder, gaussian distribution learning, probabilistic ladder architecture.

0

0

0

0

1:01

06/12/2021

Towards Sample-efficient Overparameterized Meta-learning

Yue Sun, Adhyyan Narang, Ibrahim Gulluk and
Samet Oymak, Maryam Fazel

Keywords Paper

theory, machine learning, meta learning, representation learning, few shot learning

0

0

0

0

13:54

06/12/2021

Self-Supervised Learning of Event-Based Optical Flow with Spiking Neural Networks

Jesse Hagenaars, Federico Paredes-Valles, Guido de Croon

Keywords Paper

deep learning, optimization, self-supervised learning

0

0

0

0

13:28

06/12/2020

PLANS: Neuro-Symbolic Program Learning from Videos

Raphaël Dang-Nhu

Keywords Paper

0

0

0

0

3:52

22/11/2021

DISCO: accurate Discrete Scale Convolutions

Ivan Sosnovik, Artem Moskalev, Arnold W.M. Smeulders

Keywords Paper

equivariance, symmetry, invariance, scale, convolutions, dilation, tracking, image classification

0

0

0

0

8:38

14/06/2020

Uncertainty-Aware CNNs for Depth Completion: Uncertainty from Beginning to End

Abdelrahman Eldesokey, Michael Felsberg, Karl Holmquist, Michael Persson

Keywords Paper

uncertainty, sparsity, depth completion, bayesian deep learning, normalized convolution, real-time

0

0

0

0

1:00

03/05/2021

Co-Mixup: Saliency Guided Joint Mixup with Supermodular Diversity

Jang-Hyun Kim, Wonho Choo, Hosan Jeong, Hyun Oh Song

Keywords Paper

Supervised Learning, Discrete Optimization, Data Augmentation, Deep Learning

0

0

0

0

14:43

18/07/2021

On Linear Identifiability of Learned Representations

Geoffrey Roeder, Luke Metz, Durk Kingma

Keywords Paper

Deep Learning, Embedding and Representation learning

0

0

0

0

5:11

19/04/2021

Keep learning: Self-supervised meta-learning for learning from inference

Akhil Kedia, Sai Chetan Chinthakindi

Keywords Paper

0

0

0

0

11:27

05/12/2020

Towards a better understanding of label smoothing in neural machine translation

Yingbo Gao, Weiyue Wang, Christian Herold and
Zijian Yang, Hermann Ney

Keywords Paper

0

0

0

0

13:37

18/07/2021

Double-Win Quant: Aggressively Winning Robustness of Quantized Deep Neural Networks via Random Precision Training and Inference

Yonggan Fu, Qixuan Yu, Meng Li and
Vikas Chandra, Yingyan Lin

Keywords Paper

Algorithms, Adversarial Examples

0

0

0

0

5:20

03/05/2021

Learning N:M Fine-grained Structured Sparse Neural Networks From Scratch

Aojun Zhou, Yukun Ma, Junnan Zhu and
Jianbo Liu, Zhijie Zhang, Kun Yuan, Wenxiu Sun, Hongsheng Li

Keywords Paper

sparsity, efficient training and inference.

0

0

0

0

5:09

12/07/2020

Optimization and Analysis of the pAp@k Metric for Recommender Systems

Gaurush Hiranandani, Warut Vijitbenjaronk, Sanmi Koyejo, Prateek Jain

Keywords Paper

Learning Theory

0

0

0

0

16:11

06/12/2021

Network-to-Network Regularization: Enforcing Occam's Razor to Improve Generalization

Rohan Ghosh, Mehul Motani

Keywords Paper

theory, deep learning, machine learning

0

0

0

0

14:07

20/07/2020

Gating creates slow modes and controls phase-space complexity in GRUs and LSTMs

Tankut Can, Kamesh Krishnamurthy, David J. Schwab

Keywords Paper

0

0

0

0

21:00

02/02/2021

Any-Precision Deep Neural Networks

Haichao Yu, Haoxiang Li, Humphrey Shi and
Thomas S. Huang, Gang Hua

Keywords Paper

0

0

0

0

14:26