Hash Layers For Large Sparse Models

06/12/2021

Hash Layers For Large Sparse Models

Stephen Roller, Sainbayar Sukhbaatar, arthur d szlam, Jason Weston

Keywords: transformers

Abstract Paper Similar Papers

Abstract: We investigate the training of sparse layers that use different parameters for different inputs based on hashing in large Transformer models. Specifically, we modify the feedforward layer to hash to different sets of weights depending on the current token, over all tokens in the sequence. We show that this procedure either outperforms or is competitive with learning-to-route mixture-of-expert methods such as Switch Transformers and BASE Layers, while requiring no routing parameters or extra terms in the objective function such as a load balancing loss, and no sophisticated assignment algorithm. We study the performance of different hashing techniques, hash sizes and input features, and show that balanced and random hashes focused on the most local features work best, compared to either learning clusters or using longer-range context. We show our approach works well both on large language modeling and dialogue tasks, and on downstream fine-tuning tasks.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at NeurIPS 2021 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

06/12/2021

Channel Permutations for N:M Sparsity

Jeff Pool, Chong Yu

Keywords Paper

optimization

0

0

0

0

12:41

03/05/2021

Fuzzy Tiling Activations: A Simple Approach to Learning Sparse Representations Online

Yangchen Pan, Kirby Banman, Martha White

Keywords Paper

natural sparsity, Reinforcement learning, fuzzy tiling activation function, sparse representation

0

0

0

1

6:22

03/05/2021

MetaNorm: Learning to Normalize Few-Shot Batches Across Domains

Yingjun Du, Xiantong Zhen, Ling Shao, Cees G Snoek

Keywords Paper

batch normalization, Meta-learning, few-shot domain generalization

0

0

0

0

5:48

18/07/2021

BASE Layers: Simplifying Training of Large, Sparse Models

Mike Lewis, Shruti Bhosale, Tim Dettmers and
Naman Goyal, Luke Zettlemoyer

Keywords Paper

Applications, Natural Language Processing

0

0

0

0

5:09

22/11/2021

One-Shot Deep Model for End-to-End Multi-Person Activity Recognition

Shuhei Tarashima

Keywords Paper

Group Activity Recognition, Action Recognition, Multi-Object Tracking, Multi-task Learning

0

0

0

0

2:50

18/07/2021

You Only Sample (Almost) Once: Linear Cost Self-Attention Via Bernoulli Sampling

Zhanpeng Zeng, Yunyang Xiong, Sathya Ravi and
Shailesh Acharya, Glenn Fung, Vikas Singh

Keywords Paper

Applications, Natural Language Processing

0

0

0

0

5:16

06/12/2021

Shapeshifter: a Parameter-efficient Transformer using Factorized Reshaped Matrices

Aliakbar Panahi, Seyran Saeedi, Tom Arodz

Keywords Paper

transformers

0

0

0

0

13:06

26/04/2020

ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators

Kevin Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning

Keywords Paper

Natural Language Processing, Representation Learning

0

0

0

0

5:12

02/02/2021

Post-training Quantization with Multiple Points: Mixed Precision without Mixed Precision

Xingchao Liu, Mao Ye, Dengyong Zhou, Qiang Liu

Keywords Paper

0

0

0

0

15:18

03/05/2021

Discovering Non-monotonic Autoregressive Orderings with Variational Inference

Xuanlin Li, Brandon Trabucco, Dong Huk Park and
Michael Luo, Sheng Shen, trevor darrell, Yang Gao

Keywords Paper

reinforcement learning, computer vision, natural language processing, optimization, variational inference, unsupervised learning

0

0

0

0

4:56

14/09/2020

An algorithmic framework for decentralised matrix factorisation

Erika Duriakova, Weipeng Huang, Elias Tragos and
Aonghus Lawlor, Barry Smyth, James Geraci, Neil Hurley

Keywords Paper

recommender systems, distributed learning, decentralised matrix factorisation, latent factor models, matrix factorisation, communication efficiency, convergence proof

0

0

0

1

13:30

06/12/2021

One Loss for All: Deep Hashing with a Single Cosine Similarity based Learning Objective

Jiun Tian Hoe, Kam Woh Ng, Tianyu Zhang and
Chee Seng Chan, Yi-Zhe Song, Tao Xiang

Keywords Paper

machine learning

0

0

0

0

11:39

03/05/2021

Anchor & Transform: Learning Sparse Embeddings for Large Vocabularies

Paul Pu Liang, Manzil Zaheer, Yuan Wang, Amr Ahmed

Keywords Paper

text classification, recommendation systems, large vocabularies, sparse embeddings, language modeling

0

0

0

1

7:03

04/07/2020

Rationalizing Text Matching: Learning Sparse Alignments via Optimal Transport

Kyle Swanson, Lili Yu, Tao Lei

Keywords Paper

Rationalizing Matching, text matching, downstream prediction, constrained problem

0

0

1

0

11:59

06/12/2021

Learning Large Neighborhood Search Policy for Integer Programming

Yaoxin Wu, Wen Song, Zhiguang Cao, Jie Zhang

Keywords Paper

deep learning, reinforcement learning and planning

0

0

0

0

8:54

03/05/2021

WrapNet: Neural Net Inference with Ultra-Low-Precision Arithmetic

Renkun Ni, Hong-Min Chu, Oscar Castaneda and
Ping-yeh Chiang, Christoph Studer, Tom Goldstein

Keywords Paper

efficient inference, quantization

0

0

0

0

5:11

06/12/2020

Distributed Distillation for On-Device Learning

Ilai Bistritz, Ariana Mann, Nicholas Bambos

Keywords Paper

0

0

0

0

3:17

12/07/2020

Sparse Sinkhorn Attention

Yi Tay, Dara Bahri, Liu Yang and
Don Metzler, Da-Cheng Juan

Keywords Paper

Deep Learning - Algorithms

0

0

0

1

12:12

06/12/2021

Sequence-to-Sequence Learning with Latent Neural Grammars

Yoon Kim

Keywords Paper

deep learning

0

0

0

0

14:31

19/08/2021

CIMON: Towards High-quality Hash Codes

Xiao Luo, Daqing Wu, Zeyu Ma and
Chong Chen, Minghua Deng, Jinwen Ma, Zhongming Jin, Jianqiang Huang, Xian-Sheng Hua

Keywords Paper

Computer Vision, Recognition, Information Retrieval

0

0

0

0

14:20

06/12/2021

Continual Learning via Local Module Composition

Oleksiy Ostapenko, Pau Rodriguez, Massimo Caccia, Laurent Charlin

Keywords Paper

continual learning, transfer learning

1

0

0

1

14:32

03/05/2021

Towards Impartial Multi-task Learning

Liyang Liu, Yi Li, Zhanghui Kuang and
Jing-Hao Xue, Yimin Chen, Wenming Yang, Qingmin Liao, Wei Zhang

Keywords Paper

Scene Understanding, Impartial Learning, Multi-task Learning

0

0

0

0

5:06

04/07/2020

Generative Semantic Hashing Enhanced via Boltzmann Machines

Lin Zheng, Qinliang Su, Dinghan Shen, Changyou Chen

Keywords Paper

Generative Hashing, large-scale retrieval, training, Boltzmann Machines

0

0

0

0

11:26

18/07/2021

Data-Free Knowledge Distillation for Heterogeneous Federated Learning

Zhuangdi Zhu, Junyuan Hong, Jiayu Zhou

Keywords Paper

Algorithms

0

1

0

0

5:15

06/12/2020

MetaPerturb: Transferable Regularizer for Heterogeneous Tasks and Architectures

Jeong Un Ryu, JWoong Shin, Hae Beom Lee, Sung Ju Hwang

Keywords Paper

0

0

0

0

3:32

14/06/2020

Conditional Channel Gated Networks for Task-Aware Continual Learning

Davide Abati, Jakub Tomczak, Tijmen Blankevoort and
Simone Calderara, Rita Cucchiara, Babak Ehteshami Bejnordi

Keywords Paper

continual learning, channel gating, conditional computation, incremental learning, lifelong learning, hard attention

0

0

0

0

5:01

01/07/2020

Zero-Resource Cross-Domain Named Entity Recognition

Zihan Liu, Genta Indra Winata, Pascale Fung

Keywords Paper

0

0

0

0

5:15

06/12/2020

Supermasks in Superposition

Mitchell Wortsman, Vivek Ramanujan, Rosanne Liu and
Ani Kembhavi, Mohammad Rastegari, Jason Yosinski, Ali Farhadi

Keywords Paper

0

0

0

0

3:03

15/06/2020

BatchCrypt: Efficient Homomorphic Encryption for Cross-Silo Federated Learning

Chengliang Zhang, Suyi Li, Junzhe Xia and
Wei Wang, Feng Yan, Yang Liu

Keywords Paper

0

0

0

0

22:38

02/02/2021

AutoDropout: Learning Dropout Patterns to Regularize Deep Networks

Hieu Pham, Quoc Le

Keywords Paper

0

0

0

0

19:48

03/05/2021

Revisiting Locally Supervised Learning: an Alternative to End-to-end Training

Yulin Wang, Zanlin Ni, Shiji Song and
Le Yang, Gao Huang

Keywords Paper

Deep learning, Locally supervised training

1

0

0

1

5:03

26/04/2020

Learned step size quantization

Steven K. Esser, Jeffrey L. McKinstry, Deepika Bablani and
Rathinakumar Appuswamy, Dharmendra S. Modha

Keywords Paper

deep learning, low precision, classification, quantization

0

0

0

0

4:40

03/05/2021

BSQ: Exploring Bit-Level Sparsity for Mixed-Precision Neural Network Quantization

Huanrui Yang, Lin Duan, Yiran Chen, Hai Li

Keywords Paper

DNN compression, bit-level sparsity, Mixed-precision quantization

0

0

0

0

4:58

06/12/2020

LoCo: Local Contrastive Representation Learning

Yuwen Xiong, Mengye Ren, Raquel Urtasun

Keywords Paper

0

1

0

1

3:18

26/04/2020

Stochastic Conditional Generative Networks with Basis Decomposition

Ze Wang, Xiuyuan Cheng, Guillermo Sapiro, Qiang Qiu

Keywords Paper

0

0

0

0

4:00

06/12/2021

POODLE: Improving Few-shot Learning via Penalizing Out-of-Distribution Samples

Duong Le, Khoi Duc Nguyen, Khoi Nguyen and
Quoc-Huy Tran, Rang Nguyen, Binh-Son Hua

Keywords Paper

few shot learning

0

0

0

0

6:48

03/05/2021

ChipNet: Budget-Aware Pruning with Heaviside Continuous Approximations

Rishabh Tiwari, Udbhav Bamba, Arnav Chavan, Deepak Gupta

Keywords Paper

Budget constraints, Budget-Aware Pruning, Structured Pruning, Sparsity Learning

0

0

0

0

6:01

06/12/2020

Bayesian Bits: Unifying Quantization and Pruning

Mart van Baalen, Christos Louizos, Markus Nagel and
Rana Ali Amjad, Ying Wang, Tijmen Blankevoort, Max Welling

Keywords Paper

0

0

0

0

3:15

22/11/2021

Feature Fusion Vision Transformer for Fine-Grained Visual Categorization

Jun Wang, Xiaohan Yu, Yongsheng Gao

Keywords Paper

Fine-grained visual categorization, Vision transformer, Self-attention, Feature Fusion

0

0

0

0

3:02

16/11/2020

Effective Unsupervised Domain Adaptation with Adversarially Trained Language Models

Thuy-Trang Vu, Dinh Phung, Gholamreza Haffari

Keywords Paper

combinatorial problem, unsupervised tasks, named recognition, broad-coverage models

0

0

0

0

11:57