DeFINE: Deep Factorized Input Token Embeddings for Neural Sequence Modeling

26/04/2020

DeFINE: Deep Factorized Input Token Embeddings for Neural Sequence Modeling

Sachin Mehta, Rik Koncel-Kedziorski, Mohammad Rastegari, Hannaneh Hajishirzi

Keywords: sequence modeling, input representations, language modeling, word embedding

Abstract Paper Similar Papers

Abstract: For sequence models with large vocabularies, a majority of network parameters lie in the input and output layers. In this work, we describe a new method, DeFINE, for learning deep token representations efficiently. Our architecture uses a hierarchical structure with novel skip-connections which allows for the use of low dimensional input and output layers, reducing total parameters and training time while delivering similar or better performance versus existing methods. DeFINE can be incorporated easily in new or existing sequence models. Compared to state-of-the-art methods including adaptive input representations, this technique results in a 6% to 20% drop in perplexity. On WikiText-103, DeFINE reduces the total parameters of Transformer-XL by half with minimal impact on performance. On the Penn Treebank, DeFINE improves AWD-LSTM by 4 points with a 17% reduction in parameters, achieving comparable performance to state-of-the-art methods with fewer parameters. For machine translation, DeFINE improves the efficiency of the Transformer model by about 1.4 times while delivering similar performance.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at ICLR 2020 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

06/12/2021

Shapeshifter: a Parameter-efficient Transformer using Factorized Reshaped Matrices

Aliakbar Panahi, Seyran Saeedi, Tom Arodz

Keywords Paper

transformers

0

0

0

0

13:06

03/05/2021

WrapNet: Neural Net Inference with Ultra-Low-Precision Arithmetic

Renkun Ni, Hong-Min Chu, Oscar Castaneda and
Ping-yeh Chiang, Christoph Studer, Tom Goldstein

Keywords Paper

efficient inference, quantization

0

0

0

0

5:11

14/06/2020

Structured Multi-Hashing for Model Compression

Elad Eban, Yair Movshovitz-Attias, Hao Wu and
Mark Sandler, Andrew Poon, Yerlan Idelbayev, Miguel Á. Carreira-Perpiñán

Keywords Paper

compression, weight hashing, on device

0

0

0

0

1:01

14/06/2020

HRank: Filter Pruning Using High-Rank Feature Map

Mingbao Lin, Rongrong Ji, Yan Wang and
Yichen Zhang, Baochang Zhang, Yonghong Tian, Ling Shao

Keywords Paper

network pruning, neural network compression and acceleration, high-rank feature map, efficient deep learning computing

0

0

0

0

4:57

05/12/2020

Compressing pre-trained language models by matrix decomposition

Matan Ben Noach, Yoav Goldberg

Keywords Paper

0

0

0

0

7:38

06/12/2021

Memory-efficient Patch-based Inference for Tiny Deep Learning

Ji Lin, Wei-Ming Chen, Han Cai and
Chuang Gan, Song Han

Keywords Paper

deep learning, machine learning, vision

0

0

0

0

11:14

14/06/2020

Resolution Adaptive Networks for Efficient Inference

Le Yang, Yizeng Han, Xi Chen and
Shiji Song, Jifeng Dai, Gao Huang

Keywords Paper

adaptive inference, efficient deep learning, multi-scale feature learning, budgeted batch classification

0

0

0

0

0:59

04/07/2020

Efficient Second-Order TreeCRF for Neural Dependency Parsing

Yu Zhang, Zhenghua Li, Min Zhang

Keywords Paper

Neural Parsing, deep era, context representation, direct operation

0

0

0

0

11:19

04/07/2020

DeFormer: Decomposing Pre-trained Transformers for Faster Question Answering

Qingqing Cao, Harsh Trivedi, Aruna Balasubramanian, Niranjan Balasubramanian

Keywords Paper

Faster Answering, question-independent processing, DeFormer, Decomposing Transformers

0

0

0

0

11:06

14/06/2020

Semantic Drift Compensation for Class-Incremental Learning

Lu Yu, Bartłomiej Twardowski, Xialei Liu and
Luis Herranz, Kai Wang, Yongmei Cheng, Shangling Jui, Joost van de Weijer

Keywords Paper

incremental learning, metric learning, semantic drift, deep neural networks, image classification, embedding networks, classification networks, catastrophic forgetting, task agnostic, nearest class mean classifier

0

0

0

0

0:59

03/05/2021

Revisiting Locally Supervised Learning: an Alternative to End-to-end Training

Yulin Wang, Zanlin Ni, Shiji Song and
Le Yang, Gao Huang

Keywords Paper

Deep learning, Locally supervised training

1

0

0

1

5:03

06/12/2021

One Loss for All: Deep Hashing with a Single Cosine Similarity based Learning Objective

Jiun Tian Hoe, Kam Woh Ng, Tianyu Zhang and
Chee Seng Chan, Yi-Zhe Song, Tao Xiang

Keywords Paper

machine learning

0

0

0

0

11:39

06/12/2020

Learning Diverse and Discriminative Representations via the Principle of Maximal Coding Rate Reduction

Yaodong Yu, Ryan Chan, Chong You and
Chaobing Song, Yi Ma

Keywords Paper

0

0

0

0

3:20

26/04/2020

Precision Gating: Improving Neural Network Efficiency with Dynamic Dual-Precision Activations

Yichi Zhang, Ritchie Zhao, Weizhe Hua and
Nayun Xu, G. Edward Suh, Zhiru Zhang

Keywords Paper

deep learning, neural network, dynamic quantization, dual precision, efficient gating

0

0

0

0

5:04

30/11/2020

Dense Dual-Path Network for Real-time Semantic Segmentation

Xinneng Yang, Yan Wu, Junqiao Zhao, Feilin Liu

Keywords Paper

0

0

0

0

5:43

14/06/2020

Conditional Channel Gated Networks for Task-Aware Continual Learning

Davide Abati, Jakub Tomczak, Tijmen Blankevoort and
Simone Calderara, Rita Cucchiara, Babak Ehteshami Bejnordi

Keywords Paper

continual learning, channel gating, conditional computation, incremental learning, lifelong learning, hard attention

0

0

0

0

5:01

14/06/2020

Structured Compression by Weight Encryption for Unstructured Pruning and Quantization

Se Jung Kwon, Dongsoo Lee, Byeongwook Kim and
Parichay Kapoor, Baeseong Park, Gu-Yeon Wei

Keywords Paper

model compression, quantization, pruning, xor gate, parallelism, memory bandwidth, sparse matrix, structured format

0

0

0

0

0:59

16/11/2020

Towards Reasonably-Sized Character-Level Transformer NMT by Finetuning Subword Systems

Jindřich Libovický, Alexander Fraser

Keywords Paper

transformer architecture, segmentation, subword model, neural model

0

0

0

0

6:28

04/07/2020

pyBART: Evidence-based Syntactic Transformations for IE

Aryeh Tiktinsky, Yoav Goldberg, Reut Tsarfaty

Keywords Paper

IE, machine-learned tasks, downstream applications, data-driven transformations

0

0

0

0

14:02

06/12/2021

Hash Layers For Large Sparse Models

Stephen Roller, Sainbayar Sukhbaatar, arthur d szlam, Jason Weston

Keywords Paper

transformers

0

0

0

0

14:29

23/06/2021

Logical Bytecode Reduction

Christian Gram Kalhauge, Jens Palsberg

Keywords Paper

input reduction, type-safe code transformation

0

0

0

0

19:40

26/04/2020

One-Shot Pruning of Recurrent Neural Networks by Jacobian Spectrum Evaluation

Shunshi Zhang, Bradly C. Stadie

Keywords Paper

Pruning, RNNs, Sparsity

0

0

0

0

5:02

06/12/2021

Channel Permutations for N:M Sparsity

Jeff Pool, Chong Yu

Keywords Paper

optimization

0

0

0

0

12:41

05/04/2021

Boveda: Building an On-Chip Deep Learning Memory Hierarchy Brick by Brick

Isak Edo Vivancos, Sayeh Sharify, Daniel Ly-Ma and
Ameer Abdelhadi, Ciaran Bannon, Milos Nikolic, Mostafa Mahmoud, Alberto Delmas Lascorz, Gennady Pekhimenko, Andreas Moshovos

Keywords Paper

0

0

0

0

19:54

05/04/2021

Boveda: Building an On-Chip Deep Learning Memory Hierarchy Brick by Brick

Isak Edo Vivancos, Sayeh Sharify, Daniel Ly-Ma and
Ameer Abdelhadi, Ciaran Bannon, Milos Nikolic, Mostafa Mahmoud, Alberto Delmas Lascorz, Gennady Pekhimenko, Andreas Moshovos

Keywords Paper

0

0

0

0

5:15

04/11/2020

Fast RDMA-based Ordered Key-Value Store using Remote Learned Cache

Xingda Wei, Rong Chen, Haibo Chen

Keywords Paper

0

0

0

0

18:58

06/12/2020

Compositional Generalization via Neural-Symbolic Stack Machines

Xinyun Chen, Chen Liang, Adams Wei Yu and
Dawn Song, Denny Zhou

Keywords Paper

Applications -> Computer Vision; Applications -> Visual Scene Analysis and Interpretation; Deep Learning -> Adversarial Network, Deep Learning -> Generative Models

0

0

0

0

3:26

14/06/2020

ViewAL: Active Learning With Viewpoint Entropy for Semantic Segmentation

Yawar Siddiqui, Julien Valentin, Matthias Nießner

Keywords Paper

active learning, semantic segmentation, deep learning, view consistency

0

0

0

0

1:01

06/12/2020

Rethinking Learnable Tree Filter for Generic Feature Transform

Lin Song, Yanwei Li, Zhengkai Jiang and
Zeming Li, Xiangyu Zhang, Hongbin Sun, Jian Sun, Nanning Zheng

Keywords Paper

Neuroscience and Cognitive Science -> Memory; Optimization -> Combinatorial Optimization; Optimization -> Submodular Optimizati, Neuroscience and Cognitive Science -> Human or Animal Learning

0

0

0

0

3:11

06/12/2021

DynamicViT: Efficient Vision Transformers with Dynamic Token Sparsification

Yongming Rao, Wenliang Zhao, Benlin Liu and
Jiwen Lu, Jie Zhou, Cho-Jui Hsieh

Keywords Paper

transformers

0

0

0

0

7:36

18/07/2021

Accurate Post Training Quantization With Small Calibration Sets

Itay Hubara, Yury Nahshan, Yair Hanani and
Ron Banner, Daniel Soudry

Keywords Paper

Algorithms, AutoML

0

0

0

0

5:16

19/08/2021

DeepME: Deep Mixture Experts for Large-scale Image Classification

Ming He, Guangyi Lv, Weidong He and
Jianping Fan, Guihua Zeng

Keywords Paper

Computer Vision, Recognition, Deep Learning, Classification

0

0

0

0

12:22

15/06/2020

Blended, precise semantic program embeddings

Ke Wang, Zhendong Su

Keywords Paper

Static and Dynamic Program Features, Attention Network, Semantic Program Embedding

0

0

0

0

15:39

08/12/2020

Regularized Graph Convolutional Networks for Short Text Classification

Kshitij Tayal, Nikhil Rao, Saurabh Agarwal and
Xiaowei Jia, Karthik Subbian, Vipin Kumar

Keywords Paper

0

0

0

0

8:27

05/04/2021

VS-Quant: Per-vector Scaled Quantization for Accurate Low-Precision Neural Network Inference

Steve Dai, Rangha Venkatesan, Mark Ren and
Brian Zimmer, William Dally, Brucek Khailany

Keywords Paper

Deep Learning -> Generative Models, Algorithms -> Similarity and Distance Learning

0

0

0

0

5:01

05/04/2021

VS-Quant: Per-vector Scaled Quantization for Accurate Low-Precision Neural Network Inference

Steve Dai, Rangha Venkatesan, Mark Ren and
Brian Zimmer, William Dally, Brucek Khailany

Keywords Paper

Deep Learning -> Generative Models, Algorithms -> Similarity and Distance Learning

0

0

0

0

19:08

12/07/2020

DropNet: Reducing Neural Network Complexity via Iterative Pruning

Chong Min John Tan, Mehul Motani

Keywords Paper

Deep Learning - General

0

0

0

0

15:13

12/07/2020

Aligned Cross Entropy for Non-Autoregressive Machine Translation

Marjan Ghazvininejad, Vladimir Karpukhin, Luke Zettlemoyer, Omer Levy

Keywords Paper

Applications - Language, Speech and Dialog

0

0

0

0

14:43

06/12/2020

Modular Meta-Learning with Shrinkage

Yutian Chen, Abe Friesen, Feryal Behbahani and
Arnaud Doucet, David Budden, Matthew Hoffman, Nando de Freitas

Keywords Paper

0

0

0

0

3:21

25/07/2020

Reranking for efficient transformer-based answer selection

Yoshitomo Matsubara, Thuy Vu, Alessandro Moschitti

Keywords Paper

natural language processing, question answering, transformer models, neural networks, information retrieval, reranking

0

0

0

0

9:45