Anchor & Transform: Learning Sparse Embeddings for Large Vocabularies

Abstract: Learning continuous representations of discrete objects such as text, users, movies, and URLs lies at the heart of many applications including language and user modeling. When using discrete objects as input to neural networks, we often ignore the underlying structures (e.g., natural groupings and similarities) and embed the objects independently into individual vectors. As a result, existing methods do not scale to large vocabulary sizes. In this paper, we design a simple and efficient embedding algorithm that learns a small set of anchor embeddings and a sparse transformation matrix. We call our method Anchor & Transform (ANT) as the embeddings of discrete objects are a sparse linear combination of the anchors, weighted according to the transformation matrix. ANT is scalable, flexible, and end-to-end trainable. We further provide a statistical interpretation of our algorithm as a Bayesian nonparametric prior for embeddings that encourages sparsity and leverages natural groupings among objects. By deriving an approximate inference algorithm based on Small Variance Asymptotics, we obtain a natural extension that automatically learns the optimal number of anchors instead of having to tune it as a hyperparameter. On text classification, language modeling, and movie recommendation benchmarks, we show that ANT is particularly suitable for large vocabulary sizes and demonstrates stronger performance with fewer parameters (up to 40x compression) as compared to existing compression baselines.

03/05/2021

Anchor & Transform: Learning Sparse Embeddings for Large Vocabularies

Paul Pu Liang, Manzil Zaheer, Yuan Wang, Amr Ahmed

Comments

Similar Papers

Fuzzy Tiling Activations: A Simple Approach to Learning Sparse Representations Online

Yangchen Pan, Kirby Banman, Martha White

Keywords Abstract Paper

natural sparsity, Reinforcement learning, fuzzy tiling activation function, sparse representation

Shapeshifter: a Parameter-efficient Transformer using Factorized Reshaped Matrices

Aliakbar Panahi, Seyran Saeedi, Tom Arodz

Keywords Abstract Paper

transformers

Structured Compression by Weight Encryption for Unstructured Pruning and Quantization

Se Jung Kwon, Dongsoo Lee, Byeongwook Kim and Parichay Kapoor, Baeseong Park, Gu-Yeon Wei

Keywords Abstract Paper

model compression, quantization, pruning, xor gate, parallelism, memory bandwidth, sparse matrix, structured format

Continuous Self-Attention Models with Neural ODE Networks

Jing Zhang, Peng Zhang, Baiwen Kong and Junqiu Wei, Xin Jiang

Keywords Abstract Paper

MetaNorm: Learning to Normalize Few-Shot Batches Across Domains

Yingjun Du, Xiantong Zhen, Ling Shao, Cees G Snoek

Keywords Abstract Paper

batch normalization, Meta-learning, few-shot domain generalization

Minimizing FLOPs to Learn Efficient Sparse Representations

Biswajit Paria, Chih-Kuan Yeh, Ian E.H. Yen and Ning Xu, Pradeep Ravikumar, Barnabás Póczos

Keywords Abstract Paper

sparse embeddings, deep representations, metric learning, regularization

Implicit Rank-Minimizing Autoencoder

Li Jing, Jure Zbontar, yann lecun

Keywords Abstract Paper

Improving Maximum Likelihood Training for Text Generation with Density Ratio Estimation

Yuxuan Song, Ning Miao, Hao Zhou and Lantao Yu, Mingxuan Wang, Lei Li

Keywords Abstract Paper

Sparse Sinkhorn Attention

Yi Tay, Dara Bahri, Liu Yang and Don Metzler, Da-Cheng Juan

Keywords Abstract Paper

Deep Learning - Algorithms

Automatic Mixed-Precision Quantization Search of BERT

Changsheng Zhao, Ting Hua, Yilin Shen and Qian Lou, Hongxia Jin

Keywords Abstract Paper

Machine Learning, Deep Learning, NLP Applications and Tools, Text Classification

Test-Time Classifier Adjustment Module for Model-Agnostic Domain Generalization

Yusuke Iwasawa, Yutaka Matsuo

Keywords Abstract Paper

deep learning, optimization, transformers, domain adaptation

Structured Conditional Continuous Normalizing Flows for Efficient Amortized Inference in Graphical Models

Christian Weilbach, Boyan Beronov, Frank Wood, William Harvey

Keywords Abstract Paper

Active learning with maximum margin sparse gaussian processes

Weishi Shi, Qi Yu

Keywords Abstract Paper

Naive Feature Selection: Sparsity in Naive Bayes

Armin Askari, Alexandre d'Aspremont, Laurent El Ghaoui

Keywords Abstract Paper

Longitudinal Deep Kernel Gaussian Process Regression

Junjie Liang, Yanting Wu, Dongkuan Xu, Vasant G Honavar

Keywords Abstract Paper

Disentangled Recurrent Wasserstein Autoencoder

Jun Han, Martin Min, Ligong Han and Li Erran Li, Xuan Zhang

Keywords Abstract Paper

Recurrent Generative Model, Sequential Representation Learning, Disentanglement

GSPL: A Succinct Kernel Model for Group-Sparse Projections Learning of Multiview Data

Danyang Wu, Jin Xu, Xia Dong and Meng Liao, Rong Wang, Feiping Nie, Xuelong Li

Keywords Abstract Paper

Machine Learning, Learning Sparse Models, Multi-instance; Multi-label; Multi-view learning, Unsupervised Learning

Channel Permutations for N:M Sparsity

Jeff Pool, Chong Yu

Keywords Abstract Paper

optimization

Overcoming the curse of dimensionality with Laplacian regularization in semi-supervised learning

Vivien Cabannes, Loucas Pillaud-Vivien, Francis Bach, Alessandro Rudi

Keywords Abstract Paper

machine learning, kernel methods, semi-supervised learning

Inducing and Exploiting Activation Sparsity for Fast Inference on Deep Neural Networks

Mark Kurtz, Justin Kopinsky, Rati Gelashvili and Alexander Matveev, John Carr, Michael Goin, William Leiserson, Sage Moore, Nir Shavit, Dan Alistarh

Keywords Abstract Paper

Deep Learning - Algorithms

Group-Sparse Matrix Factorization for Transfer Learning of Word Embeddings

Kan Xu, Xuanyi Zhao, Hamsa Bastani, Osbert Bastani

Keywords Abstract Paper

Keywords Paper

Keywords Paper

Se Jung Kwon, Dongsoo Lee, Byeongwook Kim and
Parichay Kapoor, Baeseong Park, Gu-Yeon Wei

Keywords Paper

Jing Zhang, Peng Zhang, Baiwen Kong and
Junqiu Wei, Xin Jiang

Keywords Paper

Keywords Paper

Biswajit Paria, Chih-Kuan Yeh, Ian E.H. Yen and
Ning Xu, Pradeep Ravikumar, Barnabás Póczos

Keywords Paper

Keywords Paper

Yuxuan Song, Ning Miao, Hao Zhou and
Lantao Yu, Mingxuan Wang, Lei Li

Keywords Paper

Yi Tay, Dara Bahri, Liu Yang and
Don Metzler, Da-Cheng Juan

Keywords Paper

Changsheng Zhao, Ting Hua, Yilin Shen and
Qian Lou, Hongxia Jin

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Jun Han, Martin Min, Ligong Han and
Li Erran Li, Xuan Zhang

Keywords Paper

Danyang Wu, Jin Xu, Xia Dong and
Meng Liao, Rong Wang, Feiping Nie, Xuelong Li

Keywords Paper

Keywords Paper

Keywords Paper

Mark Kurtz, Justin Kopinsky, Rati Gelashvili and
Alexander Matveev, John Carr, Michael Goin, William Leiserson, Sage Moore, Nir Shavit, Dan Alistarh

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Qing Li, Siyuan Huang, Yining Hong and
Yixin Chen, Ying Nian Wu, Song-Chun Zhu

Keywords Paper

Keywords Paper