Extreme Model Compression for On-device Natural Language Understanding

08/12/2020

Extreme Model Compression for On-device Natural Language Understanding

Kanthashree Mysore Sathyendra, Samridhi Choudhary, Leah Nicolich-Henkin

Keywords:

Abstract Paper Similar Papers

Abstract: In this paper, we propose and experiment with techniques for extreme compression of neural natural language understanding (NLU) models, making them suitable for execution on resource-constrained devices. We propose a task-aware, end-to-end compression approach that performs word-embedding compression jointly with NLU task learning. We show our results on a large-scale, commercial NLU system trained on a varied set of intents with huge vocabulary sizes. Our approach outperforms a range of baselines and achieves a compression rate of 97.4% with less than 3.7% degradation in predictive performance. Our analysis indicates that the signal from the downstream task is important for effective compression with minimal degradation in performance.

The video of this talk cannot be embedded. You can watch it here:

https://underline.io/lecture/6123-extreme-model-compression-for-on-device-natural-language-understanding

(Link will open in new window)

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at COLING Workshops 2020 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

06/12/2021

DeepReduce: A Sparse-tensor Communication Framework for Federated Deep Learning

Hang Xu, Kelly Kostopoulou, Aritra Dutta and
Xin Li, Alexandros Ntoulas, Panos Kalnis

Keywords Paper

deep learning, federated learning

0

0

0

0

12:15

03/05/2021

Share or Not? Learning to Schedule Language-Specific Capacity for Multilingual Translation

Biao Zhang, Ankur Bapna, Rico Sennrich, Orhan Firat

Keywords Paper

multilingual transformer, multilingual translation, language-specific modeling, conditional computation

0

0

0

0

15:04

19/08/2021

A Compression-Compilation Framework for On-mobile Real-time BERT Applications

Wei Niu, Zhenglun Kong, Geng Yuan and
Weiwen Jiang, Jiexiong Guan, Caiwen Ding, Pu Zhao, Sijia Liu, Bin Ren, Yanzhi Wang

Keywords Paper

Knowledge Representation and Reasoning, General, General

0

0

0

0

10:44

06/12/2020

ScaleCom: Scalable Sparsified Gradient Compression for Communication-Efficient Distributed Training

Chia-Yu Chen, Jiamin Ni, Songtao Lu and
Xiaodong Cui, Pin-Yu Chen, Xiao Sun, Naigang Wang, Swagath Venkataramani, Vijayalakshmi (Viji) Srinivasan, Wei Zhang, Kailash Gopalakrishnan

Keywords Paper

0

0

0

0

3:06

14/06/2020

Adaptive Loss-Aware Quantization for Multi-Bit Networks

Zhongnan Qu, Zimu Zhou, Yun Cheng, Lothar Thiele

Keywords Paper

quantization, binary neural networks, adaptive bitwidth, loss-aware

0

0

0

0

1:01

12/07/2020

Variable-Bitrate Neural Compression via Bayesian Arithmetic Coding

Yibo Yang, Robert Bamler, Stephan Mandt

Keywords Paper

Deep Learning - General

0

0

0

0

15:08

02/02/2021

OPQ: Compressing Deep Neural Networks with One-shot Pruning-Quantization

Peng Hu, Xi Peng, Hongyuan Zhu and
Mohamed M. Sabry Aly, Jie Lin

Keywords Paper

0

0

0

0

13:26

05/01/2021

A Variational Information Bottleneck Based Method to Compress Sequential Networks for Human Action Recognition

Ayush Srivastava, Oshin Dutta, Jigyasa Gupta and
Sumeet Agarwal, Prathosh AP

Keywords Paper

0

0

0

0

4:29

18/07/2021

Unsupervised Representation Learning via Neural Activation Coding

Yookoon Park, Sangho Lee, Gunhee Kim, David Blei

Keywords Paper

Deep Learning, Embedding and Representation learning

0

0

0

0

13:50

26/08/2020

Improving Maximum Likelihood Training for Text Generation with Density Ratio Estimation

Yuxuan Song, Ning Miao, Hao Zhou and
Lantao Yu, Mingxuan Wang, Lei Li

Keywords Paper

0

0

0

0

12:32

08/12/2020

Best Practices for Data-Efficient Modeling in NLG:How to Train Production-Ready Neural Models with Less Data

Ankit Arun, Soumya Batra, Vikas Bhardwaj and
Ashwini Challa, Pinar Donmez, Peyman Heidari, Hakan Inan, Shashank Jain, Anuj Kumar, Shawn Mei, Karthik Mohan, Michael White

Keywords Paper

0

0

0

0

15:01

02/02/2021

DenserNet: Weakly Supervised Visual Localization Using Multi-Scale Feature Aggregation

Dongfang Liu, Yiming Cui, Liqi Yan and
Christos Mousas, Baijian Yang, Yingjie Chen

Keywords Paper

0

0

0

0

16:15

26/04/2020

Data-Independent Neural Pruning via Coresets

Ben Mussay, Margarita Osadchy, Vladimir Braverman and
Samson Zhou, Dan Feldman

Keywords Paper

coresets, neural pruning, network compression

0

0

0

0

4:23

06/12/2021

Understanding Adaptive, Multiscale Temporal Integration In Deep Speech Recognition Systems

Menoua Keshishian, Samuel Norman-Haignere, Nima Mesgarani

Keywords Paper

deep learning, machine learning

0

0

0

0

10:28

06/12/2021

Compressing Neural Networks: Towards Determining the Optimal Layer-wise Decomposition

Lucas Liebenwein, Alaa Maalouf, Dan Feldman, Daniela Rus

Keywords Paper

deep learning, optimization

0

0

0

0

14:34

07/09/2020

Paying more Attention to Snapshots of Iterative Pruning: Improving Model Compression via Ensemble Distillation

Duong Le, Nhan Vo, Nam Thoai

Keywords Paper

network pruning, knowledge distillation, ensemble learning

0

0

0

0

8:30

08/12/2020

Have Your Text and Use It Too! End-to-End Neural Data-to-Text Generation with Semantic Fidelity

Hamza Harkous, Isabel Groves, Amir Saffari

Keywords Paper

0

0

0

0

14:37

26/04/2020

Improving Neural Language Generation with Spectrum Control

Lingxiao Wang, Jing Huang, Kevin Huang and
Ziniu Hu, Guangtao Wang, Quanquan Gu

Keywords Paper

0

0

0

0

4:58

03/05/2021

Neural Topic Model via Optimal Transport

He Zhao, Dinh Phung, Viet Huynh and
Trung Le, Wray Buntine

Keywords Paper

optimal transport, document analysis, topic modelling

0

0

0

1

9:29

12/07/2020

Operation-Aware Soft Channel Pruning using Differentiable Masks

Minsoo Kang, Bohyung Han

Keywords Paper

Applications - Computer Vision

0

0

0

0

14:56

06/12/2020

All Word Embeddings from One Embedding

Sho Takase, Sosuke Kobayashi

Keywords Paper

0

0

0

0

3:11

26/04/2020

Deep probabilistic subsampling for task-adaptive compressed sensing

Iris A.M. Huijben, Bastiaan S. Veeling, Ruud J.G. van Sloun

Keywords Paper

0

0

0

0

4:57

12/07/2020

Differentiable Product Quantization for Learning Compact Embedding Layers

Ting Chen, Lala Li, Yizhou Sun

Keywords Paper

Applications - Language, Speech and Dialog

0

0

0

0

10:16

02/02/2021

Continuous Self-Attention Models with Neural ODE Networks

Jing Zhang, Peng Zhang, Baiwen Kong and
Junqiu Wei, Xin Jiang

Keywords Paper

0

0

0

0

15:25

26/04/2020

Neural Epitome Search for Architecture-Agnostic Network Compression

Daquan Zhou, Xiaojie Jin, Qibin Hou and
Kaixin Wang, Jianchao Yang, Jiashi Feng

Keywords Paper

Network Compression, Classification, Deep Learning, Weights Sharing

0

0

0

0

6:22

19/04/2021

Zero-shot neural passage retrieval via domain-targeted synthetic question generation

Ji Ma, Ivan Korotkov, Yinfei Yang and
Keith Hall, Ryan McDonald

Keywords Paper

0

0

0

0

12:47

22/11/2021

Contextual Convolution Blocks

David Marwood, Shumeet Baluja

Keywords Paper

spatially selective features, convolutional layer, cc-block, self-attention, se-block, squeeze and excitation, excitation map

0

0

0

0

2:45

30/11/2020

Lossless Image Compression Using a Multi-Scale Progressive Statistical Model

Honglei Zhang, Francesco Cricri, Hamed R. Tavakoli and
Nannan Zou, Emre Aksu, Miska M. Hannuksela

Keywords Paper

0

0

0

0

9:33

22/11/2021

Quality Level Prediction of Image Compression using Block-wise Confidence-aware CNN

Kyuwon Kim, Chulju Yang

Keywords Paper

convolutional neural network (CNN), compression artifacts removal, compression quality prediction, confidence estimation

0

0

0

0

3:02

14/06/2020

Forward and Backward Information Retention for Accurate Binary Neural Networks

Haotong Qin, Ruihao Gong, Xianglong Liu and
Mingzhu Shen, Ziran Wei, Fengwei Yu, Jingkuan Song

Keywords Paper

model compression, binary neural networks, deep learning, quantization, computer vision

0

0

0

0

1:00

14/06/2020

Automatic Neural Network Compression by Sparsity-Quantization Joint Learning: A Constrained Optimization-Based Approach

Haichuan Yang, Shupeng Gui, Yuhao Zhu, Ji Liu

Keywords Paper

model compression, pruning, quantization, structured projection

0

0

0

0

1:01

06/12/2021

PortaSpeech: Portable and High-Quality Generative Text-to-Speech

Yi Ren, Jinglin Liu, Zhou Zhao

Keywords Paper

generative model

0

0

0

0

10:15

30/11/2020

Gaussian Vector: An Efficient Solution for Facial Landmark Detection

Yilin Xiong, Zijian Zhou, Yuhao Dou, Zhizhong Su

Keywords Paper

0

0

0

0

8:53

14/06/2020

The Knowledge Within: Methods for Data-Free Model Compression

Matan Haroush, Itay Hubara, Elad Hoffer, Daniel Soudry

Keywords Paper

compression, data-free, quantization, fine-tunning, knowledge-distillation, feature-visulization, computer-vision, statistics, similarity-measure, privacy

0

0

0

0

0:51

06/12/2021

On the Out-of-distribution Generalization of Probabilistic Image Modelling

Mingtian Zhang, Andi Zhang, Steven McDonagh

Keywords Paper

generative model

0

0

0

0

10:06

06/12/2021

Convolutional Normalization: Improving Deep Convolutional Network Robustness and Training

Sheng Liu, Xiao Li, Yuexiang Zhai and
Chong You, Zhihui Zhu, Carlos Fernandez-Granda, Qing Qu

Keywords Paper

deep learning, machine learning, robustness, generative model

0

0

0

0

6:45

12/07/2020

FedBoost: A Communication-Efficient Algorithm for Federated Learning

Jenny Hamer, Mehryar Mohri, Ananda Theertha Suresh

Keywords Paper

General Machine Learning Techniques

0

0

0

0

15:14

02/11/2020

Lightweight convolutional neural networks on binaural waveforms for low complexity acoustic scene classification

Nicolas Pajusco, Richard Huang, Nicolas Farrugia

Keywords Paper

0

0

0

0

11:50

06/12/2021

AugMax: Adversarial Composition of Random Augmentations for Robust Training

Haotao Wang, Chaowei Xiao, Jean Kossaifi and
Zhiding Yu, Anima Anandkumar, Zhangyang Wang

Keywords Paper

deep learning, robustness, adversarial robustness and security

0

0

0

0

11:19

02/02/2021

Accelerating Neural Machine Translation with Partial Word Embedding Compression

Fan Zhang, Mei Tu, Jinyao Yan

Keywords Paper

0

0

0

0

14:53