Accelerating Neural Machine Translation with Partial Word Embedding Compression

02/02/2021

Accelerating Neural Machine Translation with Partial Word Embedding Compression

Fan Zhang, Mei Tu, Jinyao Yan

Keywords:

Abstract Paper Similar Papers

Abstract: Large model size and high computational complexity prevent the neural machine translation (NMT) models from being deployed to low resource devices (e.g. mobile phones). Due to the large vocabulary, a large storage memory is required for the word embedding matrix in NMT models, in the meantime, high latency is introduced when constructing the word probability distribution. Based on reusing the word embedding matrix in the softmax layer, it is possible to handle the two problems brought by large vocabulary at the same time. In this paper, we propose Partial Vector Quantization (P-VQ) for NMT models, which can both compress the word embedding matrix and accelerate word probability prediction in the softmax layer. With P-VQ, the word embedding matrix is split into two low dimensional matrices, namely the shared part and the exclusive part. We compress the shared part by vector quantization and leave the exclusive part unchanged to maintain the uniqueness of each word. For acceleration, in the softmax layer, we replace most of the multiplication operations with the efficient looking-up operations based on our compression to reduce the computational complexity. Furthermore, we adopt curriculum learning and compact the word embedding matrix gradually to improve the compression quality. Experimental results on the Chinese-to-English translation task show that our method can reduce 74.35% of parameters of the word embedding and 74.42% of the FLOPs of the softmax layer. Meanwhile, the average BLEU score on the WMT test sets only drops 0.04.

The video of this talk cannot be embedded. You can watch it here:

https://slideslive.com/38948867

(Link will open in new window)

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at AAAI 2021 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

19/04/2021

ProFormer: Towards on-device LSH projection based transformers

Chinnadhurai Sankar, Sujith Ravi, Zornitsa Kozareva

Keywords Paper

0

0

0

0

4:21

02/02/2021

Continuous Self-Attention Models with Neural ODE Networks

Jing Zhang, Peng Zhang, Baiwen Kong and
Junqiu Wei, Xin Jiang

Keywords Paper

0

0

0

0

15:25

06/12/2020

All Word Embeddings from One Embedding

Sho Takase, Sosuke Kobayashi

Keywords Paper

0

0

0

0

3:11

06/12/2021

PortaSpeech: Portable and High-Quality Generative Text-to-Speech

Yi Ren, Jinglin Liu, Zhou Zhao

Keywords Paper

generative model

0

0

0

0

10:15

02/02/2021

OPQ: Compressing Deep Neural Networks with One-shot Pruning-Quantization

Peng Hu, Xi Peng, Hongyuan Zhu and
Mohamed M. Sabry Aly, Jie Lin

Keywords Paper

0

0

0

0

13:26

06/12/2020

HAWQ-V2: Hessian Aware trace-Weighted Quantization of Neural Networks

Zhen Dong, Zhewei Yao, Daiyaan Arfeen and
Amir Gholami, Michael Mahoney, Kurt Keutzer

Keywords Paper

1

0

0

0

3:21

06/12/2020

ScaleCom: Scalable Sparsified Gradient Compression for Communication-Efficient Distributed Training

Chia-Yu Chen, Jiamin Ni, Songtao Lu and
Xiaodong Cui, Pin-Yu Chen, Xiao Sun, Naigang Wang, Swagath Venkataramani, Vijayalakshmi (Viji) Srinivasan, Wei Zhang, Kailash Gopalakrishnan

Keywords Paper

0

0

0

0

3:06

02/02/2021

Compressing Deep Convolutional Neural Networks by Stacking Low-dimensional Binary Convolution Filters

Weichao Lan, Liang Lan

Keywords Paper

0

0

0

0

14:38

03/05/2021

Neural Topic Model via Optimal Transport

He Zhao, Dinh Phung, Viet Huynh and
Trung Le, Wray Buntine

Keywords Paper

optimal transport, document analysis, topic modelling

0

0

0

1

9:29

02/02/2021

Plug-and-Play Domain Adaptation for Cross-Subject EEG-based Emotion Recognition

Li-Ming Zhao, Xu Yan, Bao-Liang Lu

Keywords Paper

0

0

0

0

18:10

16/11/2020

F2-Softmax: Diversifying Neural Text Generation via Frequency Factorized Softmax

Byung-Ju Choi, Jimin Hong, David Park, Sang Wan Lee

Keywords Paper

neural generation, sub-optimal generation, learning model, mefmax

0

0

0

0

11:37

02/02/2021

LREN: Low-Rank Embedded Network for Sample-Free Hyperspectral Anomaly Detection

Kai Jiang, Weiying Xie, Jie Lei and
Tao Jiang, Yunsong Li

Keywords Paper

0

0

0

0

12:56

16/11/2020

From Zero to Hero: On the Limitations of Zero-Shot Language Transfer with Multilingual Transformers

Anne Lauscher, Vinit Ravishankar, Ivan Vulić, Goran Glavaš

Keywords Paper

zero-shot transfer, downstream transfer, resource-lean scenarios, pos tagging

0

0

0

0

11:45

16/11/2020

Sparse Text Generation

Pedro Henrique Martins, Zita Marinho, André F. T. Martins

Keywords Paper

story completion, dialogue generation, text generators, language models

0

0

0

0

11:27

06/12/2021

Shapeshifter: a Parameter-efficient Transformer using Factorized Reshaped Matrices

Aliakbar Panahi, Seyran Saeedi, Tom Arodz

Keywords Paper

transformers

0

0

0

0

13:06

02/02/2021

HR-Depth: High Resolution Self-Supervised Monocular Depth Estimation

Xiaoyang Lyu, Liang Liu, Mengmeng Wang and
Xin Kong, Lina Liu, Yong Liu, Xinxin Chen, Yi Yuan

Keywords Paper

0

0

0

0

12:10

07/09/2020

Learning Effectively from Noisy Supervision for Weakly Supervised Semantic Segmentation

Wenbin Xie, Qiaoqiao Wei, Zheng Li, Hui Zhang

Keywords Paper

Semantic Segmentation, Weakly Supervised Semantic Segmentation, Self Attention

0

0

0

0

3:46

03/05/2021

Learnable Embedding sizes for Recommender Systems

Siyi Liu, Chen Gao, Yihong Chen and
Depeng Jin, Yong Li

Keywords Paper

Deep Learning, Embedding Size, Recommender Systems

0

0

0

0

5:29

02/02/2021

DPFPS: Dynamic and Progressive Filter Pruning for Compressing Convolutional Neural Networks from Scratch

Xiaofeng Ruan, Yufan Liu, Bing Li and
Chunfeng Yuan, Weiming Hu

Keywords Paper

0

0

0

0

14:38

26/04/2020

Plug and Play Language Models: A Simple Approach to Controlled Text Generation

Sumanth Dathathri, Andrea Madotto, Janice Lan and
Jane Hung, Eric Frank, Piero Molino, Jason Yosinski, Rosanne Liu

Keywords Paper

controlled text generation, generative models, conditional generative models, language modeling, transformer

0

0

1

1

4:58

04/07/2020

Masked Language Model Scoring

Julian Salazar, Davis Liang, Toan Q. Nguyen, Katrin Kirchhoff

Keywords Paper

Masked Scoring, NLP tasks, domain adaptation, language scoring

0

0

0

0

11:24

06/12/2021

Accelerated Sparse Neural Training: A Provable and Efficient Method to Find N:M Transposable Masks

Itay Hubara, Brian Chmiel, Moshe Island and
Ron Banner, Joseph Naor, Daniel Soudry

Keywords Paper

deep learning

0

0

0

0

11:02

05/01/2021

A Variational Information Bottleneck Based Method to Compress Sequential Networks for Human Action Recognition

Ayush Srivastava, Oshin Dutta, Jigyasa Gupta and
Sumeet Agarwal, Prathosh AP

Keywords Paper

0

0

0

0

4:29

26/04/2020

The Curious Case of Neural Text Degeneration

Ari Holtzman, Jan Buys, Li Du and
Maxwell Forbes, Yejin Choi

Keywords Paper

generation, text, NLG, NLP, natural language, natural language generation, language model, neural, neural language model

0

0

0

0

4:57

06/12/2021

The Limitations of Large Width in Neural Networks: A Deep Gaussian Process Perspective

Geoff Pleiss, John Cunningham

Keywords Paper

deep learning, kernel methods

0

0

0

0

6:59

30/11/2020

Imbalance Robust Softmax for Deep Embedding Learning

Hao Zhu, Yang Yuan, Guosheng Hu and
Xiang Wu, Neil Robertson

Keywords Paper

0

0

0

0

7:16

05/01/2021

Spike-Thrift: Towards Energy-Efficient Deep Spiking Neural Networks by Limiting Spiking Activity via Attention-Guided Compression

Souvik Kundu, Gourav Datta, Massoud Pedram, Peter A. Beerel

Keywords Paper

0

0

0

0

5:22

26/04/2020

AutoQ: Automated Kernel-Wise Neural Network Quantization

Qian Lou, Feng Guo, Minje Kim and
Lantao Liu, Lei Jiang.

Keywords Paper

AutoML, Kernel-Wise Neural Networks Quantization, Hierarchical Deep Reinforcement Learning

0

0

0

0

4:54

06/12/2020

DynaBERT: Dynamic BERT with Adaptive Width and Depth

Lu Hou, Zhiqi Huang, Lifeng Shang and
Xin Jiang, Xiao Chen, Qun Liu

Keywords Paper

0

0

0

0

2:59

01/07/2020

Span-Based LCFRS-2 Parsing

Miloš Stanojević, Mark Steedman

Keywords Paper

0

0

0

0

14:37

06/12/2021

LLC: Accurate, Multi-purpose Learnt Low-dimensional Binary Codes

Aditya Kusupati, Matthew Wallingford, Vivek Ramanujan and
Raghav Somani, Jae Sung Park, Krishna Pillutla, Prateek Jain, Sham Kakade, Ali Farhadi

Keywords Paper

machine learning, representation learning

0

0

0

0

15:05

06/12/2021

SOFT: Softmax-free Transformer with Linear Complexity

Jiachen Lu, Jinghan Yao, Junge Zhang and
Xiatian Zhu, Hang Xu, Weiguo Gao, Chunjing XU, Tao Xiang, Li Zhang

Keywords Paper

robustness, transformers, language

0

0

0

0

8:04

15/06/2020

MatrixKV: Reducing Write Stalls and Write Amplification in LSM-tree Based KV Stores with Matrix Container in NVM

Ting Yao, Yiwen Zhang, Jiguang Wan and
Qiu Cui, Liu Tang, Hong Jiang, Changsheng Xie, Xubin He

Keywords Paper

0

0

0

0

20:43

19/08/2021

Deep Unified Cross-Modality Hashing by Pairwise Data Alignment

Yimu Wang, Bo Xue, Quan Cheng and
Yuhui Chen, Lijun Zhang

Keywords Paper

Computer Vision, Recognition, Information Retrieval, Deep Learning

0

0

0

0

13:11

08/12/2020

Emergent Communication Pretraining for Few-Shot Machine Translation

Yaoyiran Li, Edoardo Maria Ponti, Ivan Vulić, Anna Korhonen

Keywords Paper

0

0

0

0

14:42

02/11/2020

Lightweight convolutional neural networks on binaural waveforms for low complexity acoustic scene classification

Nicolas Pajusco, Richard Huang, Nicolas Farrugia

Keywords Paper

0

0

0

0

11:50

03/05/2021

Mirostat: A Neural Text Decoding Algorithm That Directly Controls Perplexity

Sourya Basu, Govardana Sachithanandam Ramachandran, Nitish Shirish Keskar, Lav R Varshney

Keywords Paper

cross-entropy, incoherence, repetitions, sampling algorithms, Neural text decoding

0

0

0

0

5:07

03/05/2021

SOLAR: Sparse Orthogonal Learned and Random Embeddings

Tharun Medini Medini, Beidi Chen, Anshumali Shrivastava

Keywords Paper

Embedding Models, Learning to Hash, Inverted Index, Sparse Embedding

0

0

0

0

4:24

19/08/2021

Learning Deeper Non-Monotonic Networks by Softly Transferring Solution Space

Zheng-Fan Wu, Hui Xue, Weimin Bai

Keywords Paper

Machine Learning, Kernel Methods, Deep Learning, Classification

0

0

0

0

12:50

06/12/2021

Aligned Structured Sparsity Learning for Efficient Image Super-Resolution

Yulun Zhang, Huan Wang, Can Qin, Yun Fu

Keywords Paper

deep learning

0

0

0

0

13:23