Vector-Vector-Matrix Architecture: A Novel Hardware-Aware Framework for Low-Latency Inference in NLP Applications

16/11/2020

Vector-Vector-Matrix Architecture: A Novel Hardware-Aware Framework for Low-Latency Inference in NLP Applications

Matthew Khoury, Rumen Dangovski, Longwu Ou, Preslav Nakov, Yichen Shen, Li Jing

Keywords: natural applications, neural translation, neural nmt, neural

Abstract Paper Similar Papers

Abstract: Deep neural networks have become the standard approach to building reliable Natural Language Processing (NLP) applications, ranging from Neural Machine Translation (NMT) to dialogue systems. However, improving accuracy by increasing the model size requires a large number of hardware computations, which can slow down NLP applications significantly at inference time. To address this issue, we propose a novel vector-vector-matrix architecture (VVMA), which greatly reduces the latency at inference time for NMT. This architecture takes advantage of specialized hardware that has low-latency vector-vector operations and higher-latency vector-matrix operations. It also reduces the number of parameters and FLOPs for virtually all models that rely on efficient matrix multipliers without significantly impacting accuracy. We present empirical results suggesting that our framework can reduce the latency of sequence-to-sequence and Transformer models used for NMT by a factor of four. Finally, we show evidence suggesting that our VVMA extends to other domains, and we discuss novel hardware for its efficient use.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at EMNLP 2020 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

18/07/2021

Winograd Algorithm for AdderNet

Wenshuo Li, Hanting Chen, Mingqiang Huang and
Xinghao Chen, Chunjing Xu, Yunhe Wang

Keywords Paper

Deep Learning

0

0

0

0

5:02

23/08/2020

Rethinking pruning for accelerating deep inference at the edge

Dawei Gao, Xiaoxi He, Zimu Zhou and
Yongxin Tong, Ke Xu, Lothar Thiele

Keywords Paper

automatic speech recognition, deep learning, name entity recognition, network pruning, sequence labelling

0

0

0

0

13:43

14/06/2020

F-BRS: Rethinking Backpropagating Refinement for Interactive Segmentation

Konstantin Sofiiuk, Ilia Petrov, Olga Barinova, Anton Konushin

Keywords Paper

interactive segmentation, interactive, instance segmentation, segmentation, backpropagating refinement, refinement

0

0

0

0

4:56

19/08/2021

Pruning of Deep Spiking Neural Networks through Gradient Rewiring

Yanqi Chen, Zhaofei Yu, Wei Fang and
Tiejun Huang, Yonghong Tian

Keywords Paper

Humans and AI, Brain Sciences, Cognitive Modeling, Classification

0

0

0

0

12:58

03/05/2021

AdamP: Slowing Down the Slowdown for Momentum Optimizers on Scale-invariant Weights

Byeongho Heo, Sanghyuk Chun, Seong Joon Oh and
Dongyoon Han, Sangdoo Yun, Gyuwan Kim, Youngjung Uh, Jung-Woo Ha

Keywords Paper

effective learning rate, normalize layer, scale-invariant weights, momentum optimizer

0

0

0

0

5:16

05/01/2021

Dynamic Routing Networks

Shaofeng Cai, Yao Shu, Wei Wang

Keywords Paper

0

0

0

0

4:52

05/01/2021

OverNet: Lightweight Multi-Scale Super-Resolution With Overscaling Network

Parichehr Behjati, Pau Rodriguez, Armin Mehri and
Isabelle Hupont, Carles Fernandez Tena, Jordi Gonzalez

Keywords Paper

0

0

0

0

4:24

14/09/2020

Squeezing Correlated Neurons for Resource-Efficient Deep Neural Networks

Elbruz Ozen, Alex Orailoglu

Keywords Paper

deep learning, information redundancy, pruning

0

0

0

0

14:48

06/12/2021

Heavy Ball Neural Ordinary Differential Equations

Hedi Xia, Vai Suliafu, Hangjie Ji and
Tan Nguyen, Andrea Bertozzi, Stanley Osher, Bao Wang

Keywords Paper

deep learning, optimization, machine learning, vision

0

0

0

0

4:08

06/12/2021

Efficient Equivariant Network

Lingshen He, Yuxuan Chen, zhengyang shen and
Yiming Dong, Yisen Wang, Zhouchen Lin

Keywords Paper

deep learning, vision

0

0

0

0

8:20

01/07/2020

Compressing Neural Machine Translation Models with 4-bit Precision

Alham Fikri Aji, Kenneth Heafield

Keywords Paper

0

0

0

0

9:35

02/02/2021

TRQ: Ternary Neural Networks With Residual Quantization

Yue Li, Wenrui Ding, Chunlei Liu and
Baochang Zhang, Guodong Guo

Keywords Paper

0

0

0

0

15:21

26/04/2020

Minimizing FLOPs to Learn Efficient Sparse Representations

Biswajit Paria, Chih-Kuan Yeh, Ian E.H. Yen and
Ning Xu, Pradeep Ravikumar, Barnabás Póczos

Keywords Paper

sparse embeddings, deep representations, metric learning, regularization

0

0

0

0

4:41

14/06/2020

ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks

Qilong Wang, Banggu Wu, Pengfei Zhu and
Peihua Li, Wangmeng Zuo, Qinghua Hu

Keywords Paper

channel attention, efficient, adaptive 1d convolution, deep cnns, image classifcation, object detection, instance segmentation

0

0

0

0

0:57

14/06/2020

Frequency Domain Compact 3D Convolutional Neural Networks

Hanting Chen, Yunhe Wang, Han Shu and
Yehui Tang, Chunjing Xu, Boxin Shi, Chao Xu, Qi Tian, Chang Xu

Keywords Paper

model compression, 3d convolutional neural networks, network pruning, frequency domain transform

0

0

0

0

1:02

14/06/2020

Fast Sparse ConvNets

Erich Elsen, Marat Dukhan, Trevor Gale, Karen Simonyan

Keywords Paper

vision, convolutional networks, cnns, efficient inference, sparsity, mobile, edge, tensorflow, xnnpack

0

0

0

0

1:01

14/06/2020

P–nets: Deep Polynomial Neural Networks

Grigorios G. Chrysos, Stylianos Moschoglou, Giorgos Bouritsas and
Yannis Panagakis, Jiankang Deng, Stefanos Zafeiriou

Keywords Paper

polynomial neural networks, tensor decompositions, high-order polynomials, generative models, discriminative models, stylegan, resnet, 3d mesh representation learning, activation functions

0

0

0

0

1:00

02/02/2021

Linearly Replaceable Filters for Deep Network Channel Pruning

Donggyu Joo, Eojindl Yi, Sunghyun Baek, Junmo Kim

Keywords Paper

0

0

0

0

15:47

12/07/2020

Boosting Deep Neural Network Efficiency with Dual-Module Inference

Liu Liu, Lei Deng, Zhaodong Chen and
yuke wang, Shuangchen Li, Jingwei Zhang, Yihua Yang, Zhenyu Gu, Yufei Ding, Yuan Xie

Keywords Paper

Deep Learning - General

0

0

0

0

8:04

12/07/2020

Efficient and Scalable Bayesian Neural Nets with Rank-1 Factors

Mike Dusenberry, Ghassen Jerfel, Yeming Wen and
Yian Ma, Jasper Snoek, Katherine Heller, Balaji Lakshminarayanan, Dustin Tran

Keywords Paper

Deep Learning - General

0

0

0

1

14:29

14/06/2020

Factorized Higher-Order CNNs With an Application to Spatio-Temporal Emotion Estimation

Jean Kossaifi, Antoine Toisoul, Adrian Bulat and
Yannis Panagakis, Timothy M. Hospedales, Maja Pantic

Keywords Paper

tensor methods, deep learning, spatiotemporal, emotion, cnn, tensor decomposition, low-rank, valence, arousal

0

0

0

0

1:01

06/12/2021

Boost Neural Networks by Checkpoints

Feng Wang, Guoyizhe Wei, Qiao Liu and
Jinxiang Ou, xian wei, Hairong Lv

Keywords Paper

deep learning

1

0

0

0

4:45

07/09/2020

Paying more Attention to Snapshots of Iterative Pruning: Improving Model Compression via Ensemble Distillation

Duong Le, Nhan Vo, Nam Thoai

Keywords Paper

network pruning, knowledge distillation, ensemble learning

0

0

0

0

8:30

18/07/2021

A Novel Sequential Coreset Method for Gradient Descent Algorithms

Jiawei Huang, Ruomin Huang, wenjie liu and
Nikolaos Freris, Hu Ding

Keywords Paper

Optimization

0

0

0

0

5:15

02/02/2021

Amata: An Annealing Mechanism for Adversarial Training Acceleration

Nanyang Ye, Qianxiao Li, Xiao-Yun Zhou, Zhanxing Zhu

Keywords Paper

0

0

0

0

14:30

03/05/2021

A Block Minifloat Representation for Training Deep Neural Networks

Sean Fox, Seyedramin Rasoulinezhad, Julian Faraone and
david boland, Philip Leong

Keywords Paper

0

0

0

0

5:15

03/05/2021

Reweighting Augmented Samples by Minimizing the Maximal Expected Loss

Mingyang Yi, LU HOU, Lifeng Shang and
Xin Jiang, Qun Liu, Zhi-Ming Ma

Keywords Paper

sample reweighting, data augmentation

0

0

0

0

4:58

18/07/2021

Learn-to-Share: A Hardware-friendly Transfer Learning Framework Exploiting Computation and Parameter Sharing

Cheng Fu, Hanxian Huang, Xinyun Chen and
Yuandong Tian, Jishen Zhao

Keywords Paper

Applications, Natural Language Processing

0

0

0

0

16:03

03/05/2021

Fast and Complete: Enabling Complete Neural Network Verification with Rapid and Massively Parallel Incomplete Verifiers

Kaidi Xu, Huan Zhang, Shiqi Wang and
Yihan Wang, Suman Jana, Xue Lin, Cho-Jui Hsieh

Keywords Paper

branch and bound, neural network verification

0

0

0

0

5:08

14/06/2020

AANet: Adaptive Aggregation Network for Efficient Stereo Matching

Haofei Xu, Juyong Zhang

Keywords Paper

stereo matching, cost aggregation, edge-preserving, deformable convolution, cost volume, dense correspondences

0

0

0

0

1:01

12/07/2020

Inducing and Exploiting Activation Sparsity for Fast Inference on Deep Neural Networks

Mark Kurtz, Justin Kopinsky, Rati Gelashvili and
Alexander Matveev, John Carr, Michael Goin, William Leiserson, Sage Moore, Nir Shavit, Dan Alistarh

Keywords Paper

Deep Learning - Algorithms

0

0

0

0

14:41

06/12/2021

Collapsed Variational Bounds for Bayesian Neural Networks

Marcin Tomczak, Siddharth Swaroop, Andrew Foong, Richard Turner

Keywords Paper

deep learning, optimization, generative model

0

0

0

0

5:44

19/04/2021

Incremental beam manipulation for natural language generation

James Hargreaves, Andreas Vlachos, Guy Emerson

Keywords Paper

0

0

0

0

10:34

06/12/2020

GPU-Accelerated Primal Learning for Extremely Fast Large-Scale Classification

John Halloran, David M Rocke

Keywords Paper

0

0

0

0

3:33

06/12/2021

Efficient Neural Network Training via Forward and Backward Propagation Sparsification

Xiao Zhou, Weizhong Zhang, Zonghao Chen and
SHIZHE DIAO, Tong Zhang

Keywords Paper

deep learning, optimization

0

0

0

0

7:48

06/12/2021

The Limitations of Large Width in Neural Networks: A Deep Gaussian Process Perspective

Geoff Pleiss, John Cunningham

Keywords Paper

deep learning, kernel methods

0

0

0

0

6:59

25/07/2020

Reranking for efficient transformer-based answer selection

Yoshitomo Matsubara, Thuy Vu, Alessandro Moschitti

Keywords Paper

natural language processing, question answering, transformer models, neural networks, information retrieval, reranking

0

0

0

0

9:45

05/01/2021

Exploiting the Redundancy in Convolutional Filters for Parameter Reduction

Kumara Kahatapitiya, Ranga Rodrigo

Keywords Paper

0

0

0

0

5:10

06/12/2021

AC-GC: Lossy Activation Compression with Guaranteed Convergence

R David Evans, Tor Aamodt

Keywords Paper

deep learning, optimization, graph learning

0

0

0

0

14:39

14/06/2020

Meta-Transfer Learning for Zero-Shot Super-Resolution

Jae Woong Soh, Sunwoo Cho, Nam Ik Cho

Keywords Paper

zero-shot super-resolution, meta learning, transfer learning

0

0

0

0

0:59