Hard-Coded Gaussian Attention for Neural Machine Translation

04/07/2020

Hard-Coded Gaussian Attention for Neural Machine Translation

Weiqiu You, Simeng Sun, Mohit Iyyer

Keywords: Neural Translation, Hard-Coded Attention, variant, encoder

Abstract Paper Similar Papers

Abstract: Recent work has questioned the importance of the Transformer's multi-headed attention for achieving high translation quality. We push further in this direction by developing a ``hard-coded'' attention variant without any learned parameters. Surprisingly, replacing all learned self-attention heads in the encoder and decoder with fixed, input-agnostic Gaussian distributions minimally impacts BLEU scores across four different language pairs. However, additionally, hard-coding cross attention (which connects the decoder to the encoder) significantly lowers BLEU, suggesting that it is more important than self-attention. Much of this BLEU drop can be recovered by adding just a single learned cross attention head to an otherwise hard-coded Transformer. Taken as a whole, our results offer insight into which components of the Transformer are actually important, which we hope will guide future work into the development of simpler and more efficient attention-based models.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at ACL 2020 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

06/12/2021

Scatterbrain: Unifying Sparse and Low-rank Attention

Beidi Chen, Tri Dao, Eric Winsor and
Zhao Song, Atri Rudra, Christopher Ré

Keywords Paper

transformers, generative model

0

0

0

0

13:15

06/12/2021

SOFT: Softmax-free Transformer with Linear Complexity

Jiachen Lu, Jinghan Yao, Junge Zhang and
Xiatian Zhu, Hang Xu, Weiguo Gao, Chunjing XU, Tao Xiang, Li Zhang

Keywords Paper

robustness, transformers, language

0

0

0

0

8:04

06/12/2021

IA-RED$^2$: Interpretability-Aware Redundancy Reduction for Vision Transformers

Bowen Pan, Rameswar Panda, Yifan Jiang and
Zhangyang Wang, Rogerio Feris, Aude Oliva

Keywords Paper

deep learning, transformers, vision, interpretability

0

0

0

0

11:25

08/12/2020

Emergent Communication Pretraining for Few-Shot Machine Translation

Yaoyiran Li, Edoardo Maria Ponti, Ivan Vulić, Anna Korhonen

Keywords Paper

0

0

0

0

14:42

05/01/2021

Self Supervision for Attention Networks

Badri N. Patro, Kasturi G.S., Ansh Jain, Vinay P. Namboodiri

Keywords Paper

0

0

0

0

5:01

04/07/2020

DeFormer: Decomposing Pre-trained Transformers for Faster Question Answering

Qingqing Cao, Harsh Trivedi, Aruna Balasubramanian, Niranjan Balasubramanian

Keywords Paper

Faster Answering, question-independent processing, DeFormer, Decomposing Transformers

0

0

0

0

11:06

08/12/2020

Infusing Sequential Information into Conditional Masked Translation Model with Self-Review Mechanism

Pan Xie, Zhi Cui, Xiuying Chen and
XiaoHui Hu, Jianwei Cui, Bin Wang

Keywords Paper

0

0

0

0

6:43

17/08/2020

Learning temporal coherence via self-supervision for GAN-based video generation

Mengyu Chu, You Xie, Jonas Mayer and
Laura Leal-Taixé, Nils Thuerey

Keywords Paper

self-supervision, temporal cycle-consistency, video super-resolution, generative adversarial network, unpaired video translation

0

0

0

0

16:59

03/05/2021

Sharper Generalization Bounds for Learning with Gradient-dominated Objective Functions

Yunwen Lei, Yiming Ying

Keywords Paper

generalization bounds, non-convex learning

0

0

0

0

5:09

06/12/2020

Modular Meta-Learning with Shrinkage

Yutian Chen, Abe Friesen, Feryal Behbahani and
Arnaud Doucet, David Budden, Matthew Hoffman, Nando de Freitas

Keywords Paper

0

0

0

0

3:21

06/12/2020

Top-k Training of GANs: Improving GAN Performance by Throwing Away Bad Samples

Samarth Sinha, Zhengli Zhao, Anirudh Goyal ALIAS PARTH GOYAL and
Colin A Raffel, Augustus Odena

Keywords Paper

0

0

0

0

3:20

16/11/2020

Revisiting Modularized Multilingual NMT to Meet Industrial Demands

Sungwon Lyu, Bokyung Son, Kichang Yang, Jaekyoung Bae

Keywords Paper

multilingual translation, multilingual industries, -, multilingual model

0

0

0

0

11:38

06/12/2021

Object-Aware Regularization for Addressing Causal Confusion in Imitation Learning

Jongjin Park, Younggyo Seo, Chang Liu and
Li Zhao, Tao Qin, Jinwoo Shin, Tie-Yan Liu

Keywords Paper

reinforcement learning and planning, causality

0

0

0

0

12:12

02/02/2021

Tempered Sigmoid Activations for Deep Learning with Differential Privacy

Nicolas Papernot, Abhradeep Thakurta, Shuang Song and
Steve Chien, Úlfar Erlingsson

Keywords Paper

0

0

0

0

15:38

26/04/2020

Rapid Learning or Feature Reuse? Towards Understanding the Effectiveness of MAML

Aniruddh Raghu, Maithra Raghu, Samy Bengio, Oriol Vinyals

Keywords Paper

deep learning analysis, representation learning, meta-learning, few-shot learning

0

0

0

0

5:25

06/12/2020

Differentiable Augmentation for Data-Efficient GAN Training

Shengyu Zhao, Zhijian Liu, Ji Lin and
Jun-Yan Zhu, Song Han

Keywords Paper

0

0

0

0

3:22

14/06/2020

Recursive Least-Squares Estimator-Aided Online Learning for Visual Tracking

Jin Gao, Weiming Hu, Yan Lu

Keywords Paper

online learning, visual tracking, continual learning, recursive least-squares estimation, deep learning, memory retention, recursive learning, mini-batch sgd, normal equation, mlp layer

0

0

0

0

5:01

03/05/2021

AdamP: Slowing Down the Slowdown for Momentum Optimizers on Scale-invariant Weights

Byeongho Heo, Sanghyuk Chun, Seong Joon Oh and
Dongyoon Han, Sangdoo Yun, Gyuwan Kim, Youngjung Uh, Jung-Woo Ha

Keywords Paper

effective learning rate, normalize layer, scale-invariant weights, momentum optimizer

0

0

0

0

5:16

12/07/2020

More Information Supervised Probabilistic Deep Face Embedding Learning

Ying Huang, Shangfeng Qiu, Wenwei Zhang and
Xianghui Luo, Jinzhuo Wang

Keywords Paper

Applications - Computer Vision

0

0

0

0

12:10

06/12/2021

One Loss for All: Deep Hashing with a Single Cosine Similarity based Learning Objective

Jiun Tian Hoe, Kam Woh Ng, Tianyu Zhang and
Chee Seng Chan, Yi-Zhe Song, Tao Xiang

Keywords Paper

machine learning

0

0

0

0

11:39

06/12/2021

ReSSL: Relational Self-Supervised Learning with Weak Augmentation

Mingkai Zheng, Shan You, Fei Wang and
Chen Qian, Changshui Zhang, Xiaogang Wang, Chang Xu

Keywords Paper

self-supervised learning, contrastive learning

0

0

0

0

6:35

14/06/2020

Reusing Discriminators for Encoding: Towards Unsupervised Image-to-Image Translation

Runfa Chen, Wenbing Huang, Binghui Huang and
Fuchun Sun, Bin Fang

Keywords Paper

nice-gan, reusing discriminators for encoding, unsupervised image-to-image translation, decoupled training, multi-scale discriminators, adversarial loss, no independent component for encoding, shared layers, residual attention, cyclegan

0

0

0

0

1:01

18/07/2021

Low-Precision Reinforcement Learning: Running Soft Actor-Critic in Half Precision

Johan Björck, Xiangyu Chen, Christopher De Sa and
Carla Gomes, Kilian Weinberger

Keywords Paper

Reinforcement Learning and Planning

0

0

0

0

5:19

14/09/2020

Squeezing Correlated Neurons for Resource-Efficient Deep Neural Networks

Elbruz Ozen, Alex Orailoglu

Keywords Paper

deep learning, information redundancy, pruning

0

0

0

0

14:48

18/07/2021

Barlow Twins: Self-Supervised Learning via Redundancy Reduction

Jure Zbontar, Li Jing, Ishan Misra and
yann lecun, Stephane Deny

Keywords Paper

Algorithms, Unsupervised Learning

0

0

0

0

4:21

05/12/2020

Self-supervised learning for pairwise data refinement

Gustavo Hernandez Abrego, Bowen Liang, Wei Wang and
Zarana Parekh, Yinfei Yang, Yunhsuan Sung

Keywords Paper

0

0

0

0

15:17

12/07/2020

Aligned Cross Entropy for Non-Autoregressive Machine Translation

Marjan Ghazvininejad, Vladimir Karpukhin, Luke Zettlemoyer, Omer Levy

Keywords Paper

Applications - Language, Speech and Dialog

0

0

0

0

14:43

19/04/2021

Multilingual neural machine translation with deep encoder and multiple shallow decoders

Xiang Kong, Adithya Renduchintala, James Cross and
Yuqing Tang, Jiatao Gu, Xian Li

Keywords Paper

0

0

0

0

10:26

06/12/2020

SMYRF - Efficient Attention using Asymmetric Clustering

Giannis Daras, Nikita Kitaev, Augustus Odena, Alex Dimakis

Keywords Paper

0

0

0

0

3:28

12/07/2020

Hierarchically Decoupled Morphological Transfer

Donald Hejna, Lerrel Pinto, Pieter Abbeel

Keywords Paper

Reinforcement Learning - Deep RL

0

0

0

0

15:14

04/07/2020

Improving Massively Multilingual Neural Machine Translation and Zero-Shot Translation

Biao Zhang, Philip Williams, Ivan Titov, Rico Sennrich

Keywords Paper

Massively Translation, Zero-Shot Translation, neural translation, NMT

0

0

0

0

11:47

16/11/2020

If beam search is the answer, what was the question?

Clara Meister, Ryan Cotterell, Tim Vieira

Keywords Paper

language tasks, beam search, decoding, maximum decoding

0

0

0

0

12:18

03/05/2021

Revisiting Locally Supervised Learning: an Alternative to End-to-end Training

Yulin Wang, Zanlin Ni, Shiji Song and
Le Yang, Gao Huang

Keywords Paper

Deep learning, Locally supervised training

1

0

0

1

5:03

19/04/2021

PPT: Parsimonious parser transfer for unsupervised cross-lingual adaptation

Kemal Kurniawan, Lea Frermann, Philip Schulz, Trevor Cohn

Keywords Paper

0

0

0

0

11:52

06/12/2020

SuperLoss: A Generic Loss for Robust Curriculum Learning

Thibault Castells, Philippe Weinzaepfel, Jerome Revaud

Keywords Paper

, Probabilistic Methods -> MCMC

0

0

0

0

3:26

06/12/2021

Believe What You See: Implicit Constraint Approach for Offline Multi-Agent Reinforcement Learning

Yiqin Yang, Xiaoteng Ma, Li Chenghao and
Zewu Zheng, Qiyuan Zhang, Gao Huang, Jun Yang, Qianchuan Zhao

Keywords Paper

reinforcement learning and planning

0

0

0

0

13:36

06/12/2020

FixMatch: Simplifying Semi-Supervised Learning with Consistency and Confidence

Kihyuk Sohn, David Berthelot, Nicholas Carlini and
Zizhao Zhang, Han Zhang, Colin A Raffel, Dogus Cubuk, Alexey Kurakin, Chun-Liang Li

Keywords Paper

0

0

0

0

3:17

14/06/2020

Cross-Batch Memory for Embedding Learning

Xun Wang, Haozhi Zhang, Weilin Huang, Matthew R. Scott

Keywords Paper

embedding learning, hard mining, memory module

0

0

0

0

4:49

06/12/2020

Hard Negative Mixing for Contrastive Learning

Yannis Kalantidis, Mert Bulent Sariyildiz, Noe Pion and
Philippe Weinzaepfel, Diane Larlus

Keywords Paper

0

0

0

0

3:17

07/09/2020

Non-Probabilistic Cosine Similarity Loss for Few-Shot Image Classification

Joonhyuk Kim, Inug Yoon, Gyeong-Moon Park, Jong-Hwan Kim

Keywords Paper

few-shot learning, image classification, NPC loss

0

0

0

0

4:59