Towards Reasonably-Sized Character-Level Transformer NMT by Finetuning Subword Systems

16/11/2020

Towards Reasonably-Sized Character-Level Transformer NMT by Finetuning Subword Systems

Jindřich Libovický, Alexander Fraser

Keywords: transformer architecture, segmentation, subword model, neural model

Abstract Paper Similar Papers

Abstract: Applying the Transformer architecture on the character level usually requires very deep architectures that are difficult and slow to train. These problems can be partially overcome by incorporating a segmentation into tokens in the model. We show that by initially training a subword model and then finetuning it on characters, we can obtain a neural machine translation model that works at the character level without requiring token segmentation. We use only the vanilla 6-layer Transformer Base architecture. Our character-level models better capture morphological phenomena and show more robustness to noise at the expense of somewhat worse overall translation quality. Our study is a significant step towards high-performance and easy to train character-based models that are not extremely large.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at EMNLP 2020 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

05/01/2021

Weakly-Supervised Object Representation Learning for Few-Shot Semantic Segmentation

Xiaowen Ying, Xin Li, Mooi Choo Chuah

Keywords Paper

0

0

0

0

5:00

26/08/2020

How fine can fine-tuning be? Learning efficient language models

Evani Radiya-Dixit, Xin Wang

Keywords Paper

0

0

0

0

13:05

14/06/2020

Robust Object Detection Under Occlusion With Context-Aware CompositionalNets

Angtian Wang, Yihong Sun, Adam Kortylewski, Alan L. Yuille

Keywords Paper

object detection, partial occlusion, compositional models, analysis by synthesis, out of distribution, robustness

0

0

0

0

1:01

05/01/2021

Multimodal Prototypical Networks for Few-Shot Learning

Frederik Pahde, Mihai Puscas, Tassilo Klein, Moin Nabi

Keywords Paper

0

0

0

0

4:56

30/11/2020

Few-Shot Zero-Shot Learning: Knowledge Transfer with Less Supervision

Nanyi Fei, Jiechao Guan, Zhiwu Lu, Yizhao Gao

Keywords Paper

0

0

0

0

7:37

29/06/2020

Embedding java classes with Code2vec: Improvements from variable obfuscation

Rhys Compton, Eibe Frank, Panos Patros, Abigail Koay

Keywords Paper

code2vec, machine learning, code obfuscation, source code, neural networks

0

0

0

0

14:20

18/07/2021

Straight to the Gradient: Learning to Use Novel Tokens for Neural Text Generation

Xiang Lin, Simeng Han, Shafiq Joty

Keywords Paper

Applications, Natural Language Processing

0

0

0

0

16:00

14/06/2020

A Disentangling Invertible Interpretation Network for Explaining Latent Representations

Patrick Esser, Robin Rombach, Björn Ommer

Keywords Paper

interpretability, inn, disentangling, generative models, invertible neural networks, autoencoders, normalizing flows, vae, explainable, xai

0

0

0

0

1:01

06/12/2021

Efficient Training of Visual Transformers with Small Datasets

Yahui Liu, Enver Sangineto, Wei Bi and
Nicu Sebe, Bruno Lepri, Marco Nadai

Keywords Paper

robustness, transformers, vision

0

0

0

0

8:23

14/06/2020

RL-CycleGAN: Reinforcement Learning Aware Simulation-to-Real

Kanishka Rao, Chris Harris, Alex Irpan and
Sergey Levine, Julian Ibarz, Mohi Khansari

Keywords Paper

robotics, sim2real, cyclegan, reinforcement learning, grasping, q-learning

0

0

0

0

4:55

03/05/2021

Revisiting Locally Supervised Learning: an Alternative to End-to-end Training

Yulin Wang, Zanlin Ni, Shiji Song and
Le Yang, Gao Huang

Keywords Paper

Deep learning, Locally supervised training

1

0

0

1

5:03

06/12/2020

Towards Learning Convolutions from Scratch

Behnam Neyshabur

Keywords Paper

0

0

0

0

3:21

19/08/2021

Local Representation is Not Enough: Soft Point-Wise Transformer for Descriptor and Detector of Local Features

Zihao Wang, Xueyi Li, Zhen Li

Keywords Paper

Computer Vision, 2D and 3D Computer Vision, Recognition

0

0

0

0

14:56

16/11/2020

Self-Supervised Object-in-Gripper Segmentation from Robotic Motions

Wout Boerdijk, Martin Sundermeyer, Maximilian Durner, Rudolph Triebel

Keywords Paper

0

0

0

0

5:03

14/06/2020

Modeling the Background for Incremental Learning in Semantic Segmentation

Fabio Cermelli, Massimiliano Mancini, Samuel Rota Bulò and
Elisa Ricci, Barbara Caputo

Keywords Paper

incremental, learning, semantic, segmentation, continual, catastrophic, forgetting, scene, parsing

0

0

0

0

1:01

02/02/2021

LREN: Low-Rank Embedded Network for Sample-Free Hyperspectral Anomaly Detection

Kai Jiang, Weiying Xie, Jie Lei and
Tao Jiang, Yunsong Li

Keywords Paper

0

0

0

0

12:56

14/06/2020

HRank: Filter Pruning Using High-Rank Feature Map

Mingbao Lin, Rongrong Ji, Yan Wang and
Yichen Zhang, Baochang Zhang, Yonghong Tian, Ling Shao

Keywords Paper

network pruning, neural network compression and acceleration, high-rank feature map, efficient deep learning computing

0

0

0

0

4:57

06/12/2021

Few-Shot Object Detection via Association and DIscrimination

Yuhang Cao, Jiaqi Wang, Ying Jin and
Tong Wu, Kai Chen, Ziwei Liu, Dahua Lin

Keywords Paper

deep learning, machine learning, vision

0

0

0

0

10:31

02/02/2021

Adversarial Turing Patterns from Cellular Automata

Nurislam Tursynbek, Ilya Vilkoviskiy, Maria Sindeeva, Ivan Oseledets

Keywords Paper

0

0

0

0

14:50

02/02/2021

Looking Wider for Better Adaptive Representation in Few-Shot Learning

Jiabao Zhao, Yifan Yang, Xin Lin and
Jing Yang, Liang He

Keywords Paper

0

0

0

0

16:58

23/06/2021

Logical Bytecode Reduction

Christian Gram Kalhauge, Jens Palsberg

Keywords Paper

input reduction, type-safe code transformation

0

0

0

0

19:40

06/12/2020

CodeCMR: Cross-Modal Retrieval For Function-Level Binary Source Code Matching

Zeping Yu, Wenxin Zheng, Jiaqi Wang and
Qiyi Tang, Sen Nie, Shi Wu

Keywords Paper

0

0

0

0

3:00

06/12/2020

Modular Meta-Learning with Shrinkage

Yutian Chen, Abe Friesen, Feryal Behbahani and
Arnaud Doucet, David Budden, Matthew Hoffman, Nando de Freitas

Keywords Paper

0

0

0

0

3:21

02/02/2021

Token-Aware Virtual Adversarial Training in Natural Language Understanding

Linyang Li, Xipeng Qiu

Keywords Paper

0

0

0

0

12:49

06/12/2021

Continual Learning via Local Module Composition

Oleksiy Ostapenko, Pau Rodriguez, Massimo Caccia, Laurent Charlin

Keywords Paper

continual learning, transfer learning

1

0

0

1

14:32

03/05/2021

Learning N:M Fine-grained Structured Sparse Neural Networks From Scratch

Aojun Zhou, Yukun Ma, Junnan Zhu and
Jianbo Liu, Zhijie Zhang, Kun Yuan, Wenxiu Sun, Hongsheng Li

Keywords Paper

sparsity, efficient training and inference.

0

0

0

0

5:09

19/08/2021

Fast Multi-label Learning

Xiuwen Gong, Dong Yuan, Wei Bao

Keywords Paper

Machine Learning, Multi-instance; Multi-label; Multi-view learning

0

0

0

0

15:18

19/08/2021

Few-Shot Partial-Label Learning

Yunfeng Zhao, Guoxian Yu, Lei Liu and
Zhongmin Yan, Lizhen Cui, Carlotta Domeniconi

Keywords Paper

Machine Learning, Multi-instance; Multi-label; Multi-view learning, Transfer, Adaptation, Multi-task Learning, Weakly Supervised Learning

0

0

0

0

14:12

06/12/2021

Shapeshifter: a Parameter-efficient Transformer using Factorized Reshaped Matrices

Aliakbar Panahi, Seyran Saeedi, Tom Arodz

Keywords Paper

transformers

0

0

0

0

13:06

03/05/2021

Fuzzy Tiling Activations: A Simple Approach to Learning Sparse Representations Online

Yangchen Pan, Kirby Banman, Martha White

Keywords Paper

natural sparsity, Reinforcement learning, fuzzy tiling activation function, sparse representation

0

0

0

1

6:22

03/05/2021

BRECQ: Pushing the Limit of Post-Training Quantization by Block Reconstruction

Yuhang Li, Ruihao Gong, Xu Tan and
Yang Yang, Peng Hu, Qi Zhang, fengwei yu, Wei Wang, Shi Gu

Keywords Paper

Second-order analysis, Mixed Precision, Post Training Quantization

0

0

0

0

4:36

14/06/2020

Overcoming Classifier Imbalance for Long-Tail Object Detection With Balanced Group Softmax

Yu Li, Tao Wang, Bingyi Kang and
Sheng Tang, Chunfeng Wang, Jintao Li, Jiashi Feng

Keywords Paper

object detection, long-tail, lvis, weight norm, classifier imbalance, balanced group softmax, bags, instance segmentation

0

0

0

0

4:57

22/11/2021

Few-shot Semantic Segmentation with Self-supervision from Pseudo-classes

Yiwen Li, Gratianus Wesley Putra Data, Yunguan Fu and
Yipeng Hu, Victor Adrian Prisacariu

Keywords Paper

few-shot semantic segmentation, self-supervision

0

0

0

0

2:46

03/05/2021

Neural Pruning via Growing Regularization

Huan Wang, Can Qin, Yulun Zhang, Yun Fu

Keywords Paper

deep neural network pruning, regularization, Hessian matrix, model compression

0

0

0

0

6:15

14/06/2020

NAS-FCOS: Fast Neural Architecture Search for Object Detection

Ning Wang, Yang Gao, Hao Chen and
Peng Wang, Zhi Tian, Chunhua Shen, Yanning Zhang

Keywords Paper

neural architecture search, object detection

0

0

0

0

1:00

06/12/2021

LLC: Accurate, Multi-purpose Learnt Low-dimensional Binary Codes

Aditya Kusupati, Matthew Wallingford, Vivek Ramanujan and
Raghav Somani, Jae Sung Park, Krishna Pillutla, Prateek Jain, Sham Kakade, Ali Farhadi

Keywords Paper

machine learning, representation learning

0

0

0

0

15:05

02/02/2021

Train a One-Million-Way Instance Classifier for Unsupervised Visual Representation Learning

Yu Liu, Lianghua Huang, Pan Pan and
Bin Wang, Yinghui Xu, Rong Jin

Keywords Paper

0

0

0

0

15:15

06/12/2020

Does Unsupervised Architecture Representation Learning Help Neural Architecture Search?

Shen Yan, Yu Zheng, Wei Ao and
Xiao Zeng, Mi Zhang

Keywords Paper

0

0

0

0

3:13

06/12/2021

Learning Compact Representations of Neural Networks using DiscriminAtive Masking (DAM)

Jie Bu, Arka Daw, M. Maruf, Anuj Karpatne

Keywords Paper

deep learning, machine learning, vision, graph learning, representation learning

0

0

0

0

13:59

02/02/2021

Instance Mining with Class Feature Banks for Weakly Supervised Object Detection

Yufei Yin, Jiajun Deng, Wengang Zhou, Houqiang Li

Keywords Paper

0

0

0

0

14:57