Distill, Adapt, Distill: Training Small, In-Domain Models for Neural Machine Translation

01/07/2020

Distill, Adapt, Distill: Training Small, In-Domain Models for Neural Machine Translation

Mitchell Gordon, Kevin Duh

Keywords:

Abstract Paper Similar Papers

Abstract: We explore best practices for training small, memory efficient machine translation models with sequence-level knowledge distillation in the domain adaptation setting. While both domain adaptation and knowledge distillation are widely-used, their interaction remains little understood. Our large-scale empirical results in machine translation (on three language pairs with three domains each) suggest distilling twice for best performance: once using general-domain data and again using in-domain data with an adapted teacher.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at ACL Workshops virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

06/12/2021

Think Big, Teach Small: Do Language Models Distil Occam’s Razor?

Gonzalo Jaimovitch-Lopez, David Castellano Falcón, Cesar Ferri, José Hernández-Orallo

Keywords Paper

machine learning, interpretability, few shot learning

0

0

0

0

12:12

06/12/2020

Learning Sparse Prototypes for Text Generation

Junxian He, Taylor Berg-Kirkpatrick, Graham Neubig

Keywords Paper

0

0

0

0

3:22

16/11/2020

Sequence-Level Mixed Sample Data Augmentation

Demi Guo, Yoon Kim, Alexander Rush

Keywords Paper

sequence-to-sequence problems, scan, semantic parsing, neural networks

0

0

0

0

5:58

19/04/2021

Active learning for sequence tagging with deep pre-trained models and Bayesian uncertainty estimates

Artem Shelmanov, Dmitri Puzyrev, Lyubov Kupriyanova and
Denis Belyakov, Daniil Larionov, Nikita Khromov, Olga Kozlova, Ekaterina Artemova, Dmitry V. Dylov, Alexander Panchenko

Keywords Paper

0

0

0

0

11:47

16/11/2020

Self-Paced Learning for Neural Machine Translation

Yu Wan, Baosong Yang, Derek F. Wong and
Yikai Zhou, Lidia S. Chao, Haibo Zhang, Boxing Chen

Keywords Paper

neural, curriculum learning, translation tasks, nmt

0

0

0

0

6:03

18/07/2021

Self-supervised and Supervised Joint Training for Resource-rich Machine Translation

Yong Cheng, Wei Wang, Lu Jiang, Wolfgang Macherey

Keywords Paper

Applications, Natural Language Processing

0

0

0

0

5:21

02/02/2021

Meta-Curriculum Learning for Domain Adaptation in Neural Machine Translation

Runzhe Zhan, Xuebo Liu, Derek F. Wong, Lidia S. Chao

Keywords Paper

0

0

0

0

14:20

04/07/2020

Jointly Masked Sequence-to-Sequence Model for Non-Autoregressive Neural Machine Translation

Junliang Guo, Linli Xu, Enhong Chen

Keywords Paper

Non-Autoregressive Translation, natural tasks, non-autoregressive translation~(NAT, non-autoregressive

0

0

0

0

10:47

04/07/2020

Multi-source Meta Transfer for Low Resource Multiple-Choice Question Answering

Ming Yan, Hao Zhang, Di Jin, Joey Tianyi Zhou

Keywords Paper

Multi-source Transfer, Low Answering, Multiple-choice answering, machine comprehension

0

0

0

0

7:40

16/11/2020

TeaForN: Teacher-Forcing with N-grams

Sebastian Goodman, Nan Ding, Radu Soricut

Keywords Paper

machine benchmark, news benchmarks, sequence models, teacher-forcing

0

0

0

0

12:02

02/02/2021

Future-Guided Incremental Transformer for Simultaneous Translation

Shaolei Zhang, Yang Feng, Liangyou Li

Keywords Paper

0

0

0

0

14:44

16/11/2020

Iterative Domain-Repaired Back-Translation

Hao-Ran Wei, Zhirui Zhang, Boxing Chen, Weihua Luo

Keywords Paper

domain-specific translation, domain adaptation, back-translation method, out-of-domain systems

0

0

0

0

11:35

02/02/2021

Semi-supervised Sequence Classification through Change Point Detection

Nauman Ahad, Mark A. Davenport

Keywords Paper

0

0

0

0

14:21

04/07/2020

Deep Contextualized Self-training for Low Resource Dependency Parsing

Guy Rotman, Roi Reichart

Keywords Paper

Low Parsing, sequence tasks, Deep Self-training, Neural parsing

0

0

0

0

11:41

02/02/2021

Adaptive Teaching of Temporal Logic Formulas to Preference-based Learners

Zhe Xu, Yuxin Chen, Ufuk Topcu

Keywords Paper

0

0

0

0

19:42

12/07/2020

Unsupervised Transfer Learning for Spatiotemporal Predictive Networks

Zhiyu Yao, Yunbo Wang, Mingsheng Long, Jianmin Wang

Keywords Paper

Sequential, Network, and Time-Series Modeling

0

0

0

0

15:19

05/12/2020

Self-supervised learning for pairwise data refinement

Gustavo Hernandez Abrego, Bowen Liang, Wei Wang and
Zarana Parekh, Yinfei Yang, Yunhsuan Sung

Keywords Paper

0

0

0

0

15:17

16/11/2020

Simulated multiple reference training improves low-resource machine translation

Huda Khayrallah, Brian Thompson, Matt Post, Philipp Koehn

Keywords Paper

machine mt, mt, simulated training, simulated

0

0

0

0

6:56

30/11/2020

Large-Scale Cross-Domain Few-Shot Learning

Jiechao Guan, Manli Zhang, Zhiwu Lu

Keywords Paper

0

0

0

0

7:26

16/11/2020

Zero-Shot Cross-Lingual Transfer with Meta Learning

Farhad Nooralahzadeh, Giannis Bekoulis, Johannes Bjerva, Isabelle Augenstein

Keywords Paper

strategic knowledge, downstream task, multilingual applications, natural tasks

0

0

0

0

11:42

04/07/2020

Learning a Multi-Domain Curriculum for Neural Machine Translation

Wei Wang, Ye Tian, Jiquan Ngiam and
Yinfei Yang, Isaac Caswell, Zarana Parekh

Keywords Paper

Neural Translation, data selection, machine translation, multi-domain curriculum

0

0

0

0

11:44

08/12/2020

Dynamic Curriculum Learning for Low-Resource Neural Machine Translation

Chen Xu, Bojie Hu, Yufan Jiang and
Kai Feng, Zeyang Wang, Shen Huang, Qi Ju, Tong Xiao, Jingbo Zhu

Keywords Paper

0

0

0

0

13:28

02/02/2021

LRSC: Learning Representations for Subspace Clustering

Changsheng Li, Chen Yang, Bo Liu and
Ye Yuan, Guoren Wang

Keywords Paper

0

0

0

0

15:09

26/04/2020

Synthesizing Programmatic Policies that Inductively Generalize

Jeevana Priya Inala, Osbert Bastani, Zenna Tavares, Armando Solar-Lezama

Keywords Paper

Program synthesis, reinforcement learning, inductive generalization

0

0

0

0

4:42

08/12/2020

Fine-tuning BERT for Low-Resource Natural Language Understanding via Active Learning

Daniel Grießhaber, Johannes Maucher, Ngoc Thang Vu

Keywords Paper

0

0

0

0

11:06

08/12/2020

Optimizing Transformer for Low-Resource Neural Machine Translation

Ali Araabi, Christof Monz

Keywords Paper

0

0

0

0

10:02

16/11/2020

Domain Adaptation of Thai Word Segmentation Models using Stacked Ensemble

Peerat Limkonchotiwat, Wannaphong Phatthiyaphaibun, Raheem Sarwar and
Ekapol Chuangsuwanich, Sarana Nutanong

Keywords Paper

natural tasks, thai segmentation, transfer learning, filter-and-refine solution

0

0

0

0

6:28

19/04/2021

Keep learning: Self-supervised meta-learning for learning from inference

Akhil Kedia, Sai Chetan Chinthakindi

Keywords Paper

0

0

0

0

11:27

18/07/2021

Training Data Subset Selection for Regression with Controlled Generalization Error

Durga S, Rishabh Iyer, Ganesh Ramakrishnan, Abir De

Keywords Paper

, Algorithms, Online Learning, Algorithms, Supervised Learning

0

0

0

0

4:15

16/11/2020

Efficient Meta Lifelong-Learning with Limited Memory

Zirui Wang, Sanket Vaibhav Mehta, Barnabas Poczos, Jaime Carbonell

Keywords Paper

lifelong learning, local adaptation, text benchmarks, multi-task learning

0

0

0

0

12:03

04/07/2020

Paraphrase Generation by Learning How to Edit from Samples

Amirhossein Kazemnejad, Mohammadreza Salehi, Mahdieh Soleymani Baghshah

Keywords Paper

Paraphrase Generation, Neural sequence, sequence generation, retrieval-based method

0

0

0

0

12:20

04/07/2020

Learning to Customize Model Structures for Few-shot Dialogue Generation Tasks

Yiping Song, Zequn Liu, Wei Bi and
Rui Yan, Ming Zhang

Keywords Paper

Few-shot Tasks, open-domain systems, generative models, meta-learning framework

0

0

0

0

11:43

16/11/2020

An Imitation Game for Learning Semantic Parsers from User Interaction

Ziyu Yao, Yiqi Tang, Wen-tau Yih and
Huan Sun, Yu Su

Keywords Paper

bootstrapping, fine-tuning parsers, theoretical analysis, text-to-sql problem

0

0

0

0

11:49

22/11/2021

Few-shot Action Recognition with Prototype-centered Attentive Learning

Xiatian Zhu, Antoine S Toisoul, Juan-Manuel Perez-Rua and
Li Zhang, Brais Martinez, Tao Xiang

Keywords Paper

Few-shot learning, Video recognition, Action classification, Small training data, Model pre-training, Meta-learning, Transformer, Self-attention learning, Cross-attention learning, Prototype learning, Prototype-centered learning, Hybrid-attention learning

0

0

0

0

2:22

26/08/2020

Data Generation for Neural Programming by Example

Judith Clymo, Adria Gascon, Brooks Paige and
Nathanael Fijalkow, Haik Manukian

Keywords Paper

0

0

0

0

14:31

14/06/2020

Meshed-Memory Transformer for Image Captioning

Marcella Cornia, Matteo Stefanini, Lorenzo Baraldi, Rita Cucchiara

Keywords Paper

transformer, image captioning, vision and language, fully-attentive models, mesh connectivity, memory vectors, self-attention

0

0

0

0

1:00

16/11/2020

Self-Supervised Meta-Learning for Few-Shot Natural Language Classification Tasks

Trapit Bansal, Rishikesh Jha, Tsendsuren Munkhdalai, Andrew McCallum

Keywords Paper

nlp applications, fine-tuning, meta-learning problem, supervised tasks

0

0

0

0

11:49

14/06/2020

TransMatch: A Transfer-Learning Scheme for Semi-Supervised Few-Shot Learning

Zhongjie Yu, Lin Chen, Zhongwei Cheng, Jiebo Luo

Keywords Paper

few-shot learning, semi-supervised learning, meta-learning

0

0

0

0

1:01

06/12/2020

What is being transferred in transfer learning?

Behnam Neyshabur, Hanie Sedghi, Chiyuan Zhang

Keywords Paper

0

0

0

0

3:20

18/07/2021

Provable Meta-Learning of Linear Representations

Nilesh Tripuraneni, Chi Jin, Michael Jordan

Keywords Paper

Theory, Statistical Learning Theory

0

0

0

0

5:09