Query Distillation: BERT-based Distillation for Ensemble Ranking

08/12/2020

Query Distillation: BERT-based Distillation for Ensemble Ranking

Wangshu Zhang, Junhong Liu, Zujie Wen, Yafang Wang, Gerard de Melo

Keywords:

Abstract Paper Similar Papers

Abstract: Recent years have witnessed substantial progress in the development of neural ranking networks, but also an increasingly heavy computational burden due to growing numbers of parameters and the adoption of model ensembles. Knowledge Distillation (KD) is a common solution to balance the effectiveness and efficiency. However, it is not straightforward to apply KD to ranking problems. Ranking Distillation (RD) has been proposed to address this issue, but only shows effectiveness on recommendation tasks. We present a novel two-stage distillation method for ranking problems that allows a smaller student model to be trained while benefitting from the better performance of the teacher model, providing better control of the inference latency and computational burden. We design a novel BERT-based ranking model structure for list-wise ranking to serve as our student model. All ranking candidates are fed to the BERT model simultaneously, such that the self-attention mechanism can enable joint inference to rank the document list. Our experiments confirm the advantages of our method, not just with regard to the inference latency but also in terms of higher-quality rankings compared to the original teacher model.

The video of this talk cannot be embedded. You can watch it here:

https://underline.io/lecture/6113-query-distillation-bert-based-distillation-for-ensemble-ranking

(Link will open in new window)

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at COLING Workshops 2020 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

26/04/2020

BlockSwap: Fisher-guided Block Substitution for Network Compression on a Budget

Jack Turner, Elliot J. Crowley, Michael O'Boyle and
Amos Storkey, Gavin Gray

Keywords Paper

model compression, architecture search, efficiency, budget, convolutional neural networks

0

0

0

0

5:09

19/08/2021

Object Detection in Densely Packed Scenes via Semi-Supervised Learning with Dual Consistency

Chao Ye, Huaidong Zhang, Xuemiao Xu and
Weiwei Cai, Jing Qin, Kup-Sze Choi

Keywords Paper

Computer Vision, Recognition, Deep Learning, Semi-Supervised Learning

0

0

0

0

10:19

30/11/2020

Online Knowledge Distillation via Multi-branch Diversity Enhancement

Zheng Li, Ying Huang, Defang Chen and
Tianren Luo, Ning Cai, Zhigeng Pan

Keywords Paper

0

0

0

0

5:36

03/05/2021

SEED: Self-supervised Distillation For Visual Representation

Jacob Zhiyuan Fang, Jianfeng Wang, Lijuan Wang and
Lei Zhang, 'YZ' Yezhou Yang, Zicheng Liu

Keywords Paper

Representation Learning, Self Supervised Learning, Knowledge Distillation

0

0

0

0

5:09

30/11/2020

Introspective Learning by Distilling Knowledge from Online Self-explanation

Jindong Gu, Zhiliang Wu, Volker Tresp

Keywords Paper

0

0

0

0

10:18

22/11/2021

Adaptive Distillation: Aggregating Knowledge from Multiple Paths for Efficient Distillation

Sumanth Chennupati, Mohammad Mahdi Kamani, Zhongwei Cheng, Lin Chen

Keywords Paper

Knowledge Distillation, Multitask Learning, Model Compression, Adaptive Distillation, Efficient Training

0

0

0

0

3:07

02/02/2021

Stochastic Precision Ensemble: Self-Knowledge Distillation for Quantized Deep Neural Networks

Yoonho Boo, Sungho Shin, Jungwook Choi, Wonyong Sung

Keywords Paper

0

0

0

0

19:03

19/04/2021

Annealing knowledge distillation

Aref Jafari, Mehdi Rezagholizadeh, Pranav Sharma, Ali Ghodsi

Keywords Paper

0

0

0

0

12:38

02/02/2021

Collaborative Group Learning

Shaoxiong Feng, Hongshen Chen, Xuancheng Ren and
Zhuoye Ding, Kan Li, Xu Sun

Keywords Paper

0

0

0

0

17:58

13/04/2021

RankDistil: Knowledge distillation for ranking

Sashank Reddi, Rama Kumar Pasumarthi, Aditya Menon and
Ankit Singh Rawat, Felix Yu, Seungyeon Kim, Andreas Veit, Sanjiv Kumar

Keywords Paper

0

0

0

0

2:58

18/07/2021

AlphaNet: Improved Training of Supernets with Alpha-Divergence

Dilin Wang, Chengyue Gong, Meng Li and
Qiang Liu, Vikas Chandra

Keywords Paper

Deep Learning, Architectures

0

0

0

0

16:14

14/06/2020

Few Sample Knowledge Distillation for Efficient Network Compression

Tianhong Li, Jianguo Li, Zhuang Liu, Changshui Zhang

Keywords Paper

efficient network compression, few samples, knowledge distillation

0

0

0

0

1:01

06/12/2021

Unsupervised Representation Transfer for Small Networks: I Believe I Can Distill On-the-Fly

Hee Min Choi, Hyoa Kang, Dokwan Oh

Keywords Paper

self-supervised learning, representation learning

0

0

0

0

3:35

02/02/2021

Teacher Guided Neural Architecture Search for Face Recognition

Xiaobo Wang

Keywords Paper

0

0

0

0

13:54

14/06/2020

Creating Something From Nothing: Unsupervised Knowledge Distillation for Cross-Modal Hashing

Hengtong Hu, Lingxi Xie, Richang Hong, Qi Tian

Keywords Paper

cross-model retrieval, cross-model hashing, unsupervised learning, knowledge distillation, similarity calculation, teacher-student network, supervised learning, unsupervised model guiding supervised model, exploiting information from output features, achieving sota performance

0

0

0

0

1:00

18/07/2021

Zero-Shot Knowledge Distillation from a Decision-Based Black-Box Model

Zi Wang

Keywords Paper

Deep Learning

0

0

0

0

5:08

03/05/2021

MixKD: Towards Efficient Distillation of Large-scale Language Models

Kevin Liang, Weituo Hao, Dinghan Shen and
Yufan Zhou, Weizhu Chen, Changyou Chen, Lawrence Carin

Keywords Paper

Representation Learning, Natural Language Processing

0

0

0

0

3:52

18/07/2021

On Learnability via Gradient Method for Two-Layer ReLU Neural Networks in Teacher-Student Setting

Shunta Akiyama, Taiji Suzuki

Keywords Paper

Theory, Deep learning Theory

0

0

0

0

5:17

06/12/2020

Self-Distillation as Instance-Specific Label Smoothing

Zhilu Zhang, Mert Sabuncu

Keywords Paper

0

0

0

0

3:09

16/11/2020

Dialogue Distillation: Open-Domain Dialogue Augmentation Using Unpaired Data

Rongsheng Zhang, Yinhe Zheng, Jianzhi Shao and
Xiaoxi Mao, Yadong Xi, Minlie Huang

Keywords Paper

collecting data, automatic evaluation, open-domain systems, neural models

0

0

0

0

11:01

06/12/2021

Iterative Teacher-Aware Learning

Luyao Yuan, Dongruo Zhou, Junhong Shen and
Jingdong Gao, Jeffrey L Chen, Quanquan Gu, Ying Nian Wu, Song-Chun Zhu

Keywords Paper

theory, optimization, reinforcement learning and planning, machine learning

0

0

0

0

6:40

02/02/2021

Towards Faster Deep Collaborative Filtering via Hierarchical Decision Networks

Yu Chen, Sinno Jialin Pan

Keywords Paper

0

0

0

0

15:28

22/11/2021

Semi-Online Knowledge Distillation

Zhiqiang Liu, Yanxia Liu, Chengkai Huang

Keywords Paper

Knowledge Distillation, Model Compression

0

0

0

0

3:00

02/02/2021

Robust Knowledge Transfer via Hybrid Forward on the Teacher-Student Model

Liangchen Song, Jialian Wu, Ming Yang and
Qian Zhang, Yuan Li, Junsong Yuan

Keywords Paper

0

0

0

0

16:09

14/06/2020

Online Knowledge Distillation via Collaborative Learning

Qiushan Guo, Xinjiang Wang, Yichao Wu and
Zhipeng Yu, Ding Liang, Xiaolin Hu, Ping Luo

Keywords Paper

knowledge distillation, collaborative learning, transfer learning, deep neural network

0

0

0

0

4:37

14/06/2020

Neural Networks Are More Productive Teachers Than Human Raters: Active Mixup for Data-Efficient Knowledge Distillation From a Blackbox Model

Dongdong Wang, Yandong Li, Liqiang Wang, Boqing Gong

Keywords Paper

blackbox knowledge distillation, data-efficient learning, active learning, mixup

0

0

0

0

4:59

02/02/2021

ALP-KD: Attention-Based Layer Projection for Knowledge Distillation

Peyman Passban, Yimeng Wu, Mehdi Rezagholizadeh, Qun Liu

Keywords Paper

0

0

0

0

18:53

03/05/2021

Knowledge Distillation as Semiparametric Inference

Tri Dao, Govinda Kamath, Vasilis Syrgkanis, Lester Mackey

Keywords Paper

generalization bounds, knowledge distillation, model compression, loss correction, orthogonal machine learning, cross-fitting, semiparametric inference

0

0

0

0

5:10

06/12/2021

No Fear of Heterogeneity: Classifier Calibration for Federated Learning with Non-IID Data

Mi Luo, Fei Chen, Dapeng Hu and
Yifan Zhang, Jian Liang, Jiashi Feng

Keywords Paper

optimization, machine learning, federated learning

0

0

0

0

3:27

02/02/2021

Learning to Augment for Data-scarce Domain BERT Knowledge Distillation

Lingyun Feng, Minghui Qiu, Yaliang Li and
Hai-Tao Zheng, Ying Shen

Keywords Paper

0

0

0

0

17:11

08/12/2020

Dynamic Curriculum Learning for Low-Resource Neural Machine Translation

Chen Xu, Bojie Hu, Yufan Jiang and
Kai Feng, Zeyang Wang, Shen Huang, Qi Ju, Tong Xiao, Jingbo Zhu

Keywords Paper

0

0

0

0

13:28

02/02/2021

DeepCollaboration: Collaborative Generative and Discriminative Models for Class Incremental Learning

Bo Cui, Guyue Hu, Shan Yu

Keywords Paper

0

0

0

0

15:13

12/07/2020

Student-Teacher Curriculum Learning via Reinforcement Learning: Predicting Hospital Inpatient Admission Location

Rasheed El-Bouri, David Eyre, Peter Watkinson and
Tingting Zhu, David Clifton

Keywords Paper

Applications - Neuroscience, Cognitive Science, Biology and Health

0

0

0

0

15:22

16/11/2020

BERT-EMD: Many-to-Many Layer Mapping for BERT Compression with Earth Mover's Distance

Jianquan Li, Xiaokang Liu, Honghong Zhao and
Ruifeng Xu, Min Yang, Yaohong Jin

Keywords Paper

natural tasks, nlp tasks, matching, many-to-many mapping

0

0

0

0

11:58

03/05/2021

Knowledge distillation via softmax regression representation learning

Jing Yang, Brais Martinez, Adrian Bulat, Georgios Tzimiropoulos

Keywords Paper

0

0

0

0

4:56

02/02/2021

Data-Free Knowledge Distillation with Soft Targeted Transfer Set Synthesis

Zi Wang

Keywords Paper

0

0

0

0

14:19

13/04/2021

Understanding robustness in teacher-student setting: A new perspective

Zhuolin Yang, Zhaoxi Chen, Tiffany Cai and
Xinyun Chen, Bo Li, Yuandong Tian

Keywords Paper

0

0

0

0

3:03

12/07/2020

Optimizing Data Usage via Differentiable Rewards

Xinyi Wang, Hieu Pham, Paul Michel and
Antonios Anastasopoulos, Jaime Carbonell, Graham Neubig

Keywords Paper

Deep Learning - Algorithms

0

0

0

0

12:53

19/08/2021

Comparing Kullback-Leibler Divergence and Mean Squared Error Loss in Knowledge Distillation

Taehyeon Kim, Jaehoon Oh, Nak Yil Kim and
Sangwook Cho, Se-Young Yun

Keywords Paper

Machine Learning, Classification, Deep Learning

0

0

0

0

12:43

05/01/2021

Real-Time Uncertainty Estimation in Computer Vision via Uncertainty-Aware Distribution Distillation

Yichen Shen, Zhilu Zhang, Mert R. Sabuncu, Lin Sun

Keywords Paper

0

0

0

0

5:01