Fast Multi-Resolution Transformer Fine-tuning for Extreme Multi-label Text Classification

06/12/2021

Fast Multi-Resolution Transformer Fine-tuning for Extreme Multi-label Text Classification

Jiong Zhang, Wei-Cheng Chang, Hsiang-Fu Yu, Inderjit Dhillon

Keywords: machine learning, transformers

Abstract Paper Similar Papers

Abstract: Extreme multi-label text classification~(XMC) seeks to find relevant labels from an extreme large label collection for a given text input. Many real-world applications can be formulated as XMC problems, such as recommendation systems, document tagging and semantic search. Recently, transformer based XMC methods, such as X-Transformer and LightXML, have shown significant improvement over other XMC methods. Despite leveraging pre-trained transformer models for text representation, the fine-tuning procedure of transformer models on large label space still has lengthy computational time even with powerful GPUs. In this paper, we propose a novel recursive approach, XR-Transformer to accelerate the procedure through recursively fine-tuning transformer models on a series of multi-resolution objectives related to the original XMC objective function. Empirical results show that XR-Transformer takes significantly less training time compared to other transformer-based XMC models while yielding better state-of-the-art results. In particular, on the public Amazon-3M dataset with 3 million labels, XR-Transformer is not only 20x faster than X-Transformer but also improves the Precision@1 from 51% to 54%.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at NeurIPS 2021 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

06/12/2021

Long-Short Transformer: Efficient Transformers for Language and Vision

Chen Zhu, Wei Ping, Chaowei Xiao and
Mohammad Shoeybi, Tom Goldstein, Anima Anandkumar, Bryan Catanzaro

Keywords Paper

machine learning, transformers

0

0

0

0

11:44

02/02/2021

TRQ: Ternary Neural Networks With Residual Quantization

Yue Li, Wenrui Ding, Chunlei Liu and
Baochang Zhang, Guodong Guo

Keywords Paper

0

0

0

0

15:21

02/02/2021

Robust PDF Document Conversion using Recurrent Neural Networks

Nikolaos Livathinos, Cesar Berrospi, Maksym Lysak and
Viktor Kuropiatnyk, Ahmed Nassar, Andre Carvalho, Michele Dolfi, Christoph Auer, Kasper Dinkla, Peter Staar

Keywords Paper

0

0

0

0

20:33

14/06/2020

ABCNet: Real-Time Scene Text Spotting With Adaptive Bezier-Curve Network

Yuliang Liu, Hao Chen, Chunhua Shen and
Tong He, Lianwen Jin, Liangwei Wang

Keywords Paper

bezier curve, scene text, end-to-end, detection, recognition, arbitrarily shaped, one stage, align, sampling, deep neural network

0

0

0

0

5:01

23/08/2020

Taming pretrained transformers for extreme multi-label text classification

Wei-Cheng Chang, Hsiang-Fu Yu, Kai Zhong and
Yiming Yang, Inderjit S. Dhillon

Keywords Paper

extreme multi-label text classification, transformer models

0

0

0

1

2:23

06/12/2021

NxMTransformer: Semi-Structured Sparsification for Natural Language Understanding via ADMM

Connor Holmes, Minjia Zhang, Yuxiong He, Bo Wu

Keywords Paper

optimization, transformers, language

0

0

0

0

10:53

26/08/2020

Improving Maximum Likelihood Training for Text Generation with Density Ratio Estimation

Yuxuan Song, Ning Miao, Hao Zhou and
Lantao Yu, Mingxuan Wang, Lei Li

Keywords Paper

0

0

0

0

12:32

06/12/2021

Searching for Efficient Transformers for Language Modeling

David So, Wojciech Mańke, Hanxiao Liu and
Zihang Dai, Noam Shazeer, Quoc V Le

Keywords Paper

transformers, language

0

0

0

0

13:29

06/12/2020

LAPAR: Linearly-Assembled Pixel-Adaptive Regression Network for Single Image Super-resolution and Beyond

Wenbo Li, Kun Zhou, lu Qi and
Nianjuan Jiang, Jiangbo Lu, Jiaya Jia

Keywords Paper

0

0

0

0

3:09

30/11/2020

Fast and Differentiable Message Passing on Pairwise Markov Random Fields

Zhiwei Xu, Thalaiyasingam Ajanthan, Richard Hartley

Keywords Paper

0

0

0

0

9:41

15/06/2020

Learning fast and precise numerical analysis

Jingxuan He, Gagandeep Singh, Markus Püschel, Martin Vechev

Keywords Paper

Abstract interpretation, Performance optimization, Machine learning, Numerical domains

0

0

0

0

14:20

03/05/2021

NAS-Bench-ASR: Reproducible Neural Architecture Search for Speech Recognition

Abhinav Mehrotra, Alberto Gil Couto Pimentel Ramos, Sourav Bhattacharya and
Łukasz Dudziak, Ravichander Vipperla, Thomas C Chau, Mohamed Abdelfattah, Samin Ishtiaq, Nic Lane

Keywords Paper

Benchmark, NAS, ASR

0

0

0

0

4:50

06/12/2020

Approximate Cross-Validation with Low-Rank Data in High Dimensions

Will Stephenson, Madeleine Udell, Tamara Broderick

Keywords Paper

0

0

0

0

3:02

03/05/2021

Deep Encoder, Shallow Decoder: Reevaluating Non-autoregressive Machine Translation

Jungo Kasai, Nikolaos Pappas, Hao Peng and
James Cross, Noah Smith

Keywords Paper

Machine Translation, Sequence Modeling, Natural Language Processing

0

0

0

0

5:04

18/07/2021

ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision

Wonjae Kim, Bokyung Son, Ildoo Kim

Keywords Paper

Algorithms, Multimodal Learning

0

0

0

0

19:03

06/12/2020

GPU-Accelerated Primal Learning for Extremely Fast Large-Scale Classification

John Halloran, David M Rocke

Keywords Paper

0

0

0

0

3:33

08/12/2020

Model-agnostic Methods for Text Classification with Inherent Noise

Kshitij Tayal, Rahul Ghosh, Vipin Kumar

Keywords Paper

0

0

0

0

8:46

04/07/2020

The Cascade Transformer: an Application for Efficient Answer Sentence Selection

Luca Soldaini, Alessandro Moschitti

Keywords Paper

Efficient Selection, Answer Selection, classification tasks, classification

0

0

0

0

13:39

06/12/2020

Convolutional Tensor-Train LSTM for Spatio-Temporal Learning

Jiahao Su, Wonmin Byeon, Jean Kossaifi and
Furong Huang, Jan Kautz, Anima Anandkumar

Keywords Paper

0

0

0

0

3:29

26/08/2020

Prior-aware Composition Inference for Spectral Topic Models

Moontae Lee, David Bindel, David Mimno

Keywords Paper

0

0

0

0

14:46

06/12/2021

Align before Fuse: Vision and Language Representation Learning with Momentum Distillation

Junnan Li, Ramprasaath Selvaraju, Akhilesh Gotmare and
Shafiq Joty, Caiming Xiong, Steven Chu Hong Hoi

Keywords Paper

transformers, vision, representation learning

0

0

0

0

9:40

04/07/2020

GAN-BERT: Generative Adversarial Learning for Robust Text Classification with a Bunch of Labeled Examples

Danilo Croce, Giuseppe Castellucci, Roberto Basili

Keywords Paper

Robust Classification, Natural tasks, image processing, generative setting

0

0

0

0

6:48

06/12/2021

Compacter: Efficient Low-Rank Hypercomplex Adapter Layers

Rabeeh Karimi Mahabadi, James Henderson, Sebastian Ruder

Keywords Paper

optimization

0

0

0

0

14:16

06/12/2021

Tuning Large Neural Networks via Zero-Shot Hyperparameter Transfer

Ge Yang, Edward Hu, Igor Babuschkin and
Szymon Sidor, Xiaodong Liu, David Farhi, Nick Ryder, Jakub Pachocki, Weizhu Chen, Jianfeng Gao

Keywords Paper

deep learning, transformers

0

0

0

0

9:55

06/12/2020

Wavelet Flow: Fast Training of High Resolution Normalizing Flows

Jason Yu, Konstantinos Derpanis, Marcus Brubaker

Keywords Paper

0

0

0

0

3:23

03/05/2021

Better Fine-Tuning by Reducing Representational Collapse

Armen Aghajanyan, Akshat Shrivastava, Anchit Gupta and
Naman Goyal, Luke Zettlemoyer, Sonal Gupta

Keywords Paper

nlp, glue, representational learning, finetuning

0

0

0

0

5:06

19/04/2021

Progressively pretrained dense corpus index for open-domain question answering

Wenhan Xiong, Hong Wang, William Yang Wang

Keywords Paper

0

0

0

0

12:15

02/02/2021

DPFPS: Dynamic and Progressive Filter Pruning for Compressing Convolutional Neural Networks from Scratch

Xiaofeng Ruan, Yufan Liu, Bing Li and
Chunfeng Yuan, Weiming Hu

Keywords Paper

0

0

0

0

14:38

07/09/2020

Paying more Attention to Snapshots of Iterative Pruning: Improving Model Compression via Ensemble Distillation

Duong Le, Nhan Vo, Nam Thoai

Keywords Paper

network pruning, knowledge distillation, ensemble learning

0

0

0

0

8:30

06/12/2021

Sparse is Enough in Scaling Transformers

Sebastian Jaszczur, Aakanksha Chowdhery, Afroz Mohiuddin and
Łukasz Kaiser, Wojciech Gajewski, Henryk Michalewski, Jonni Kanerva

Keywords Paper

machine learning, transformers

0

0

0

0

8:28

06/12/2021

SimiGrad: Fine-Grained Adaptive Batching for Large Scale Training using Gradient Similarity Measurement

Heyang Qin, Samyam Rajbhandari, Olatunji Ruwase and
Feng Yan, Lei Yang, Yuxiong He

Keywords Paper

machine learning

0

0

0

0

11:23

26/04/2020

ALBERT: A Lite BERT for Self-supervised Learning of Language Representations

Zhenzhong Lan, Mingda Chen, Sebastian Goodman and
Kevin Gimpel, Piyush Sharma, Radu Soricut

Keywords Paper

Natural Language Processing, BERT, Representation Learning

0

0

0

0

4:59

04/07/2020

DeFormer: Decomposing Pre-trained Transformers for Faster Question Answering

Qingqing Cao, Harsh Trivedi, Aruna Balasubramanian, Niranjan Balasubramanian

Keywords Paper

Faster Answering, question-independent processing, DeFormer, Decomposing Transformers

0

0

0

0

11:06

06/12/2020

AdaTune: Adaptive Tensor Program Compilation Made Efficient

Menghao Li, Minjia Zhang, Chi Wang, Mingqin Li

Keywords Paper

0

0

0

0

3:16

26/04/2020

Reducing Transformer Depth on Demand with Structured Dropout

Angela Fan, Edouard Grave, Armand Joulin

Keywords Paper

reduction, regularization, pruning, dropout, transformer

0

0

0

0

5:01

11/08/2020

A computational approach to packet classification

Alon Rashelbach, Ori Rottenstreich, Mark Silberstein

Keywords Paper

Neural Networks, Virtual Switches, Packet Classification

0

0

0

0

16:56

06/12/2021

Controllable and Compositional Generation with Latent-Space Energy-Based Models

Weili Nie, Arash Vahdat, Anima Anandkumar

Keywords Paper

deep learning, generative model

0

0

0

0

13:13

26/04/2020

Plug and Play Language Models: A Simple Approach to Controlled Text Generation

Sumanth Dathathri, Andrea Madotto, Janice Lan and
Jane Hung, Eric Frank, Piero Molino, Jason Yosinski, Rosanne Liu

Keywords Paper

controlled text generation, generative models, conditional generative models, language modeling, transformer

0

0

1

1

4:58

08/12/2020

Domain Transfer based Data Augmentation for Neural Query Translation

Liang Yao, Baosong Yang, Haibo Zhang and
Boxing Chen, Weihua Luo

Keywords Paper

0

0

0

0

10:57

05/01/2021

DynaVSR: Dynamic Adaptive Blind Video Super-Resolution

Suyoung Lee, Myungsub Choi, Kyoung Mu Lee

Keywords Paper

0

0

0

0

4:56