Fast, Accurate, and Simple Models for Tabular Data via Augmented Distillation

Abstract: Automated machine learning (AutoML) can produce complex model ensembles by stacking, bagging, and boosting many individual models like trees, deep networks, and nearest neighbor estimators. While highly accurate, the resulting predictors are large, slow, and opaque as compared to their constituents. To improve the deployment of AutoML on tabular data, we propose FAST-DAD to distill arbitrarily-complex ensemble predictors into individual models like boosted trees, random forests, and deep networks. At the heart of our approach is a data augmentation strategy based on Gibbs sampling from a self-attention pseudolikelihood estimator. Across 30 datasets spanning regression and binary/multiclass classification tasks, FAST-DAD distillation produces significantly better individual models than one obtains through standard training on the original data. Our individual distilled models are over 10x faster and more accurate than ensemble predictors produced by AutoML tools like H2O/AutoSklearn.

06/12/2021

Fast, Accurate, and Simple Models for Tabular Data via Augmented Distillation

Rasool Fakoor, Jonas Mueller, Nick Erickson, Pratik Chaudhari, Alex Smola

Comments

Similar Papers

Batch Active Learning at Scale

Gui Citovsky, Giulia DeSalvo, Claudio Gentile and Lazaros Karydas, Anand Rajagopalan, Afshin Rostamizadeh, Sanjiv Kumar

Keywords Abstract Paper

active learning

Run-Sort-ReRun: Escaping Batch Size Limitations in Sliced Wasserstein Generative Models

José Lezama, Wei Chen, Qiang Qiu

Keywords Abstract Paper

Deep Learning

Improving Robustness using Generated Data

Sven Gowal, Sylvestre-Alvise Rebuffi, Olivia Wiles and Florian Stimberg, Dan Andrei Calian, Timothy A Mann

Keywords Abstract Paper

machine learning, robustness, adversarial robustness and security, generative model

Incremental Sensitivity Analysis for Kernelized Models

Hadar Sivan, Moshe Gabel, Assaf Schuster

Keywords Abstract Paper

Efficiently sampling functions from Gaussian process posteriors

James Wilson, Viacheslav Borovitskiy, Alexander Terenin and Peter Mostowsky, Marc Deisenroth

Keywords Abstract Paper

Gaussian Processes

Perturb-and-max-product: Sampling and learning in discrete energy-based models

Miguel Lazaro-Gredilla, Antoine Dedieu, Dileep George

Keywords Abstract Paper

generative model, graph learning

Meta Two-Sample Testing: Learning Kernels for Testing with Limited Data

Feng Liu, Wenkai Xu, Jie Lu, [deadname] J Sutherland

Keywords Abstract Paper

meta learning, kernel methods

Dataset Distillation with Infinitely Wide Convolutional Networks

Timothy Nguyen, Roman Novak, Lechao Xiao, Jaehoon Lee

Keywords Abstract Paper

deep learning, machine learning, vision, meta learning

Paying more Attention to Snapshots of Iterative Pruning: Improving Model Compression via Ensemble Distillation

Duong Le, Nhan Vo, Nam Thoai

Keywords Abstract Paper

network pruning, knowledge distillation, ensemble learning

Marginalized Stochastic Natural Gradients for Black-Box Variational Inference

Geng Ji, Debora Sujono, Erik Sudderth

Keywords Abstract Paper

Probabilistic Methods, Approximate Inference

Online Sign Identification: Minimization of the Number of Errors in Thresholding Bandits

Reda Ouhamma, Rémy Degenne, Vianney Perchet, Pierre Gaillard

Keywords Abstract Paper

bandits, online learning

ALBERT: A Lite BERT for Self-supervised Learning of Language Representations

Zhenzhong Lan, Mingda Chen, Sebastian Goodman and Kevin Gimpel, Piyush Sharma, Radu Soricut

Keywords Abstract Paper

Natural Language Processing, BERT, Representation Learning

GLISTER: Generalization based Data Subset Selection for Efficient and Robust Learning

Krishnateja Killamsetty, Durga Sivasubramanian, Ganesh Ramakrishnan, Rishabh Iyer

Keywords Abstract Paper

Variable Skipping for Autoregressive Range Density Estimation

Eric Liang, Zongheng Yang, Ion Stoica and Pieter Abbeel, Yan Duan, Peter Chen

Keywords Abstract Paper

Deep Learning - Generative Models and Autoencoders

Reducing Transformer Depth on Demand with Structured Dropout

Angela Fan, Edouard Grave, Armand Joulin

Keywords Abstract Paper

reduction, regularization, pruning, dropout, transformer

Large-Scale Meta-Learning with Continual Trajectory Shifting

JWoong Shin, Hae Beom Lee, Boqing Gong, Sung Ju Hwang

Keywords Abstract Paper

Algorithms, Multitask, Transfer, and Meta Learning

On-the-fly Rectification for Robust Large-Vocabulary Topic Inference

Moontae Lee, June Cho, Kun Dong and David Mimno, David Bindel

Keywords Abstract Paper

Applications, Natural Language Processing

Faster & more reliable tuning of neural networks: Bayesian optimization with importance sampling

Setareh Ariafar, Zelda Mariet, Dana Brooks and Jennifer Dy, Jasper Snoek

Keywords Abstract Paper

Combining Human Predictions with Model Probabilities via Confusion Matrices and Calibration

Gavin Kerrigan, Padhraic Smyth, Mark Steyvers

Keywords Abstract Paper

machine learning, vision

Long-Short Transformer: Efficient Transformers for Language and Vision

Chen Zhu, Wei Ping, Chaowei Xiao and Mohammad Shoeybi, Tom Goldstein, Anima Anandkumar, Bryan Catanzaro

Keywords Abstract Paper

Gui Citovsky, Giulia DeSalvo, Claudio Gentile and
Lazaros Karydas, Anand Rajagopalan, Afshin Rostamizadeh, Sanjiv Kumar

Keywords Paper

Keywords Paper

Sven Gowal, Sylvestre-Alvise Rebuffi, Olivia Wiles and
Florian Stimberg, Dan Andrei Calian, Timothy A Mann

Keywords Paper

Keywords Paper

James Wilson, Viacheslav Borovitskiy, Alexander Terenin and
Peter Mostowsky, Marc Deisenroth

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Zhenzhong Lan, Mingda Chen, Sebastian Goodman and
Kevin Gimpel, Piyush Sharma, Radu Soricut

Keywords Paper

Keywords Paper

Eric Liang, Zongheng Yang, Ion Stoica and
Pieter Abbeel, Yan Duan, Peter Chen

Keywords Paper

Keywords Paper

Keywords Paper

Moontae Lee, June Cho, Kun Dong and
David Mimno, David Bindel

Keywords Paper

Setareh Ariafar, Zelda Mariet, Dana Brooks and
Jennifer Dy, Jasper Snoek

Keywords Paper

Keywords Paper

Chen Zhu, Wei Ping, Chaowei Xiao and
Mohammad Shoeybi, Tom Goldstein, Anima Anandkumar, Bryan Catanzaro

Keywords Paper

Doron Laadan, Roman Vainshtein, Yarden Curiel and
Gilad Katz, Lior Rokach

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Soumya Ghosh, Will Stephenson, Stan Nguyen and
Sameer Deshpande, Tamara Broderick

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Cristina Pinneri, Shambhuraj Sawant, Sebastian Blaes and
Jan Achterhold, Joerg Stueckler, Michal Rolinek, Georg Martius

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

talyaa01 Eden, Piotr Indyk, Shyam Narayanan and
Ronitt Rubinfeld, Sandeep Silwal, Tal Wagner

Keywords Paper

Kashif Rasul, Abdul-Saboor Sheikh, Ingmar Schuster and
Urs Bergmann, Roland Vollgraf

Keywords Paper

Keywords Paper