Coresets for Decision Trees of Signals

06/12/2021

Coresets for Decision Trees of Signals

Ibrahim Jubran, Ernesto Evgeniy Sanches Shayda, Ilan I Newman, Dan Feldman

Keywords: machine learning

Abstract Paper Similar Papers

Abstract: A $k$-decision tree $t$ (or $k$-tree) is a recursive partition of a matrix (2D-signal) into $k\geq 1$ block matrices (axis-parallel rectangles, leaves) where each rectangle is assigned a real label. Its regression or classification loss to a given matrix $D$ of $N$ entries (labels) is the sum of squared differences over every label in $D$ and its assigned label by $t$.Given an error parameter $\varepsilon\in(0,1)$, a $(k,\varepsilon)$-coreset $C$ of $D$ is a small summarization that provably approximates this loss to \emph{every} such tree, up to a multiplicative factor of $1\pm\varepsilon$. In particular, the optimal $k$-tree of $C$ is a $(1+\varepsilon)$-approximation to the optimal $k$-tree of $D$.We provide the first algorithm that outputs such a $(k,\varepsilon)$-coreset for \emph{every} such matrix $D$. The size $|C|$ of the coreset is polynomial in $k\log(N)/\varepsilon$, and its construction takes $O(Nk)$ time.This is by forging a link between decision trees from machine learning -- to partition trees in computational geometry. Experimental results on \texttt{sklearn} and \texttt{lightGBM} show that applying our coresets on real-world data-sets boosts the computation time of random forests and their parameter tuning by up to x$10$, while keeping similar accuracy. Full open source code is provided.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at NeurIPS 2021 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

18/07/2021

Near-Optimal Algorithms for Explainable k-Medians and k-Means

Kostya Makarychev, Liren Shan

Keywords Paper

Algorithms, Unsupervised Learning

0

0

0

0

5:12

06/12/2021

Nearly-Tight and Oblivious Algorithms for Explainable Clustering

Buddhima Gamlath, Xinrui Jia, Adam Polak, Ola Svensson

Keywords Paper

optimization, clustering, interpretability

0

0

0

0

12:31

06/12/2021

Instance-Dependent Bounds for Zeroth-order Lipschitz Optimization with Error Certificates

Francois Bachoc, Tom Cesari, Sébastien Gerchinovitz

Keywords Paper

theory, optimization

0

0

0

0

14:51

06/12/2021

Improved Coresets and Sublinear Algorithms for Power Means in Euclidean Spaces

Vincent Cohen-Addad, David Saulpic, Chris Schwiegelshohn

Keywords Paper

clustering

0

0

0

0

16:06

06/12/2021

PLUGIn: A simple algorithm for inverting generative models with recovery guarantees

Babhru Joshi, Xiaowei Li, Yaniv Plan, Ozgur Yilmaz

Keywords Paper

deep learning, optimization, generative model

0

0

0

0

14:58

12/07/2020

Near-optimal sample complexity bounds for learning Latent $k-$polytopes and applications to Ad-Mixtures

Chiranjib Bhattacharyya, Ravindran Kannan

Keywords Paper

Learning Theory

0

0

0

0

15:04

06/12/2021

List-Decodable Mean Estimation in Nearly-PCA Time

Ilias Diakonikolas, Daniel Kane, Daniel Kongsgaard and
Jerry Li, Kevin Tian

Keywords Paper

theory, clustering

0

0

0

0

14:21

12/07/2020

On Efficient Low Distortion Ultrametric Embedding

Vincent Cohen-Addad, Karthik C. S., Guillaume Lagarde

Keywords Paper

Unsupervised and Semi-Supervised Learning

0

0

0

0

16:37

04/08/2021

Breaking The Dimension Dependence in Sparse Distribution Estimation under Communication Constraints

Wei-Ning Chen, Peter Kairouz, Ayfer Ozgur

Keywords Paper

0

0

0

0

15:28

06/12/2020

Extrapolation Towards Imaginary 0-Nearest Neighbour and Its Improved Convergence Rate

Akifumi Okuno, Hidetoshi Shimodaira

Keywords Paper

0

0

0

0

3:14

18/07/2021

Improving Ultrametrics Embeddings Through Coresets

Vincent Cohen-Addad, Rémi de Joannis de Verclos, Guillaume Lagarde

Keywords Paper

Algorithms, Clustering

0

0

0

0

5:19

06/12/2020

A Novel Approach for Constrained Optimization in Graphical Models

Sara Rouhani, Tahrima Rahman, Vibhav Gogate

Keywords Paper

0

0

0

0

3:21

12/07/2020

Improving the Sample and Communication Complexity for Decentralized Non-Convex Optimization: Joint Gradient Estimation and Tracking

Haoran Sun, Songtao Lu, Mingyi Hong

Keywords Paper

Optimization - Non-convex

0

0

0

0

13:56

06/12/2020

Sample Efficient Reinforcement Learning via Low-Rank Matrix Estimation

Devavrat Shah, Dogyoon Song, Zhi Xu, Yuzhe Yang

Keywords Paper

0

0

0

0

3:22

06/12/2021

A Faster Maximum Cardinality Matching Algorithm with Applications in Machine Learning

Nathaniel Lahn, Sharath Raghvendra, Jiacheng Ye

Keywords Paper

optimization, machine learning, graph learning

0

0

0

0

14:49

03/05/2021

Deep Learning meets Projective Clustering

Alaa Maalouf, Harry Lang, Daniela Rus, Dan Feldman

Keywords Paper

NLP, Compressing Deep Networks, Matrix Factorization, SVD

0

0

0

0

5:26

18/07/2021

Meta Learning for Support Recovery in High-dimensional Precision Matrix Estimation

Qian Zhang, Yilin Zheng, Jean Honorio

Keywords Paper

Algorithms, Meta-Learning, Algorithms, Few-Shot Learning; Algorithms, Multitask and Transfer Learning, Theory, Statistical Learning Theory

0

0

0

0

5:03

18/07/2021

Streaming and Distributed Algorithms for Robust Column Subset Selection

Shuli Jiang, Dongyu Li, Irene Mengze Li and
Arvind Mahankali, David Woodruff

Keywords Paper

Algorithms, Deep Learning, Generative Models, Deep Learning, Predictive Models; Deep Learning, Recurrent Networks

0

0

0

0

7:26

06/12/2020

Robust Sub-Gaussian Principal Component Analysis and Width-Independent Schatten Packing

Arun Jambulapati, Jerry Li, Kevin Tian

Keywords Paper

0

0

0

0

3:22

09/07/2020

How to trap a gradient flow

Dan Mikulincer, Sebastien Bubeck

Keywords Paper

Non-convex optimization,

0

0

0

0

15:01

06/12/2020

On Adaptive Distance Estimation

Yeshwanth Cherapanamjeri, Jelani Nelson

Keywords Paper

0

0

0

0

3:16

03/08/2020

Simple, deterministic, constant-round coloring in the congested clique

Artur Czumaj, Peter Davies, Merav Parter

Keywords Paper

derandomization, congested clique, massively parallel computation, coloring

0

0

0

0

24:57

06/12/2021

Better Algorithms for Individually Fair $k$-Clustering

Maryam Negahbani, Deeparnab Chakrabarty

Keywords Paper

theory, self-supervised learning, clustering, fairness

0

0

0

0

14:02

06/12/2021

Dimensionality Reduction for Wasserstein Barycenter

Zachary Izzo, Sandeep Silwal, Samson Zhou

Keywords Paper

machine learning

0

0

0

0

11:10

06/12/2021

Robustifying Algorithms of Learning Latent Trees with Vector Variables

Fengzhuo Zhang, Vincent Tan

Keywords Paper

theory, graph learning

0

0

0

0

13:21

09/07/2020

An O(m/eps^3.5)-Cost Algorithm for Semidefinite Programs with Diagonal Constraints

Swati Padmanabhan, Yin Tat Lee

Keywords Paper

Convex optimization, Approximation algorithms, Combinatorial optimization

0

0

0

0

12:34

18/07/2021

Dimensionality Reduction for the Sum-of-Distances Metric

Zhili Feng, Praneeth Kacham, David Woodruff

Keywords Paper

Neuroscience and Cognitive Science, Deep Learning, Biologically Plausible Deep Networks; Neuroscience and Cognitive Science, Connectomics; Neuroscience and Cog, Algorithms, Dimensionality Reduction

0

0

0

0

17:12

06/12/2020

Decision trees as partitioning machines to characterize their generalization properties

Jean-Samuel Leboeuf, Frédéric LeBlanc, Mario Marchand

Keywords Paper

0

0

0

0

2:38

06/12/2021

On the Sample Complexity of Privately Learning Axis-Aligned Rectangles

Menachem Sadigurschi, Uri Stemmer

Keywords Paper

theory, privacy

0

0

0

0

14:00

18/07/2021

Randomized Dimensionality Reduction for Facility Location and Single-Linkage Clustering

Shyam Narayanan, Sandeep Silwal, Piotr Indyk, Or Zamir

Keywords Paper

Algorithms, Dimensionality Reduction

0

0

0

0

5:00

06/12/2020

Universal guarantees for decision tree induction via a higher-order splitting criterion

Guy Blanc, Neha Gupta, Jane Lange, Li-Yang Tan

Keywords Paper

0

0

0

0

2:53

06/12/2020

From Trees to Continuous Embeddings and Back: Hyperbolic Hierarchical Clustering

Ines Chami, Albert Gu, Vaggos Chatziafratis, Chris Ré

Keywords Paper

0

0

0

0

3:22

08/07/2020

The Strahler Number of a Parity Game

Laure Daviaud, Marcin Jurdzinski, K. S. Thejaswini

Keywords Paper

parity game, attractor decomposition, progress measure, universal tree, Strahler number

0

0

0

0

22:30

09/07/2020

Consistent recovery threshold of hidden nearest neighbor graphs

Jian Ding, Yihong Wu, Jiaming Xu, Dana Yang

Keywords Paper

Learning from complex/structured data (e.g. networks, time series), Information theory, Learning with algebraic or combinatorial structure

0

0

0

0

15:28

09/07/2020

Tree-projected gradient descent for estimating gradient-sparse parameters on graphs

Sheng Xu, Zhou Fan, Sahand Negahban

Keywords Paper

High-dimensional statistics, Combinatorial optimization, Learning from complex/structured data (e.g. networks, time series), Non-convex optimization

0

0

0

0

16:00

04/08/2021

Exact Recovery of Clusters in Finite Metric Spaces Using Oracle Queries

Marco Bressan, Nicolò Cesa-Bianchi, Silvio Lattanzi, Andrea Paudice

Keywords Paper

0

0

0

0

16:56

18/07/2021

On the price of explainability for some clustering problems

Eduardo Laber, Lucas Murtinho

Keywords Paper

Optimization, Combinatorial Optimization

0

0

0

0

16:52

06/12/2021

A Domain-Shrinking based Bayesian Optimization Algorithm with Order-Optimal Regret Performance

Sudeep Salgia, Sattar Vakili, Qing Zhao

Keywords Paper

optimization, bandits, kernel methods

0

0

0

0

15:51

04/08/2021

Quantifying Variational Approximation for Log-Partition Function

Romain Cosson, Devavrat Shah

Keywords Paper

0

0

0

0

16:54

06/12/2021

Contextual Recommendations and Low-Regret Cutting-Plane Algorithms

Sreenivas Gollapudi, Guru Guruganesh, Kostas Kollias and
Pasin Manurangsi, Renato Leme, Jon Schneider

Keywords Paper

bandits, online learning

0

0

0

0

7:29