13/04/2021

Sharp analysis of a simple model for random forests

Jason Klusowski

Keywords:

Abstract: Random forests have become an important tool for improving accuracy in regression and classification problems since their inception by Leo Breiman in 2001. In this paper, we revisit a historically important random forest model, called centered random forests, originally proposed by Breiman in 2004 and later studied by Gérard Biau in 2012, where a feature is selected at random and the splits occurs at the midpoint of the node along the chosen feature. If the regression function is d-dimensional and Lipschitz, we show that, given access to n observations, the mean-squared prediction error is O((n(\log n)^{(d-1)/2})^{-\frac{1}{d\log2+1}}). This positively answers an outstanding question of Biau about whether the rate of convergence for this random forest model could be improved beyond O(n^{-\frac{1}{d(4/3)\log2+1}}). Furthermore, by a refined analysis of the approximation and estimation errors for linear models, we show that our new rate cannot be improved in general. Finally, we generalize our analysis and improve current prediction error bounds for another random forest model, called median random forests, in which each tree is constructed from subsampled data and the splits are performed at the empirical median along a chosen feature.

 0
 0
 0
 0
This is an embedded video. Talk and the respective paper are published at AISTATS 2021 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment
no comments yet
code of conduct: tbd Characters remaining: 140

Similar Papers