23/07/2020

CaliForest: calibrated random forest for health data

Yubin Park, Joyce C. Ho

Keywords: Applied computing, Life and medical sciences, Health informatics, Computing methodologies, Machine learning, Machine learning algorithms, Ensemble methods, Bagging, Machine learning approaches, Classification and regression trees, General and reference, Cross-computing tools and techniques, Empirical studies

Abstract: Real-world predictive models in healthcare should be evaluated in terms of discrimination, the ability to differentiate between high and low risk events, and calibration, or the accuracy of the risk estimates. Unfortunately, calibration is often neglected and only discrimination is analyzed. Calibration is crucial for personalized medicine as they play an increasing role in the decision making process. Since random forest is a popular model for many healthcare applications, we propose CaliForest, a new calibrated random forest. Unlike existing calibration methodologies, CaliForest utilizes the out-of-bag samples to avoid the explicit construction of a calibration set. We evaluated CaliForest on two risk prediction tasks obtained from the publicly-available MIMIC-III database. Evaluation on these binary prediction tasks demonstrates that CaliForest can achieve the same discriminative power as random forest while obtaining a better-calibrated model evaluated across six different metrics. CaliForest will be published on the standard Python software repository and the code will be openly available on Github.

 0
 0
 0
 0
This is an embedded video. Talk and the respective paper are published at ACM-CHIL 2020 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment
no comments yet
code of conduct: tbd Characters remaining: 140

Similar Papers