25/07/2020

Evidence weighted tree ensembles for text classification

Md Zahidul Islam, Jixue Liu, Jiuyong Li, Lin Liu, Wei Kang

Keywords: decision semantics, ensemble methods, text classification

Abstract: Text documents are often mapped to vectors of binary values where 1 indicates the presence of a word and 0 indicates the absence. The vectors are then used to train predictive models. In tree-based ensemble models, predictions from some decision trees may be made purely from absent words. This type of predictions should be trusted less as absent words can be interpreted in multiple ways. In this work, we propose to improve the comprehensibility and accuracy of ensemble models by distinguishing word presence and absence. The presented method weights predictions based on word presence. Experimental results on 35 real text datasets indicate that our method outperforms state-of-the-art ensemble methods on various text classification tasks.

The video of this talk cannot be embedded. You can watch it here:
https://dl.acm.org/doi/10.1145/3397271.3401229#sec-supp
(Link will open in new window)
 0
 0
 0
 0
This is an embedded video. Talk and the respective paper are published at SIGIR 2020 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment
no comments yet
code of conduct: tbd

Similar Papers