23/08/2020

Imputing various incomplete attributes via distance likelihood maximization

Shaoxu Song, Yu Sun

Keywords: distance likelihood, incomplete data, data imputation

Abstract: Missing values may appear in various attributes. By "various", we mean (1) different types of values in a tuple, such as numerical or categorical, and (2) different attributes in a tuple, either the dependent or determinant attributes of regression models or dependency rules. Such varieties unfortunately prevent the imputation performing. In this paper, we propose to study the distance models that predict distances between tuples for missing data imputation. The immediate benefits are in two aspects, (1) uniformly processing and collaboratively utilizing the distances on all the attributes with various types of values, and (2) rather than enumerating the combinations of imputation candidates on various attributes, we can directly calculate the most likely distances of missing values to other complete ones and thus infer the corresponding imputations. Our major technical highlights include (1) introducing the imputation statistically explainable by the likelihood on distances, (2) proving NP-hardness of finding the maximum likelihood imputation, and (3) devising the approximation algorithm with performance guarantees. Experiments over datasets with real missing values demonstrate the superiority of the proposed method compared to 11 existing approaches in 5 categories. Our proposal improves not only the imputation accuracy but also the downstream applications such as classification, clustering and record matching.

The video of this talk cannot be embedded. You can watch it here:
https://dl.acm.org/doi/10.1145/3394486.3403096#sec-supp
(Link will open in new window)
 0
 0
 0
 0
This is an embedded video. Talk and the respective paper are published at KDD 2020 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment
no comments yet
code of conduct: tbd Characters remaining: 140

Similar Papers