26/08/2020

A Theoretical and Practical Framework for Regression and Classification from Truncated Samples

Andrew Ilyas, Emmanouil Zampetakis, Constantinos Daskalakis

Keywords:

Abstract: Machine learning and statistics are invaluable for extracting insights from data. A key assumption of most methods, however, is that they have access to independent samples from the distribution of relevant data. As such, these methods often perform poorly in the face of {\em biased data} which breaks this assumption. In this work, we consider the classical challenge of bias due to truncation, wherein samples falling outside of an ``observation window'' cannot be observed. We present a general framework for regression and classification from samples that are truncated according to the value of the dependent variable. The framework argues that stochastic gradient descent (SGD) can be efficiently executed on the population log-likelihood of the truncated sample. Our framework is broadly applicable, and we provide end-to-end guarantees for the well-studied problems of truncated logistic and probit regression, where we argue that the true model parameters can be identified computationally and statistically efficiently from truncated data, extending recent work on truncated linear regression. We also provide experiments to illustrate the practicality of our framework on synthetic and real data.

 0
 0
 0
 0
This is an embedded video. Talk and the respective paper are published at AISTATS 2020 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment
no comments yet
code of conduct: tbd

Similar Papers