19/10/2020

WebLens: Towards interactive large-scale structured data profiling

Rituparna Khan, Michael Gubanov

Keywords: relational model, large-scale structured data, metadata profiling

Abstract: Data profiling is a "set of statistical data analysis activities and processes to determine properties of a given dataset". Historically,most of the data profiling tasks were aimed at data. At scale, when a dataset has millions of tables, their meta-data (i.e. titles, attribute names and types) becomes abundant similar to data instances, and its profiling starts playing a vital role.Here we demonstrate our work on WebLens- an interactive, scalable metadata profiler for large-scale structured data. At its core is a new data structure - Metadata-profile, coupled with Machine/Deep-Learning models trained to construct it. It represents a meta-data summary of a specific real world object collected over millions of data sources. Such profiles significantly simplify access to large-scale structured datasets for both data scientists and end users.Finally, we performed a user study with 20 students and found WebLens trained models significantly outperform 20 people on the task of construction of metadata-profiles for 10 objects from different domains. For demonstration and evaluation we used a large-scale dataset of ’15 Million relational English tables from the Web.

The video of this talk cannot be embedded. You can watch it here:
https://dl.acm.org/doi/10.1145/3340531.3417443#sec-supp
(Link will open in new window)
 0
 0
 0
 0
This is an embedded video. Talk and the respective paper are published at CIKM 2020 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment
no comments yet
code of conduct: tbd Characters remaining: 140

Similar Papers