19/10/2020

Schema-agnostic entity matching using pre-trained language models

Kai-Sheng Teong, Lay-Ki Soon, Tin Tin Su

Keywords: language models, schema agnostic, entity matching

Abstract: Entity matching (EM) is the process of linking records from different data sources. While extensive research has been done in various aspects of EM, many of these studies generally assume EM tasks as schema-specific, which attempt to match record pairs at attributes level. Unfortunately, in the real-world, tables that undergo EM may not have an aligned schema, and often, the schema or metadata of the table and attributes are not known beforehand.In view of this challenge, this paper presents an effective approach for schema-agnostic EM, where having schema-aligned tables is not compulsory. The proposed method stemmed from the idea of treating tuples in tables for EM similar to sentence pair classification problem in natural language processing (NLP). A pre-trained language model, BERT is adopted by fine-tuning it using labeled dataset. The proposed method was experimented using benchmark datasets and compared against two state-of-the-art approaches,namely DeepMatcher and Magellan. The experimental results show that our proposed solution outperforms by an average of 9

The video of this talk cannot be embedded. You can watch it here:
https://dl.acm.org/doi/10.1145/3340531.3412131#sec-supp
(Link will open in new window)
 0
 0
 0
 0
This is an embedded video. Talk and the respective paper are published at CIKM 2020 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment
no comments yet
code of conduct: tbd

Similar Papers