Abstract:
Since over a decade coreference resolution systems have been developed in order to find simple 1-to-1 equivalent mapping (sameAs relations) between instances of different linked datasets and knowledge graphs. Comparative evaluations of instance matching systems can inform us about the performance of such systems regarding artificial benchmarks or real-world data challenges. However, the lack of real data for evaluating these systems is currently a bottleneck. In this paper, we propose the use of the Cruise entities in the GeoLink data repository as a real-world instance matching benchmark for linked data and knowledge graphs. The GeoLink project has brought together seven datasets related to geoscience research. Both the ontology (T-box) and the instance data (A-box) of GeoLink are significantly larger than current benchmarks, and they have particularly interesting challenges, such as geospatial and temporal data. The benchmark we propose here consists of two real-world datasets in GeoLink called R2R data and BCO-DMO which includes manual curated owl:sameAs links between more than 900 Cruise entities of these two datasets. The reference alignment was discussed and generated by domain experts from different institutions and is expressed in the Alignment API format.