19/10/2020

Two test collections for retrieval using named entity markup

Jacob Bremerman, Dawn Lawrie, James Mayfield, Douglas W. Oard

Keywords: test collection, topic aspects, entity-based search

Abstract: Studying the effects of semantic analysis on retrieval effectiveness can be difficult using standard test collections because both queries and documents typically lack semantic markup. This paper describes extensions to two test collections, CLEF 2003/2004 Russian and TDT-3 Chinese, to support study of the utility of named entity annotation. A new set of topic aspects that were expected to benefit from named entity markup were defined for topics in those test collections, with two queries for each aspect. One of these queries uses named entities as bag-of-words query terms or as semantic constraints on a free-text query term; the other is a bag-of-words baseline query without named entity markup. Exhaustive judgment of the documents annotated by CLEF or TDT as relevant to each corresponding topic was performed, resulting in relevance judgments for 133 Russian and 33 Chinese topic aspects that each have at least one relevant document. Named entity tags were automatically generated for the documents in both collections. Use of the test collections is illustrated with some preliminary experiments.

The video of this talk cannot be embedded. You can watch it here:
https://dl.acm.org/doi/10.1145/3340531.3417452#sec-supp
(Link will open in new window)
 0
 0
 0
 0
This is an embedded video. Talk and the respective paper are published at CIKM 2020 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment
no comments yet
code of conduct: tbd

Similar Papers