25/07/2020

Crowdsourced text sequence aggregation based on hybrid reliability and representation

Jiyi Li

Keywords: text sequence aggregation, crowdsourcing, reliability

Abstract: The crowd is cheaper and easier to access than the oracle to collect the ground truth data for training and evaluating models. To ensure the quality of the crowdsourced data, people can assign multiple crowd workers to one question and then aggregate the multiple answers with diverse quality into a golden one. In the areas of IR and NLP, the ground truth data of many tasks are text sequences. To aggregate multiple crowdsourced text sequences with diverse quality, the methods adapted from the existing answer aggregation methods which are proposed for labels (e.g., categories) only focus on one-sided reliability and do not fully utilize the rich information in text sequences. We thus propose a crowdsourced text sequence aggregation method which can capture the hybrid reliability information, i.e., the local question-wise reliability of text answers and global dataset-wise reliability of crowd workers. For the local reliability, it also incorporates the text similarities from hybrid representation, i.e., the text embeddings and word sequences. The experiments based on real crowdsourced datasets show that our method outperforms the baselines which only utilize one-sided reliability and one-sided representation. Our method can effectively leverage the rich information of text sequences.

The video of this talk cannot be embedded. You can watch it here:
https://dl.acm.org/doi/10.1145/3397271.3401239#sec-supp
(Link will open in new window)
 0
 0
 0
 0
This is an embedded video. Talk and the respective paper are published at SIGIR 2020 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment
no comments yet
code of conduct: tbd

Similar Papers