25/07/2020

On understanding data worker interaction behaviors

Lei Han, Tianwa Chen, Gianluca Demartini, Marta Indulska, Shazia Sadiq

Keywords: interaction behavior, search pattern, data curation

Abstract: Understanding how data workers interact with data and various pieces of information (e.g., code snippet examples) is key to design systems that can better support them in exploring a given dataset. To date, however, there is a paucity of research studying information seeking patterns and the strategies adopted by data workers as they carry out data curation activities. In this work, we aim at understanding the behaviors of data workers in discovering data quality issues, and how these behavioral observations relate to their performance. Specifically, we investigate how data workers use information resources and tools to support their task completion. To this end, we collect a multi-modal dataset through a data-driven experiment that relies on the use of eye-tracking technology with a purpose-designed platform built on top of iPython Notebook. The collected data reveals that: (i) searching in external resources is a prevalent action that can be leveraged to achieve better performance; (ii) ’copy-paste-modify’ is a typical strategy for writing code to complete tasks; (iii) providing sample code within the system could help data workers to get started with their task; and (iv) surfacing underlying data is an effective way to support exploration. By investigating the behaviors prior to each search action, we also find that the most common reasons that trigger external search actions are the need to seek assistance in writing or debugging code and to search for relevant code to reuse. Our findings provide insights into patterns of interactions with various system components and information resources to perform data curation tasks. This bears implications on the design of domain-specific IR systems for data workers like code-base search.

The video of this talk cannot be embedded. You can watch it here:
https://dl.acm.org/doi/10.1145/3397271.3401059#sec-supp
(Link will open in new window)
 0
 0
 0
 0
This is an embedded video. Talk and the respective paper are published at SIGIR 2020 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment
no comments yet
code of conduct: tbd Characters remaining: 140

Similar Papers