25/04/2020

TRACTUS: Understanding and Supporting Source Code Experimentation in Hypothesis-Driven Data Science

Krishna Subramanian, Johannes Maas, Jan Borchers

Keywords: data science, programming ide, exploratory programming, information visualization, observational study

Abstract: Data scientists experiment heavily with their code, compromising code quality to obtain insights faster. We observed ten data scientists perform hypothesis-driven data science tasks, and analyzed their coding, commenting, and analysis practice. We found that they have difficulty keeping track of their code experiments. When revisiting exploratory code to write production code later, they struggle to retrace their steps and capture the decisions made and insights obtained, and have to rerun code frequently. To address these issues, we designed TRACTUS, a system extending the popular RStudio IDE, that detects, tracks, and visualizes code experiments in hypothesis-driven data science tasks. TRACTUS helps recall decisions and insights by grouping code experiments into hypotheses, and structuring information like code execution output and documentation. Our user studies show how TRACTUS improves data scientists’ workflows, and suggest additional opportunities for improvement. TRACTUS is available as an open source RStudio IDE addin at http://hci.rwth-aachen.de/tractus.

The video of this talk cannot be embedded. You can watch it here:
https://www.youtube.com/watch?v=CCMwLeYMewQ
(Link will open in new window)
 0
 0
 0
 0
This is an embedded video. Talk and the respective paper are published at CHI 2020 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment
no comments yet
code of conduct: tbd

Similar Papers