29/06/2020

A dataset of enterprise-driven open source software

Diomidis Spinellis, Zoe Kotti, Konstantinos Kravvaritis, Georgios Theodorou, Panos Louridas

Keywords: SEC 10-K, EDGAR, software ecosystems, dataset, SEC 20-F, open source software in business, Software engineering economics, Fortune Global 500

Abstract: We present a dataset of open source software developed mainly by enterprises rather than volunteers. This can be used to address known generalizability concerns, and, also, to perform research on open source business software development. Based on the premise that an enterprise’s employees are likely to contribute to a project developed by their organization using the email account provided by it, we mine domain names associated with enterprises from open data sources as well as through white- and blacklisting, and use them through three heuristics to identify 17 264 enterprise GitHub projects. We provide these as a dataset detailing their provenance and properties. A manual evaluation of a dataset sample shows an identification accuracy of 89

 0
 0
 0
 0
This is an embedded video. Talk and the respective paper are published at MSR 2020 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment
no comments yet
code of conduct: tbd

Similar Papers