Authors
Aaditya Bhatia, Ellis E Eghan, Manel Grichi, William G Cavanagh, Zhen Ming Jiang, Bram Adams
Publication date
2023/5
Journal
Empirical Software Engineering
Volume
28
Issue
3
Pages
60
Publisher
Springer US
Description
Machine Learning (ML) academic publications commonly provide open-source implementations on GitHub, allowing their audience to replicate, validate, or even extend the ML algorithms, data sets and metadata. However, thus far little is known about the degree of collaboration activity happening on such ML research repositories, in particular regarding (1) the degree to which such repositories receive contributions from forks, (2) the nature of such contributions (i.e., the types of changes), and (3) the nature of changes that are not contributed back to forks, which might represent missed opportunities. In this paper, we empirically study contributions to 1,346 ML research repositories and their 67,369 forks, both quantitatively and qualitatively, by building on Hindle et al.’s seminal taxonomy of code changes. We found that while ML research repositories are heavily forked, only 9% of the forks made modifications to the …
Total citations
2023202443