Authors
Sadika Amreen, Audris Mockus, Russell Zaretzki, Christopher Bogart, Yuxia Zhang
Publication date
2020/3
Journal
Empirical Software Engineering
Volume
25
Pages
1136-1167
Publisher
Springer US
Description
An accurate determination of developer identities is important for software engineering research and practice. Without it, even simple questions such as “how many developers does a project have?” cannot be answered. The commonly used version control data from Git is full of identity errors and the existing approaches to correct these errors are difficult to validate on large scale and cannot be easily improved. We, therefore, aim to develop a scalable, highly accurate, easy to use and easy to improve approach to correct software developer identity errors. We first amalgamate developer identities from version control systems in open source software repositories and investigate the nature and prevalence of these errors, design corrective algorithms, and estimate the impact of the errors on networks inferred from this data. We investigate these questions using a collection of over 1B Git commits with over 23M …
Total citations
2019202020212022202320241712453