Authors
Christopher S Corley, Kostadin Damevski, Nicholas A Kraft
Publication date
2018/10/9
Journal
IEEE Transactions on Software Engineering
Volume
46
Issue
10
Pages
1068-1080
Publisher
IEEE
Description
The standard approach to applying text retrieval models to code repositories is to train models on documents representing program elements. However, code changes lead to model obsolescence and to the need to retrain the model from the latest snapshot. To address this, we previously introduced an approach that trains a model on documents representing changesets from a repository and demonstrated its feasibility for feature location. In this paper, we expand our work by investigating: a second task (developer identification), the effects of including different changeset parts in the model, the repository characteristics that affect the accuracy of our approach, and the effects of the time invariance assumption on evaluation results. Our results demonstrate that our approach is as accurate as the standard approach for projects with most changes localized to a subset of the code, but less accurate when changes are …
Total citations
201920202021202220232024123741
Scholar articles
CS Corley, K Damevski, NA Kraft - IEEE Transactions on Software Engineering, 2018