View article

[PDF] from github.io

Changeset-based topic modeling of software repositories

Authors

Christopher S Corley, Kostadin Damevski, Nicholas A Kraft

Publication date

2018/10/9

Journal

IEEE Transactions on Software Engineering

Volume

Issue

Pages

1068-1080

Publisher

IEEE

Description

The standard approach to applying text retrieval models to code repositories is to train models on documents representing program elements. However, code changes lead to model obsolescence and to the need to retrain the model from the latest snapshot. To address this, we previously introduced an approach that trains a model on documents representing changesets from a repository and demonstrated its feasibility for feature location. In this paper, we expand our work by investigating: a second task (developer identification), the effects of including different changeset parts in the model, the repository characteristics that affect the accuracy of our approach, and the effects of the time invariance assumption on evaluation results. Our results demonstrate that our approach is as accurate as the standard approach for projects with most changes localized to a subset of the code, but less accurate when changes are …

Total citations

Cited by 18

2019202020212022202320241 2 3 7 4 1

Scholar articles

Changeset-based topic modeling of software repositories

CS Corley, K Damevski, NA Kraft - IEEE Transactions on Software Engineering, 2018