View article

[PDF] from upm.es

Sample size vs. bias in defect prediction

Authors

Foyzur Rahman, Daryl Posnett, Israel Herraiz, Premkumar Devanbu

Publication date

2013/8/18

Book

Proceedings of the 2013 9th joint meeting on foundations of software engineering

Pages

147-157

Description

Most empirical disciplines promote the reuse and sharing of datasets, as it leads to greater possibility of replication. While this is increasingly the case in Empirical Software Engineering, some of the most popular bug-fix datasets are now known to be biased. This raises two significant concerns: first, that sample bias may lead to underperforming prediction models, and second, that the external validity of the studies based on biased datasets may be suspect. This issue has raised considerable consternation in the ESE literature in recent years. However, there is a confounding factor of these datasets that has not been examined carefully: size. Biased datasets are sampling only some of the data that could be sampled, and doing so in a biased fashion; but biased samples could be smaller, or larger. Smaller data sets in general provide less reliable bases for estimating models, and thus could lead to inferior model …

Total citations

Cited by 147

201420152016201720182019202020212022202320248 15 19 12 16 11 12 16 15 12 7

Scholar articles

Sample size vs. bias in defect prediction

F Rahman, D Posnett, I Herraiz, P Devanbu - Proceedings of the 2013 9th joint meeting on …, 2013