View article

[PDF] from umn.edu

Implications of ceiling effects in defect predictors

Authors

Tim Menzies, Burak Turhan, Ayse Bener, Gregory Gay, Bojan Cukic, Yue Jiang

Publication date

2008/5/12

Book

Proceedings of the 4th international workshop on Predictor models in software engineering

Pages

47-54

Description

Context

There are many methods that input static code features and output a predictor for faulty code modules. These data mining methods have hit a "performance ceiling"; i.e., some inherent upper bound on the amount of information offered by, say, static code features when identifying modules which contain faults.

Objective

We seek an explanation for this ceiling effect. Perhaps static code features have "limited information content"; i.e. their information can be quickly and completely discovered by even simple learners.

Method

An initial literature review documents the ceiling effect in other work. Next, using three sub-sampling techniques (under-, over-, and micro-sampling), we look for the lower useful bound on the number of training instances.

Results

Using micro-sampling, we find that as few as 50 instances yield as much information as larger training sets.

Conclusions

We have found much evidence for the …

Total citations

Cited by 241

2007200820092010201120122013201420152016201720182019202020212022202320241 9 12 8 20 19 20 11 16 25 9 17 12 8 17 11 16 5

Scholar articles

Implications of ceiling effects in defect predictors

T Menzies, B Turhan, A Bener, G Gay, B Cukic, Y Jiang - Proceedings of the 4th international workshop on …, 2008