View article

On the relative value of data resampling approaches for software defect prediction

Authors

Kwabena Ebo Bennin, Jacky W Keung, Akito Monden

Publication date

2019/4/15

Journal

Empirical Software Engineering

Volume

Pages

602-636

Publisher

Springer US

Description

Software defect data sets are typically characterized by an unbalanced class distribution where the defective modules are fewer than the non-defective modules. Prediction performances of defect prediction models are detrimentally affected by the skewed distribution of the faulty minority modules in the data set since most algorithms assume both classes in the data set to be equally balanced. Resampling approaches address this concern by modifying the class distribution to balance the minority and majority class distribution. However, very little is known about the best distribution for attaining high performance especially in a more practical scenario. There are still inconclusive results pertaining to the suitable ratio of defect and clean instances (Pfp), the statistical and practical impacts of resampling approaches on prediction performance and the more stable resampling approach across several …

Total citations

Cited by 84

20182019202020212022202320242 2 9 16 16 27 12

Scholar articles

On the relative value of data resampling approaches for software defect prediction

KE Bennin, JW Keung, A Monden - Empirical Software Engineering, 2019