Authors
Kwabena Ebo Bennin, Jacky W Keung, Akito Monden
Publication date
2019/4/15
Journal
Empirical Software Engineering
Volume
24
Pages
602-636
Publisher
Springer US
Description
Software defect data sets are typically characterized by an unbalanced class distribution where the defective modules are fewer than the non-defective modules. Prediction performances of defect prediction models are detrimentally affected by the skewed distribution of the faulty minority modules in the data set since most algorithms assume both classes in the data set to be equally balanced. Resampling approaches address this concern by modifying the class distribution to balance the minority and majority class distribution. However, very little is known about the best distribution for attaining high performance especially in a more practical scenario. There are still inconclusive results pertaining to the suitable ratio of defect and clean instances (Pfp), the statistical and practical impacts of resampling approaches on prediction performance and the more stable resampling approach across several …
Total citations
201820192020202120222023202422916162712
Scholar articles