Authors
Shuo Feng, Jacky Keung, Xiao Yu, Yan Xiao, Kwabena Ebo Bennin, Md Alamgir Kabir, Miao Zhang
Publication date
2021/1/1
Journal
Information and Software Technology
Volume
129
Pages
106432
Publisher
Elsevier
Description
Context: Generally, there are more non-defective instances than defective instances in the datasets used for software defect prediction (SDP), which is referred to as the class imbalance problem. Oversampling techniques are frequently adopted to alleviate the problem by generating new synthetic defective instances. Existing techniques generate either near-duplicated instances which result in overgeneralization (high probability of false alarm, p f) or overly diverse instances which hurt the prediction model’s ability to find defects (resulting in low probability of detection, p d). Furthermore, when existing oversampling techniques are applied in SDP, the effort needed to inspect the instances with different complexity is not taken into consideration. Objective: In this study, we introduce Complexity-based OverSampling TEchnique (COSTE), a novel oversampling technique that can achieve low p f and high p d …
Total citations
20202021202220232024212213624
Scholar articles