View article

[PDF] from arxiv.org

" Better Data" is Better than" Better Data Miners"(Benefits of Tuning SMOTE for Defect Prediction)

Authors

Amritanshu Agrawal, Tim Menzies

Publication date

2018/5/20

Conference

International Conference on Software Engineering, 2018

Description

We report and fix an important systematic error in prior studies that ranked classifiers for software analytics. Those studies did not (a) assess classifiers on multiple criteria and they did not (b) study how variations in the data affect the results. Hence, this paper applies (a) multi-performance criteria while (b) fixing the weaker regions of the training data (using SMOTUNED, which is an auto-tuning version of SMOTE). This approach leads to dramatically large increases in software defect predictions when applied in a 5*5 cross-validation study for 3,681 JAVA classes (containing over a million lines of code) from open source systems, SMOTUNED increased AUC and recall by 60% and 20% respectively. These improvements are independent of the classifier used to predict for defects. Same kind of pattern (improvement) was observed when a comparative analysis of SMOTE and SMOTUNED was done against the most …

Total citations

Cited by 231

201720182019202020212022202320243 17 32 37 51 46 24 21

Scholar articles

Is" better data" better than" better data miners"? on the benefits of tuning SMOTE for defect prediction

A Agrawal, T Menzies - Proceedings of the 40th International Conference on …, 2018