Authors
Gustavo EAPA Batista, Ronaldo C Prati, Maria C Monard
Publication date
2005/9/8
Book
International symposium on intelligent data analysis
Pages
24-35
Publisher
Springer Berlin Heidelberg
Description
Several studies have pointed out that class imbalance is a bottleneck in the performance achieved by standard supervised learning systems. However, a complete understanding of how this problem affects the performance of learning is still lacking. In previous work we identified that performance degradation is not solely caused by class imbalances, but is also related to the degree of class overlapping. In this work, we conduct our research a step further by investigating sampling strategies which aim to balance the training set. Our results show that these sampling strategies usually lead to a performance improvement for highly imbalanced data sets having highly overlapped classes. In addition, over-sampling methods seem to outperform under-sampling methods.
Total citations
200620072008200920102011201220132014201520162017201820192020202120222023202421222761079847810111384
Scholar articles
GE Batista, RC Prati, MC Monard - International symposium on intelligent data analysis, 2005