Authors
L Mathieson, A Mendes, J Marsden, J Pond, P Moscato
Publication date
2004
Description
This paper introduces a new method for knowledge extraction from databases with the aim of finding a discriminative set of features that is also a robust set for within-class classification. Our method is generic and we introduce it here in the field of breast cancer diagnosis from digital mammography data. The mathematical formalism is based on a generalization of the k-Feature set problem called (α, β)-k-Feature Set problem recently introduced by Cotta, Sloper and Moscato [1]. We work in two steps: first, an optimal (α, β)-k-Feature Set of minimum cardinality is identified, then a set of classification rules using these features are obtained. We obtain the (α, β)-k-Feature Set in two phases: first a series of extremely powerful reduction techniques that do not lose the optimal solution are employed, second a metaheuristic search tries to determine the remaining features to be considered or disregarded. Two algorithms were tested with a public domain digital mammography dataset composed of 71 malignant and 75 benign cases. Based on the results provided by the algorithms, we obtain the classification rules that employ only a subset of the features present in the dataset.
Total citations