Authors
Andrea Bommert, Jörg Rahnenführer, Michel Lang
Publication date
2017/8/1
Journal
Computational and mathematical methods in medicine
Volume
2017
Publisher
Hindawi
Description
Finding a good predictive model for a high‐dimensional data set can be challenging. For genetic data, it is not only important to find a model with high predictive accuracy, but it is also important that this model uses only few features and that the selection of these features is stable. This is because, in bioinformatics, the models are used not only for prediction but also for drawing biological conclusions which makes the interpretability and reliability of the model crucial. We suggest using three target criteria when fitting a predictive model to a high‐dimensional data set: the classification accuracy, the stability of the feature selection, and the number of chosen features. As it is unclear which measure is best for evaluating the stability, we first compare a variety of stability measures. We conclude that the Pearson correlation has the best theoretical and empirical properties. Also, we find that for the stability assessment …
Total citations
2017201820192020202120222023202412255373
Scholar articles