Authors
Andrea Martina Bommert
Publication date
2021/1/20
Institution
TU Dortmund University
Description
In this thesis, four aspects connected to feature selection are analyzed: Firstly, a benchmark of filter methods for feature selection is conducted. Secondly, measures for the assessment of feature selection stability are compared both theoretically and empirically. Some of the stability measures are newly defined. Thirdly, a multi-criteria approach for obtaining desirable models with respect to predictive accuracy, feature selection stability, and sparsity is proposed and evaluated. Fourthly, an approach for finding desirable models for data sets with many similar features is suggested and evaluated. For the benchmark of filter methods, 20 filter methods are analyzed. First, the filter methods are compared with respect to the order in which they rank the features and with respect to their scaling behavior, identifying groups of similar filter methods. Next, the predictive accuracy of the filter methods when combined with a predictive model and the run time are analyzed, resulting in recommendations on filter methods that work well on many data sets. To identify suitable measures for stability assessment, 20 stability measures are compared based on both theoretical properties and on their empirical behavior. Five of the measures are newly proposed by us. Groups of stability measures that consider the same feature sets as stable or unstable are identified and the impact of the number of selected features on the stability values is studied. Additionally, the run times for calculating the stability measures are analyzed. Based on all analyses, recommendations on which stability measures should be used in future analyses are made. When searching for a good …
Total citations
20212022202320243243