Authors
Yun Xu, Royston Goodacre
Publication date
2018/7/1
Journal
Journal of Analysis and Testing
Volume
2
Issue
3
Pages
249-262
Publisher
Springer Singapore
Description
Model validation is the most important part of building a supervised model. For building a model with good generalization performance one must have a sensible data splitting strategy, and this is crucial for model validation. In this study, we conducted a comparative study on various reported data splitting methods. The MixSim model was employed to generate nine simulated datasets with different probabilities of mis-classification and variable sample sizes. Then partial least squares for discriminant analysis and support vector machines for classification were applied to these datasets. Data splitting methods tested included variants of cross-validation, bootstrapping, bootstrapped Latin partition, Kennard-Stone algorithm (K-S) and sample set partitioning based on joint XY distances algorithm (SPXY). These methods were employed to split the data into training and validation sets. The estimated …
Total citations
2019202020212022202320241954133223191126