View article

[HTML] from frontiersin.org

Feature selection stability and accuracy of prediction models for genomic prediction of residual feed intake in pigs using machine learning

Authors

Miriam Piles, Rob Bergsma, Daniel Gianola, Hélène Gilbert, Llibertat Tusell

Publication date

2021/2/22

Journal

Frontiers in genetics

Volume

Pages

611506

Publisher

Frontiers Media SA

Description

Feature selection (FS, i.e., selection of a subset of predictor variables) is essential in high-dimensional datasets to prevent overfitting of prediction/classification models and reduce computation time and resources. In genomics, FS allows identifying relevant markers and designing low-density SNP chips to evaluate selection candidates. In this research, several univariate and multivariate FS algorithms combined with various parametric and non-parametric learners were applied to the prediction of feed efficiency in growing pigs from high-dimensional genomic data. The objective was to find the best combination of feature selector, SNP subset size, and learner leading to accurate and stable (i.e., less sensitive to changes in the training data) prediction models. Genomic best linear unbiased prediction (GBLUP) without SNP pre-selection was the benchmark. Three types of FS methods were implemented: (i) filter methods: univariate (univ.dtree, spearcor) or multivariate (cforest, mrmr), with random selection as benchmark; (ii) embedded methods: elastic net and least absolute shrinkage and selection operator (LASSO) regression; (iii) combination of filter and embedded methods. Ridge regression, support vector machine (SVM), and gradient boosting (GB) were applied after pre-selection performed with the filter methods. Data represented 5,708 individual records of residual feed intake to be predicted from the animal’s own genotype. Accuracy (stability of results) was measured as the median (interquartile range) of the Spearman correlation between observed and predicted data in a 10-fold cross-validation. The best prediction in terms of accuracy …

Total citations

Cited by 40

20222023202416 15 9

Scholar articles

Feature selection stability and accuracy of prediction models for genomic prediction of residual feed intake in pigs using machine learning

M Piles, R Bergsma, D Gianola, H Gilbert, L Tusell - Frontiers in genetics, 2021