Authors
Mei Dong, Longhai Li, Man Chen, Anthony Kusalik, Wei Xu
Publication date
2020/8/24
Journal
PLoS One
Volume
15
Issue
8
Pages
e0237779
Publisher
Public Library of Science
Description
Microbiome data consists of operational taxonomic unit (OTU) counts characterized by zero-inflation, over-dispersion, and grouping structure among samples. Currently, statistical testing methods are commonly performed to identify OTUs that are associated with a phenotype. The limitations of statistical testing methods include that the validity of p-values/q-values depend sensitively on the correctness of models and that the statistical significance does not necessarily imply predictivity. Predictive analysis using methods such as LASSO is an alternative approach for identifying associated OTUs and for measuring the predictability of the phenotype variable with OTUs and other covariate variables. We investigate three strategies of performing predictive analysis: (1) LASSO: fitting a LASSO multinomial logistic regression model to all OTU counts with specific transformation; (2) screening+GLM: screening OTUs with q-values returned by fitting a GLMM to each OTU, then fitting a GLM model using a subset of selected OTUs; (3) screening+LASSO: fitting a LASSO to a subset of OTUs selected with GLMM. We have conducted empirical studies using three simulation datasets generated using Dirichlet-multinomial models and a real gut microbiome data related to Parkinson’s disease to investigate the performance of the three strategies for predictive analysis. Our simulation studies show that the predictive performance of LASSO with appropriate variable transformation works remarkably well on zero-inflated data. Our results of real data analysis show that Parkinson’s disease can be predicted based on selected OTUs after the binary transformation, age …
Total citations
20212022202320249895