Authors
Muthukumaran Panchaksaram, Lucas Freitas, Mario dos Reis
Publication date
2024
Journal
bioRxiv
Pages
2024.04. 10.588547
Publisher
Cold Spring Harbor Laboratory
Description
In Bayesian molecular-clock dating of species divergences, rate models are used to construct the prior on the molecular evolutionary rates for branches in the phylogeny, with independent and autocorrelated rate models being commonly used. The two class of models, however, can result in markedly different divergence time estimates for the same dataset, and thus Bayesian model selection appears necessary to select for the best rate model and obtain reliable inferences of divergence times. However, the properties of Bayesian rate model selection are not well understood, in particular when the number of sequence partitions analysed increases and when fossil calibrations are misspecified. Furthermore, Bayesian rate model selection is computationally expensive as it requires calculation of marginal likelihoods by MCMC sampling, and therefore methods that can speed up the model selection procedure without compromising its accuracy are desirable. In this study, we use a combination of computer simulations and real data analysis to investigate the statistical behaviour of Bayesian rate model selection and we also explore approximations of the likelihood to improve computational efficiency in large phylogenomic datasets. Our simulations demonstrate that the posterior probability for the correct rate model converges to one as more molecular sequence partitions are analysed and when no fossil calibrations are used, as expected due to asymptotic Bayesian model selection theory. Furthermore, we also show the model selection procedure is robust to slight misspecification of fossil calibrations, and reliable inference of the correct rate model …