Modeling gene content across a phylogeny to determine when genes become associated

Diao J, O’Reilly MM, and Holland BR

Stochastic Models
https://doi.org/10.1080/15326349.2024.2330082

Abstract

We consider a model for inferring functional links between genes. We begin with the simple case of two genes whose presence or absence evolves stochastically along a phylogenetic tree. We develop a hidden Markov model where the hidden states of the model correspond to whether or not the genes perform a joint function. In the case that two genes do perform a joint function, the rates of gain or loss of each gene depend on the presence or absence of the other gene. Otherwise, those two genes are assumed to be gained and lost independently. Using simulation, we investigate the conditions under which the package corHMM can infer the hidden state correctly, and we also investigate when the Akaike information criterion (AIC) has the power to reject the simpler model when it is incorrect. We find that we can more accurately determine the dependent and independent rate class regimes when the trees have more tips and when the differences in the two rate classes are larger. We find the accuracy of corHMM is not overly affected by whether or not there are multiple transitions between the rate classes or just a single transition. We show how the two-gene case can be extended to a more general n-gene model with a level-dependent quasi-birth-and-death (LD-QBD) framework. We assume that the level n of the QBD corresponds to the number of genes that are required for some beneficial function, and the phases within each level record the presence/absence of particular genes.

TOP