A subfunctionalisation model of gene family evolution predicts balanced tree shapes

Diao J, O’Reilly MM and Holland B

Molecular Phylogenetics and Evolution


We consider a subfunctionalisation model of gene family evolution. A family of n genes that perform z functions is represented by an n×z binary matrix Yt where a 1 in the ijth position indicates that gene i can perform function jYt evolves according to a continuous time Markov chain (CTMC) that represents the processes of gene duplication, coding region loss and regulatory region loss with the restriction that each function is protected by selection, meaning that each column in the matrix must contain at least one 1.

We generate gene trees based on the CTMC {Yt,t⩾0}. We analyse the long-run behaviour of the model and specify the conditions where we expect gene trees to continue to grow and where we expect them to have a stable number of genes. We show that different choices of rate parameters for the processes of duplication and loss lead to different tree shapes as measured by two common tree-shape statistics: the β-statistic for measuring tree balance and the γ-statistic for assessing diversification rate. We use an extension of β that is estimated from sets of trees. This extension is less biased compared to using the average β value for individual trees.

When the rate of gene duplication dominates the rate of gene loss, the process is unstable and the distribution of tree shapes is close to following the uniform ranked tree shape (URT) distribution. However, when the process is stable, gene trees are predicted to have positive values of β indicating balanced trees and negative values of γ indicating that diversification occurs more towards the root of the tree. The results of our analyses suggest that comparing the tree-shape statistics of empirical gene-trees to the predictions presented here will provide a test of the subfunctionalisation model.