Ensemble-based genomic prediction for maize flowering time improves prediction accuracy and reveals novel insights into trait genetic variation
Tomura S, Powell O, Wilkinson MJ and Cooper M
G3 Genes|Genomes|Genetics
Abstract
While various genomic prediction models have been evaluated for their potential to accelerate genetic gain for multiple traits, no individual genomic prediction model has outperformed all others across all applications. As an alternative approach, ensembles of multiple individual genomic prediction models can be applied to utilize the complementary strengths of individual prediction models and offset the prediction errors of each. We used the EasiGP (Ensemble AnalySis with Interpretable Genomic Prediction) pipeline to investigate the performance of an ensemble approach, targeting flowering-time traits measured in 2 maize nested association mapping datasets. For both datasets, the ensemble-based prediction approach achieved higher prediction accuracy and lower prediction error across the flowering-time traits compared to each individual model. Multiple genomic regions known to contain key flowering-time-related genes were repeatedly included as features across individual genomic prediction models, indicating the models successfully captured SNPs as features that are associated with genomic regions known to contain flowering-time genes. Although repeatability was high for some genomic regions, estimated marker effects varied across many genomic regions, suggesting that the models might also have captured different aspects of the genetic variation underlying the traits. The ensemble combination of the diverse views likely contributed to the improvement of prediction performance by the ensemble-based approach over the individual prediction models. Ensemble-based prediction can be applied to overcome limitations observed in the continuous exploration for the best individual genomic prediction models that can consistently achieve the highest prediction performance, thereby potentially contributing to improved prediction accuracy for applications in crop breeding.

