Optimization of weighted ensembles of genomic prediction models in maize

Tomura S, Powell O, Wilkinson MJ, Lefevre J and Cooper M

in silico Plants
https://doi.org/10.1093/insilicoplants/diag010

Abstract

Ensembles of multiple genomic prediction models have demonstrated improved prediction performance over the individual models contributing to the ensemble. The outperformance of ensemble models is expected from the Diversity Prediction Theorem, which states that for ensembles constructed with diverse prediction models, the ensemble prediction error becomes lower than the mean prediction error of the individual models. While a naïve ensemble-average model provides baseline performance improvement by aggregating all individual prediction models with equal weights, optimizing weights for each individual model could further enhance ensemble prediction performance. The weights can be optimized based on their level of informativeness regarding prediction error and diversity. Here, we evaluated weighted ensemble-average models with three possible weight optimization approaches (linear transformation, Nelder–Mead and Bayesian) using flowering time and tillering traits from two maize nested associated mapping (NAM) datasets: TeoNAM and MaizeNAM. The three proposed weighted ensemble-average approaches improved prediction performance in several of the prediction scenarios investigated. In particular, the weighted ensemble models enhanced prediction performance when the adjusted weights differed substantially from the equal weights used by the naïve ensemble models. For performance comparisons among the weighted ensembles, there was no clear superiority among the proposed approaches in both prediction accuracy and error across the prediction scenarios. Weight optimization for ensembles warrants further investigation to explore the opportunities to improve their prediction performance; for example, integration of a weighted ensemble with a simultaneous hyperparameter tuning process may offer a promising direction for further research.

TOP