Phenotypic prediction based on genotypic make-up is a key challenge in plant improvement. It relies on access to large data sets on breeding populations and advanced genome-to-phenome (G2P) prediction methods. One of the challenges of public sector research on plant breeding is in the scale of investment and experimentation possible. Universities do not have the funding capacity to do testing of large breeding populations across hundreds of environments and companies cannot afford to share data from such experiments in an open access environment to the public sector. This has hampered the capacity of Government funded research in Australia and world-wide to support breeding research and to deliver innovation in our region and world-wide. The Centre has addressed this issue for the research community by producing realistic yet synthetic data sets on phenotypic prediction for large breeding populations.
A synthetic sorghum dataset has been developed by Owen Powell, Greg McLean, Jason Brider, Graeme Hammer, and Mark Cooper, and is freely available for researchers to test analytical methods and different G2P prediction strategies. The dataset contains performance (phenotypic) predictions for 1,500 genotypes with their synthetic genotypic fingerprint data based on trait variation. Phenotypic predictions were generated for two Australian production sites in contrasting environmental conditions, with a particular focus on patterns of water use through the growing season. This was made possible by using the sorghum crop growth and development model in the Agricultural Production Systems sIMulator (APSIM), connected with the synthetic genetic variation for the many traits and intermediate traits represented in the sorghum model. The dataset takes into account the effects of genomic and genotypic variation for phenology (timing of developmental events such as flowering), leaf size, and propensity to tiller (branch), across a range of plant densities in environments contrasting in temperature, radiation, rainfall, and soil attributes. The dataset poses challenging prediction problems for research teams due to large contributions to plant performance from underlying biological interactions, relevant for Australian cropping systems. The open access resource will enable the Genotype x Environment x Management (GxExM) community to investigate prediction algorithm development to accelerate rates of crop improvement.
This dataset aims to help researchers highlight the most appropriate tools to address specific research questions and develop predictive tools that get the right answers for the right reasons. The dataset is a powerful resource for understanding the key ingredients to develop informationrich datasets for predictive breeding and will deliver major impacts as a teaching tool that could also be used in population genetics. The hope is that many in the GxExM community will utilise the dataset to test their methods, thus allowing the use of a common dataset to test the efficacy of a range of methods. That way researchers will be able to use what is known, to expose what is not known. The dataset will continue to be improved and updated based on community feedback.
Owen Powell and Mark Cooper have also made another dataset available as a case study of shoot branching in a model species. In concordance with the sorghum dataset, performance records are available for intermediate traits and end-point traits that were under divergent selection for 30 generations. The experiments were replicated five times, resulting in 1,200 populations with genetic fingerprint, genotype and phenotype data.
ACCESS THE DATASETS: