Chimeric reference panels for genomic imputation
Zhou M, James ME, Engelstädter J and Ortiz-Barrientos D
Abstract
Despite transformative advances in genomic technologies, missing data remain a fundamental constraint that limits the full potential of genomic research across biological systems. Genotype imputation offers a remedy by inferring unobserved genotypes from observed data. However, conventional imputation methods typically rely on external reference panels constructed from complete genome sequences of hundreds of individuals, a costly approach largely inaccessible for nonmodel organisms. Moreover, these methods generally overlook novel genomic positions not captured in existing panels. To overcome these limitations, we developed Retriever, a method for constructing a chimeric reference panel that enables genotype imputation without the need for an external reference panel. Retriever constructs a chimeric reference panel directly from the target samples using a sliding window approach to identify and retrieve genomic partitions with complete data. By exploiting the complementary distribution of missing data across samples, Retriever assembles a panel that preserves local patterns of linkage disequilibrium and captures novel variants. When the Retriever-constructed panels are used with Beagle for genotype imputation, Retriever consistently achieves accuracy exceeding 95% across diverse datasets, including plants, animals, and fungi. By eliminating the need for costly external panels, Retriever provides an accessible and cost-effective solution that broadens the application of genomic analyses across various species.

