Agronomy and Horticulture Department


Date of this Version



G3: Genes|Genomes|Genetics Early Online, published on May 31, 2016 as doi:10.1534/g3.116.031443


© The Author(s) 2013. Published by the Genetics Society of America.


The identification and mobilization of useful genetic variation from germplasm banks for use in breeding programs is critical for future genetic gain and protection against crop pests. Plummeting costs of next-generation sequencing and genotyping is revolutionizing the way in which researchers and breeders interface with plant germplasm collections. An example of this is the high density genotyping of the entire USDA Soybean Germplasm Collection. We assessed the usefulness of 50K SNP data collected on 18,480 domesticated soybean (G. max) accessions and vast historical phenotypic data for developing genomic prediction models for protein, oil, and yield. Resulting genomic prediction models explained an appreciable amount of the variation in accession performance in independent validation trials, with correlations between predicted and observed reaching up to 0.92 for oil and protein and 0.79 for yield. The optimization of training set design was explored using a series of cross-validation schemes. It was found that the target population and environment need to be well represented in the training set. Secondly, genomic prediction training sets appear to be robust to the presence of data from diverse geographical locations and genetic clusters. This finding, however, depends on the influence of shattering and lodging, and may be specific to soybean with its presence of maturity groups. The distribution of 7,608 non-phenotyped accessions was examined through the application of genomic prediction models. The distribution of predictions of phenotyped accessions was representative of the distribution of predictions for non-phenotyped accessions, with no non-phenotyped accessions being predicted to fall far outside the range of predictions of phenotyped accessions.