Agronomy and Horticulture Department


Date of this Version



Genetics 163:3 (March 2003), pp. 1123–1134


U.S. Government Work


Single-nucleotide polymorphisms (SNPs) provide an abundant source of DNA polymorphisms in a number of eukaryotic species. Information on the frequency, nature, and distribution of SNPs in plant genomes is limited. Thus, our objectives were (1) to determine SNP frequency in coding and noncoding soybean (Glycine max L. Merr.) DNA sequence amplified from genomic DNA using PCR primers designed to complete genes, cDNAs, and random genomic sequence; (2) to characterize haplotype variation in these sequences; and (3) to provide initial estimates of linkage disequilibrium (LD) in soybean. Approximately 28.7 kbp of coding sequence, 37.9 kbp of noncoding perigenic DNA, and 9.7 kbp of random noncoding genomic DNA were sequenced in each of 25 diverse soybean genotypes. Over the greater than 76 kbp, mean nucleotide diversity expressed as Watterson’s [theta] was 0.00097. Nucleotide diversity was 0.00053 and 0.00111 in coding and in noncoding perigenic DNA, respectively, lower than estimates in the autogamous model species Arabidopsis thaliana. Haplotype analysis of SNP-containing fragments revealed a deficiency of haplotypes vs. the number that would be anticipated at linkage equilibrium. In 49 fragments with three or more SNPs, five haplotypes were present in one fragment while four or less were present in the remaining 48, thereby supporting the suggestion of relatively limited genetic variation in cultivated soybean. Squared allele-frequency correlations (r2) among haplotypes at 54 loci with two or more SNPs indicated low genome-wide LD. The low level of LD and the limited haplotype diversity suggested that the genome of any given soybean accession is a mosaic of three or four haplotypes. To facilitate SNP discovery and the development of a transcript map, subsets of four to six diverse genotypes, whose sequence analysis would permit the discovery of at least 75% of all SNPs present in the 25 genotypes as well as 90% of the common (frequency greater than 0.10) SNPs, were identified.