U.S. Department of Commerce
Date of this Version
2008
Citation
Can. J. Fish. Aquat. Sci. 65: 1475–1486 (2008); doi:10.1139/F08-049
Abstract
Estimating the accuracy of genetic stock identification (GSI) that can be expected given a previously collected baseline requires simulation. The conventional method involves repeatedly simulating mixtures by resampling from the baseline, simulating new baselines by resampling from the baseline, and analyzing the simulated mixtures with the simulated baselines. We show that this overestimates the predicted accuracy of GSI. The bias is profound for closely related populations and increases as more genetic data (loci and (or) alleles) are added to the analysis. We develop a new method based on leave-one-out cross validation and show that it yields essentially unbiased estimates of GSI accuracy. Applying both our method and the conventional method to a coastwide baseline of 166 Chinook salmon (Oncorhynchus tshawytscha) populations shows that the conventional method provides severely biased predictions of accuracy for some individual populations. The bias for reporting units (aggregations of closely related populations) is moderate, but still present.