U.S. Department of Justice


Date of this Version



Forensic Science International: Genetics 34 (2018) 197–205


© 2018 The Authors.

Open access



Some of the expected advantages of next generation sequencing (NGS) for short tandem repeat (STR) typing include enhanced mixture detection and genotype resolution via sequence variation among non-homologous alleles of the same length. However, at the same time that NGS methods for forensic DNA typing have advanced in recent years, many caseworking laboratories have implemented or are transitioning to probabilistic genotyping to assist the interpretation of complex autosomal STR typing results. Current probabilistic software programs are designed for length-based data, and were not intended to accommodate sequence strings as the product input. Yet to leverage the benefits of NGS for enhanced genotyping and mixture deconvolution, the sequence variation among same-length products must be utilized in some form. Here, we propose use of the longest uninterrupted stretch (LUS) in allele designations as a simple method to represent sequence variation within the STR repeat regions and facilitate – in the nearterm – probabilistic interpretation of NGS-based typing results. An examination of published population data indicated that a reference LUS region is straightforward to define for most autosomal STR loci, and that using repeat unit plus LUS length as the allele designator can represent greater than 80% of the alleles detected by sequencing. A proof of concept study performed using a freely available probabilistic software demonstrated that the LUS length can be used in allele designations when a program does not require alleles to be integers, and that utilizing sequence information improves interpretation of both single-source and mixed contributor STR typing results as compared to using repeat unit information alone. The LUS concept for allele designation maintains the repeat-based allele nomenclature that will permit backward compatibility to extant STR databases, and the LUS lengths themselves will be concordant regardless of the NGS assay or analysis tools employed. Further, these biologically based, easy-to-derive designations uphold clear relationships between parent alleles and their stutter products, enabling analysis in fully continuous probabilistic programs that model stutter while avoiding the algorithmic complexities that come with string based searches. Though using repeat unit plus LUS length as the allele designator does not capture variation that occurs outside of the core repeat regions, this straightforward approach would permit the large majority of known STR sequence variation to be used for mixture deconvolution and, in turn, result in more informative mixture statistics in the near term. Ultimately, the method could bridge the gap from current length-based probabilistic systems to facilitate broader adoption of NGS by forensic DNA testing laboratories.