In an effort to contribute to comprehensive understanding of the diversity and population distribution of sequence-based short tandem repeat (STR) alleles, the current study analyzed 786 samples of individuals from different population groups, including four of the most commonly encountered in the United States.
DNA samples were ampliﬁed with the PowerSeq™ Auto/Y System Prototype Kit (Promega Corp.), and sequencing was performed on an Illumina® MiSeq instrument. Sequence data were analyzed using a bioinformatics processing tool, Altius. For additional data analysis and profile comparison, capillary electrophoresis (CE) size-based STR genotypes were generated for a subset of individuals, and where possible, also with a second commercially available MPS STR assay. Autosomal STR loci were analyzed and frequencies were calculated based on sequence composition. Also, population genetics studies were performed, with Hardy–Weinberg equilibrium, polymorphic information content (PIC), and observed and expected heterozygosity all assessed. Overall, sequence-based allelic variants of the repeat region were observed in 20 out of 22 different STR loci commonly used in forensic DNA genotyping, with the highest number of sequence variation observed at locus D12S391. The highest increase in allelic diversity and in PIC through sequence-based genotyping was observed at loci D3S1358 and D8S1179. A detailed sequence analysis, such as the one performed in the present study, is important in understanding the diversity of sequence-based STR alleles across different populations and to demonstrate how such allelic variation can improve statistics used for forensic casework. (publisher abstract modified)