Description of original award (Fiscal Year 2021, $484,798)
Deconvoluting mixture samples is one of the most challenging problems confronting DNA forensic laboratories. Efforts have been made to provide solutions regarding mixture interpretation. The probabilistic interpretation of the Short Tandem Repeat (STR) profiles can increase the number of complex mixtures that can be analyzed. A portion of complex mixture profiles, particularly for mixtures with a high number of contributors, are still being deemed uninterpretable. Novel forensic markers, such as Single Nucleotide Polymorphisms (SNPs), Insertion-Deletion (Indels), and microhaplotypes, also have been proposed to allow for better mixture deconvolution. However, these markers either have lower discrimination power compared with STRs and are not compatible with CODIS. The short-read sequencing (SRS) technologies can facilitate mixture interpretation by identifying the intra-allelic variations within STRs. Unfortunately, the limited sizes of STR markers and the short-reads limit the number of alleles that can be attained per STR.
The latest long-read sequencing (LRS) technologies (e.g., Pacific Biosciences or PacBio) can overcome this limit in some samples and sequence larger DNA fragments (including STRs, SNPs, and Indels) with definitive phasing. Based on the high-fidelity PacBio SMRT sequencing technologies, the proposal herein will develop a novel CODIS compatible forensic marker, macrohaplotype, which combines CODIS STR and flanking variants to offer extremely high numbers of haplotypes and hence very high discrimination power per marker and can substantially improve mixture interpretation capabilities. In addition, Unique Molecular Identifiers (UMIs) will be tagged to DNA templates during library preparation, which can reduce errors and quantitative bias introduced by amplification. In this proposed project, with macrohaplotype markers designed, a PacBio SMRT sequencing workflow and a bioinformatics pipeline will be developed and optimized with known genome cell line samples. Then, a macrohaplotype database will be built with public data, imputed haplotypes, and cell line sequences. The statistical performance of this macrohaplotype panel for mixture interpretation will be evaluated with simulation studies and empirical samples (both mock and real casework mixtures). The outcome of this effort will improve the capabilities to interpret DNA mixture evidence, particularly the complex mixtures with high numbers of contributors. Thus, more biological evidence will be analyzed successfully, which in turn will result in more and better investigative leads to help solve crimes.