Description of original award (Fiscal Year 2013, $1,064,073)
As submitted by the proposer: This basic research proposal's goal is the development and initial testing of a highly parallel approach for the genetic analysis of biological mixtures. In forensic casework, there are many examples where multiple individuals contribute cellular material to a sample of interest. As a result, the sample's DNA is a complex genetic mixture containing multiple contributors' genotypes with varying allelic ratios. For any forensic laboratory, the analysis of a mixture and identification of the contributors is a major challenge. Our proposal addresses the complexity of mixtures using a novel next generation sequencing approach and determining the metrics of next generation sequencing that will be required for its forensic application. First, we will develop the statistical framework and algorithms to analyze biological mixtures with expanded sets of genetic markers. Our statistical analysis will consider the complex characteristics of a biological mixture and the utility of additional short tandem repeat (STRs), single nucleotide polymorphisms (SNPs) and insertion deletions (indels) in delineating the composition of mixtures. Statistical modeling will factor in multiple contributors of varying ratios, allelic content of different classes of genetic markers, frequency of heterozygotes and experimental variance related to low template amounts. Overall, this will establish a rigorous statistical basis to improve and assess the performance benefits of an expanded genetic marker sets in biological mixtures. In particular, we show that to precisely identify individuals contributing <10% to DNA mixtures, a much larger set of markers is required than current standard. Second, we will develop and apply an innovative next generation DNA sequencing methodology to genotype tens of thousands of STR and other markers across the entire human genome. This analysis will include the CODIS and European Standard Set. We will use a 'whole genome' bioinformatics approach to identify additional candidate genetic markers with a special emphasis on characterizing the haplotypes of dual sets of proximal STRs and SNP/indels. These candidate marker sets must fulfill the requisites of sequencing primer specificity, adequate heterozygote frequency and other critical metrics. Once we have identified these markers and potential primer sequences, we will sequence thousands of candidate marker loci, assess the performance of primer sets targeting these loci, determine the genotype and generate data for allele frequencies using a large population. We will also produce experimentally vetted primers for tens of thousands of genetic markers across the human genome. These sequences can be used for either standard multiplexed DNA typing assays or highly parallel analysis with next generation DNA sequencing. Third, integrating our statistical algorithms for complex genetic mixtures and highly parallel genetic marker analysis, we will conduct a pilot study of biological mixtures. We will analyze a series of test mixtures with each having multiple contributors of varying ratios and DNA template amounts. Our analysis will use an expanded marker set based on our previous results. For rigorous performance assessment, we will use this highly parallel analysis on a series of blinded test mixtures. The results will be compared to both the predicted composition and experimental data generated from a collaborating forensics laboratory. The analysis of these mixtures will provide validation metrics for the highly parallel sequencing analysis. This study will also provide experimental insight into the potential advantages and issues with next generation sequencing in mixture analysis. The resulting validation data will be made openly available to forensics researchers.