Description of original award (Fiscal Year 2014, $34,999)
As Submitted By Proposer:
Repetitive sequences in the human genome called Short Tandem Repeats (STRs) are used in human identification for forensic purposes. An STR profile developed from a biological sample collected at a crime scene is compared with that of a person of interest or run against a database to check for a match. Interpretation of STR profiles is problematic because of dropout, allele overlap and PCR amplification artifacts like stutter. The goal of this research is to develop computational methods and tools for analysis of STR profiles that are robust to these phenomena and that utilize quantitative peak height information captured in profiles. These methods are expected to improve significantly on existing methods for analysis of STR profiles, particularly in cases of low amounts of template DNA or where there are many contributors. The first aim of the proposed thesis research is to characterize the signal, noise and stutter peak heights and study their dependence on template DNA amount. Our second aim is to develop a method to identify the number of contributors (NOC) to a DNA sample. The NOC is needed when determining whether a known should be included as a contributor and to calculate the Likelihood Ratio (LR) in the case of an inclusion. In preliminary work, we developed a computational method called NOCIt that calculates the a posteriori probability (APP) on the number of contributors to a forensic sample. NOCIt takes into account signal peak heights, population allele frequencies, allele dropout and stutter. On the samples tested, NOCIt had an accuracy of 86% and was 16% more accurate than the best pre-existing method to identify the number of contributors. In the proposed project, we will reduce the running time of NOCIt by developing a faster method based on a Metropolis-Hastings algorithm. The ultimate objective of mixture interpretation is to determine whether a person of interest contributed to the sample. Though methods have been developed to tackle this problem by deconvolving mixtures, they are generally not suitable for complex mixtures that contain significant amounts of dropout, stutter and allele sharing. Our third aim is to develop a computational tool (MatchIt) to directly calculate the LR for a person of interest treating other contributors, if any, as interference and to compute a p value for the LR, which is the probability of observing an LR at least as large as the one observed from a random person of interest. We will develop user-friendly interfaces to run NOCIt and MatchIt and distribute them on-line to facilitate their use in the forensics community.