Description of original award (Fiscal Year 2014, $673,087)
As submitted by the proposer: High-throughput DNA sequencing platforms have the potential to revolutionize forensic DNA. These platforms offer low sequencing costs, rapid turnover, large capacity, and the ability to simultaneously analyze a large number of DNA regions. However, the field is yet to develop robust methods to accurately call Short Tandem Repeats (STRs) from these technologies for DNA profiling. We recently devised an algorithm, called lobSTR, to profile STRs from high-throughput sequencing data. We demonstrated the power of lobSTR for sample identification. We used lobSTR to profile STRs on the Y chromosome of DNA samples from anonymous individuals and inferred their surnames by querying online genealogical databases.
Here, we propose to leverage our experience to devise robust algorithms for forensic STR profiling from high-throughput DNA sequencing. Our plan has three aims. First, we will create a gold standard dataset of STR profiles for 384 individuals from populations around the world using capillary electrophoresis (6 months). These individuals have already been subject to whole genome sequencing, saving a massive amount of resources and accelerating the proposed studies. Second, we will use the gold standard dataset to further develop lobSTR to accurately call forensically-relevant STRs from high throughput sequencing data (6 months). Third, we will leverage the ability of sequencing platforms to genotype SNPs in addition to STRs to develop a genetic imputation method to infer missing STRs due to allelic dropouts (1 year). This imputation algorithm has the potential to make major strides in obtaining DNA profiles for forensic samples with low-copy number (LCN) DNA. We plan to make the algorithms and datasets freely available for the entire forensic DNA community. Overall, success in the proposed studies will facilitate the integration of high throughput sequencing methods in crime labs, opening new technological possibilities for forensic DNA analysis. ca/ncf