U.S. flag

An official website of the United States government, Department of Justice.

Statistical evaluation of forensic sequencing profiles

Award Information

Award #
Funding Category
Competitive Discretionary
Congressional District
Past Project Period End Date
Funding First Awarded
Total funding (to date)

Description of original award (Fiscal Year 2020, $107,205)

Forensic DNA interpretation is currently centered on the analysis of short tandem repeats (STR), relying on capillary electrophoresis (CE) to gain access to the allele numbers contained in a DNA sample. To evaluate such DNA evidence profiles, match probabilities can be calculated and incorporated into a likelihood ratio (LR) model, an approach that is widely accepted in forensic evidence evaluations. With the introduction of next generation sequencing (NGS), more discrimination is provided through the ability of this technique to reveal variation within the STRs. Therefore, there is a need to re-examine population genetics parameters and statistical models to facilitate the implementation of NGS methods. NGS data allow a better discrimination by effectively being able to detect more alleles than previously possible with standard CE approaches. In addition, an additional degree of variation can be observed in the form of single nucleotide polymorphisms (SNPs) that appear in the flanking regions adjacent to the repeat motif. Since the observed data for NGS-based methods as well as the underlying biological processes differ from CE-based results, statistical models and parameters need to be re-evaluated. Although the likelihood ratio is recommended as a metric of evaluation, often a random match probability is reported. There are three main evaluation frameworks possible: a binary, semi-continuous, or fully continuous approach. Since the latter may be hard to interpret due to the complex nature of the models and confusion still exists about the difference between random match probabilities and likelihood ratios, there is a need to address this confusion and develop models that are less complex before implementing fully continuous models for NGS data. This research will focus on the impact of sequencing data on statistical models and underlying parameters. We will evaluate random match probabilities and likelihood ratios and characterize the impact for different approaches, with the goal to provide sensible models for NGS data to allow a better understanding of the underlying principles and provide guidance in gradually adapting to such data. The focus will be on autosomal data, but some attention will also be given to SNP data and Y-chromosomal markers. Specifically, the proposed research will focus on 1) providing estimates for population genetic parameters for sequencing data; 2) investigating the impact of sequencing data on random match probabilities and likelihood ratios; and 3) recommendations on how to incorporate additional markers, such as flanking region SNPs, in statistical evaluations.
Note: This project contains a research and/or development component, as defined in applicable law, and complies with Part 200 Uniform Requirements - 2 CFR 200.210(a)(14). CA/NCF

Date Created: September 18, 2020