U.S. flag

An official website of the United States government, Department of Justice.

Advancing the Statistical Interpretation of Forensic DNA Data Samples

Key questions have arisen about how DNA data are to be interpreted statistically. Two NIJ grants have supported research that has far-reaching implications for testing hypotheses using DNA evidence and expressing confidence in the conclusions reached.
Date Published
October 31, 2019

When University of Washington biostatistician Bruce Weir and his team began their NIJ-supported research into population genetics, they had three goals. First, the researchers wanted to better understand and describe human population structure, which is the degree to which people are genetically differentiated among populations. Second, they wanted to improve their understanding of lineage markers, which are parts of the genome inherited only through one sex, like the Y chromosome. And finally, they wanted to generate more sophisticated interpretations of DNA STR profiles (the characterizations found in CODIS) than are commonly used.

Estimating the probability that a matching pattern could be the result of chance is a critical component of forensic statements, including those involving DNA. A sample of DNA from a crime scene may have a series of features that match those of a suspect, but there’s some probability that they also happen to match those of a number of other people. Thus, a necessary foundation of the use of DNA as a forensic tool has been understanding the natural frequency of each possible profile, as well as knowing how these frequencies change in different populations.

However, such measures are difficult to obtain directly and are more commonly based on estimates of genetic diversity and differentiation between subpopulations, using limited samples of DNA sequences. Weir’s team updated the methods for calculating these estimates with a more refined calculation. They then examined 250 published reports of DNA profile frequencies across 446 global populations and generated diversity and structure measures that can be used in forensic statistical calculations.

The researchers also examined the population genetic aspects of the Y chromosome, which is inherited only through the paternal line (father to son), and examined the calculations behind profile probabilities that are based on short tandem repeats (STRs) found on the Y-chromosome (Y-STRs). These calculations could be greatly simplified if the different Y-STR locations on the Y chromosome could be treated as independently evolving and inherited units. However, the Y chromosome does not have a counterpart in the cell (there’s only one version, from the father), and so it does not undergo the mixing process that other chromosomes experience (“recombination”).

Nonetheless, after examining three publicly-available Y-STR databases, Weir’s team showed that Y-STR profiles appear to have frequencies which behave as if the Y-STR locations were independent. That is, it appears that mutations at each Y-STR happen frequently and independently enough that Y-STRs can be treated mathematically like other STRs. Moreover, Weir showed that using only 10 STRs across the Y chromosome, one could confidently determine “membership in a common male lineage.”

Weir’s team did extensive work on interpreting DNA profiles quantitatively and building mathematical models for calculating the probabilities of different genotypes. Profiles are most commonly based on STRs, portions of the genome in which short sequences are repeated a different number of times in different people. Characterizing the STRs in a forensic sample becomes complicated when it contains mixtures of DNA from an unknown number of people, extremely low concentrations of DNA, or damaged and degraded DNA. Moreover, the nature of STRs make them susceptible to erroneous molecular copying processes in the lab that generate false counts of the number of repeated units, a problem called “stutter.”

Weir’s team examined these effects on generating STR profiles, both empirically and theoretically, and in light of available laboratory tools. They also investigated the mathematics of commonly used statistical calculations and the effects of further complications, such as the presence of family members in DNA sample mixtures. With stutter, one of the papers published by the team demonstrated that this problem is driven in part by the number of sequence repeats, but even when considering this factor, it varies significantly between different STRs.

Together these awards have resulted in more than 50 peer-reviewed scientific papers in both forensic and general science journals. Papers have covered a wide range of subjects, including a 2015 research article in Science that used the new population diversity and structure metrics, in conjunction with DNA sequences from confiscated ivory, to determine elephant poaching hotspots.

About this article

The research described in this article was funded by NIJ grants 2011-DN-BX-K541 and 2014-DN-BX-K028, awarded to the University of Washington. This article is based on the grantee reports 'Population Genetic Issues for Forensic DNA Profiles” by Bruce Weir, principal investigator, Department of Biostatistics, the University of Washington.

Date Published: October 31, 2019