U.S. flag

An official website of the United States government, Department of Justice.

PACE: Probabilistic Assessment for Contributor Estimation-A Machine Learning-based Assessment of the Number of Contributors in DNA Mixtures

NCJ Number
Date Published
March 2017
10 pages
This study proposes a probabilistic approach for estimating the number of contributors in a DNA mixture that leverages the strengths of machine learning.
The deconvolution of DNA mixtures remains one of the most critical challenges in the field of forensic DNA analysis. In addition, of all the data features required to perform such deconvolution, the number of contributors in the sample is widely considered the most important, and, if incorrectly chosen, the most likely to negatively influence the mixture interpretation of a DNA profile. Unfortunately, most current approaches to mixture deconvolution require the assumption that the number of contributors is known by the analyst, an assumption that can prove to be especially faulty when faced with increasingly complex mixtures of three or more contributors. To assess the approach proposed in the current project, researchers compared classification performances of six machine learning algorithms and evaluated the model from the top-performing algorithm against the current state of the art in the field of contributor number classification. Overall, results showed just over 98-percent accuracy in identifying the number of contributors in a DNA mixture of up to four contributors. Comparative results showed three-person mixtures had a classification accuracy improvement of just over 6 percent compared to the current best-in-field methodology, and that four-person mixtures had a classification accuracy improvement of just over 20 percent. The Probabilistic Assessment for Contributor Estimation (PACE) also achieved classification of mixtures of up to four contributors in less than 1 second, using a standard laptop or desktop computer. Considering the high classification accuracy rates, as well as the significant time commitment required by the current state- of-the art model versus seconds required by a machine-learning-derived model, the approach described in this article provides a promising means of estimating the number of contributors and, subsequently, will lead to improved DNA mixture interpretation. (Publisher abstract modified)

Date Published: March 1, 2017