U.S. flag

An official website of the United States government, Department of Justice.

A fully continuous machine learning approach to predict the number of contributors in sequence-based DNA profiles

Award Information

Award #
Funding Category
Congressional District
Funding First Awarded
Total funding (to date)

Description of original award (Fiscal Year 2018, $431,917)

Over the past decade, the field of forensic DNA analysis has experienced significant technological advancements including new methods to increase the sensitivity of detection, the development of software to assess the contributors in mixtures and probabilistic genotyping. Despite this advancement, the challenge of mixture interpretation remains. This challenge is independent of the laboratory method and instrument, existing both in currently validated techniques such as fragment analyses using capillary electrophoresis and in emerging technologies such as massively parallel sequencing. Mixture interpretation has thus far received a paucity of attention in emerging and soon-to-be-validated next generation DNA sequencing (massively parallel sequencing). Within the larger challenge of mixture interpretation lies a singular critical component, the assessment of the number of contributors (NOC). The assumption of the NOC provides the underlying support for the vast majority of subsequent assumptions such as the presence of allelic dropout and the pairing of sister alleles. The primary objective of the proposed project is to develop a fully continuous probabilistic machine learning-based tool to predict the number of contributors in sequence-based data sets. The project’s use of machine learning, which is ideally suited for complex, high-dimensional classification problems, complements the impending transition to NGS. By pro-actively developing software tools and solutions specifically tailored to NGS-based mixture interpretation, the forensic science community will be given the proper time and resources to scrutinize, optimize, and potentially implement the methods proposed in this project. Final deliverables include: (1) a machine learning based tool to predict NOC in MPS data, and (2) a DNA sequence and mixture simulation tool (both with windows compatible graphical user interfaces).

This project contains a research and/or development component, as defined in applicable law, and complies with Part 200 Uniform Requirements - 2 CFR 200.210(a)(14).


Date Created: September 27, 2018