U.S. flag

An official website of the United States government, Department of Justice.

Hybrid Machine Learning Approach for DNA Mixture Interpretation

NCJ Number
Date Published
June 2016
Michael A. Marciano; Kevin S. Sweder
Publication Type
Research (Applied/Empirical), Report (Study/Research), Report (Grant Sponsored), Program/Project Description
Grant Number(s)
The primary goal of this project was to assess a new machine learning-based method of DNA mixture deconvolution, with the secondary goal of developing a superior model for estimating the number of contributors in a DNA mixture, which is a vital component of DNA mixture deconvolution.
Machine learning refers to the development of systems that can learn from data. A machine learning algorithm can, after exposure to an initial set of data, evaluate new, previously unseen examples and relate them to the initial "training" data. It is ideally suited for classification problems that involve implicit patterns, and it is most effective when used in conjunction with large amounts of data. Although machine learning has not previously been used in DNA mixture analysis, it is well-suited to such analysis because of two key problem characteristics. First, there is a large repository of human DNA mixture data in electronic format. Second, patterns in such data are often obscure and beyond the capability of manual analysis; however, they can be statistically evaluated by using one or more machine learning algorithms. The system was trained, tested, and validated using electronic data obtained from 1,405 non-simulated DNA mixture samples composed of 1-4 contributors and generated from a combination of 16 individuals. This report concludes that the proposed method for DNA mixture deconvolution, including determining the number of contributors, is a robust and reproducible method that was developed using an expansive AmpFISTR Identifiler PCR Amplification Kit. A description of materials and methods covers data acquisition and exportation, the locus-sample-specific threshold (LSST) calculation, data partitioning, feature scaling, feature selection, and machine learning algorithms. A more detailed discussion of the optimized system will be addressed in the Final Report. 10 figures, 8 tables, and 21 references
Date Created: January 9, 2019