U.S. flag

An official website of the United States government, Department of Justice.

Development of a probabilistic multiclass model selection algorithm for high-dimension and complex data encountered in forensic pattern and trace evidence.

Award Information

Award #
Funding Category
Competitive Discretionary
Congressional District
Funding First Awarded
Total funding (to date)

Description of original award (Fiscal Year 2018, $149,942)

The development of quantifiable measures of uncertainty in forensic conclusions has resulted in the apparition of several ad-hoc methods for approximating the weight of the evidence (WoE). In particular, following developments in the field of biometry in the 1990s, forensic researchers have attempted to use similarity measures, or scores, to simplify the approximation of the weight of high-dimensional and complex evidential data.

Score-based methods have been proposed for numerous evidence types such as fingerprints, handwriting, inks, controlled substances, firearms, and voice analysis. Researchers have designed different score-based statistics to approximate the weight of evidence. In general, score-based methods consider the score as a projection onto the real line and focus on different sampling distributions of the score.

While it has been shown that score-based methods for assigning the WoE do not converge to, or share the same basic properties as, the ""true"" WoE, another statistic has recently come to light. Kernel-based methods preserve the necessary properties of the true WoE, while also maintaining the data reduction advantages of score-based methods via similarity scores: kernel methods consider a transformation of the entire feature space by using a score as a kernel function.

This three-year research project proposes to complete the class of kernel-based algorithms initiated under NIJ Awards 2009-DN-BX-K234 (addressed the ""outlier detection"" problem) and 2015-R2-CX-0028 (addressed the ""common source"" problem) by proposing a fully probabilistic model to approach the ""specific source"" problem. This project will assess the ""specific source"" problem though development of a progressive series of models: first the ""specific source"" problem will be addressed for two fixed classes; the model will then be extended to consider multiple (greater than two) fixed sources; finally, a kernel-based model selection algorithm will be developed to consider a single fixed source (the suspected source) plus multiple random sources. This class of algorithms will make use of kernels to handle the high-dimension and complex data that is commonly encountered in forensic pattern and trace evidence.

Success of this project will have major implications for the forensic science, criminal justice, and statistical communities. The proposed class of algorithms will enable the quantification of most evidence types for the benefit of the forensic science and legal communities in the U.S. and worldwide. In addition, this algorithm will provide a naturally probabilistic, multiclass, and compact alternative to current kernel-based pattern recognition methods such as support vector machines, relevance vector machines, and approximate Bayesian computation methods.

"Note: This project contains a research and/or development component, as defined in applicable law," and complies with Part 200 Uniform Requirements - 2 CFR 200.210(a)(14).

Date Created: September 20, 2018