U.S. flag

An official website of the United States government, Department of Justice.

Development of a Probabilistic Multi-Class Model Selection Algorithm for High-Dimensional and Complex Data

NCJ Number
310345
Date Published
2021
Length
205 pages
Annotation

In this study, researchers developed a probabilistic multi-class model selection algorithm for high-dimensional and complex data.

Abstract

This dissertation introduces a probabilistic multi-class model selection algorithm that serves to complete the class of kernel-based algorithms initiated under NIJ Awards 2009-DN-BX-K234 and 2015-R2-CX-0028, which addressed the “outlier detection” and “common source” problems. The authors propose a fully probabilistic model for addressing the “specific source” problem in three progressive models: first, the problem is addressed for a pair of fixed sources; next, the two-class model is extended to consider multiple fixed sources; finally, a kernel-based model selection algorithm is developed to consider a single fixed source juxtaposed with multiple random sources. This class of algorithms relates pairs of high-dimensional, complex objects through a kernel function to obtain a vector of within-source and between-source scores and capitalizes on the variability that exists within and between these sets of scores. The model makes no assumptions about the type or dimension of data to which it can be applied and can be tailored to any type of data by modifying the kernel function at the core of the model. In addition, this algorithm provides a naturally probabilistic, multi-class, and compact alternative to current kernel-based pattern recognition methods such as support vector machines, relevance vector machines, and approximate Bayesian computation methods. The development of quantifiable measures of uncertainty in forensic conclusions has resulted in the debut of several ad-hoc methods for approximating the weight of evidence (WoE). In particular, forensic researchers have attempted to use similarity measures, or scores, to approximate the WoE characterized by high-dimensional and complex data. Score-based methods have been proposed to approximate the WoE for numerous evidence types. In general, score-based methods consider the score as a projection onto the real line.

Date Published: January 1, 2021