U.S. flag

An official website of the United States government, Department of Justice.

Developing Guidelines for the Application of Multivariate Statistical Analysis to Forensic Evidence

NCJ Number
Date Published
175 pages

This report reviews research in investigating and demonstrating the application of multivariate statistical procedures to forensically relevant data, and highlight the advantages and limitations that must be considered for these procedures to be used in forensic investigations.


During this research three separate and diverse data sets were generated. The first contained chromatographic data of ignitable liquid reference standards and simulated fire debris samples. The second contained spectral data of controlled substance reference standards and simulated street samples. The third contained ribosomal RNA gene sequence data of bacteria in soil samples from different habitats. For each data set, statistical procedures were used to associate, or classify, samples to the corresponding reference standard. The chromatographic and spectral data sets were initially probed using principal components analysis (PCA) and hierarchical cluster analysis (HCA). These two procedures are exploratory in nature and are used to identify patterns in the data, enabling association of similar samples with distinction from different samples. While there are advantages and disadvantages for each procedure, greater success in associating the simulated sample to the appropriate reference standard was achieved using HCA. The same two data sets were further probed using two classification procedures: soft independent modeling of class analogy and k-nearest neighbors. The sequencing data were analyzed with PCA and nonmetric multidimensional scaling (NMDS). NMDS was able to cluster replicate samples of each soil within standard error whereas; only mild association of replicates was possible using PCA. Aspects of this research have been disseminated to the wider forensic community through poster and oral presentations; manuscripts for each data type are in preparation; and tutorials outlining the application, interpretation, and considerations for these data analysis procedures are currently being developed.

Date Published: January 1, 2014