U.S. flag

An official website of the United States government, Department of Justice.

Forensic Footwear Reliability: Part II - Range of Conclusions, Accuracy, and Consensus

NCJ Number
Journal of Forensic Sciences Dated: September 2020
Date Published
September 2020
12 pages
This article, the second of a three part series, presents the findings of a reliability study that assessed the extent of agreement between forensic footwear examiners in the United States.

Between February 2017 and August 2018, West Virginia University conducted a reliability study to determine expert performance among forensic footwear examiners in the United States. Throughout the study's duration, 70 examiners each performed 12 comparisons and reported a total of 840 conclusions. In order to assess the accuracy of conclusions, the similarities and differences between mated and non-mated pairs were evaluated according to three criteria: (i) inherent agreement/disagreement in class, wear, and randomly acquired features; (ii) limitations as a function of questioned impression quality, clarity, and totality; and (iii) adherence to the Scientific Working Group for Shoeprint and Tire Tread Evidence (SWGTREAD) 2013 conclusion standard. Using these criteria, acceptable/expected categorical conclusions were defined. Preliminary results from this study are divided into a series of three summaries. This manuscript (Part II) reports accuracy and reproducibility. For mated pairs, accuracy equals 76.3 percent +/- 13.0 percent (median of 78.6 percent, and a 90 percent confidence interval between 72.2 percent and 80.0 percent). For non-mated pairs, accuracy equals 87.4 percent +/- 9.24 percent (median of 91.4 percent and a 90 percent confidence interval between 84.7 percent and 89.8 percent). In addition, the community assessed agreement (denoted by IQR) of reported results equals the research team's accepted/expected conclusions for 10 out of 12 comparisons. In terms of reproducibility, the 90 percent confidence interval for consensus was computed and found to equal 0.71-0.86 (median of 0.77) for the combined dataset. Although based on a limited sample size, these results provide a baseline.

Date Published: September 1, 2020