U.S. flag

An official website of the United States government, Department of Justice.

Expert Algorithm for Substance Identification Using Mass Spectrometry: Statistical Foundations in Unimolecular Reaction Rate Theory

NCJ Number
307429
Journal
Journal of the American Society for Mass Spectrometry Volume: 34 Issue: 7 Dated: 2023 Pages: 1248-1262
Date Published
2023
Length
15 pages
Annotation

This report describes a study aimed at resolving how to accurately identify an organic substance from its mass spectrum, and is divided into two parts: the first part describes how the RRKM theory predicts that many branching ratios in replicate electron-ionization mass spectra will provide approximately linear correlations when analysis conditions change within or between instruments; and the second part describes how general linear regression modeling can serve as the basis for binary classification and reliable identification of cocaine from its diastereomers and all other known negatives.

Abstract

This study aims to resolve one of the longest-standing problems in mass spectrometry, which is how to accurately identify an organic substance from its mass spectrum when a spectrum of the suspected substance has not been analyzed contemporaneously on the same instrument. Part one of this two-part report describes how the Rice-Ramsperger-Kassel-Marcus (RRKM) theory predicts that many branching ratios in replicate electron-ionization mass spectra will provide approximately linear correlations when analysis conditions change within or between instruments. Here, proof-of-concept general linear modeling is based on the 20 most abundant fragments in a database of 128 training spectra of cocaine collected over six months in an operational crime laboratory. The statistical validity of the approach is confirmed through both analysis of variance (ANOVA) of the regression models and assessment of the distributions of the residuals of the models. General linear modeling models typically explain more than 90 percent of the variance in normalized abundances. When the linear models from the training set are applied to 175 additional known positive cocaine spectra from more than 20 different laboratories, the linear models enabled ion abundances to be predicted with an accuracy of less-than two percent, relative to the base peak, even though the measured abundances vary by more than 30 percent. The same models were also applied to 716 known negative spectra, including the diastereomers of cocaine: allococaine, pseudococaine, and pseudoallococaine, and the residual errors were larger for the known negatives than for known positives. The second part of the manuscript describes how general linear regression modeling can serve as the basis for binary classification and reliable identification of cocaine from its diastereomers and all other known negatives. Publisher Abstract Provided

Date Published: January 1, 2023