This paper describes the preparation of in-silico fire debris samples and preprocessing of the data for ML methods, provides a brief discussion of ML methods, gives the results of ML methods on in-silico TIS dataset and experimental TIS dataset, and discusses the implications of those results.
A large dataset of 240,000 fire debris samples have been generated in-silico using a data augmentation method at National Center for Forensic Science. The IS samples contain balanced data with 50 percent samples having ignitable liquid residue and 50 percent only having substrate components. In the big data era, this large dataset is useful for researchers to develop and implement their new machine learning methods. In this paper, the authors split the data into a training dataset and a test dataset. They then trained seven machine learning methods including logistic regression, least discriminant analysis, quadratic discriminant analysis, support vector machine, random forest, XGBoost, and neural network on an in-silico training dataset. The predictive accuracy and area under the ROC (AUC) of the models was evaluated and compared on both an in-silico test dataset and on an experimental fire debris dataset. In addition, the authors analyzed both TIS and TIC datasets. For the TIS dataset, neural network provides the highest AUC in both in-silico test and experimental fire debris dataset. Random forest shows the highest performance for the TIC dataset when they binned the retention index. (Published Abstract Provided)