U.S. flag

An official website of the United States government, Department of Justice.

Reliable Model Selection without Reference Values by Utilizing Model Diversity with Prediction Similarity

NCJ Number
304940
Author(s)
R. C. Spiers; J. H. Kalivas
Date Published
2021
Length
11 pages
Annotation

Since tuning parameter selection typically depends on only one model quality measure assessing model bias using prediction accuracy, this article reports the development of a generic model selection process using concepts from consensus modeling and–activity relationship quantitative structure (QSAR) activity landscapes.

 

Abstract

Predictive modeling (calibration or training) with various data formats, such as near-infrared (NIR) spectra and quantitative structure–activity relationship (QSAR) data, provides essential information if a proper model is selected. Similarly, with a general model selection approach, spectral model maintenance (updating) from original modeling conditions to new conditions can be performed for dynamic modeling. Fundamental modeling (partial least-squares (PLS) and others) and maintenance processes (domain adaptation or transfer learning and others) require selection of tuning parameter(s) values to isolate models that can accurately predict new samples or molecules, e.g., number of PLS latent variables to predict analyte concentration. Regardless of the modeling task, model selection is complex and without a reliable protocol. Tuning parameter selection typically depends on only one model quality measure assessing model bias using prediction accuracy. Developed in this paper is a generic model selection process using concepts from consensus modeling and QSAR activity landscapes. It is a consensus filtering approach that prioritizes model diversity (MD) while conserving prediction similarity (PS) fused with a common bias-variance trade-off measure. A significant feature of MDPS is that a cross-validation scheme is not needed because models are selected relative to predicting new samples or molecules, i.e., model selection uses unlabeled samples (without reference values) for active predictions. The versatility and reliability of MDPS model selection is shown using four NIR data sets and a QSAR data set. The study also substantiates the Rashomon effect where there is not one best model tuning parameter value that provides accurate predictions. (Publisher Abstract Provided)

 

Date Published: January 1, 2021