You are currently viewing a new version of our website. To view the old version click .
Processes
  • This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
  • Article
  • Open Access

12 December 2025

Machine Learning-Driven QSAR Modeling for Predicting Short-Term Exposure Limits of Hydrocarbons and Their Derivatives

,
,
,
and
School of Safety Science and Engineering, Changzhou University, Changzhou 213164, China
*
Author to whom correspondence should be addressed.
This article belongs to the Section AI-Enabled Process Engineering

Abstract

The scarcity of reliably determined STELs for numerous chemicals severely impedes occupational health risk assessment. To address this gap, this study establishes and validates a suite of robust quantitative structure–activity relationship (QSAR) models to efficiently predict STELs for hydrocarbons and their derivatives. A dataset of 60 compounds was partitioned using Affinity Propagation clustering, and the validity of this division was verified using Tanimoto similarity analysis and Uniform Manifold Approximation and Projection (UMAP). Four optimal molecular descriptors, indicative of molecular size and spatial configuration, were identified using a genetic algorithm. These descriptors served as inputs for one linear model—multiple linear regression (MLR)—and three nonlinear models: support vector machine (SVM), back-propagation artificial neural network (BP-ANN), and extreme gradient boosting (XGBoost). All models were rigorously validated according to OECD principles. The results demonstrated that the XGBoost model achieved superior performance, with key metrics (\({R}^{2}, {Q}_{\text{loo}}^{2}, {Q}_{\text{ext}}^{2}\)) all exceeding 0.9. Interpretability analysis using SHAP (SHapley Additive exPlanations) revealed that molecular size and symmetry descriptors (E3u, G2m) positively correlate with STEL, while the degree of unsaturation (n = CHR) shows a significant negative influence, providing novel mechanistic insights into the structure–toxicity relationship. Notably, 96% of the predictions fell within the defined applicability domain, confirming the model’s reliability. This study therefore serves as a rapid, accurate, interpretable, and reliable computational tool, with the potential to significantly inform and enhance occupational health and safety decision-making, especially for novel or data-poor chemicals.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.