High concentrations of tropospheric ozone (
) in urban areas pose a significant risk to human health. This study proposes an evaluation framework based on the XGBoost algorithm to predict
concentration, assessing the model’s capacity for seasonal extrapolation and
[...] Read more.
High concentrations of tropospheric ozone (
) in urban areas pose a significant risk to human health. This study proposes an evaluation framework based on the XGBoost algorithm to predict
concentration, assessing the model’s capacity for seasonal extrapolation and spatial transferability. The experiment uses hourly air pollution data (
, NO,
, and NOx) and meteorological factors (temperature, relative humidity, barometric pressure, wind speed, and wind direction) from six monitoring stations in the Monterrey Metropolitan Area, Mexico (from 22 September 2022 to 21 September 2023). In the preprocessing phase, the datasets were extended via feature engineering, including cyclic variables, rolling windows, and lag features, to capture temporal dynamics. The prediction models were optimized using a random search, with time-series cross-validation to prevent data leakage. The models were evaluated across a concentration range of 0.001 to 0.122 ppm, demonstrating high predictive accuracy, with a coefficient of determination (
) of up to 0.96 and a root-mean-square error (RMSE) of 0.0034 ppm when predicting summer (
) concentrations without prior knowledge. Spatial generalization was robust in residential areas (
> 0.90), but performance decreased in the industrial corridor (AQMS-NL03). We identified that this decrease is related to local complexity through the quantification of domain shift (Kolmogorov–Smirnov test) and Shapley additive explanations (SHAP) diagnostics, since the model effectively learns atmospheric inertia in stable areas but struggles with the stochastic effects of NOx titration driven by industrial emissions. These findings position the proposed approach as a reliable tool for “virtual detection” while highlighting the crucial role of environmental topology in model implementation.
Full article