This study presents a novel application of a hybrid regression–kriging (RK) and machine learning (ML) framework to impute missing tropospheric NO
2 data from the TROPOMI satellite over Taiwan during the winter months of January, February, and December 2022. The proposed approach combines
[...] Read more.
This study presents a novel application of a hybrid regression–kriging (RK) and machine learning (ML) framework to impute missing tropospheric NO
2 data from the TROPOMI satellite over Taiwan during the winter months of January, February, and December 2022. The proposed approach combines geostatistical interpolation with nonlinear modeling by integrating RK with ML models—specifically comparing gradient boosting regression (GBR), random forest (RF), and K-nearest neighbors (KNN)—to determine the most suitable auxiliary predictor. This structure enables the framework to capture both spatial autocorrelation and complex relationships between NO
2 concentrations and environmental drivers. Model performance was evaluated using the coefficient of determination (
r2), computed against observed TROPOMI NO
2 column values filtered by quality assurance criteria. GBR achieved the highest validation
r2 values of 0.83 for January and February, while RF yielded 0.82 and 0.79 in January and December, respectively. These results demonstrate the model’s robustness in capturing intra-seasonal patterns and nonlinear trends in NO
2 distribution. In contrast, models using only static land cover inputs performed poorly (
r2 < 0.58), emphasizing the limited predictive capacity of such variables in isolation. Interpretability analysis using the SHapley Additive exPlanations (SHAP) method revealed temperature as the most influential meteorological driver of NO
2 variation, particularly during winter, while forest cover consistently emerged as a key land-use factor mitigating NO
2 levels through dry deposition. By integrating dynamic meteorological variables and static land cover features, the hybrid RK–ML framework enhances the spatial and temporal completeness of satellite-derived air quality datasets. As the first RK–ML application for TROPOMI data in Taiwan, this study establishes a regional benchmark and offers a transferable methodology for satellite data imputation. Future research should explore ensemble-based RK variants, incorporate real-time auxiliary data, and assess transferability across diverse geographic and climatological contexts.
Full article