1. Introduction
1.1. Urban Air Quality Monitoring and Sustainability Challenges
Air pollution is one of the most significant environmental health risks worldwide [
1]. According to the World Health Organization (WHO), 99% of the global population lives in areas where air pollution levels exceed recommended guideline limits, and only 17% of cities worldwide complied with these guidelines [
2]. This situation poses a major challenge for urban sustainability, particularly in megacities characterized by high population density, complex topography, and limited atmospheric ventilation.
In Mexico, air quality remains a critical concern, driven by high emissions, adverse topographic conditions, and limited atmospheric dispersion [
3]. Mexico City, in particular, exhibits pronounced spatial heterogeneity in pollutant concentrations, with several zones presenting moderate-to-high risk levels for sensitive populations. This variability underscores the need for continuous and spatially resolved air quality monitoring. Despite more than three decades of sustained regulatory and environmental management efforts, and despite improvements in some indicators, pollutants such as ozone and particulate matter continue to exceed air quality standards during several periods of the year [
4].
Air quality monitoring in megacities plays a fundamental role in protecting public health, mitigating climate change impacts, and supporting evidence-based urban planning and policy-making [
5]. Moreover, although difficult to quantify, previous studies suggest that effective air quality management can generate substantial long-term economic benefits by reducing health-related costs and productivity losses [
6].
However, conventional air quality monitoring networks face important limitations related to cost, spatial coverage, and their ability to capture fine-scale spatial–temporal variability. Factors such as traffic intensity, industrial activity, and population density can lead to significant concentration differences over very short distances (≈0.5 km) [
4,
7]. While complementary approaches, such as satellite-based observations and atmospheric models, are increasingly used, they require complex processing and integration to provide reliable information at the urban scale [
6]. Similarly, low-cost sensor deployments have demonstrated potential for short-term urban studies [
1,
8,
9], but their application is often constrained by limited deployment periods and challenges related to long-term data stability and sensor maintenance.
In this context, spatial interpolation emerges as a complementary and cost-effective approach to estimate pollutant concentrations in areas without fixed monitoring stations by exploiting short- to medium-range spatial dependence inherent in urban environmental data. Geostatistical methods such as Ordinary Kriging (OK) are particularly attractive because they generate continuous spatial estimates while explicitly quantifying prediction uncertainty, a feature essential for decision-making in complex urban environments characterized by heterogeneous emission sources and variable atmospheric mixing [
10].
1.2. Spatial Interpolation in Air Quality Studies
The application of geostatistical methods, particularly kriging, in air quality research can be broadly grouped into three main strands: (1) spatial analysis and classification of polluted areas, (2) methodological developments and comparisons of interpolation techniques, and (3) health risk assessment associated with air pollutant exposure. Together, these studies highlight the versatility of kriging as a spatial analysis tool, while also revealing important limitations regarding its temporal sensitivity and pollutant-specific behavior.
The first group of studies focuses on the spatial characterization and classification of air pollution across urban or regional scales. For example, Hopkins et al. [
11] applied geospatial techniques to air quality data collected in India between 2018 and 2023. In this study, the concentrations of PM
2.5 (Particulate matter < 2.5 um), PM
10 (particulate matter < 10 um), NO
2, and SO
2 decreased during the COVID-19 lockdown period, and special emphasis was placed on classifying zones according to dominant pollutants. Similarly, Thakur et al. [
12] investigated the spatial distribution of O
3 in China using hourly monitoring data from 2020 and kriging interpolation, identifying peak and low concentration periods and showing relationships between O
3 levels, rainfall, and co-pollutants such as SO
2 and NO
2. Long-term spatial analyses have also been conducted in major Chinese cities. Li et al. [
13] analyzed air quality trends in Beijing from 2013 to 2019, reporting lower levels of SO
2, CO, PM
2.5, PM
10, and NO
2, while ozone exhibited a more variable behavior. Zhang et al. [
14] focused on PM
2.5 in Beijing, finding a decreasing trend over time influenced by humidity and wind speed, whereas [
15] examined multiple pollutants in Hangzhou and identified strong relationships between air pollution, land-use patterns, seasonal variability, and spatial distribution. At a broader scale, Denby et al. [
16] mapped spatial trends in air quality across Europe using kriging combined with multiple linear regression. In the Mexican context, Ramirez et al. [
17] described the spatial and temporal distribution of CO, NO
2, SO
2, PM
10, and O
3 in the city of Guadalajara from 2000 to 2005 using ordinary kriging, while [
18] evaluated whether monitoring networks were adequately representative and sufficient in the Rhön region of Germany through data analysis and geostatistical methods.
A second body of literature focuses on methodological developments and comparative evaluations of air pollution interpolation techniques. Dai et al. [
19] proposed a hybrid modeling framework that integrates vector autoregression (VAR), ordinary kriging, and XGBoost to predict ozone (O
3) concentrations across multiple seasons in China, demonstrating superior performance relative to traditional machine learning approaches such as decision trees. Dharmalingam et al. [
20] conducted a comprehensive comparison of nine prediction methods applied to eight air pollutants, including Site Average, Inverse Distance Weighting, Kriging, and Random Forest. Their findings indicate that predictive efficiency varies substantially depending on pollutant type, highlighting the importance of pollutant-specific methodological selection.
Earlier comparative work by [
21] evaluated different air quality prediction models used by the U.S. Environmental Protection Agency and showed that model performance varies substantially by pollutant. Beelen et al. [
22] compared ordinary kriging, universal kriging, and regression-based approaches for developing air pollution maps, concluding that the relative efficiency of these methods depends not only on pollutant type but also on key characteristics of the study area, such as urban density, topography, emission source distribution, meteorological conditions, and monitoring network coverage. While these studies contribute to methodological advancement, they generally emphasize predictive accuracy rather than systematically assessing how kriging performance changes across different temporal conditions (e.g., hourly and monthly variability).
The third group of studies employs kriging as a supporting tool in health risk. Xue et al. [
23] predicted the occurrence of fibrotic interstitial lung diseases using pollutants such as CO, O
3, SO
2, and NO
2, applying different methods including nearest-neighbor weighting and kriging, and finding that SO
2 and NO
2 were better estimated using kriging. Farhi et al. [
24] analyzed cancer risk in newborns associated with prenatal exposure to SO
2, PM
10, NOx, and O
3 using GIS (Geographic Information System) and kriging, concluding that no direct relationship was detected. Malek et al. [
25] examined the association between exposure to gaseous pollutants (NOx, NO
2, SO
2, O
3) and particulate matter (PM
2.5 and PM
10) with neurodegenerative disorders in menopausal women, finding that long-term risks were mainly associated with particulate matter. Additional studies have reported increased risks of chronic inflammatory airway diseases associated with NO
2 and SO
2 exposure [
26], potential associations between O
3 and SO
2 exposure and type 2 diabetes [
27], and mixed or inconclusive results regarding tuberculosis [
28]. Other research has highlighted negative impacts of PM
10 on children’s lung function [
29], associations between multiple pollutants and chronic obstructive pulmonary disease [
30], and increased risks of hypertension, hypercholesterolemia, and diabetes under long-term exposure conditions [
31], it can also be associated with kidney failure [
32]. Cohort studies have further linked increased PM
10 and NOx concentrations during pregnancy with congenital malformations [
24].
Although these studies demonstrate the widespread use of kriging as a spatial interpolation tool, it is typically employed as part of a broader analytical framework to characterize polluted areas or estimate exposure for health assessments. Across the three strands of literature, there is limited evaluation of the temporal sensitivity of kriging performance itself, particularly with respect to hourly and seasonal variability. Moreover, explicit comparisons between primary pollutants, which are directly emitted from local sources such as vehicles and industrial activities (e.g., SO2), and secondary pollutants, which are formed through atmospheric chemical reactions and tend to exhibit broader regional patterns (e.g., O3), remain scarce under a unified geostatistical validation framework. This gap is especially evident in the Mexican context, where relatively few studies have applied kriging to interpolate urban air pollution, underscoring the need for systematic analyses that account for both temporal dynamics and pollutant-specific spatial behavior.
1.3. Research Gap and Contributions
Although geostatistical interpolation has been widely applied in urban air quality studies, most existing research relies on fixed temporal snapshots or temporally aggregated data, with limited attention to how predictive performance varies across different hours of the day and months of the year. Moreover, many studies focus on a single pollutant or analyze multiple pollutants without explicitly accounting for their distinct physical behavior, particularly the contrast between primary pollutants dominated by local emissions and secondary pollutants formed through atmospheric chemical processes. As a result, the temporal sensitivity of kriging performance and its dependence on pollutant type remain insufficiently explored, especially in the context of megacities in Latin America. To address these gaps, this study evaluates the temporal sensitivity of Ordinary Kriging (OK) for urban air quality mapping in the Mexico City Metropolitan Area. Kriging performance is assessed across representative hourly blocks (09:00, 15:00, and 21:00) and seasonal periods (February, April, May, and August), explicitly comparing a secondary pollutant (O3) and a primary pollutant (SO2) under a unified geostatistical validation framework. The analysis combines leave-one-out cross-validation and external hold-out validation to quantify prediction accuracy, bias, and uncertainty.
Accordingly, this study addresses the following research question: How do hourly and monthly temporal scales influence the performance and reliability of Ordinary Kriging for primary and secondary air pollutants in a megacity? By systematically identifying when and for which pollutants kriging provides more robust spatial estimates, this work offers practical insights for the sustainable and physically consistent use of spatial interpolation as a complement to urban air quality monitoring networks.
3. Data Preprocessing
3.1. Data Cleaning and Quality Control
Hourly air quality data for ozone (O
3) and sulfur dioxide (SO
2) from 2024 were obtained from SIMAT [
40]. The dataset includes station identifiers, geographic coordinates (latitude and longitude), hourly averaged pollutant concentrations, date, and time.
A systematic data-cleaning and quality-control process was applied before geostatistical modeling. First, station coordinates were reviewed and converted to numeric format, correcting textual inconsistencies and decimal separators. Records with missing, inconsistent, or physically implausible concentration values were removed. All variables were homogenized to ensure a consistent format across stations and time periods. Stations with insufficient data coverage during the selected periods were excluded from the analysis.
Given the relatively compact spatial extent of the monitoring network (maximum inter-station separation ≈ 56 km), geographic coordinates were transformed into local Cartesian coordinates (x, y) expressed in kilometers to allow the use of Euclidean distances in variogram modeling. The conversion was performed using local approximations of 1° latitude ≈ 110.6 km and 1° longitude ≈ 111.3 km × cos (mean latitude). To ensure temporal stationarity for geostatistical analysis, representative months and hours were selected based on descriptive statistics of pollutant concentrations. For O3, May and 15:00 were identified as the peak month and hour, corresponding to conditions of maximum photochemical activity. For SO2, February and morning hours were identified as representative peak periods. The same preprocessing procedure was applied to both pollutants.
All data preprocessing, statistical analyses, and geostatistical modeling were performed using Python 3.10 in Google Colab (cloud-based Jupyter Notebook environment). The main libraries employed included NumPy, Pandas, SciPy, Matplotlib, and PyKrige for Ordinary Kriging implementation and variogram modeling.
Google Colab provides a Linux-based cloud execution environment with access to virtual CPUs and high-memory configurations, ensuring computational reproducibility and stability. No commercial or proprietary geostatistical software was used.
All simulations and code development were conducted on a MacBook Pro 16-inch equipped with Apple Silicon (M-series processor), 16 GB RAM, and macOS Sonoma, although all computational processing was performed in the Google Colab cloud environment rather than locally.
3.2. Geostatistical Modeling Using Ordinary Kriging
Spatial interpolation was performed using OK, a geostatistical method widely applied in environmental and air quality studies to estimate pollutant concentrations at unsampled locations based on spatial autocorrelation among observations [
10,
22]. Kriging has a long-standing theoretical foundation and has been extensively developed and applied over the past five decades, becoming a cornerstone of spatial statistics in geosciences Chilès & Desassis [
41]. OK assumes an unknown but locally constant mean within the neighborhood of estimation, making it appropriate for environmental variables when no explicit deterministic spatial trend is specified.
Under this framework, the pollutant concentration at an unsampled location
is estimated as a weighted linear combination of observed values at surrounding monitoring stations:
where
denotes the observed concentration at the location
,
are the kriging weights [
41], and
is the number of neighboring observations used in the estimation. The kriging weights are obtained by solving a system of linear equations derived using Lagrange multipliers, ensuring predictions while minimizing estimation variance, subject to the spatial autocorrelation structure described by the fitted variogram model.
Spatial dependence among observations was characterized using the experimental semi-variogram, which describes how variance between observation pairs changes as a function of separation distance. The experimental semi-variogram was computed as [
42]:
where
represents the semivariance at lag distance
, and
is the number of data pairs separated by that distance. To obtain a continuous representation of spatial dependence suitable for kriging interpolation, the experimental semi-variogram was fitted using three commonly adopted theoretical models in air quality applications: spherical, exponential, and Gaussian. Although recent methodological advances have proposed regularized and hybrid kriging formulations to improve estimation under complex conditions [
43], this study adopts classical variogram models to maintain interpretability and comparability with previous air pollution studies.
The parameters of the theoretical variogram models (nugget, sill, and range) were estimated using weighted least squares (WLS) fitting to the experimental semivariogram. The fitting procedure minimized the squared difference between experimental and modeled semivariance values across lag distances, assigning greater weight to bins with more point pairs. Automatic fitting routines implemented in the PyKrige library were used to ensure objective, reproducible, and algorithmically consistent parameter estimation. No manual or purely visual fitting was applied.
In addition to point estimates, OK provides a quantitative measure of prediction uncertainty through the kriging variance, which reflects the reliability of spatial estimates given the spatial configuration of monitoring stations:
where
denotes the kriging variance at location
,
is the modeled semivariance between observation and prediction locations, and
is the Lagrange multiplier enforcing the unbiasedness constraint. The kriging weights are obtained by solving a system of linear equations derived using Lagrange multipliers, ensuring unbiased predictions while minimizing estimation variance, subject to the spatial autocorrelation structure described by the fitted variogram model.
To ensure temporal stationarity in the geostatistical modeling, kriging was applied to datasets corresponding to specific combinations of month and hour, as defined in
Section 3.1. For O
3, spatial interpolation focused on May at 15:00, representing conditions of maximum photochemical activity and peak concentrations. For SO
2, representative peak conditions were selected in February and during the morning hours. This temporal stratification enables consistent comparison of spatial patterns across different atmospheric regimes while avoiding mixing heterogeneous temporal processes. The same geostatistical framework was applied to both pollutants, allowing a direct comparison of kriging performance for a secondary pollutant dominated by photochemical processes (O
3) and a primary pollutant primarily driven by local emissions and dispersion conditions (SO
2).
3.3. Model Validation and Performance Assessment
The performance of the geostatistical models was evaluated using a combination of internal and external validation strategies [
44], thereby enabling a robust assessment of prediction accuracy and uncertainty under varying temporal conditions. Two complementary approaches were adopted: leave-one-out cross-validation (LOOCV) and hold-out validation using independent monitoring.
LOOCV was employed as an internal validation method to assess model robustness using the full dataset. In this procedure, each monitoring station was sequentially removed from the dataset, and its pollutant concentration was estimated using OK based on the remaining stations [
44,
45]. The predicted values were then compared with the observed concentrations at the omitted locations. This approach allows evaluation of model sensitivity to individual observations while preserving the spatial configuration of the monitoring network. For each variogram model (spherical, exponential, and Gaussian), prediction errors were quantified using three standard performance metrics: root-mean-square error (RMSE), mean absolute error (MAE), and mean bias error (bias). RMSE was used to emphasize larger deviations, MAE to provide a robust measure of average error magnitude, and bias to identify systematic over- or underestimation by the model.
To complement LOOCV and evaluate the models’ predictive capability at truly unobserved locations, an external hold-out validation was conducted. A subset of monitoring stations was randomly selected and excluded from the model calibration process, while the remaining stations were used to fit the variogram and perform kriging interpolation [
42,
44]. Pollutant concentrations at the hold-out stations were then predicted and compared against observed values. This validation strategy is particularly relevant for assessing the applicability of spatial interpolation in areas with limited monitoring coverage, as it simulates the practical scenario of estimating pollutant concentrations at locations where no measurements are available. The same performance metrics (RMSE, MAE, and bias) were computed to ensure consistency with the LOOCV evaluation.
Approximately 20% of monitoring stations were randomly selected using a fixed random seed to ensure reproducibility of the validation procedure. A single hold-out split was applied per temporal scenario (month–hour combination) to maintain methodological consistency and comparability across pollutants and time periods.
Model performance was systematically compared across the three theoretical variogram models considered in this study. The selection of the most appropriate variogram model for each pollutant and temporal scenario was based on a combined evaluation of predictive accuracy (RMSE and MAE), bias minimization, and the spatial plausibility of the resulting interpolated surfaces. Rather than relying on a single metric, this multi-criteria evaluation allowed identification of models that balance accuracy and stability across different validation approaches.
In addition to accuracy metrics, kriging variance was analyzed to assess spatial patterns of prediction uncertainty. Areas with higher kriging variance were interpreted as locations where estimates are less reliable due to sparse monitoring coverage or increased spatial variability. This uncertainty information complements point predictions and provides valuable insights for identifying regions where additional monitoring could improve air quality assessments.
The validation framework was applied consistently across different temporal scenarios, including selected hours and months representative of peak and non-peak conditions for O3 and SO2. This approach enables a direct comparison of model performance across temporal regimes and supports the evaluation of how spatial interpolation accuracy varies with atmospheric conditions and pollutant type. All validation procedures were implemented within a fully reproducible computational workflow, ensuring consistent parameter estimation, station selection, and performance metric calculation across all temporal scenarios.
The methodological flowchart of the Ordinary Kriging framework applied in this study is summarized in
Figure 3. The flowchart illustrates the sequential process from data preprocessing and temporal stratification to geostatistical modeling, validation, and uncertainty assessment, ensuring transparency and reproducibility of the proposed framework.
4. Results
4.1. Descriptive Temporal Patterns of O3 and SO2
Descriptive statistics for ozone (O
3) concentrations in 2024 are presented in
Table 1 and
Table 2.
Table 1 summarizes monthly statistics, indicating that May exhibits the highest mean O
3 concentrations, while February shows the largest variability, as reflected by the standard deviation. Regarding diurnal behavior,
Table 2 shows that ozone (O
3) concentrations increase during the late morning and reach their highest levels in the afternoon, with peak mean values typically occurring between 14:00 and 18:00. At the station level, May is the most frequently identified peak month across the monitoring network.
Figure 4 illustrates the hourly variation of ozone concentrations at the UIZ station as a representative example.
For SO
2, February exhibits the highest mean concentrations across the monitoring network, with greater variability observed during the winter months (January to March). As shown in
Figure 4, the morning hours between 09:00 and 11:00 consistently exhibit the highest SO
2 levels, reflecting the combined influence of primary emissions and reduced atmospheric mixing under stable boundary-layer conditions. Across all stations in the network, February is the most frequent peak month for SO
2 (
Table 3), although localized deviations in peak timing and magnitude were observed. To illustrate the typical diurnal pattern,
Table 4 and
Figure 5 presents the hourly variation of SO
2 concentrations at the CUA station as a representative example; however, the reported temporal trends are derived from the full set of monitoring stations.
Descriptive analysis revealed clear and contrasting temporal patterns for both pollutants. O3 concentrations exhibited strong diurnal variability, with consistent afternoon peaks driven by photochemical activity, whereas SO2 showed morning maxima associated with primary emissions and limited atmospheric mixing. Across the monitoring network, May was identified as the peak month for O3, with maximum concentrations occurring between 14:00 and 18:00, while February and morning hours (09:00–11:00) were predominant for SO2. Although peak periods were generally consistent across stations, some local deviations were observed, reflecting site-specific emission and dispersion conditions. Interquartile ranges (p25–p75) further confirm higher temporal variability in O3 during the warm season and in SO2 during the morning hours. These temporal patterns were used to define representative scenarios for subsequent geostatistical modeling.
4.2. Variogram Analysis and Model Selection
Variogram-based model selection was conducted using OK for the representative temporal scenarios identified in the descriptive analysis. Three theoretical variogram models, spherical, exponential, and Gaussian, were evaluated to characterize spatial dependence and assess predictive performance. Model evaluation was performed using leave-one-out cross-validation (LOOCV) and an external hold-out validation, with the root-mean-square error (RMSE), mean absolute error (MAE), and bias as performance metrics.
Table 5 presents the LOOCV and hold-out results for the representative O
3 scenario (May, 15:00 h). Under LOOCV, the Gaussian model achieved the lowest prediction errors (RMSE = 14.55 ppb; MAE = 8.82 ppb), outperforming the spherical and exponential models. Although the Gaussian variogram exhibited a moderately negative bias (−2.63 ppb), indicating a slight tendency to underestimate O
3 concentrations, its overall predictive accuracy was superior. Consequently, the Gaussian model was selected as the most appropriate variogram for O
3 spatial interpolation and uncertainty assessment.
To further assess model robustness, external validation was conducted using three randomly selected monitoring stations (CAM, CUT, and CHO) as a hold-out set (
Table 5). While the spherical model yielded marginally lower RMSE and MAE values in the hold-out evaluation, differences among models were small, and all exhibited consistent negative bias. These results indicate moderate sensitivity to the validation approach but confirm the stability of the Gaussian model for representing O
3 spatial patterns under peak photochemical conditions.
For comparative purposes, the same validation framework was applied to SO
2. For each pollutant, representative scenarios were defined based on descriptive peak conditions to capture the dominant physical and chemical regimes: May at 15:00 h for O
3 (photochemical peak) and February at 10:00 h for SO
2 (primary emissions under stable morning conditions). The corresponding LOOCV and hold-out results for SO
2 are shown in
Table 6.
In contrast to O3, SO2 exhibited higher sensitivity to variogram model selection. The exponential model yielded the lowest RMSE and MAE values for both LOOCV and hold-out validation, indicating superior predictive performance. The spherical model showed comparable accuracy, while the Gaussian model performed poorly, with substantially higher errors and instability. These results are consistent with the predominantly primary nature of SO2 emissions, which generate sharper spatial gradients and shorter correlation ranges, better captured by exponential-type variogram structures. Although a small bias was observed for the exponential and spherical models, its magnitude was negligible, indicating stable and reliable performance.
Overall, the results demonstrate that optimal variogram selection depends on both temporal conditions and pollutant characteristics. O3, as a secondary pollutant, is better represented by smoother spatial structures, whereas sulfur SO2 (a primary pollutant) exhibits stronger local variability requiring models with shorter-range spatial dependence. These findings highlight the importance of selecting pollutant-specific variograms when applying kriging-based interpolation in urban air quality studies.
4.3. Spatial Interpolation and Uncertainty Assessment
Spatial interpolation was performed using OK to estimate pollutant concentrations across the Mexico City Metropolitan Area for the representative temporal scenarios identified in the descriptive analysis. Interpolations were conducted separately for O3 and SO2, enabling a comparative assessment of spatial patterns of secondary and primary pollutants within a unified methodological framework.
For O
3, the interpolated concentration fields exhibited relatively smooth spatial gradients across the basin, with higher concentrations generally observed during the afternoon peak period. This spatial structure is consistent with the regional formation of O
3 through photochemical processes and its subsequent transport under prevailing wind conditions. The resulting kriging maps reveal coherent spatial patterns that extend beyond individual monitoring stations, highlighting areas of elevated O
3 exposure that are not directly captured by the fixed monitoring network.
Figure 6a presents the spatial interpolation of ozone concentrations using OK with a Gaussian variogram for the representative peak scenario (May, 15:00 h). Higher concentrations are observed predominantly in the southern sector of the basin, whereas lower values are observed toward the northern areas. The absence of abrupt local gradients reflects the secondary nature of O
3 and supports the suitability of smooth variogram structures for this pollutant.
Figure 6b shows the corresponding kriging variance map, which quantifies the spatial uncertainty in predictions. Lower uncertainty is observed in areas with higher station density, whereas uncertainty increases toward the basin’s peripheral regions. Notably, regions of elevated O
3 concentration do not necessarily coincide with areas of highest uncertainty, underscoring the importance of jointly analyzing concentration and uncertainty fields when interpreting interpolated air quality surfaces.
In contrast, SO
2 exhibited more localized spatial patterns characterized by sharper gradients and higher spatial variability. Elevated SO
2 concentrations were primarily confined to specific regions of the basin, reflecting the influence of local combustion sources and industrial emissions combined with limited atmospheric dispersion during morning hours.
Figure 7a presents the SO
2 interpolation for the representative scenario (February, 10:00 h), showing spatial patterns dominated by localized hotspots rather than smooth regional gradients. This behavior is consistent with the predominantly primary nature of SO
2 emissions and their stronger dependence on proximity to sources and local meteorological conditions.
The corresponding uncertainty map for SO
2 (
Figure 7b) indicates higher spatial variability in prediction confidence, particularly in areas distant from monitoring stations. Compared to O
3, SO
2 uncertainty patterns are more heterogeneous, reflecting both the sharper spatial gradients and the greater sensitivity of primary pollutants to local emission and dispersion processes.
To further illustrate the applicability of the proposed framework, point-based predictions were generated at locations without monitoring stations. As an illustrative example, O3 concentration was estimated at the Torre Latinoamericana, a central urban landmark. The predicted O3 concentration was 54.79 ppb, with an associated kriging variance of 47.36, corresponding to an approximate standard deviation of 6.88 ppb. For SO2, the predicted concentration at the same location was 1.99 ppb with an estimated standard deviation of 0.46 ppb. These results demonstrate the method’s ability to provide both concentration estimates and uncertainty bounds at unmonitored locations, supporting its potential use in exposure assessment and urban air quality analysis.
Overall, the spatial interpolation results highlight clear differences in the spatial behavior of O3 and SO2, reflecting their distinct physical and chemical characteristics. While O3 exhibits broader regional patterns with smoother spatial transitions, SO2 is dominated by localized hotspots and higher spatial heterogeneity. These contrasting behaviors underscore the importance of pollutant-specific considerations and uncertainty assessment when applying geostatistical methods to urban air-quality monitoring.
4.4. Sensitivity Analysis
To evaluate the robustness and temporal sensitivity of the proposed geostatistical framework, a sensitivity analysis was conducted by varying both the hour of day and the month for each pollutant. This analysis aims to assess how changes in atmospheric conditions affect the spatial dependence structure and the predictive performance of OK, beyond the choice of the variogram model itself.
For O3, sensitivity analysis focused on representative stages of the diurnal photochemical cycle during May, the month with the highest average concentrations. Three characteristic time blocks were selected:
Morning (09:00 h): Relatively low ozone levels, influenced by residual nocturnal conditions and limited photochemical activity.
Afternoon peak (15:00 h): Maximum photochemical production under strong solar radiation.
Evening (21:00 h): Ozone decay phase, with reduced radiation and changing ventilation conditions.
Table 7 presents the leave-one-out cross-validation (LOOCV) results for these three hours. A clear increase in interpolation error from morning to evening was observed across all variogram models. RMSE values increased from approximately 8.4 ppb at 09:00 h to 14.5 ppb at 15:00 h and 18.2 ppb at 21:00 h, indicating increasing spatial heterogeneity during the evening decay period.
Notably, the Gaussian variogram exhibited unstable behavior during the morning period (09:00 h), with unrealistically high RMSE and MAE values. This result reflects the incompatibility of highly smooth spatial assumptions with weakly structured O3 fields dominated by local variability and low concentrations. In contrast, spherical and exponential models showed more stable performance during the morning hours, suggesting that smoother variogram structures are more appropriate under peak photochemical conditions than under low-concentration regimes.
Seasonal sensitivity for O
3 was evaluated by fixing the hour at 15:00 h and comparing February (winter), April (transition), May (photochemical maximum), and August (rainy season). Results shown in
Table 8 indicate substantial seasonal variability in spatial dependence. February and May exhibited comparatively lower RMSE values (≈13.5–14.5 ppb), suggesting more coherent spatial fields under winter stagnation and peak photochemical production. In contrast, April and August presented higher interpolation errors (≈16–18 ppb), reflecting increased atmospheric variability associated with transitional conditions and convective mixing. Across all months, differences among variogram models were moderate, indicating that hourly and monthly effects exert a stronger influence on ozone spatial structure than the specific variogram form.
For SO2, a predominantly primary pollutant, sensitivity analysis was designed to capture conditions of emission accumulation and dispersion. Hourly sensitivity was evaluated during February, the month with the highest SO2 levels, by selecting four representative periods: 6:00 h (stable morning conditions), 10:00 h (peak emissions), 15:00 h (enhanced mixing), and 21:00 h (evening stabilization).
Table 9 summarizes the LOOCV results for these hours. The highest prediction errors were consistently observed during early-morning conditions (06:00 h), when stable atmospheric layers favor local accumulation of emissions and the development of sharp spatial gradients. The lowest errors occurred at 15:00 h, coinciding with stronger atmospheric mixing and a more homogeneous spatial field. Evening hours (21:00 h) exhibited intermediate error levels, reflecting partial re-stabilization of the boundary layer.
Across all hours, spherical and exponential variogram models exhibited comparable, stable performance. In contrast, the Gaussian variogram consistently yielded unrealistically high RMSE and MAE values, indicating numerical instability and physical inconsistency in the SO2 spatial fields. This behavior confirms that highly smooth variogram structures are unsuitable for modeling pollutants with short spatial correlation ranges and strong local emission control.
Seasonal sensitivity for SO
2 was evaluated by fixing the hour at 10:00 h and comparing February, April, and August (
Table 10). A progressive decrease in prediction error from winter to the rainy season was observed, reflecting enhanced dispersion and wet deposition processes during summer months. As with the hourly analysis, model performance was more strongly governed by atmospheric conditions than by the choice of variogram.
Overall, the sensitivity analysis demonstrates that temporal factors (hour and month) dominate the spatial dependence structure for both pollutants, while the choice of variogram model plays a secondary role. However, the nature of this sensitivity differs substantially between pollutants. O
3 exhibits strong hour and season-dependent variability associated with photochemical production and regional transport, requiring hour and month-specific spatial models. In contrast, SO
2 displays greater sensitivity to atmospheric stability and mixing conditions, with simpler variogram models providing more robust performance. These findings highlight the importance of pollutant-specific temporal stratification when applying geostatistical interpolation to urban air quality data, and caution against the use of static spatial models across heterogeneous atmospheric regimes. In
Figure 8, the Leave-one-out cross-validation (LOOCV) RMSE for O
3 in May and SO
2 in February at representative hours of the day is shown. Results highlight the strong temporal sensitivity of spatial interpolation accuracy. For O
3, the Gaussian variogram performs best during peak photochemical conditions (15:00 h) but fails under weakly structured morning fields. For SO
2, primary emission dominance leads to higher errors under stable morning conditions, whereas improved performance is observed under enhanced atmospheric mixing. These results confirm that hour-specific variogram selection is essential for physically consistent geostatistical modeling.
5. Discussion
For O3, the Gaussian variogram model achieved the lowest prediction errors under the leave-one-out cross-validation (LOOCV) scheme, indicating superior overall predictive performance for the representative peak photochemical scenario. However, this model exhibited a consistent negative bias, reflecting a slight tendency to underestimate O3 concentrations. Under the hold-out validation scheme, differences in RMSE and MAE across variogram models were relatively small, and all models showed negative bias. These results indicate that, although the Gaussian model provides the best overall fit, uncertainty and bias remain non-negligible and should be explicitly acknowledged when interpreting interpolated O3 fields.
The O3 interpolation maps generated using the Gaussian model displayed smooth spatial patterns without abrupt gradients, consistent with the secondary nature of O3 formation and its regional-scale behavior. Sensitivity analysis further revealed that kriging performance for O3 is strongly dependent on both the hour of day and season. Smoother variogram structures performed better under conditions of intense photochemical activity (e.g., afternoon peak hours), whereas during periods of lower concentrations and weaker spatial structure (morning and evening), prediction errors increased. This confirms that O3 spatial dependence is highly dynamic and temporally sensitive, reinforcing the need for hour-specific and month-specific modeling strategies.
The improved performance of the Gaussian variogram during afternoon peak hours can be physically interpreted in terms of boundary-layer dynamics and photochemical production. During midday and early afternoon, enhanced solar radiation promotes photochemical reactions leading to regional-scale O3 formation, while convective mixing deepens the planetary boundary layer. This vertical mixing reduces localized concentration gradients and produces smoother horizontal spatial structures, which are better captured by variogram models imposing gradual spatial continuity. Conversely, during morning and evening transition periods, boundary-layer stability increases, vertical mixing weakens, and spatial heterogeneity becomes more pronounced, thereby reducing the predictive stability of smooth correlation structures.
In contrast, SO2 exhibited markedly different behavior. As a predominantly primary pollutant, SO2 exhibited greater spatial heterogeneity and sharper local gradients, leading to greater sensitivity to variogram model selection. Across both LOOCV and hold-out validation schemes, the exponential variogram consistently yielded lower RMSE and MAE values. In contrast, the Gaussian model exhibited instability and unrealistically high errors, particularly during the morning hours. This behavior reflects the incompatibility of overly smooth variogram structures with pollutants characterized by short correlation ranges and strong local emission influences.
The higher prediction errors observed during early morning hours are consistent with nocturnal boundary-layer stabilization and reduced atmospheric mixing. Under stable stratification, emissions from local combustion sources accumulate near the surface, generating sharp concentration gradients and short correlation ranges. Such conditions amplify spatial discontinuities that are poorly represented by smooth variogram structures. In contrast, during midday periods characterized by increased turbulence and boundary-layer growth, dispersion processes reduce spatial heterogeneity, leading to more stable interpolation performance. Monthly differences, including lower errors during the rainy season, may also reflect enhanced dispersion and wet deposition processes that moderate the contrast in extreme concentrations.
Spatial interpolation maps for SO2 further support these findings, revealing localized concentration patterns and increased variability, particularly during early morning hours when stable atmospheric conditions favor pollutant accumulation and limit dispersion. Under nocturnal boundary-layer stabilization, reduced vertical mixing enhances short-range spatial dependence and sharp concentration gradients, leading to larger interpolation errors when smooth variogram models are applied. In contrast, during midday and afternoon periods, increased turbulence and boundary-layer growth promote dispersion and produce more homogeneous spatial fields, resulting in improved model stability. Seasonal differences, including lower errors during the rainy season, may reflect enhanced dispersion and wet deposition processes that moderate extreme concentration contrasts.
This behavior indicates that the Gaussian model may overestimate spatial continuity in contexts where the true correlation range is short and highly heterogeneous. In contrast, the exponential model allows sharper changes at short distances and better captures localized emission-driven variability, resulting in more stable and physically consistent predictions. These findings highlight the importance of aligning variogram structure with pollutant-specific spatial behavior, particularly when modeling primary pollutants under stable atmospheric conditions.
Taken together, these results indicate that temporal context (hour of day and month) exerts a stronger influence on kriging performance than the choice of variogram model alone. While Gaussian structures are more appropriate for secondary pollutants, such as O3 under strong photochemical regimes, exponential models better capture the spatial behavior of primary pollutants such as SO2. Importantly, no single variogram model can be considered universally optimal across pollutants and temporal conditions.
From an algorithmic perspective, these results demonstrate that variogram model selection is not merely a parametric adjustment but a structural decision that directly affects numerical stability and predictive reliability. The sensitivity of prediction error to variogram functional form underscores the need for temporally stratified and pollutant-aware model specification within geostatistical workflows.
Limitations and Implications
Several limitations of this study should be acknowledged. First, data availability was restricted to a subset of monitoring stations in the Mexico City network, which may limit spatial representativeness in some areas of the metropolitan region. Second, the interpolation framework relied exclusively on concentration measurements and did not explicitly incorporate emission inventories or meteorological variables (e.g., wind speed and direction, humidity, temperature), which are known to influence pollutant dispersion, particularly for primary pollutants.
From a methodological perspective, only three theoretical variogram structures (spherical, exponential, and Gaussian) were evaluated. Although widely used, more flexible covariance formulations (e.g., Matérn models, anisotropic structures, or non-stationary kernels) could better capture complex spatial processes in heterogeneous urban environments. Additionally, the temporal component was handled through discrete hourly and monthly stratification rather than a fully spatial–temporal model, which does not explicitly represent temporal autocorrelation.
The analysis focused exclusively on O3 and SO2, leaving other relevant pollutants such as NOx, PM2.5, PM10, and CO for future research. Furthermore, the study was limited to a single megacity; extending the framework to a multi-city context would improve the generalizability of the findings. Future work could also evaluate whether similar temporal sensitivity patterns emerge for other secondary pollutants relative to O3 and for other primary pollutants relative to SO2.
Despite these limitations, the results demonstrate that Ordinary Kriging, when applied with explicit consideration of pollutant type and temporal context, can serve as a valuable complement to fixed air quality monitoring networks. From a sustainability perspective, this approach supports more informed use of interpolated maps for urban planning and exposure assessment, while emphasizing the importance of responsible interpretation of spatial predictions and associated uncertainty. Moreover, integrating geostatistical methods with emerging low-cost sensors and IoT-based monitoring systems offers promising opportunities to enhance spatial coverage in cities with limited monitoring infrastructure.
6. Conclusions
This study evaluated the temporal sensitivity of Ordinary Kriging (OK) for urban air-quality mapping in the Mexico City Metropolitan Area, explicitly comparing a secondary pollutant (O3) and a primary pollutant (SO2) under a unified geostatistical validation framework. By integrating descriptive analysis, leave-one-out cross-validation, external hold-out validation, and sensitivity analysis across hours of the day and months of the year, this work provides a comprehensive assessment of when and for which pollutants spatial interpolation yields reliable results.
The findings demonstrate that interpolation performance is strongly conditioned by both temporal context and pollutant characteristics. For O3, a secondary pollutant governed by photochemical processes, smoother variogram structures, particularly the Gaussian model, yielded superior predictive performance under peak photochemical conditions, producing spatial fields with gradual gradients and coherent regional patterns. In contrast, SO2, as a predominantly primary pollutant, exhibited sharper spatial gradients, stronger local variability, and shorter correlation ranges. For this pollutant, the exponential variogram consistently yielded more stable and accurate predictions, whereas overly smooth models led to instability and unrealistic estimates, especially during the morning hours.
Sensitivity analyses revealed that interpolation accuracy varies substantially across both hourly and monthly scales. For O3, prediction errors increased during evening decay periods, reflecting greater spatial heterogeneity, whereas for SO2, the largest errors were associated with early morning stable atmospheric conditions. These results confirm that no single variogram model is universally optimal and that geostatistical predictions at unmonitored locations should be performed using hour-specific and month-specific configurations to ensure physical consistency and predictive reliability.
From a sustainability perspective, this study highlights that spatial interpolation can effectively complement fixed air quality monitoring networks only when temporal sensitivity and pollutant-specific behavior are explicitly accounted for. When appropriately applied, geostatistical mapping can support urban planning, exposure assessment, and environmental decision-making by extending information beyond existing monitoring coverage, while requiring responsible interpretation and explicit communication of associated uncertainty.
Finally, the integration of geostatistical methods with emerging low-cost sensors and IoT-based air quality monitoring systems represents a promising pathway to expand spatial coverage in cities with limited monitoring infrastructure. Combined with monthly temporal stratification and pollutant-aware modeling, such approaches can contribute to more resilient, equitable, and sustainable urban air quality management.