1. Introduction
In recent years, the study of air pollution, its environmental drivers, and its health consequences has become a central research topic in both atmospheric science and public health. A substantial body of evidence demonstrates the profound impact of chronic and acute exposure to particulate matter on human well-being. Numerous epidemiological studies have documented a wide spectrum of adverse outcomes, including respiratory diseases [
1,
2,
3,
4,
5,
6,
7], neurodegenerative disorders, cardiometabolic impairments, as well as elevated risks of premature mortality and cancer [
8]. This growing corpus of research underscores the necessity of precise, high-resolution air quality monitoring and the development of modern analytical methods capable of capturing both spatial and temporal processes.
Parallel to health impact studies, numerous investigations have focused on long-term trend analysis and descriptive characterization of air pollution using both reference-grade stations—typically operating on gravimetric principles—and a rapidly expanding network of low-cost sensors (LCSs). Although LCS devices exhibit lower precision, their affordability and scalability enable dense spatial coverage, producing high-granularity datasets that support localized assessments of pollution variability. Studies of this type have been conducted across diverse geographical contexts, from Poland (e.g., focusing on the Kraków metropolitan area [
9,
10,
11]) to the Czech Republic [
12], Serbia [
13], China [
14], and the United States [
15], reflecting a global shift toward distributed sensing and data-driven environmental monitoring.
Another vibrant line of research involves forecasting pollution concentrations using a wide range of modeling paradigms, from classical time-series approaches such as ARIMA models to advanced machine learning and hybrid statistical–computational frameworks [
16]. Complementary to these predictive efforts is a growing methodological emphasis on diagnostic and explanatory analyses leveraging Explainable AI (XAI) or geostatistical models. For example, Danek et al. [
8] employed Geographically Weighted Regression (GWR) to characterize the spatially varying influence of meteorological drivers, while Zareba and Danek [
17] introduced an integrated deep learning and geostatistical–XAI framework capable of capturing seasonal variability and temporal dynamics of meteorological mechanisms responsible for smog episodes.
More recently, the field has witnessed an accelerated growth in applications of large language models (LLMs), primarily within literature synthesis, decision support pipelines, and health-oriented recommendation systems. A particularly innovative contribution in this domain is the work of Cogiel et al. [
18], who demonstrated the use of multimodal AI (MLLMs) combined with visual context engineering to automate the interpretation of pollution intensity maps. This methodological advancement is especially important given the explosive growth of IoT-based LCS networks: as sensor density and sampling frequency increase, the manual interpretation of spatiotemporal pollution maps becomes impractical. These findings highlight that for MLLMs to generalize effectively, the cartographic and visual quality of input maps—including interpolation techniques, color encoding, normalization, and spatial representation—is crucial.
Traditional geostatistical methods—particularly kriging—remain widely used for spatial interpolation; however, their practical application in contemporary environmental monitoring workflows is often constrained by several methodological and computational limitations. Kriging requires explicit modeling of the spatial covariance structure through a variogram [
19], a process that is both statistically delicate and highly dependent on expert judgment. Inaccurate or poorly parameterized variograms can propagate substantial errors into interpolated surfaces, especially in heterogeneous or meteorologically dynamic environments. Moreover, kriging’s underlying assumptions of second-order stationarity and isotropy, while mathematically convenient, rarely reflect the actual complexity of urban atmospheric systems. When spatial gradients or directional transport processes (e.g., valley channeled winds or boundary layer inversions) violate these assumptions, kriging may produce overly smoothed estimates under strong nonstationarity or directional transport. Its computational cost also increases rapidly with dataset size, making the method less suitable for high-density sensor networks or real-time mapping applications. Even advanced variants such as universal kriging or anisotropic kriging only partially mitigate these issues and often introduce additional layers of model complexity. As a result, despite its theoretical elegance, kriging may produce misleading spatial patterns when applied in contexts characterized by strong nonstationarity, irregular sampling geometry, or rapidly evolving pollution phenomena.
In contrast, Random Forest (RF) [
20] regression has emerged as a robust and flexible complementary data-driven approach for spatial prediction. Multiple methodological extensions have been proposed, including the foundational approach by Hengl et al. [
21], spatially attenuated variants accounting for distance-dependent relationships [
22], and formulations designed to accommodate multi-resolution predictors and high-dimensional feature spaces [
23]. In this study, the approach of Hengl et al. [
21] is adopted due to the relatively limited spatial extent of the study area, the uniform distribution of point measurements, and the consistent raster data resolution. RF models provide stable predictions across nonlinear feature interactions, and their ability to incorporate a wide array of explanatory variables—such as topographic morphology or meteorological conditions (temperature, humidity, and wind speed)—offers a substantial advantage over classical geostatistics. Furthermore, embedded measures of predictor importance align RF with the broader goals of Explainable AI, allowing for transparent assessment of environmental drivers. This study tests the hypothesis that Random Forest-based spatial mapping differs from ordinary kriging in its ability to preserve local-scale spatial variability in sparse sensor observations.
Within the regional air pollution typology developed by Morawiec et al. [
24], Kraków is classified as part of the southern urban–industrial macroregion, which is characterized by persistently elevated pollutant concentrations and intensive anthropogenic pressure. This classification provides a structured spatial context for the present study and identifies Kraków as a representative and critical case for air quality analysis. Despite the implementation of stringent local mitigation measures, including prohibitions on coal and wood combustion within municipal boundaries, the city has experienced chronic air pollution exposure over multiple decades. In contrast, adjacent municipalities operate under less restrictive regulatory regimes, resulting in uneven emission controls and fostering complex transboundary pollution processes that significantly affect the spatial and temporal variability of air quality within Kraków. Spatial pollution mapping is therefore essential for disentangling local emission contributions from regional inflow patterns. Kraków constitutes a particularly compelling case study not only because it lies within the highly regulated environmental framework of the European Union but also because it is situated within a national energy system whose characteristics resemble those of fast-developing regions in Asia or South America [
25]. The city’s topography—a valley bordered by several towns lacking strict emission controls—creates a natural testbed for investigating pollutant dispersion, cold-air pooling, and boundary layer dynamics. The application of RF-based mapping to time-series data clearly illustrates how these processes modulate the spatial and temporal distributions of particulate matter across the metropolitan region.
The contribution of this study lies in the application of a rigorously validated Random Forest-based mapping framework for generating high-resolution air pollution maps in a complex urban environment. Specifically, the study contributes (i) a validated spatial mapping framework integrating dense low-cost sensor data with meteorological and topographic predictors, (ii) an explicit assessment of intra-day temporal variability and spatial robustness using leave-one-out validation, and (iii) a transparent evaluation of predictor importance to enhance interpretability. The framework integrates dense low-cost sensor data with meteorological and topographic predictors, evaluates intra-day temporal variability, and quantifies predictor importance, enabling the identification of fine-scale spatial patterns that are not readily captured by classical geostatistical interpolation. The study provides a concise and reproducible workflow for RF-based air quality mapping and illustrates its practical value for spatial analysis of PM2.5 in Kraków. Rather than introducing a completely new prediction algorithm, the study focuses on application-level validation, spatial robustness, and interpretability in the specific context of dense urban sensor networks.
2. Materials and Methods
2.1. Sensors and Area Characterization
Air quality assessment in Poland is embedded within the regulatory framework defined by the European Union’s (EU) Ambient Air Quality Directive (2008/50/EC) [
26]. This legislation specifies permissible concentrations of particulate matter and prescribes standardized measurement protocols to ensure comparability across member states. While the EU maintains a substantial network of reference air monitoring stations, their spatial distribution is intentionally sparse due to the high cost and strict maintenance requirements of reference instruments. Kraków exemplifies this limitation: despite chronic pollution challenges, the city operates only a handful of reference-grade monitors for PM
10 and PM
2.5. However, their limited number prevents capturing fine-scale spatial variability across the urban landscape.
To overcome these spatial gaps, the current study integrates a dense network of Airly LCS (
www.airly.org, accessed on 1 September 2025) optical devices. These sensors operate on light-scattering principles to estimate particulate concentrations and can be deployed extensively across the city, providing high-resolution coverage impossible with reference stations alone. Although LCS measurements are not certified for regulatory reporting under EU law, numerous intercomparison studies have demonstrated that once corrected for environmental influences such as humidity, temperature, and local microclimate, LCS data can approximate reference station readings with high accuracy. For instance, Airly sensors used in this study have been reported to achieve a root mean square error (RMSE) of 3–6 µg/m
3 for PM
2.5 and 5–8 µg/m
3 for PM
10, with correlation coefficients (R
2) ranging from 0.85 to 0.93 under typical urban conditions.
The Airly sensors used in this study employ an MCERTS-certified optical system for real-time measurement of PM
10, PM
2.5, and PM
1.0. While the manufacturer does not publicly disclose the internal sensor model [
27], independent field evaluations report the integration of a Plantower PMS5003 laser-based particulate matter sensor [
28]. Measurements are calibrated at the system level against reference-grade stations and disseminated after the application of proprietary internal correction and quality control procedures [
29].
Each Airly device is continuously recalibrated using proprietary machine learning models that account for local environmental conditions, nearby reference measurements, and temporal drift. While the precise calibration algorithms are not publicly disclosed, similar ML-based correction techniques have been shown to reduce raw measurement errors by 30–50% and substantially improve agreement with reference stations. The data accessed through the Airly API are already postprocessed, ensuring that the measurements used in this study represent corrected and reliable particulate concentrations rather than raw sensor outputs.
Dense LCS networks have previously enabled analyses that would be impossible using reference stations alone. For example, during the spring 2020 COVID-19 lockdown, detailed studies combining LCS observations with chemical composition analyses revealed that the majority of particulate matter reaching Kraków originated from solid fuel heating in surrounding municipalities rather than from transportation, which was drastically reduced. Carbonaceous aerosols from coal combustion accounted for approximately 50% of PM10 during the heating season, with secondary inorganic aerosols contributing around 20%, metals 3–4%, and other unidentified components the remainder. Seasonal variability was observed, with transportation emissions playing a larger role during summer months. These findings align with regional emission inventories, which identify residential solid fuel combustion as the primary contributor to approximately half of annual PM10 and PM2.5 emissions.
In this study, the high spatial density and quantitative reliability of LCS data were essential for the RF-based mapping framework. The network comprised 51 Airly sensors distributed across the Kraków metropolitan area (
Figure 1). Topographic information was integrated from the Copernicus Digital Elevation Model (DEM) [
30], while meteorological variables including temperature, humidity, and wind speed were obtained from the Open-Meteo database [
31]. Meteorological data, originally at a 5 km × 5 km resolution, were rescaled to a 500 m × 500 m grid to match the model resolution. The combination of LCS, topographic, and meteorological data allowed for fine-grained, data-driven reconstruction of particulate matter distribution, capturing microscale spatial patterns and enabling a robust assessment of predictor importance in an urban setting. By providing dense, temporally resolved measurements, the LCS network formed the backbone of this study’s high-resolution air pollution mapping approach, bridging the gap between sparse reference stations and the needs of modern machine learning models.
The analysis was conducted for 23–24 March 2022, focusing on five representative hours (23 March at 6:00, 16:00, 20:00, and 23:00 and 24 March at 1:00). The selected study period represents a typical winter smog episode (heating related characterized by diurnal temperature fluctuations around the freezing point, low wind, and intensified residential combustion) and was used to validate the methodological performance of the proposed approach by illustrating its temporal robustness and spatial stability under different daily emission regimes, rather than to draw wider climatological conclusions. Extension to longer time periods is beyond the scope of this study.
It is important to note that air quality data were retrieved via the Airly API as hourly averaged PM2.5 concentrations, using values already corrected by the data provider. Raw sensor data and API keys cannot be shared due to provider restrictions, but the described workflow is transferable to other dense sensor networks.
2.2. Random Forest
Random Forest regression was applied as a data-driven framework for spatial mapping of PM2.5 concentrations. The method was selected due to its ability to model nonlinear relationships and to integrate heterogeneous predictors without requiring assumptions of stationarity or predefined spatial covariance structures.
For each analyzed hour, point measurements of PM2.5 from low-cost sensors were combined with coincident meteorological observations, including air temperature, relative humidity, and wind speed. In addition, elevation was extracted from a Digital Elevation Model at sensor locations and included as a topographic predictor. All variables were assembled into a tabular dataset representing the conditions at a given time step.
Spatial dependency was introduced explicitly through distance-based predictors. For each sensor location, Euclidean distances to all other sensors were computed and added as individual predictors. This approach allows the Random Forest model to learn spatial relationships directly from the data rather than relying on an explicit variogram model as in classical geostatistics. Similar distance-based representations have been used in established Random Forest frameworks for spatial prediction to represent spatial proximity and autocorrelation [
21]. While this increases the dimensionality of the predictor space, Random Forest is relatively robust to correlated features, and model performance was evaluated using leave-one-out validation to reduce potential location-specific effects. In this study, the distance-based representation provides a simple and reproducible way to account for spatial proximity in a dense sensor network. More compact spatial encodings were not explored and remain beyond the scope of this work.
Meteorological variables were prepared as raster layers for each time step. Gridded temperature, humidity, and wind speed fields were projected to a common coordinate system, resampled to the target grid resolution, and spatially smoothed to reduce local artifacts. These raster predictors were then used as inputs for spatial prediction across the interpolation grid.
The Random Forest model was fitted using the ranger implementation [
32] using 500 trees, considering one-third of the available predictors at each split, and a minimum node size of five. Model training was performed separately for each selected hour to capture short-term variability in pollution patterns. Predictions were generated for all grid cells by combining raster based meteorological predictors with distance-based spatial features, producing continuous PM
2.5 concentration surfaces. All variables used as inputs to the Random Forest model, including the target variable and meteorological, topographic, and spatial predictors, are summarized in
Table 1.
Model performance at measurement locations was evaluated using the coefficient of determination R2. Spatial robustness was assessed using leave-one-out cross-validation, where models were repeatedly trained with individual sensors removed and the variability of predictions was analyzed across the domain.
To enhance interpretability and support explainable spatial analysis, permutation-based variable importance was calculated for non-spatial predictors [
33]. For each variable, importance was calculated by randomly permuting its values and quantifying the resulting decrease in model performance while keeping all other predictors unchanged. Final importance values were estimated by repeatedly refitting the model and averaging permutation scores for temperature, humidity, wind speed, and elevation. This procedure allowed the identification of dominant environmental controls and their temporal variability.
The resulting Random Forest maps were finally compared with surfaces generated using ordinary kriging to evaluate differences in spatial structure, smoothness, and interpretability.
2.3. Ordinary Kriging
Ordinary kriging (OK) is the most commonly applied variant of kriging, which is a geostatistical interpolation method. It provides estimates of values at unsampled locations within a study area based on a known variogram and the information from neighboring observations [
34]. The OK estimator is expressed as:
where
Z* is the predicted value at location
x0,
Z(
xi) is the observed value at location
xi,
λi denotes the kriging weight associated with the
i-th observation, and
n is the number of neighboring sample points used in the estimation.
It assumes a constant but unknown mean across the study area, which makes it particularly suitable for environmental applications such as air pollution mapping [
35,
36,
37,
38].
Kriging interpolations for all maps were generated using the Geostatistical Wizard tool in ArcGIS Pro 3.0.3, which served solely as an implementation environment, while the final interpolation relied on manually specified parameters rather than default settings. A single spatial covariance structure was assumed for all analyzed time steps. An exponential semivariogram model was used with a nugget of 20, partial sill of 314, and range of 20 km. To ensure temporal comparability, identical color scales were applied to all maps using the global minimum and maximum PM2.5 concentrations across the analyzed time steps.
3. Results
Figure 2 shows the diurnal evolution of meteorological conditions at all sensor locations on the analyzed day, with individual time series colored according to site elevation. Air temperature exhibits a coherent daily cycle across the study area, with minimum values during early morning hours and a rapid increase after sunrise, reaching a pronounced maximum in the afternoon. Differences between sites are relatively small during daytime but become more distinct during nighttime and early morning, indicating elevation dependent thermal stratification and the presence of local inversion conditions. Relative humidity follows an inverse pattern to temperature, with high and spatially consistent values during nighttime and early morning hours, followed by a marked decrease around midday. Evening hours show renewed divergence between sites, particularly between lower and higher elevations. Wind speed displays the highest spatial variability among the analyzed variables. Overall wind conditions remain weak to moderate throughout most of the day, with a distinct minimum around midday and a pronounced increase during evening hours. This evening intensification is spatially heterogeneous and more pronounced at higher elevations, suggesting locally driven circulation patterns. The combined temporal behavior of temperature, humidity, and wind speed reflects meteorological conditions favorable for pollutant accumulation and limited dispersion, providing important context for the spatial patterns of PM
2.5 modeled in subsequent analyses.
The maps were generated using the Random Forest algorithm for 23 March at 6:00, 16:00, 20:00, and 23:00 and 24 March at 1:00.
Figure 3a–e illustrate the model predictions of PM
2.5 concentrations at these specific hours. To better understand the meteorological background of these processes,
Figure 4,
Figure 5 and
Figure 6 present smoothed spatial distributions of temperature, humidity, and wind speed at the same hours used as a predictors in main spatial modeling runs. The role of meteorological and topographic predictors in shaping pollution patterns was further assessed using Random Forest permutation importance analysis (
Figure 7). Among the non-spatial predictors, elevation was consistently the most influential variable across all hours, reflecting the significance of terrain-induced accumulation. Temperature and humidity also played important roles, particularly in the afternoon and evening, while wind speed contributed less to the model but still showed localized effects. All maps were generated using ArcGIS Pro software 3.0.3, ESRI.
Geostatistical maps produced for the same hours exhibit smoother interpolations; however, they lack the capability to effectively incorporate additional predictors. These maps are shown in
Figure 8, and they represent PM
2.5 concentrations generated using ordinary kriging.
The agreement between Random Forest predictions and point measurements was high, with R
2 values typically exceeding 0.9 across the analyzed hours. These R
2 values refer to the agreement between predicted and observed PM
2.5 concentrations at sensor locations and are reported as a measure of overall model fit rather than out-of-sample predictive performance. Out-of-sample evaluation was performed separately using leave-one-out cross-validation, with the focus on spatial robustness and stability of the predicted concentration fields rather than pointwise error metrics. As illustrated in
Figure 9 for the example of 6:00, the standard deviation of LOO predictions remains low over most of the study area, indicating that the resulting concentration fields are robust to the removal of individual sensors. Slightly higher uncertainty is confined to localized zones, mainly near isolated measurement points or areas with lower sensor density. Overall, the spatial distribution of LOO error confirms that the RF-based mapping framework produces stable and consistent PM
2.5 patterns while allowing for the identification of locally influential observations.
4. Discussion
The spatiotemporal patterns of PM2.5 obtained with the RF model reflect the combined effects of topography, meteorological conditions, and emission dynamics. The analysis of five selected time points provides insight into how predictors interact to shape air quality in the Kraków region under inversion and low-wind conditions. In the context of this study, emission dynamics are interpreted indirectly through observed concentration patterns, while the model itself is designed for spatial reconstruction under given meteorological and topographic conditions rather than detailed emission modeling.
At 6:00, elevated PM
2.5 concentrations were observed in river valleys, indicating potential pollution transport pathways. Mountainous areas remained relatively clean despite intensive fossil fuel use in these regions, likely because residential heating activities had not yet started in the early morning. Elevation emerged as the dominant predictor, with the model highlighting increased pollution in terrain depressions. Temperature ranked second in importance, shaping finer details of the spatial distribution. By 16:00, despite unusually high temperatures for March, the first signs of the so-called “bagel” appeared, consistent with typical evening emission peaks reported for residential heating in the Kraków region [
39]. Elevation remained the most important predictor, but the role of temperature increased, and humidity gained significantly in importance, influencing the emerging spatial patterns. At 20:00, the “bagel” became more pronounced. The evening temperature drop favored intensified fossil fuel combustion, while the city center of Kraków remained relatively clean. Humidity was the most influential predictor at this time, likely shaping PM
2.5 values in less polluted areas. Elevation continued to be important, but the importance of wind speed increased, suggesting that even weak airflow patterns may have contributed to the redistribution of pollutants. By 23:00, pollutant levels decreased in regions dominated by manual heating systems and began to flow into Kraków along the Vistula River valley from the west. Elevation once again dominated predictor importance, while temperature and humidity jointly ranked second. Their influence likely differed by location: humidity-shaping conditions in Kraków as the receptor area and temperature in emission-producing regions. Interestingly, their spatial distributions were highly correlated during this time, reinforcing their combined role in pollution dynamics. At 1:00, pollutant transport along the Skawinka Valley led to accumulation within Kraków. The situation closely resembled that of 6:00 on the previous day, with a nearly identical hierarchy of predictor importance.
RF models provide not only a high agreement with point measurements but also spatial stability, remaining robust to the removal of individual measurement points (
Figure 9). Traditional maps, although effective for interpolation, do not account for important predictors, which limits their interpretability.
The present study has several limitations that should be acknowledged. First, the analysis is based on selected hours representing a single winter smog episode and therefore does not aim to provide generalized conclusions across multiple seasons or pollution regimes. Second, the modeling framework does not explicitly incorporate traffic intensity or detailed emission inventories. Emission dynamics are interpreted indirectly through observed concentration patterns and meteorological data. Third, PM2.5 measurements were obtained from low-cost sensors using manufacturer’s correction algorithms, and raw data processing procedures are proprietary. While these constraints do not affect the methodological demonstration of the spatial mapping framework, they limit the extent to which source-specific or long-term conclusions can be drawn.
5. Conclusions
This study demonstrated that RF-based spatial modeling provides an effective and interpretable framework for high-resolution mapping of PM2.5 concentrations in complex urban terrain. By integrating dense low-cost sensor measurements with meteorological and topographic predictors, the proposed approach achieved high predictive accuracy. Comparison with ordinary kriging showed that traditional geostatistical interpolation produces smoother concentration maps, but it cannot include additional environmental predictors and therefore provides limited insight into the processes driving pollution patterns.
RF preserved fine-scale spatial structures, particularly along river valleys, whereas kriging produced smoother and less detailed concentration fields. The RF model reproduced observed concentrations with coefficients of determination of 0.85–0.95 and showed high spatial robustness in leave-one-out validation (error below 5%).
The presented results demonstrate that spatial prediction based on machine learning can complement classical geostatistical mapping. The methodology is transferable to other cities with dense sensor networks, despite restrictions on raw data sharing.