Winter and Summer PM 2.5 Land Use Regression Models for the City of Novi Sad, Serbia

: In this study, we describe the development of seasonal winter and summer (heating and non-heating season) land use regression (LUR) models for PM 2.5 mass concentration for the city of Novi Sad, Serbia. The PM 2.5 data were obtained through an extensive seasonal measurement cam-paign conducted at 21 locations in urban, urban/industrial, industrial and background areas in the period from February 2020– July 2021. At each location, PM 2.5 samples were collected on quartz ﬁbre ﬁlters for 10 days per season using a reference gravimetric pump. The developed heating season model h ad two predictors, the ﬁrst can be associated with domestic heating over a larger area and the second with local traﬃc. These predictors contributed to the adjusted R 2 of 0.33 and 0.55, respectively. The developed non-heating season model had one predictor which can be associated with local traﬃc, which contributed to the adjusted R 2 of 0.40. Leave-one-out cross-validation determined RMSE /mean absolute error for the heating and non-heating season model were 4.04/4.80 µg/m 3 and 2.80/3.17 µg/m 3 , respectively. For purposes of completeness, developed LUR models were also compared to a simple linear model which utilizes satellite aerosol optical depth data for PM 2.5 estimation, and showed superior performance. The developed LUR models can help with quantiﬁcation of differences between seasonal levels of air pollution, and, consequently, air pollution exposure and association between seasonal long-term exposure and possible health risk implications.


Introduction
Particulate matter pollution is known to be the single largest environmental health risk in Europe [1] and various adverse health effects are linked to the exposure to particulate matter (PM) [2,3].With the continuous rapid growth of urban areas, dense air quality monitoring within these environments becomes increasingly important since significant contrasts in ambient air quality and level of PM2.5 can exist within different city microenvironments and consequently manifest different degree of health risks [4,5].
However, sparse referent monitoring networks and limited geographic coverage, which makes capturing intra-city variabilities difficult, are the reality of most countries in the world [6].Nowadays there is a wide palette of different techniques used for processing air quality data with the goal of estimating spatial variations in measured pollutants, and assessing exposure, each having different requirements for data availability, going from more simple spatial interpolation methods to the more advanced dispersion models that integrate chemical transport modelling.
Data-driven statistical modelling of air pollution generally has several advantages in the specific context of land use regression (LUR) modelling.LUR modelling has a solid foundation since it relies on directly measuring air pollution using typical reference instruments, and it allows for approximate modelling of pollution sources/sinks or their proxies on both smaller and larger scales of spatial variation but does so without introducing complexities of the physical-chemical modelling and dependencies of extensive datasets.This makes, in a way, LUR models robust and capable of quantifying the level of air pollution and air pollution exposure even at the personal level [6][7][8][9][10][11][12][13][14][15][16][17][18][19].Many LUR models were developed for densely populated cities where high levels of air pollution were detected.Those studies focused on the development of either local or regional LUR models [20][21][22].A well-known LUR model targeting multiple European cities was developed in the framework of the European Study of Cohorts for Air Pollution Effects-ES-CAPE project [14,23,24].The most commonly used target variables for LUR modelling in the context of air pollution are the PM2.5 fraction of particulate matter pollution, and the concentration of gaseous pollutant, NO2 [25].These models can often provide a high percentage of explained variability.Recent publications that utilized LUR models for predicting PM2.5 have implemented different advanced machine learning algorithms, satellite data or other approaches in combination with basic LUR modelling principles in order to enhance model prediction efficiency and ability to be used for a larger study area [15,17,18,[26][27][28][29][30][31][32].
Monitoring air pollutants for LUR development is often time-limited to several sampling campaigns during the year, lasting one or two weeks [33][34][35].Seasonal PM2.5 LUR models have been developed in Asia [36], Canada [37], Iran [38] and Colombia [34].Previously, in Serbia, adaptations of the European LUR model developed by Wang et al. in 2014 within the ESCAPE study [24] and its validation with the development of supporting average daily traffic intensity models, and data fusion simulation were performed in Belgrade for PM2.5 by Davidović et al. [39] and for NO2 by Davidović et al. [40].Some aspects of the application of the European LUR model developed by Wang et al. [24] in the Novi Sad municipality were shown by Dmitrašinović et al. [41].LUR models are usually developed based on 20-80 sites [24]; however, in the ESCAPE project, 40-80 and 20-40 sites in 36 study areas in Europe were used for NOX and PM2.5 model development [42].The smaller the number of measurement sites used for assessing the level of pollutant concentration, the more the accuracy of the developed LUR model will depend on the position of the individual measuring points [43].
LUR model development typically has the following components: the target variable specifications (specific air quality measurements (e.g., NO2, PM2.5) we want to predict or estimate), specific air pollution campaign data or existent air pollution monitoring network data, land use and land cover maps (e.g., forests, residential areas, roads within the study area), distance to feature/buffer zones (used to model the correlations those features have with pollutant concentrations at a specific location), and other predictor variables [44].Once a statistically significant relationship between predictor variables and a target pollutant is established, this relationship can be used to estimate pollutant concentrations at various unmonitored sites.This is possible as long as all the predictor variables can be calculated for those sites [18,26,31,[44][45][46].
In this study, we had a goal to develop seasonal winter and summer (heating and non-heating season) PM2.5 land use regression (LUR) models for the city of Novi Sad, Serbia.One of the main motivations behind such a modelling effort is filling in the knowledge gap on spatial and seasonal variations in PM2.5 concentrations since the network of reference monitoring stations in the city is rather unevenly distributed.In addition, in most cities in Serbia, PM concentrations show drastic differences in heating and non-heating seasons, thus the origin and quantification of this potential difference in the city of Novi Sad needs confirmation.Once in place, these models can help with the quantification of the level of air pollution, air pollution exposure, and the association between long-term exposure and various health risks, even in microenvironments not covered by regulatory monitoring.Furthermore, these models can also be used to obtain high-resolution seasonal base maps for PM2.5 concentrations in the city of Novi Sad.

Study Area
Novi Sad, Serbia's second largest city after the capital, Belgrade, is an urban-industrial agglomeration divided by the Danube River into two distinct regions.The city encompasses an area of 702.7 km 2 and experiences a regional climate that transitions from moderately continental to continental.Novi Sad sits at an elevation of approximately 80 m above sea level [47].According to the latest publicly available population census data from 2022 [48], the city has 367,121 inhabitants.
Urban air quality in Novi Sad is influenced by several factors, including residential heating using wood, coal or natural gas, intensive traffic in urban areas, low energy efficiency of plants in the energy sector and industry, diffuse pollution from agriculture, and others [49].Fine particulate matter pollution in Novi Sad frequently exceeds the proposed daily and annual WHO limits of 15 µg/m 3 and 5 µg/m 3 , respectively [50].In Novi Sad, PM2.5 is measured by two networks, the national network running under the Serbian Environmental Protection Agency (SEPA) (one measuring site, urban) and a local network established by the City Administration for Environmental Protection (two measuring sites, urban and industrial) [51].The study area and sampling sites were categorized into urban zone (URB), industrial zone (IND), border of urban and industrial zone (URB/IND) and background zone (BCG).Figure 1 shows a photo of each of the sampling location types.Table 1 provides more details on each of the measuring sites.

Sampling Campaign
The sampling campaign was conducted during winter (15 February-18 March 2020, 8 December 2020-19 February 2021) and summer (24 July-18 September 2020, 6-16 July 2021) periods and PM2.5 was measured at 21 locations.At each location, PM2.5 was collected for 10 days using a reference gravimetric pump Sven Leckel Model LVS3, with standard PM2.5 inlet and 2.3 lpm flow rate.Gravimetric pumps were set to sample air for 48 h and collect particles on quartz fiber filters (Whatman QM-A, 47 mm).We had a total of 4 gravimetric pumps, thus particle concentrations during the winter and summer seasons were collected in batches of 4 sites simultaneously, for 10 days at each site per season.Each sample was collected for 48 h, resulting in 5 samples at each site per season.This cycle was performed until all measurements were completed at all 21 sites.Based on these results, seasonal averages were calculated according to the ESCAPE exposure manual [52].
Figure 2a shows 21 measuring locations, divided by colour according to type (URB, IND, BCG, URB/IND).Of these locations, 18 can be classified as URB-type locations with different levels of traffic intensity.During the heating season, air quality strongly depends on heating modalities present in urban microenvironments (individual heating (IH) using wood, coal, gas and electricity, or district heating by thermal power plants (DH), see Table 1).Location no.7 is in the IND zone, location no. 3 is on the very border of the URB/IND zone, and location no.17 is in the vicinity of the local background station, thus it is treated as a BCG site.The background location is situated near the Danube River, a residential area heated by a DH and near a pedestrian zone, with no traffic or domestic heating nearby.In Figure 2b, DH area borders and coverage areas (orange, pink, blue, and yellow colour) are shown and every study site is numbered (from 1-21), following the site sequence given in Table 1.Sampling sites are coloured green and white, that is, sites marked with the green colour are within areas that are within the city DH area, and sites marked with the white colour, are within areas with mixed means of individual heating (denoted as IH area in subsequent text).Thermal power plant (TPP) sites (green triangle) and district heating (DH) area borders (figure inset).Numbering of sampling sites corresponds to Table 1.

Model Development and Evaluation
The database containing all PM2.5 measurements from the 21 sites and a set of potential predictor variables was used for LUR model development.ESCAPE methodology [24,38,53,54] was applied, including enforcement of potential predictor direction of effect, as listed in Table A1.The initial step in model development was calculating univariate regression between each potential predictor variable (Table A1) and measured PM2.5 target variable (Table 2) in order to determine the strength of their correlation by observing the Pearson correlation coefficient (R) and p-value.The predictor variable with the highest R value was used in the model, and the supervised stepwise forward procedure was continued in an iterative manner.Predictor variables that were not correlated with PM2.5 measurements, had p > 0.05, did not have all observations for the study sample (N = 21) or had a predictor direction of effect opposite to the initially defined direction of effect were rejected.Additionally, for the buffer type predictor variables, if in the set of variables (e.g., corresponding to road length variable ROADLENGTH_50m) one had the highest correlation with PM2.5, then the variable with the smaller buffer radius was not considered since values of this variable were already included within the larger buffer (e.g., variable ROAD-LENGTH_25m was not taken into account since the larger buffer, variable ROAD-LENGTH_50m, is included).One other criterion was if, e.g., variable ROAD-LENGTH_100m also had a significant correlation with PM2.5, then in the regression model, an additional ring buffer was defined as the difference between ROADLENGTH_100m and ROADLENGTH_50m.After all remaining suitable predictor variables were added one by one into the set of possible predictor variables, the change in the percentage of explained variance of PM2.5 (adjusted R 2 , adj.R 2 ) was observed.During the stepwise iterative procedure for the LUR model development, the following steps were followed in order to avoid overfitting and to ensure the choice of a stable model.In each step, the best candidate predictor is included in the model only if the adjusted R 2 increased by more than 1%, otherwise, if there is no such predictor, the iterative procedure stops and the model is finalized.The direction of effect (i.e., sign of the coefficient corresponding to the particular predictor) for the newly included predictor must be in accordance with the predetermined sign, and furthermore, all coefficients in the new regression equation must remain the same, i.e., in accordance with their predetermined direction of effect [53].Adding predictors to the model stops when there are no remaining predictors that can increase adj.R 2 by at least 1% compared to the previous regression model.As a final step, variables that are not statistically significant (p > 0.10), were eliminated from the model, while those with p ≤ 0.10 were kept in the model.Evaluation of the final winter and summer PM2.5 LUR models was conducted by examining Cook's distance (Cook's D), heteroscedasticity (White test) and normality of residual distribution.
For model validation, we used the Leave-One-Out Cross-validation (LOOCV), in which data from one site is left out, the LUR model is rebuilt based on the remaining points, the error for the left-out point is calculated, and then the total error is averaged for all observation sites.The benefit of such a leave-one-out analysis is that, in a way, all measuring points are used for both model development and model validation.The performance of the developed model was assessed by observing adj.R 2 , RMSE and MAE values [38,54,55].

PM2.5 Measurements
Mean seasonal PM2.5 concentrations during heating and non-heating seasons and their descriptive statistics are given in Tables 2 and 3. Note that, as mentioned in the Materials and Methods section, each location had a total of five 48 h measurements per season that were used for inferring seasonal concentrations.Some details and comments about these individual measurements are stated in Appendix D. Among all the potential predictor variables offered for the development of the winter PM2.5 LUR model (PM2.5_LURWINTER)from Table A1, variables ROADLENGTH_50m, park_area_3000m, suburban_area_3000m, class3_area_1000m and resident_area_300m had the highest R; however, class3_area_1000m and resident_area_300m were not used in further calculation due to insufficient number of observations (different from zero).Correlations between PM2.5 and potential predictors expressed through the R values and the number of observations for these potential predictors are given in Table 4.The distinction between moderate and strong correlations was made based on p-value using the following criteria: p < 0.05 (moderate correlation, marked by *), and p < 0.01 (strong correlation, marked by **).
Tested models considered within a forward stepwise regression process along with values of R 2 , adj.R 2 , F-test, DF, p-value, VIF and RMSE are given in Table 5. Considering the values of adj.R 2 and RMSE, the third model in which the predictor variables are sub-urban_area_3000m and ROADLENGTH_50m is noticeably better than the first and second proposed models, and as such it is optimal for predicting winter PM2.5.The values of B (unstandardized) coefficient and the p-value for optimal model final predictors are B = 0.0000019, p = 0.001 for suburban_area_3000m and B = 0.024, p < 0.01 for ROAD-LENGTH_50m.Unstandardized coefficients in the final regression equations are used due to different measuring scales of predictors.The VIF criteria for multicollinearity and Cook's D criteria used for model development were <3 and <1 [56].The regression equation for this final PM2.5_LURWINTER model is given by Equation (1).By applying the final PM2.5_LURWINTER regression formula, it is possible to explain about 55% of the variance in the PM2.5 concentration.

PM2.5_LURWINTER
The evaluation process for the model gave Cook's D observation values below 1 and White's test was not statistically significant (χ 2 (5) = 1.21, p = 0.944), with a p-value higher than the level of significance (0.05) [57], showing the absence of heteroscedasticity from the tested data (Figure 3b).Values of skewness (SK = −0.077)and kurtosis coefficients (KU = −1.09)are within the acceptable range ±1 [58,59], thus the distribution of residuals for the winter model can be treated as near normal as presented on residual plots in Figure 3a-d Validation of the model using LOOCV is presented in Appendix B.

Summer LUR Model Development and Evaluation
For the development of a summer LUR model (PM2.5_LURSUMMER),all potential predictor variables, their correlations with target PM2.5 expressed through their R values and the number of observations are shown in Table 6.The same elimination criteria used in the winter model were applied to these potential predictors.Buffer-type predictor variables that significantly correlate with the PM2.5 but also have a corresponding predictor with a smaller buffer radius that has a lower correlation were not included in the model.For example, the variable SECONDARYROADLENGTH_500m was not included in the analysis as a predictor because the variable SECONDARYROADLENGTH_1000m is more strongly correlated to PM2.5 and has a larger buffer radius.When a variable of the same class has a weaker correlation with the dependent PM2.5 variable compared to the same class with a smaller buffer radius, then a new ring buffer variable is derived.For example, from the difference of variables ROADLENGTH_300m and ROADLENGTH_50m, the variable ROADLENGTH_50_300 is considered for inclusion in the model.The distinction between moderate and strong correlations was made based on p-value using the following criteria: p < 0.05 (moderate correlation, marked by *) and p < 0.01 (strong correlation, marked by **).
Tested models considered within a forward stepwise regression process are given in Table 7.According to the values of adj.R 2 , adj.ΔR 2 , RMSE and VIF coefficients, the third model, which included the predictor variables SECONDARYROADLENGTH_1000m, ROADLENGTH_50m and ROADLENGTH_50_300, was optimal.This model had lower RMSE and higher adj.R 2 values compared with the first two models, and multicollinearity was not present, as in the fourth model.The fifth model has a negative adj.ΔR 2 coefficient value (compared to the third model), while in the sixth and seventh models, there are B coefficient values that are not in accordance with the direction of effect.By applying the final step in model development, that is, statistical significance criteria, variables ROAD-LENGTH_50_300 and SECONDARYROADLENGTH_1000m were removed from the PM2.5_LURSUMMER model because their p > 0.05.Thus, in the third optimal model, ROAD-LENGTH_50m is the only significant predictor of the positive direction, with a B coefficient of 0.020 and p < 0.01.This final PM2.5_LURSUMMER model was shown to be statistically significant (F-test = 14.49, p = 0.001), with the regression equation represented by Equation (2).By applying this regression equation, the predictor explained 40.3% of the variation in the summer PM2.5 concentration.

PM2.5_LURSUMMER
Evaluation of the PM2.5_LURSUMMER model also showed that there were no observations with Cook's distances greater than 1.00, that White's test was not statistically significant (χ 2 (2) = 3.28, p = 0.198) (Figure 4b) and that values of skewness (SK = 0.853) and kurtosis (KU = −0.803)coefficients were within the acceptable range of ±1 [58,59], implying near normal residual distribution as can be seen on the residual plots (Figure 4a-d).Validation of the model using LOOCV is described in Appendix B.

Simple Seasonal PM2.5 Model Based on Aerosol Optical Depth Data
In this section, we show a simple and direct linear regression approach for connecting aerosol optical depth (AOD) data for the city of Novi Sad and PM2.5 ground-level data.
In recent years, successful efforts have been made to utilize satellite-based aerosol information like observations of Aerosol Optical Depth (AOD) to improve the spatial mapping and prediction of ground-level PM2.5.The relationship between AOD and PM is complex and depends on aerosol type and size, relative humidity, planetary boundary layer height and the vertical structure of aerosol distribution.Statistical approaches including single and multiple linear regression and more complex machine learning approaches for estimating PM based on satellite data have demonstrated some advantages compared to more traditional physical retrieval methods [60].Various satellite products including Copernicus Atmosphere Monitoring Service (CAMS), Global Near-Real-Time provided by the European Centre for Medium-Range Weather Forecasts (ECMWF) [61] and MCD19A2.061:Terra & Aqua MAIAC Land Aerosol Optical Depth provided by NASA LP DAAC at the USGS EROS Center [62], can be used to infer ground level PM2.5 concentrations using aerosol optical depth measurements originating from satellites.Such estimates of PM2.5 can be conducted using statistical approaches, which can range in complexity from linear regression models (e.g., generalized linear model [63] multiple linear regression [64]) to more complex models based on machine learning approaches [65].
Google Earth Engine was used to access and process the ECMWF/CAMS and MODIS datasets.The spatial resolution of the CAMS data is approximately 0.1 degrees (~10 km at the equator), while the MODIS AOD data from the MCD19A2 product has a native spatial resolution of 1 km.The process involved filtering the data by date, selecting relevant bands for AOD, averaging the data over the specified periods and resampling the data using bilinear interpolation.Figure 5 shows a scatterplot and regression line for seasonal models that use simple univariate linear regression with no intercept to quantify the relation between AOD data and seasonal PM2.5 ground-level data (which was also used for the seasonal LUR models presented in this paper).Validation of the model using the LOOCV approach is described in Appendix C.

Discussion
In most cities in Serbia, PM concentrations show drastic differences during heating and non-heating seasons [66], and the origin and simple quantification of this difference in the city of Novi Sad can be seen using results from reference monitoring and emission sources [67]; however, these lack spatial resolution.This difference was also analysed using a mobile monitoring approach (this increases spatial resolution) for PM10, PM2.5 and ultrafine aerosol fraction, and was performed using research-grade instruments, TSI Nanoscan 3910 and TSI OPS 3330 [68,69].These underpinning reports and research studies gave motivation for the development of the seasonal LUR model, which used reference instruments and enables quantification of differences between seasonal levels of air pollution, and consequently air pollution exposure and association between seasonal longterm exposure and possible health risk implications over the whole city of Novi Sad.
The initial set of predictors for the development of the Novi Sad winter PM2.5 LUR model (Table 4) had a mixture of source and sink predictors.The initial set contained the following predictors that model PM2.5 sources: ROADLENGTH_50m, which is a proxy for traffic intensity on nearby roads; suburban_area_3000m, which can be used as a proxy for individual heating sources in a site's surrounding area, and similarly, resident_area_300m.The initial set contained the following PM2.5 sinks: Class3_1000m, which is the number of forests and seminatural areas in a 1000 m buffer that models the amount of aerosol deposition, and similarly, park_area_3000m.Target PM2.5 seasonal concentration was inferred based on the measuring campaign that was conducted during the heating season months from December to March.These raw measurements were corrected to obtain seasonal averages using the ESCAPE procedure [52].The iterative development procedure resulted in the three models shown in Table 5.It is interesting to note that the best model in terms of explained variance contains only source predictors (suburban_area_3000m, ROADLENGTH_50m), while the second-best model contains a source (suburban_area_3000m) and a sink (park_area_3000m) predictor.While it is evident that the seasonal concentration depends on a complex interplay of source and sink processes, the extent of available data and applied statistical procedure for the model development favoured the source-only model.
The predictor group suburban area can be used as a proxy of air pollution sources related predominantly to household areas that are situated next to the urban city centre and DH area and quantifies the contribution of IH during the winter.The predictor that explained the most variance in PM2.5 concentration during the winter season in Novi Sad was suburban area in a buffer of 3000m, with adj.R 2 = 0.328.Buffers with smaller radii did not show statistical significance when explaining PM2.5 pollution.The second significant predictor variable, total road length within a buffer of 50 m, is a very rough proxy for variations in traffic density in the close vicinity, adding to the total explained variance of adj.R 2 = 0.548.
Generally, each developed LUR model contains unique definitions of potential independent variables, depending on the study area specifications, available data, and pollutants of interest [70].In cases where data on traffic intensity are not available or do not exist and thus cannot be linked to the local and central road bases, road classification and their lengths within the research area can be used as proxies for the potential predictor variables [23,36,71].Similarly, traffic emissions can be used as a proxy for major arterials, railroads or street lengths in cases where these data do not exist [72].Seasonal LUR models that model higher winter concentrations of PM2.5 and the potential influence of the heating combustion process were developed for several cities in China [35,73,74].A winter model developed by Shi et al. [35] explained 61% of winter PM2.5 variability, and predictors in the final LUR model were average building floor areas within 3000 m that explained the most variation, followed by water area within 3000 m and average building heights within 50 m (predictors with negative coefficients).Other types of predictors such as meteorological factors, roads, land use types, various points of interest for better temporal resolution and elevation were used for LUR model development in Beijing [73] and explained between 54-86% of PM2.5 winter variability.The highest percentage of explained PM2.5 variability among the mentioned heating season models was 94%, with the explanatory variables precipitation, density of roads, elevation, average air pressure and vegetation [74].The LUR model created by Saucy et al. [75] explained 29% of the variance with the use of population/building density within 25 and 300 m buffers, construction sites within a 100 m buffer, length of bus routes in a 300 m buffer and distance to the nearest waste burning sites as explanatory variables.Road lengths, as significant predictors within smaller buffer zones, were also reported in winter LUR models developed for Calgary, Canada [37], where the total length of major roads within a buffer of 200 m was significant, noting that in that study buffer zones started from 100 m due to local city fabric.The value of adj.R 2 for this winter model was 45%, similar to [36].The best-explained variability of PM2.5 in the winter season was 90%, reported in [38], which used six independent predictor variables.
The initial step of the model development produced a list of predictors for the Novi Sad summer PM2.5 LUR model (Table 6) that had only source predictors serving as proxies for traffic intensity.Target PM2.5 seasonal concentration was inferred based on a measuring campaign that was conducted during the non-heating season months of July-September.These raw measurements were corrected to obtain seasonal averages using the ES-CAPE procedure [52].
The iterative development procedure resulted in the seven models shown in Table 7.The procedure started with a univariate model with SECONDARYROADLENGTH_1000m as a predictor.This predictor was calculated over a large buffer area and varied with the number of secondary roads in the part of the city encircled by the buffer.The iterative procedure then continued and arrived at the bivariate model with SECONDARYROADLENGTH_1000m and ROADLENGTH_50m (number of roads in a very small buffer around the site) as predictors.Note that so far, all models derived from the stepwise procedure have statistically significant predictors; however, also note that VIF is increasing.Finally, following the stepwise ESCAPE procedure, we arrived at the trivariate model with SECONDARYROADLENGTH_1000m, ROADLENGTH_50m and ROADLENGTH_50_300 (number of roads in a circular ring with inner and outer diameters of 50 and 300 m, respectively) and explains a total of adj.R 2 = 0.570.Here, we stop the ESCAPE procedure since no new predictors can be added that increase adj.R 2 .Note also that the VIF parameter is further increased, indicating a mild collinearity between the predictors.It is known that such collinearity between the predictors does not reduce the explained variance; however, it does reduce the statistical significance of the predictors.The ESCAPE procedure mitigates this by omitting, in the final step of the model development, predictors that are not statistically significant.For purposes of completeness, we have also, in Appendix B, added validation of the first two models in the stepwise procedure, since these were fully statistically significant but were not reached by direct application of the ESCAPE procedure.
In the summer seasonal model developed for the city of Novi Sad, ROAD-LENGTH_50 m was the only significant predictor variable included in the model, with an explained variance of PM2.5 concentration of 40%.The road length predictor within a buffer of 50 m had a higher value of R = 0.658 than in the winter.This can be explained by the fact that small-scale traffic variations are more easily observed due to fewer variations in emission sources of PM2.5 in urban environments in the summer season.Few existing studies have efficiently used road length without linked traffic, but good road classification is needed for their application as potential predictor variables [70,71].
In comparison to our PM2.5_LURSUMMERmodel and achieved percentage of explained variance, models in [32,37,38] explained 72%, 50% and 59% of PM2.5 variance, respectively.In all three studies, predictor variable total lengths of local roads were included in the final LUR models.The summer model developed by Bertazzon et al. [37] included the sum of several road segment lengths in a buffer of 750 m as a final predictor.Contrary to previously mentioned models and our LURSUMMER model, the model developed by Jeffrey [72] used explanatory variables based on road classes and explained 97% of PM summer variability using additional predictor variables, including temperature, average air pressure, altitude, humidity and precipitation.The winter model's final predictors developed by [35] were the same as for their summer season model, which explained 52% of summer variability.The lowest explained PM2.5 variability for summer among reviewed articles was 36% and the explanatory variables included in the final model were the length of railways in a 1000 m buffer, distance to the nearest open grills, population/building density, waste burning sites and distance to nearest refuse transfer station [75].
Both the winter and summer LUR models outperform the AOD satellite data-driven approach, as evidenced by the consistently lower RMSE values for the winter and summer datasets (see Appendices B and C).This suggests that incorporating LUR predictors enables better capturing of local-scale variations in PM2.5 concentrations compared to AOD satellite-derived data.While satellite observations can offer broader coverage, their limited ability to resolve fine-scale variations results in higher RMSE values, highlighting the importance of integrating LUR-specific predictors for more accurate air quality assessments.

Conclusions
The developed PM2.5_LURWINTER and PM2.5_LURSUMMER models are seasonal land use regression models.These models represent an advancement in the field of land use regression (LUR) for PM2.5 in the region.
The PM2.5_LURWINTER model includes two predictors; the first one can be associated with individual heating sources in the city.Along with the suburb_3000m predictor as a proxy for combustion processes in individual heating sources, predictor ROADLENGTH_50m stood out as the predictor of traffic originating PM2.5 in the winter model, quantifying the influence on PM2.5 particulate matter emission during the winter due to close vicinity roads.The PM2.5_LURSUMMER model emphasizes the influence of traffic as the most important emission source in the summer, though only one predictor ROAD-LENGTH_50 m was statistically significant following the final step of the ESCAPE procedure.The lack of street-level traffic data probably affected the achieved percentage of variability explained by both models and including currently unavailable street-level traffic model would be the straightforward path for improvement of the model.
The results obtained in this study could be of direct use in the assessment of air pollution-related health risks in the municipality and contribute scientifically and practically to local authorities dealing with air quality control.Since models showed good predictive accuracy, developed seasonal PM2.5 LUR models for Novi Sad can be used in information products as base maps of fine particulate matter pollution.
As part of further research opportunities, we plan to refine developed models through additional measurement campaigns, utilizing low-cost sensor data and adding more exact traffic and heating predictors for emissions instead of proxies, thus increasing the interpretability of the model.Low-cost sensor data can be used to increase the spatial and temporal resolution of the seasonal models by fusing measurement data with modelled pollution data.Another possible direction of future research that may bring improvements to heating and non-heating season model predictive efficiency is the inclusion of machine learning algorithms and more complex models utilizing satellite data since a simple aerosol optical depth model gave promising results in this study.traffic density, road length, classes, distance from the river and population.Next to each predictor, in the last column in Table A1, is the potential direction of effect.The direction of effect plays a role during model development and will be discussed further in the following section.
Table A1.List of potential predictor variables, units, buffer size and direction of effect of target variable PM2.5.Note.Class1-artificial area, Class2-agricultural areas, Class3-forest and semi natural areas, Class4-wetlands, Class5-water bodies, suburban_area-area of households, outside strictly urban areas, with household heating on solid fuels and private central heating, WIN_A-mean daily winter traffic densities for class A on all roads in a buffer (sum (traffic density x length of all road segments)), WIN_B-mean daily winter traffic densities for class B on all roads in a buffer (sum (traffic density x length of all road segments)), WIN_C-mean daily winter traffic densities for class C on all roads in a buffer (sum (traffic density x length of all road segments)), ROADLENGTHlength of all roads in a buffer, PRIMARYROADLENGTH-length of all primary roads in a buffer, SECONDARYROADLENGTH-length of all secondary roads in a buffer, TERTIARYROAD-LENGTH-length of all tertiary roads in a buffer, RESIDENTROADLENGTH-length of all resident roads in a buffer, DISTINVMAJOR1 and DISTINVMAJOR2-distance to the nearest primary road, DISTINVSECOND1 and DISTINVSECOND2-distance to the nearest secondary road, DISTINVTERT1 and DISTINVTERT2-distance to the nearest tertiary road, DISTINVNEAR1 and DISTINVNEAR2-distance to the nearest road.NA-Not applicable [53].

Land
Different land cover classes, especially land use patterns, existing within the city have different impacts on urban air pollution and are strongly correlated with PM2.5 concentration levels [77,78], along with traffic density, road length and road classes [79,80].
The selection of potential predictors must be in accordance with the specifics of the city and based on the main known emission sources and other factors that may influence PM2.5 concentrations.The definitions of the majority of the predictors are equivalent to those in the ESCAPE study [24].In this study, potential predictor directions of effect were kept the same for both seasons.
The land cover group of predictors, artificial surfaces and agricultural areas are given a positive direction of effect, while forests, wetlands and water areas have a negative direction.The agricultural area can be treated as a seasonal predictor, with a greater possible impact on PM2.5 during the summer than the winter; however, in both seasons, certain agricultural activities such as straw burning are possible, resulting in a sharp increase in PM2.5 concentration within a short time [77,78].As a predictor, artificial surface can be divided into many subclasses; however, in this study, it was used collectively and treated as a contributor to PM2.5 pollution due to its potential as a proxy for many primary and secondary anthropogenic emission sources and its stronger spatial association at small scales [81,82].Besides the previously mentioned variables, every artificial surface, particularly larger ones, can cause urban heat islands, which can further transport air pollutants [82].The predictors forest, wetland and water were assigned a negative direction of effect, based on the assumption of their potential to decrease PM2.5 pollution.Green and (some) natural surfaces can be considered sinks for PM2.5 due to the process of aerosol surface deposition [78]; however, vegetation capacity for deposition may change from season to season [81].Proximity to water surfaces often provides better conditions for pollutant dispersion [78].
The group of land use predictors consists of buildings, parks and industrial, forest, suburban, and resident areas, of which only park and forest areas are associated with PM2.5 concentration decrease, similar to forest, wetland, and water areas in the land cover predictors.Building areas can be treated as the main urban elements that affect PM2.5 dispersion [79].In the last few decades, Novi Sad has been in the phase of intensive urbanization characterized by changes in the urban structure of the city, with the construction of many new buildings, a mix of high-rise and low-rise buildings, and underground and above-ground garages.Considering that building areas are treated as a factor that increases PM pollution, especially because the populations of small areas with multi-story buildings create conditions for canyon street effects and possible increase in PM pollution.Industrial areas have similarities with artificial surfaces and generate a certain amount of PM2.5 pollution.The residential area is also assigned a positive direction of effect and reflects the influence of urban life activities of inhabitants on PM2.5 concentrations.The suburban area as defined in Table A1 is particularly used in this research as a proxy for the heating season's impact on PM2.5 during winter and various characteristic activities of household areas such as agriculture and burning of certain biomass during the summer.Individual heating systems have a major influence on local air quality generally [83], as well as in Novi Sad, and with this predictor, we aimed to model pollution generated over a larger area, outside the local variations within the DH area.While "landuse = farmland" signifies rural areas, it can also be indirectly helpful for separating urban areas from suburbs.Since suburbs typically have lower population density compared to the city core and might have pockets of farmland on their fringes, the presence of "landuse = farmland" surrounding a designated urban area could indicate a suburban zone.
The predictor group local traffic density has a positive direction of effect on PM2.5 pollution since 18 of the 21 measuring locations are traffic sites or sites that are near traffic.Direct traffic impacts include emissions from tailpipe exhausts, brake wear and tyre wear [84] as well as the impact of resuspended dust from roads [85].
Road length and classes as potential predictors consisting of the different road classes are used as a proxy for traffic density and are therefore assigned a positive direction of effect.Road length as a proxy for traffic density has been used effectively in many scientific studies [86][87][88].Land use data (Figure A2) were downloaded from the Open Street Map (OSM) website via a publicly available source [90].Data were classified into building area, park area, industrial area, forest area, farm area and resident area.The results of the PM2.5_LURSUMMER model validation, in which the only predictor was ROADLENGTH_50 m, generated using the LOOCV method, are presented in Table A3.The value of adj.R 2 ranges from 31 to 45% for the training set.The RMSE coefficient ranges from 2.74 to 3.01 for the training set, and on left-out observation (ytrue), the RMSE and MAE are 2.80 and 3.17 µg/m 3 , respectively.Note how the RMSE for the left-out observation falls within the RMSE range for the training set.Based on the obtained results, the tested winter model is stable and is not sensitive to excluded observations.

Appendix C
The validation results for the PM2.5AODWINTER and PM2.5AODSUMMER models are presented in Table A6.The obtained RMSE values ranged from 5.25 to 6.80 µg/m 3 for winter and from 3.44 to 3.83 for summer.Based on the obtained results, both the winter and summer models are stable and are not sensitive to excluded observations.on site 17.At sites 4 (URB), 6 (URB), 5 (URB) and 21 (URB), which had very low traffic intensities, the lowest PM2.5 concentrations were in the range 6.77-12.48µg/m 3 .

Figure 1 .
Figure 1.Example of measuring sites: (a) site 13 in the urban zone, (b) site 3 at the border of the urban and industrial zone, (c) site 7 in the industrial zone and (d) site 17 in the background zone.Table1provides more details on each of the measuring sites.

Figure 2 .
Figure 2. (a) Map of the background (BCG), industrial (IND), urban (URB) and urban/industrial (URB/IND) sampling sites in Novi Sad used for developing the seasonal LUR models.(b) City

Figure 3 .
Figure 3. Residual plots for winter: (a) Quantile plot of the residuals; (b) Residuals vs. predicted values; (c) Histogram of the residuals; (d) Chart of residuals.

Figure 4 .
Figure 4. Residual plots for summer: (a) Quantile plot of the residuals; (b) Residuals vs. predicted values; (c) Histogram of the residuals; (d) Chart of residuals.

Figure 5 .
Figure 5. Seasonal PM2.5 concentration averages at 21 locations in the city of Novi Sad vs. aerosol optical depth (AOD) (a) during winter and (b) during summer.

Appendix A. 1 .
Land Cover, Land Use, Distance from the River and Population Density Land cover data (Figure A1) in the form of vector and 100 m raster data were downloaded from the Corine Land Cover (CLC) website [89].The data were categorized into five main CLC classes: artificial surfaces, agricultural areas, forest and seminatural land, wetlands and water bodies.

Figure A1 .
Figure A1.Corine Land Cover classes for Novi Sad, Petrovaradin and Sremska Kamenica and all settlements.

Figure A2 .
Figure A2.Open Street Map land use classes for Novi Sad, Petrovaradin and Sremska Kamenica.

Table 1 .
Locations used for PM2.5 sampling in two seasons: location types, coordinates and sampling periods.

Table 2 .
Mean seasonal PM2.5 concentrations (µg/m 3 ) for all sites during heating and non-heating seasons. .URB-urban location, IND-industrial location, URB/IND-location at the border of the urban and industrial zone, BCG-background location. Note
3.2.Development and Evaluation of Seasonal LUR Models3.2.1.Development and Evaluation of Winter LUR Model

Table 4 .
Correlation coefficient, R, and the number of available observations (different from zero), N, for the initial set of possible predictors for the winter LUR model.

Table 5 .
Models tested for the winter season.
Note.R-coefficient of multiple correlation, adj.R 2 -corrected coefficient of multiple determination-percentage of explained variance, F-test-value of the F quotient, DF-number of degrees of freedom, VIF-variance inflation factor (multicollinearity index), RMSE-root mean square error.

Table 6 .
Correlation coefficient, R, and the number of available observations (different from zero), N, for the initial set of possible predictors for the summer LUR model.

Table 7 .
Tested models for the summer season.-coefficient of multiple correlation, adj.R 2 -corrected coefficient of multiple determination-percentage of explained variance, F-test-value of the F quotient, DF-number of degrees of freedom, VIF-variance inflation factor (multicollinearity index), RMSE-root mean square error.
Note.R 2-coefficient of determination, adj.R 2 -adjusted R 2 , p-value-probability of F-statistic, RMSE-root mean square error of the model, MAE-mean absolute error of the model, ypredictedmodel prediction at the left-out location, ytrue-measurement at the left-out location, Δy = ypredictedytrue error of the prediction.

Table A3 .
Validation results of the final PM2.5_LURSUMMER model.Units for the RMSE, MAE, ypredicted, ytrue, and Δy values are in µg/m 3 . .R 2-coefficient of determination, adj.R 2 -adjusted R 2 , p-value-probability of F-statistic, RMSE-root mean square error, MAE-mean absolute error of the model, ypredicted-model prediction at the left-out location, ytrue-measurement at the left-out location, Δy = ypredicted − ytrue error of the prediction. Note

Table A4 .
Validation results of the first LUR summer model 1 (with predictor SECONDARYROAD-LENGTH_1000m).Units for the RMSE, MAE, ypredicted, ytrue, and Δy values are in µg/m 3 . .R 2 -coefficient of determination, adj.R 2 -adjusted R 2 , p-value-probability of F-statistic, RMSE-root mean square error, MAE-mean absolute error of the model, ypredicted-model prediction at the left-out location, ytrue-measurement at the left-out location, Δy = ypredicted − ytrue error of the prediction. Note
Note.R 2 -coefficient of determination, adj.R 2 -adjusted R 2 , p-value-probability of F-statistic, RMSE-root mean square error, MAE-mean absolute error of the model, ypredicted-model prediction at the left-out location, ytrue-measurement at the left-out location, Δy = ypredicted − ytrue error of the prediction.

Table A6 .
Validation results of the PM2.5_AODWINTER and PM2.5_AODSUMMER models.All units in the table are in µg/m 3 .RMSE-root mean square error, MAE-mean absolute error of the model.ypredicted-model prediction at the left-out location, ytrue-measurement at the left-out location, Δy = ypredicted − ytrue error of the prediction. Note: