Spatio-Temporal Modeling of Small-Scale Ultrafine Particle Variability Using Generalized Additive Models

High-resolution measurements of ultrafine particle concentrations in ambient air are needed for the study of health human effects of long-term exposure. This work, carried out in the framework of the VIEPI project (Integrated Evaluation of Indoor Particulate Exposure), aims to extend current knowledge on small-scale spatio-temporal variability of Particle Number Concentration (PNC, considered a proxy of the ultrafine particles) at a local scale domain (1 km × 1 km). PNC measurements were made in the university district of San Lorenzo in Rome using portable condensation particle counters for 7 consecutive days at 21 sites in November 2017 and June 2018. Generalized Additive Models (GAMs) were performed in the area for winter, summer and the overall period. The log-transformed two-hour PNC averages constitute the response variable, and covariates were grouped by urban morphology, land use, traffic and meteorology. Winter PNC values were about twice the summer ones. PNC recorded in the university area were significantly lower than those observed in the external routes. GAMs showed a rather satisfactory result in order to capture the spatial variability, in accordance with those of other previous studies: variances were equal to 71.1, 79.7 and 84%, respectively, for winter, summer and the overall period.


Introduction
Several studies have assessed the health effects of particles in the ultrafine mode (PM 0.1 ). The main concern is related to their capability to go deeper into the lung, where they are retained longer than the coarser particles, leading to pulmonary inflammation and endothelial dysfunction [1]. Moreover, it has been shown that ultrafine particles (UFP) exposure led to coagulation change that can be a risk factor for hypertension and cardiovascular disease. Among others, they have been linked to diabetes and cancer as well as cerebral dysfunctions due to direct translocation to the brain through the olfactory nerve [1,2].
Since ultrafine particles contribute largely to the airborne total particle number concentration (PNC), it was often used as a proxy for PM 0.1 exposure studies [3][4][5].
The outdoor PNCs spatial and temporal variability assessment and related population exposure gradients are key topics in order to evaluate the related health effects.
Some studies have attempted to evaluate the PNC spatial variability at an urban scale using Land Use Regression models (LURs). The LURs have been applied to large urban areas such as Amsterdam [6], Vancouver [7], New Delhi [8], Boston [9], Toronto [10], Rome [11], Berlin [12], London [13] and El Paso [14] including, at times, also the surrounding municipalities. In other cases, the method has been extended to wider territories of Girona (Spain, [15]) to include the entire national territory (Switzerland, [16]; the Netherlands [17]). These studies were based on multiple regression between the concentrations measured in different points (usually ranging between 20 and 80) and spatial explanatory variables extracted within zones of influence around each monitoring site, mainly starting from the assumption that the functional link between the regressors and the response variable was linear.
Although LUR models can be effective tools for evaluating the spatial distribution of PNC, leading to the identification of high-resolution concentration domains, actually the hypothesis that the functional link between the regressors and the response variable is linear seems weak, also after the PNCs log transformation.
In particular, in urban areas it has been observed that the PNC shows much more pronounced variations than that of the mass concentration of PM 10 as the distance from the road increases [18][19][20], and that the nanometric fraction of UFP is predominant in the measurement sites close to road traffic or hot spots [21,22]. Indeed, several studies have identified an exponential decrease in PNC as the distance from the road increases, for the unobstructed topographic settings [23,24]. It was also found a relative maximum in the PNC for distances around 90 m from heavy traffic roads such as motorways and expressways [18,25]. Moreover, if not only long-term exposure must be taken into account, the temporal variability also has to be contemporarily assessed.
PNC, in the field of ultrafine particles, varies over time both due to the processes of formation and growth of particles [26][27][28] and as a function of meteorological and micrometeorological physical-chemical parameters [25,[29][30][31][32][33]. The formation mechanism (nucleation) leads to the generation of new UFP in the atmosphere starting from gaseous compounds; the volumetric growth of UFP is linked both to the condensation of gases on the UFP surface or to their dissolution in the liquid film that covers the particles. The main removal processes reside in dry and wet deposition [27,28].
The relationship between PNC and the main meteorological variables has been generally described by complex, non-linear functions [25,34,35].
The Generalized Additive Models (GAMs), using smoothing functions, allow the evaluation of non-linear interactions between the covariates and the response variable even in the very frequent case in which there is no prior knowledge of the kind of functional bond [36,37].
The use of smoothing functions in place of deterministic functions based, in linear models, on the estimation of regression parameters, has produced excellent results in the analysis of complex ecological systems [38,39].
The use of splines as smoothing functions allows both for reproducing the global trend of the contribution that the covariate provides to the response variable and to better approximate any local trends in particular intervals of the domain of existence of the explanatory variable [34,40].
GAMs have been used to assess the temporal relationships between PNC measured at fixed sites and meteorological parameters, allowing for positive results in terms of statistical performance [34,35,41].
However, both LUR models and GAMs have been applied to local (representative of a neighborhood or suburb) or microscales (representative of individual roads) only in a few studies. At such small spatial scale, spatio-temporal assessment is particularly challenging due to highly heterogeneous conditions [42][43][44]. Actually, concentrations in small urban domain are strongly dependent on short-term traffic variability [45,46]. The configurations of building and streets can play a relevant role by modifying local airflow patterns [47][48][49].
To the best of the authors' knowledge, only a few studies [12,50] have addressed the goal to capture and effectively represent UFP urban hotspots spatial variability at a local or microscale through statistical models. The aim of our work, carried out in the framework of the VIEPI project (Integrated Evaluation of Indoor Particulate Exposure) [51], was to further extend the current knowledge on small-scale spatio-temporal variability of PNC at local scale domain (1 km × 1 km).
To this purpose, we developed empirical regression models, GAMs, accounting for seasonal and diurnal variability. We demonstrate quantitatively the modeling approach effectiveness, while the PNC variability main descriptors and their contribution to the PNC-explained variance were assessed. The complexity of local urban design features was also explored using GIS derived urban morphology explanatory variables.

Study Domain and Monitoring Campaigns
The study domain has an area of 1 km 2 centered on the E. Fermi Physics building of Sapienza University, one of the largest and oldest universities in Europe, accounting for more than 100,000 students and about 8000 workers, in the district of San Lorenzo in Rome (Italy). The university campus extends over an area of 0.22 km 2 where parking is allowed to a limited number of cars.
The sampling points were chosen with the aim of obtaining the maximum spatial gradient. It was assumed from a previous study [11] that spatial variability mainly depends on the traffic on the main roads and on the building shape and density.
Moreover, the sampling points within the university were chosen considering the effects of buildings on the wind. For this reason, a series of preliminary simulations with high spatial resolution were carried out using the three-dimensional, non-hydrostatic ENVIMET numerical model, able to reproduce the fluid dynamic field in correspondence with groups of obstacles [52]. Four scenarios were simulated, corresponding to the four prevailing wind directions in the area under consideration [53], i.e., North (0 • ), Northeast (45 • ), South (180 • ), and Southwest (225 • ), with a constant wind intensity of 1.5 m/s (Supplementary Materials Figure S1). The reproduction of the prevailing anemological conditions allows the drawing of general conclusions not related to single events and, therefore, is particularly useful for the choice of sampling points.
In addition, the proximity to the surrounding nearest roads was taking into account. PNC was measured using handheld condensation particle counters (CPC3007, TSI inc., Shoreview, MN) which operated at a flow of 100 mL/min measuring particles within the range of 10-1000 nm with a time resolution of 1 s. Due to the large amount of sampling points along with the limited number of available devices, a rounding measurement strategy was adopted. We selected three paths ( Figure 1: A, blue line-inside the university, B, red line -all around the university fence, and C, green line-within the surrounding San Lorenzo district) with seven sampling points each; three operators carried out measurements contemporarily on each path. Each of the 21 selected measurement points was visited three times during the day at pre-established times, representing three specific time bands at local times 8:00-10:00; 12:00-14:00 and 16:00-18:00.
For each point, 10-min measurements were made. The CPCs were carried from an operator alongside the chosen path. Each sampling point was previously geolocated.  (Tables S1 and S2).  (Tables S1 and S2).
The concentration data measured in 10 min in the November and June campaigns were reconstructed for an average time of two hours: two hours and ten minutes was in fact the time needed to visit the 7 sites of each path with measurements lasting 10 min, so the two-hour average was considered the average temporal unit of the models. The averaged values over the 2 h were reconstructed using continuous and parallel measurements carried out on the roof of the Physics building for the duration of the campaigns in June and November.
The two-hour mean PNC determined at the various points was used as the response variable of the model. In order to follow a normal distribution of the response variable ( Figure 2), a logarithmic transformation was applied [34,54]. The concentration data measured in 10 min in the November and June campaigns were reconstructed for an average time of two hours: two hours and ten minutes was in fact the time needed to visit the 7 sites of each path with measurements lasting 10 min, so the twohour average was considered the average temporal unit of the models. The averaged values over the 2 h were reconstructed using continuous and parallel measurements carried out on the roof of the Physics building for the duration of the campaigns in June and November.
The two-hour mean PNC determined at the various points was used as the response variable of the model. In order to follow a normal distribution of the response variable ( Figure 2), a logarithmic transformation was applied [34,54].

Potential Spatial Explanatory Variables
The calculated spatial variables were the following:

Potential Spatial Explanatory Variables
The calculated spatial variables were the following:  Figure S3).

2.
Land use: Urban green area, Continuous Urban Fabric representative of continuous areas with high population density, Discontinuity Density representative of discontinuous areas with medium-low population density and Industrial Commercial Public representative of public areas, were calculated in the domain of interest, starting from the data available in the Urban Atlas 2012 section of the Copernicus Monitoring Land Service, at spatial resolution of 10 m. Imperviousness, which was representative of impermeable surfaces, was calculated at a spatial resolution of 100 m.

3.
Population: the number of inhabitants was considered in the corresponding census sections, with reference to the last ISTAT census of 2011 (https://www.istat.it/it/ archivio/104317; accessed on 16 November 2021).
Each spatial variable was calculated at the measurement points in buffers with a radius of 12.5 m and 25 m. GIS analyses and mapping were performed using ArcGIS (v.10.3; ESRI).

Potential Temporal Explanatory Variables-Meteorological Parameters
The time variables used as potential predictors were wind intensity and direction, air temperature, relative humidity, atmospheric pressure, height and intensity of rainfall and global solar radiation. Moreover, the standard deviation of wind intensity and turbulent kinetic energy (TKE) were taken into account since they are representative of the atmospheric turbulence [55].
The meteorological parameters were provided by the Regional Protection Agency of the Lazio (ARPA LAZIO). Measurements were carried out using a sonic anemometer placed on the roof of the ARPA LAZIO building located 2 km NW from the study domain center.
With the aim of providing information on the air dispersion volume, the planetary boundary layer height (PBLH) was estimated based on remote sensing measurements carried out using a Light Detection And Ranging (LiDAR) sensor belonging to the BAQUNIN supersite [56] and installed on the rooftop of the E. Fermi building, in the center of the domain. The measurements were carried out continuously during the summer measurement campaign, and the 10-min temporal resolution PBLH was retrieved following the procedure described in [57].
All time parameters were averaged over 2 h to be used in the empirical models.

Potential Spatial and Temporal Explanatory Variables-Traffic Flows
Traffic flows vary significantly both spatially and temporally. Traffic data were based on traffic count taken during the measurement campaigns at each point (10-min cumulative counts); the average hourly traffic measured (vehicles/hour) was compared with the modelled traffic flows provided by the Mobility Agency of the Municipality of Rome, referring to an average year, in the main roads around the campus. Linear regression between the traffic measurements and estimations, for the monitoring campaigns in the three time bands in the cold season (8-12 November 2017) and the warm season (17-23 June 2018), were performed. The regression coefficients ranging between 0.62-0.79 in the cold season and 0.59-0.84 in the warm season. We then calculated, based on the regression equations, the traffic in the hours in which the traffic measurements were not carried out.

Models Development
The explanatory variables described were considered as potential predictors in the development of Generalized Additive Model (GAM, [34]). The GAM equation is defined as follows [40]: where: yi = response variable µ i ≡ E(yi) = expected value of yi yi~EF(µ i , ϕ) = exponential distribution of yi A i γ = ith row of the model parameter matrix with its corresponding vector f j x ji = smoothing function for the covariates j All the independent temporal variables in input to the model described above were attributed to the time average of two hours.
Three GAMs were fitted for the cold and warm week and for the overall period, using the log-transformed two-hour PNC averages as the response variable.
Among potential predictors only those with at least 90% non-zero data were selected. Correlation matrices for each group of variables (temporal, spatial and spatio-temporal) were presented in Table S3 and Figures S7-S9 of the Supplementary Materials; starting from the best correlation between these variables and the natural logarithm of PNC, a stepwise forward procedure was followed.
In the variable stepwise selection, each variable was added if: it gave an incremental R 2 adjusted higher than 1%; the a priori direction of the effect was respected and it did not change the direction of effect of previously added variables.
Moreover, test checks were performed to select the best spline for each variable. The smoothing parameter was chosen controlling the balance between likelihood function and the overfitting, founding the convergence of the iteration algorithm (gam check routine of the R mgcv package). The K (or K' used in gam check routine) parameter represents the number of basis functions that determines how wiggly a smooth can be. Where not expressly indicated, it is assumed that the value of parameter K is equal to 10. The Effective Degrees of Freedom (EDF) represents the complexity of the smooth in terms of curve sinuosity (the higher the EDF value, the greater the spline non-linearity). If there are not enough basis functions, it may not be wiggly enough to capture the relationships in data (EDF is close to K'). The F parameter is used in an ANOVA test to confirm overall significance of the smooth.
Interactions between variables were also assessed using specific functions in the R mgcv package (e.g., "Tensor smooths") [58].
Standardized data adaptation criteria, Akaike Information Criterion (AIC), Generalized Cross Validation (GCV) and Bayesian Information Criterion (BIC) were calculated for the developed models to control the phenomena of improper adaptation due to overfitting. Finally, residual normality was graphically checked in order to confirm the basic assumption.
Graphical tools and statistical indicators (k-fold Root Mean Square Error, k-fold CV, Adjusted R-Squared) were used to evaluate the performance of the GAMs fitted. The R software ([59,60] R Development Core Team) and in particular the gam function implemented in the mgcv library was used.     Two-hour averages PNC at measurement sites ranged between 3846 and 32,483 particles/cm 3 in the warm season; the observed range was 5722-101,594 particles/cm 3 during the cold season. The temporal variation, assessed by means of coefficient of variation calculated in each site, ranged between 27% and 71%. The within-area contrast (expressed as the estimated average concentration range by median ratios) was 91.7% and 86.4% in the cold and warm season, respectively. This addresses for the existence of a high spatial contrast in the PNC despite the small study domain, with implications on population exposure [5].

Particle Number Concentration Measurements
The cold season average values were approximately double that of the warm period Two-hour averages PNC at measurement sites ranged between 3846 and 32,483 particles/cm 3 in the warm season; the observed range was 5722-101,594 particles/cm 3 during the cold season. The temporal variation, assessed by means of coefficient of variation calculated in each site, ranged between 27% and 71%. The within-area contrast (expressed as the estimated average concentration range by median ratios) was 91.7% and 86.4% in the cold and warm season, respectively. This addresses for the existence of a high spatial contrast in the PNC despite the small study domain, with implications on population exposure [5].
The cold season average values were approximately double that of the warm period (see Figure S4, for detailed site-by-site comparisons). Moreover, the PNC median was larger during working days than on Sundays (when traffic flows were reduced, see Figure S6). Saturday has intermediate concentration values between Sunday and the typical values of weekdays. These results confirmed that our monitoring strategy allows for sufficiently capturing the typical diurnal PNC variability (two rush hours maximum in the 8-10 and 16-18 time bands and the typical daily minimum in the 12-14 time band).
The seasonal and temporal variability observed were nicely in agreement with longterm PNC observations carried out within the same study domain at a traffic influenced site (located a few meters from the curbside [61,62], as well as with other studies across Europe e.g., [63]).
At the measurement points inside the university area (path A away from the roads), significantly lower values (p < 0.01) compared to those observed at the measurement spots along the two external paths, close to major roads with high traffic intensity (see B1, B2, B3, B4, B5, C1, C3 and C7) were observed. Conversely, the average concentrations at points B6, C2 and C6, relatively far from the high traffic roads, were comparable, in both seasons, with those within the university area.
These finding were consistent with the sampling point characterization showed in Figure S5, where the ratio between traffic flows on the nearest road and distance from the road traffic for each site was reported. With this regard, it should be noted that total traffic flows (calculated in directions AB + BA averaged over 2 h) were linearly correlated with PNC both in June (r = 0.48, p < 0.01) and in November (r = 0.39, p < 0.01).
The spatial contrast in measured PNC was represented by the ratio of the minimum and maximum value of the average two seasonal campaigns in the overall measurement points. The calculated parameter in overall 21 measurement points was 0.53 and was comparable with that estimated within the city of Rome (0.6) [11].

Model Evaluation
GAMs were performed to predict PNC in the study area for the cold and warm seasons (November 2017 and June 2018) and for the overall period (Overall model).
In the cold season model, the relative humidity (Urel), the traffic in the nearest street (Traff), the inverse distance squared to the nearest main/busy road (Distinv2), the difference between the maximum and minimum turbulent kinetic energy in two hours (DELTA_TKE) were selected as explanatory variables.   In the summer model, the scalar wind speed (Vscal), the traffic in the nearest road (Traff), the inverse distance squared to the nearest main/busy road (Distinv2) and the height minimum of the PBL (Hmix min ) were selected as explanatory variables.  Tables 3 and 4. In the overall model, built using data for both periods together, the temperature (Tdry), the scalar wind speed (Vscal), the traffic in the nearest road (Traff), the inverse distance squared to the main/busy road (Distinv2), the difference between the maximum and minimum turbulent kinetic energy in 2 h (DELTA_TKE), were selected as explanatory variables.  The smooth terms for the three models (Tables 1-6) are all significant (p-value < 0.001), and the choices of the K parameter for each variable return satisfactory results (k index > 1).
As shown in Figures S11, S13 and S15, for the three models the basic assumptions were respected: residuals were normally distributed and the constancy of variance, and independence of the variables were respected in accordance with the random distribution of the residuals versus estimated values.

Models Performance
The performance indices for the three models are shown in Table 7. In particular, a very good space-time prediction index is represented by the Root Mean Square Error (RMSE). K-fold cross-validation has been used to estimate the prediction error [64]. The cold season model explains 71.1% of the deviance (adjusted R 2 equal to 0.690). The summer model explains 79.7% of the deviance (adjusted R 2 equal to 0.779). The overall model explains 84% of the deviance (adjusted R 2 equal to 0.835). RMSE of the models showed rather satisfactory values. All performance parameters were consistent with those of other previous studies [34,35].

Influence of Explanatory Variables
To assess the influence of explanatory variables on estimated PNC levels, the percent relative effect (P) of each variable was calculated through the following equation [65,66]: where: α = ln (variable effect) P can be interpreted as PNC percentage change in relation to the baseline level. The frequency ratio of negative relative effect by meteorological covariate (NR) was calculated to better understand the sign of relative contributions, regardless of the magnitude. For each covariate NR parameter was calculated by the following equation: NR = Nneg/Npos where: -Nneg = n. of occurrences with negative p value -Npos = n. of occurrences with positive p value NR and the P median values, also disaggregated by time bands, are shown in Tables 8 and 9. In the summer model, Hmix min had the greatest percent relative effect, causing an increase in PNC of 45.8%, whereas Vscal exhibited a relevant negative effect (p = −27.6%). Meteorology variables proved to have a strong effect in the winter season too, even if smaller than the summer one: the biggest contribution to PNC increase (p = 18.2%) was caused by DELTA_TKE, while that to PNC decrease was due to Urel (p = −15.3%). The overall model results confirmed the significant effect of meteorological parameters.
Both Vscal and Urel had prevalence of negative effect occurrences. Hmix min showed a strong positive effect frequency. DELTA_TKE exhibited a large prevalence of negative effect occurrences in the overall model, instead a relevant prevalence of positive effect in the winter one.
Looking at the overall model in Table 9, for all variables (with the exception of DELTA_TKE), the smallest percentage contribution increase took place during the evening band. In the winter model, Urel exhibited a strong negative contribution during the noontime and evening bands. In the summer model, Vscal showed a great negative effect, especially in the evening. Traffic flow contribution during the winter period resulted bigger than in summer, and during the noontime band achieved a not-negligible 8.67% relative effect.

Discussion on Influence of Meteorological Parameters
The results of the cold season model showed a negative contribution of Urel (−15.3% in Table 8) while the trend of spline function ( Figure S10) was consistent with the studies that showed an increase of PNC at low temperatures and high humidity, due to the phenomena of new particles' formation [29,67]. Vehicle emissions consist of a mixture of hot organic gases and sulfur and nitrogen oxides, together with particles. The mixture of hot gases meets cold air, rapidly forming ultrafine particles in the nucleation mode (1-20 nm) in the first seconds after emission [32,68]. These particles are always formed, but their formation is favored by low temperature and high humidity and have been observed in traffic sites near the sources [25,69].
DELTA_TKE variable was a good interpreter of the standard deviation of the horizontal wind speed [70]. DELTA_TKE showed a different contribution during the cold season and the overall models (18.2% and −3.35%, respectively, see Table 8). The relationship with PNC was positive in the range 0.0-1.0 m 2 /s 2 (Figures S10 and S14) that was also the range that reproduce very well the daily temporal variability in the overall monitoring periods. Measured values in the chosen time slot (black dots in Figure 5a,b) seem well distributed related to daily variability in the overall period (continuous red line): out of that range, DELTA_TKE variable exhibited explanatory weakness as confirmed by confidence intervals of the spline functions.
Traffic flow contribution was positive with PNC in all the three models, although a greater contribution was evident in the winter period compared to the summer one. It is well known that the new particles' formation phenomena in the atmosphere by nucleation related to photochemical processes can be relevant in the summer period (the so-called midday nucleation events [18,71]), but particles formed through such events have been reported to fall mainly in the range 3-10 nm, which is outside the counting efficiency of the instruments we used. son and the overall models (18.2% and −3.35%, respectively, see Table 8). The relationship with PNC was positive in the range 0.0-1.0 m 2 /s 2 (Figures S10 and S14) that was also the range that reproduce very well the daily temporal variability in the overall monitoring periods. Measured values in the chosen time slot (black dots in Figure 5a,b) seem well distributed related to daily variability in the overall period (continuous red line): out of that range, DELTA_TKE variable exhibited explanatory weakness as confirmed by confidence intervals of the spline functions. Traffic flow contribution was positive with PNC in all the three models, although a greater contribution was evident in the winter period compared to the summer one. It is well known that the new particles' formation phenomena in the atmosphere by nucleation related to photochemical processes can be relevant in the summer period (the so-called midday nucleation events [18,71]), but particles formed through such events have been reported to fall mainly in the range 3-10 nm, which is outside the counting efficiency of the instruments we used.
Vscal had an important negative contribution in the warm season model, and less in the overall model (−27.6%, −7.10% in Table 8), in line with the trend of spline functions (Figures S12 and S14) that, with the limit of interval confidence, was consistent with several studies that showed a decrease of UFP with an increase of wind speed [25,34]; a decreasing exponential function was proposed with the minimum observed during a wind speed higher than 5 m/s [72]. In our study, the wind speed during summer (averaged over two hour) ranged between 1.0 and 2.9 m/s, and the spline has a maximum at around 2.0 m/s. However, it can be argued that, when the wind speed increases, the availability of particles on which the gases can condense is reduced, and therefore the probability that these lead to the formation of new particles increases (this also explains why in rural sites, Vscal had an important negative contribution in the warm season model, and less in the overall model (−27.6%, −7.10% in Table 8), in line with the trend of spline functions (Figures S12 and S14) that, with the limit of interval confidence, was consistent with several studies that showed a decrease of UFP with an increase of wind speed [25,34]; a decreasing exponential function was proposed with the minimum observed during a wind speed higher than 5 m/s [72]. In our study, the wind speed during summer (averaged over two hour) ranged between 1.0 and 2.9 m/s, and the spline has a maximum at around 2.0 m/s. However, it can be argued that, when the wind speed increases, the availability of particles on which the gases can condense is reduced, and therefore the probability that these lead to the formation of new particles increases (this also explains why in rural sites, where the concentration of particles is lower, the phenomena of formation of new particles have been shown to be more intense) [73]. The temporal variability is strongly dependent on the PBLH evolution over time and on the characteristics of the local wind. The trend of spline function of Hmix min ( Figure S12) showed a relationship with PNC that was consistent with what we generally expect in summer: the higher the atmospheric vertical turbulence, the lower the PNC as the other pollutants' concentration [25,74]. A typical summer trend in the aerosol vertical profile and PBLH was showed in Figures 6 and 7. The PBLH tends to increase with the increase in solar radiation, which favors convective phenomena until it reaches a maximum in the hottest hours of the day and then falls again in the afternoon.

Spatio-Temporal Predictions of PNC
Variables selected for each model were calculated at each study domain cell (25 m × 25 m). The models' equations were used to estimate average of PNCs at a two-hour time resolution. Inverse distance weighted interpolation of the resulting gridded PNCs was then used to generate PNC maps. Figures 8-10 show the spatial and temporal estimates of PNC in the study domain for a typically summer day, Wednesday 20 June 2018, in the three-time bands 8:00-10:00;

Spatio-Temporal Predictions of PNC
Variables selected for each model were calculated at each study domain cell (25 m × 25 m). The models' equations were used to estimate average of PNCs at a two-hour time resolution. Inverse distance weighted interpolation of the resulting gridded PNCs was then used to generate PNC maps. Figures 8-10 show the spatial and temporal estimates of PNC in the study domain for a typically summer day, Wednesday 20 June 2018, in the three-time bands 8:00-10:00;

Spatio-Temporal Predictions of PNC
Variables selected for each model were calculated at each study domain cell (25 m × 25 m). The models' equations were used to estimate average of PNCs at a two-hour time resolution.
Inverse distance weighted interpolation of the resulting gridded PNCs was then used to generate PNC maps. Figures 8-10 show the spatial and temporal estimates of PNC in the study domain for a typically summer day, Wednesday 20 June 2018, in the three-time bands 8:00-10:00; 10:00-12:00 and 16:00-18:00.      The maps show the model capability to capture the intra-day PNC variability PBLH temporal trend and the traffic flow pattern drive the PNC spatiotemporal var ity, since both traffic related variables and PBLH were important predictors in the w season model. PBLH tends to gradually increase in the morning until it reaches very high va which remain stable from 1 pm to about 7 pm (on average 2500 m-as shown in Fi 7). The estimated PNC were high in the morning (low PBL and high traffic values-Fi 8) continue to increase up to 12 (still low PBL- Figure 9) until they decrease over the e domain due to the phenomena of dispersion (high PBL- Figure 10).
Maximum PNC levels were found along the busiest streets, while large PNC de was observed just a few tens of meters away from the curbside.

Conclusions
We developed GAMs to predict both spatially and temporally resolved PNC v bility at a local (neighborhoods) scale.
The modeling approach effectiveness was demonstrated quantitatively. All the three models were fully compliant with the basic assumptions for the G models, as demonstrated by diagnostic GAM plot [34,35].
The stepwise procedure allows for developing generally robust models which form reasonably well in the cross-validation approach: the R 2 adjusted (0.690-0.835) higher or within the range of the best performing statistical models already develop the urban scale [34,41] and local scale [17,35].
Considering that the relationships between the target variable (PNC) and the p tial predictors were not linear, we demonstrated that including spline functions with s uncertainty levels significantly improved the models' performance (LUR R 2 adj = 0.52 GAM R 2 adj = 0.835).
To the best of the authors' knowledge, there are few studies [12,50] that have dressed the challenge to capture and effectively represent UFP urban hotspots spatial iability, at a local or microscale, through statistical models. Among them, ours is the The maps show the model capability to capture the intra-day PNC variability. The PBLH temporal trend and the traffic flow pattern drive the PNC spatiotemporal variability, since both traffic related variables and PBLH were important predictors in the warm season model. PBLH tends to gradually increase in the morning until it reaches very high values, which remain stable from 1 pm to about 7 pm (on average 2500 m-as shown in Figure 7). The estimated PNC were high in the morning (low PBL and high traffic values- Figure 8) continue to increase up to 12 (still low PBL- Figure 9) until they decrease over the entire domain due to the phenomena of dispersion (high PBL- Figure 10).
Maximum PNC levels were found along the busiest streets, while large PNC decline was observed just a few tens of meters away from the curbside.

Conclusions
We developed GAMs to predict both spatially and temporally resolved PNC variability at a local (neighborhoods) scale.
The modeling approach effectiveness was demonstrated quantitatively. All the three models were fully compliant with the basic assumptions for the GAM models, as demonstrated by diagnostic GAM plot [34,35].
The stepwise procedure allows for developing generally robust models which perform reasonably well in the cross-validation approach: the R 2 adjusted (0.690-0.835) was higher or within the range of the best performing statistical models already developed at the urban scale [34,41] and local scale [17,35].
Considering that the relationships between the target variable (PNC) and the potential predictors were not linear, we demonstrated that including spline functions with small uncertainty levels significantly improved the models' performance (LUR R 2 adj = 0.528 vs. GAM R 2 adj = 0.835). To the best of the authors' knowledge, there are few studies [12,50] that have addressed the challenge to capture and effectively represent UFP urban hotspots spatial variability, at a local or microscale, through statistical models. Among them, ours is the only one accounting for seasonal and diurnal variability, although the temporal distribution was only suitable within the limit of the measurement ranges of the descriptive variables of the models.
The PNC variability main descriptors and their contribution to the PNC-explained variance were assessed. Based on our best knowledge, this study was the first that demonstrated the usefulness of time-resolved traffic counts to predict both spatial and temporal PNCs while accounting for traffic load as well as meteorology, allowing for reliable spatiotemporal predictions of PNC. In particular, the influence of atmospheric parameters showed that, in the summer model, there was the greatest percent relative effect of a single meteorological variable (Hmix min ), whereas in the winter season and overall models the effects of meteorological parameters were generally comparable.
As far as we know, a few numerical simulations have been developed to predict PNC at a street/local scale [18,[75][76][77]; however, reliable emission inventories as well as monitoring networks for the models' validation are still sparse [76][77][78][79]. Thus, statistical models remain to date a reliable and computationally affordable way to address the challenge.
In this study, our attention was focused on an area where the main activities take place, on the Sapienza university. It is one of the largest and oldest universities in Europe and has more than 100,000 students and about 8000 workers.
From the point of view of exposure, it therefore represents a rather large population with unique characteristics in terms of age distribution (mainly young adults) and prevailing health status.
We have highlighted that there is a significant difference between the concentrations detected in the internal areas of the university campus and those detected in the immediately adjacent external areas. In the former, students and university staff spend a large part of their day studying or working, but in the latter, they spend a significant part of their day for recreational and leisure activities (given the presence of a dense network of clubs, pubs, bars and restaurants that welcome them during breaks and in the evening).
However, also in the areas outside the university campus there is a significant variability essentially linked to the distance from the main traffic arteries. This variability affects the student, worker and resident population in the area itself.
A significant difference was also observed in the average summer levels (lower) compared to the winter ones, even if the spatial variability in the two seasons remains almost unchanged. This aspect is also important considering that the level of attendance of the campus in the two seasons is very different [80].
The method to assess PNC temporal and spatial variability proposed in this study can therefore represent a useful tool for the development of future studies that aim to estimate the integrated daily residential exposure to ultrafine particles of people that live, work or study in the area.
Moreover, our monitoring and modelling strategy could be easily replicable, with little economic effort, in other contexts as well as whenever monitoring data are needed to validate numerical simulations.  Figure S3: H/L parameter values, representative of the urban canyon. Figure S4: Comparison between average PNC values, observed in winter and in summer: (a) within Sapienza borders, path A; (b) in the points most affected by the greater proximity to roads with high traffic flows, (points B1, B2, B3, B4, B5 and points C1, C3, C7); (c) in points relatively far from the busiest roads (B6, C2, C6). Figure S5: Site characterization: ratio between traffic flows on the nearest road and distance from the road. Figure S6: Distribution of average PNC values by day of the week and by season. Figure S7: Correlation matrix between the natural logarithm of PNC (ln(PNC)) and the main meteorological and micrometeorological variables, in the two seasons, winter (left), summer (right). Figure S8: Correlation matrix between the natural logarithm of PNC (ln(PNC)) and the main spatial variables. Figure S9: Correlation matrix between the natural logarithm of PNC (ln(PNC)) and the traffic flows, in winter (left) and in summer (right). Figure S10: November 2017 model: trends of the spline functions for Urel, Distinv2 and DELTA_TKE variables. Figure S11: Checks of the basic assumptions of the November 2017 model: residual analysis. Figure S12: June 2018 model: trends of the splines for Vscal, Distinv2, Hmix min and Traff variables. Figure S13: Checks of the basic assumptions of the June 2018 model: residual analysis. Figure S14. Overall model: spline trends for Tdry, Distinv2, Vscal and DELTA_TKE. Figure S15: Check of the basic assumptions of the overall model: residual analysis.

Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author.