A Satellite-Based Land Use Regression Model of Ambient NO 2 with High Spatial Resolution in a Chinese City

: Previous studies have reported that intra-urban variability of NO 2 concentrations is even higher than inter-urban variability. In recent years, an increasing number of studies have developed satellite-derived land use regression (LUR) models to predict ground-level NO 2 concentrations, though only a few have been conducted at a city scale. In this study, we developed a satellite-derived LUR model to predict seasonal NO 2 concentrations at a city scale by including satellite-retrieved NO 2 tropospheric column density, population density, trafﬁc indicators, and NO x emission data. The R 2 of model ﬁtting and 10-fold cross validation were 0.70 and 0.61 for the satellite-derived seasonal LUR model, respectively. The satellite-based LUR model captured seasonal patterns and ﬁne gradients of NO 2 variations at a 100 m × 100 m resolution and demonstrated that NO 2 pollution in winter is 1.46 times higher than that in summer. NO 2 concentrations declined signiﬁcantly with increasing distance from roads and with increasing distance from the city center. In Suzhou, 84% of the total population lived in areas with NO 2 concentrations exceeding the annual-mean standard at 40 µ g/m 3 in 2014. This study demonstrated that satellite-retrieved data could help increase the accuracy and temporal resolution of the traditional LUR models at a city scale. This application could support exposure assessment at a high resolution for future epidemiological studies and policy development pertaining to air quality control.


Introduction
Nitrogen dioxide (NO 2 ) is not only a primary pollutant mainly from fossil fuel emissions but also a secondary pollutant arising in large part from a photochemical conversion combining NO with O 3 [1,2]. It is a common indicator for traffic-related air pollution and proven to be associated with a myriad of adverse health effects. NO 2 has been positively linked to lung cancer mortality in California by the American Cancer Society Cancer Prevention II Study [3]. In China, short-term exposure to NO 2 was significantly associated with total natural causes mortality and cardiorespiratory disease mortality across 272 cities [4]. Even at or below the current European Air quality limit values, the associations between NO 2 exposure and adverse effects have been found for both short-term and long-term exposure in Europe [5]. In previous epidemiological studies, exposure to NO 2 was mostly evaluated using ground-based fixed monitoring data, interpolation methods, or land use regression (LUR) models [6,7]. The concentrations of NO 2 may decline at a distance of several hundred meters from emission sources [8], and the spatial distributions of NO 2 differ significantly between, and especially within, cities [9,10]. In Canada, variations in NO 2 concentrations within a city further showed a stronger association with cause-specific mortality than that between cities [11]. Thus, it is an essential issue to evaluate intra-urban NO 2 concentrations with a high spatial resolution for epidemiological studies. The LUR models are one of the most common assessment methods used to capture spatial variability of NO 2 with a high spatial resolution, and have been applied in NO 2 -related cohort studies in Europe and the United States [9,[12][13][14][15]. Land use regression models also have been developed for predicting NO 2 concentrations in Chinese cities, including Shanghai, Tianjin, and Wuhan [16][17][18]. Traditional LUR models highly depend on land use data and have lower temporal resolution, but these do not satisfy the flexible requirements of exposure assessment in epidemiological studies.
Satellite data have been proven to be one of the key predictors for estimating ambient NO 2 concentrations with a high temporal resolution [19][20][21]. Specifically, a study in Western Europe indicated that the adjusted R 2 of LUR models with satellite data was increased by 0.02-0.06 compared to the models without satellite data with the R 2 of 0.48-0.56 [22]. Other studies showed that the satellite-based LUR models could expand the temporal resolution of traditional LUR models for predicting air pollutants' concentrations, from annual level to monthly or seasonal scales [19,[23][24][25]. NO 2 column density from the Ozone Monitoring Instrument (OMI) aboard satellite Aura is the most commonly used dataset for establishing satellite-based LUR or machine learning models [26][27][28]. The satellite-based LUR models not only expanded the temporal resolution of traditional ones [19], but also simultaneously helped improve model performance [22,29,30]. However, in China, most of these studies were conducted at regional or national scales [21,31]; whether satellite data can improve the resolution and model performance of LUR models at a city scale, has not been fully evaluated. In addition, the row anomaly of OMI led to a large amount of missing data at the daily level [32], hence OMI NO 2 column density data might be inappropriate to be directly used to assess NO 2 exposure levels within a city at a daily scale, and some studies resampled the data at a seasonal scale [33].
Therefore, in this study, we developed a satellite-derived LUR model, in a Chinese metropolis, to capture intra-urban NO 2 temporal variations at a seasonal level with a high spatial resolution. This model with a high spatial resolution is expected to capture the finer gradients of NO 2 variations within a city at a higher temporal resolution than that of the traditional LUR model, which could provide more accurate exposure assessment for epidemiological studies.

Study Area
Suzhou is a city located in southeastern Jiangsu Province of East China (Figure 1). It includes five urban districts (Gusu, Huqiu, Wuzhong, Xiangcheng, and Wujiang) and four satellite cities (Changshu, Taicang, Kunshan, and Zhangjiagang). Suzhou is one of five urban locations in the China Kadoorie Biobank (CKB) cohort that have focused on common chronic diseases since 2004 [34]. We developed a satellite-derived LUR model in Suzhou as a case study to establish the methodology for the assessment of exposure to NO 2 of the CKB cohort study to support the next phase of air pollution-related epidemiological studies. Suzhou covered 8488.42 km 2 in 2018 and about 42.5% of the total area was covered by waterbody. The total registered population in Suzhou reached 7.04 million by the end of 2018 (http://tjj.suzhou.gov.cn/sztjj/tjnj/2019/zk/indexce.htm). Suzhou is located in a subtropical monsoon climate zone with four distinct seasons.

Data
The database included data on NO2 monitoring, NO2 tropospheric column density from the OMI instrument, population density, road network, land use parameters, and NOx emissions.

Monitoring Data
Daily NO2 monitoring data of 20 fixed air quality stations were obtained from the National Environmental Monitoring Network, and the locations of the stations are shown in Figure 1. In accordance with the Chinese Ambient Air Quality Standard (GB3095-2012), at least 20 hourly measurements were included to calculate the daily NO2 concentration; at least 27 daily values were needed to calculate monthly concentrations (25 daily values for February); at least 324 daily values were needed to calculate the annual concentration. Most of the fixed stations were located in areas with a relatively high population density to represent the averaged exposure levels for public health.

Satellite Data
The OMI instrument is on board the National Aeronautics and Space Administration (NASA) Aura satellite that was launched in 2004. It measures radiances across 270-500 nm of the ultraviolet and visible waveband. Global tropospheric vertical column NO2 density data of OMI level 2 (OMNO2) product, with a spatial resolution of 13 km × 24 km at nadir [35], are available online at a daily time step and were downloaded from NASA Goddard Earth Sciences Data and Information Services Center (https:// earthdata.nasa.gov/). Cloud cover and a dynamic row anomaly problem of OMI were responsible for a significantly high rate of missing values of daily data. The "row anomaly" occurred due to the technical issues of the OMI, which has produced invalid data in the center-right part of each swath of observations since 2008 [32]. Within a city, the high missing rate might cause low availability of OMI NO2 tropospheric column density data

Data
The database included data on NO 2 monitoring, NO 2 tropospheric column density from the OMI instrument, population density, road network, land use parameters, and NO x emissions.

Monitoring Data
Daily NO 2 monitoring data of 20 fixed air quality stations were obtained from the National Environmental Monitoring Network, and the locations of the stations are shown in Figure 1. In accordance with the Chinese Ambient Air Quality Standard (GB3095-2012), at least 20 hourly measurements were included to calculate the daily NO 2 concentration; at least 27 daily values were needed to calculate monthly concentrations (25 daily values for February); at least 324 daily values were needed to calculate the annual concentration. Most of the fixed stations were located in areas with a relatively high population density to represent the averaged exposure levels for public health.

Satellite Data
The OMI instrument is on board the National Aeronautics and Space Administration (NASA) Aura satellite that was launched in 2004. It measures radiances across 270-500 nm of the ultraviolet and visible waveband. Global tropospheric vertical column NO 2 density data of OMI level 2 (OMNO 2 ) product, with a spatial resolution of 13 km × 24 km at nadir [35], are available online at a daily time step and were downloaded from NASA Goddard Earth Sciences Data and Information Services Center (https://earthdata.nasa. gov/). Cloud cover and a dynamic row anomaly problem of OMI were responsible for a significantly high rate of missing values of daily data. The "row anomaly" occurred due to the technical issues of the OMI, which has produced invalid data in the center-right part of each swath of observations since 2008 [32]. Within a city, the high missing rate might cause low availability of OMI NO 2 tropospheric column density data at a daily level. Therefore, seasonal resampling was done by averaging all daily OMI NO 2 tropospheric column density data falling inside a 40 km × 40 km grid to fill the gap caused by missing data and smooth the noise [33]. The satellite data were then interpolated to the fixed monitoring stations using an inverse distance weighted (IDW) method.

Other Predictors Land Use Parameters
Land use data (agricultural, forest, grassland, waterbody, urban and built up, and unused land) from 2014 were interpreted from the Landsat TM5 dataset (https://earthexplorer. usgs.gov/) with a 30 m spatial resolution ( Figure 2). Specifically, agricultural land included dry land and paddy fields; forest land included dense forests, shrub forests, loose forests, and other forests; grassland included highly-covered grassland; waterbody included rivers, lakes, beaches, bottomlands, and reservoirs; urban and built up land included urban and rural settlements and other built-up land; unused land included bare rock and sand. In Suzhou, the major land use types were urban and built-up land, agricultural land, and waterbody; and agricultural land mainly consisted of paddy fields. To optimize the correlation between NO 2 measurements and land use predictors, different buffer distances were applied, from 100 m to 5000 m, at 100-m intervals, around the 20 fixed monitoring sites [10,17,36]. The areas of each land use type were then calculated within these buffer zones separately. at a daily level. Therefore, seasonal resampling was done by averaging all daily OMI NO2 tropospheric column density data falling inside a 40 km × 40 km grid to fill the gap caused by missing data and smooth the noise [33]. The satellite data were then interpolated to the fixed monitoring stations using an inverse distance weighted (IDW) method.

Other Predictors
Land Use Parameters Land use data (agricultural, forest, grassland, waterbody, urban and built up, and unused land) from 2014 were interpreted from the Landsat TM5 dataset (https://earthexplorer.usgs.gov/) with a 30 m spatial resolution ( Figure 2). Specifically, agricultural land included dry land and paddy fields; forest land included dense forests, shrub forests, loose forests, and other forests; grassland included highly-covered grassland; waterbody included rivers, lakes, beaches, bottomlands, and reservoirs; urban and built up land included urban and rural settlements and other built-up land; unused land included bare rock and sand. In Suzhou, the major land use types were urban and built-up land, agricultural land, and waterbody; and agricultural land mainly consisted of paddy fields. To optimize the correlation between NO2 measurements and land use predictors, different buffer distances were applied, from 100 m to 5000 m, at 100-m intervals, around the 20 fixed monitoring sites [10,17,36]. The areas of each land use type were then calculated within these buffer zones separately.

Road Network
Lengths of major roads and distances to the nearest major road were calculated as indicators of traffic emissions. Types of roads included expressways, national roads, provincial roads, urban expressways, county roads, town roads, and other roads. Then, expressways, national roads, provincial roads, and urban expressways were merged as major roads. Within the buffers from 100 m to 5000 m (at 100 m intervals) around the 20 fixed monitoring sites, the lengths of major roads were then calculated [6,17]. Distance from monitoring sites to the nearest major road, inverse of the distance, and logarithmic transformation of the inverse distance were also calculated as indicators of traffic emissions [6,10].

Population Density
Population density data were obtained from the Oak Ridge National Laboratory (ORNL)'s LandScan 2014 global database at 30" × 30" resolution in raster format (http: //www.ornl.gov/sci/landscan/), which were then interpolated to the NO 2 monitoring stations using the IDW method. The population data, with an ESRI binary raster format, is approximately at a 1 km × 1 km resolution and each grid represents an average population number within the grid at an annual level (https://landscan.ornl.gov/documentation). Figure 3 shows the spatial distribution of the population in Suzhou in 2014, suggesting that more people tended to live in the center of five urban districts and four satellite cities in Suzhou. Road Network Lengths of major roads and distances to the nearest major road were calculated as indicators of traffic emissions. Types of roads included expressways, national roads, provincial roads, urban expressways, county roads, town roads, and other roads. Then, expressways, national roads, provincial roads, and urban expressways were merged as major roads. Within the buffers from 100 m to 5000 m (at 100 m intervals) around the 20 fixed monitoring sites, the lengths of major roads were then calculated [6,17]. Distance from monitoring sites to the nearest major road, inverse of the distance, and logarithmic transformation of the inverse distance were also calculated as indicators of traffic emissions [6,10].

Population Density
Population density data were obtained from the Oak Ridge National Laboratory (ORNL)'s LandScan 2014 global database at 30″ × 30″ resolution in raster format (http://www.ornl.gov/sci/landscan/), which were then interpolated to the NO2 monitoring stations using the IDW method. The population data, with an ESRI binary raster format, is approximately at a 1 km × 1 km resolution and each grid represents an average population number within the grid at an annual level (https://landscan.ornl.gov/documentation). Figure 3 shows the spatial distribution of the population in Suzhou in 2014, suggesting that more people tended to live in the center of five urban districts and four satellite cities in Suzhou.  NO x emission inventory data were collected from the Multiresolution Emission Inventory of China (MEIC, http://www.meicmodel.org) at a spatial resolution of 1 km × 1 km. The industrial NO 2 emissions from power plants and non-power plants were computed separately within buffer zones of 1 km to 10 km, at 1-km intervals, around each monitoring site.

Model Development and Evaluation
A traditional LUR model was developed, as the first step, to select the most optimized predictors from all parameters with a linear regression model [6,10,20,36]. Since the OMI NO 2 tropospheric column density was aggregated at a seasonal level to fill the gap caused by the high missing rate of the satellite data [32], this model was developed at a seasonal level [37,38]. First, we set every potential variable a prior direction. Second, manual backward supervised regression was conducted based on NO 2 seasonal concentrations to select the most optimized predictor variables. Predictors were kept in the model if they satisfied the criteria proposed by previous studies [6,10,17]: (1) the variables improved the model R 2 by at least 1%; (2) the effect directions of the variables were consistent with the prior directions; (3) the variables that were already in the model did not change their effect directions; (4) the variable would be excluded from the model if the p value was less than 0.1. This process continued until there were no more variables meeting the criteria. Variance inflation factors (VIFs) were calculated as an indicator of multicollinearity. Variables with VIF values greater than three were removed from the satellite-based LUR model and this step was repeated.
In the second step, a linear mixed effects model was developed (see Equation (1)) by involving random effects of OMI NO 2 tropospheric column density [23,37]. The advantage of employing this model was to include the variability of associations between NO 2 concentrations and OMI NO 2 tropospheric column density over time. Similar satellite-based models had been developed for predicting PM 2.5 concentrations in a national assessment [37] and PM 10 concentrations within a city in Shanghai [23]. In this model, the OMI NO 2 tropospheric column density had both random effect and fixed effect coefficients, which represented seasonal variability in the association between NO 2 measurements and OMI NO 2 tropospheric column density and the average effect of satellite measurements on the ground NO 2 measurements for the whole year, respectively [23,37]. The model structure can be summarized as: where NO 2,st indicates the mean observed NO 2 concentrations (µg/m 3 ) at the fixed station s in season t; OMI st is the only independent variable with both fixed and random effects, which represents OMI NO 2 tropospheric column density data at the fixed station s in season t; β 0 and β 0 ' are the intercepts of the fixed and season-specific random effects for the model, respectively; β 1 and β 1 'indicate the fixed and season-specific random slopes for OMI st , respectively; X is represents a series of predictors, which are selected by satisfying the criteria from the first step; and β is represents the fixed slope for predictor i at the fixed station s; and ε st is the error term at the fixed station s in season t.
In the third step, 10-fold cross validation (CV) was applied to evaluate the model performance [17,37]: 90% of the data were randomly selected for model development, which was used to predict NO 2 concentrations of the remaining 10% of the data; and this process was repeated 10 times. Root mean squared error (RMSE) was calculate as the standard deviation of the residuals. RMSE and R 2 were used to evaluate the model's performance by comparing measured and predicted NO 2 concentrations during model development and 10-fold CV, respectively. The relative prediction error (RPE, defined as RMSE divided by the mean NO 2 measurements) from 10-fold CV was then calculated to evaluate prediction accuracy. In the fourth step, seasonal prediction maps of NO 2 concentrations in Suzhou were produced based on the satellite-derived LUR models, at a 100 m × 100 m resolution at a seasonal timescale. In addition, we further calculated annual-mean and seasonal-mean population-weighted NO 2 concentrations in Suzhou [39] (see Equation (2)).
where C Pop indicates the annual-mean or seasonal-mean population-weighted NO 2 exposure concentrations in Suzhou; Pop i represents the population density of grid i; and C i indicates the estimated annual-mean or seasonal-mean NO 2 concentrations of grid i. Figure 4 shows the workflow for the development of the satellite-derived LUR model in our study. Statistical analyses were performed with nlme packages (https://www. rdocumentation.org/packages/nlme/versions/3.1-151/topics/nlme) of R3.6.1.

Remote Sens. 2021, 13, x FOR PEER REVIEW 7 of 15
In the fourth step, seasonal prediction maps of NO2 concentrations in Suzhou were produced based on the satellite-derived LUR models, at a 100 m × 100 m resolution at a seasonal timescale. In addition, we further calculated annual-mean and seasonal-mean population-weighted NO2 concentrations in Suzhou [39] (see Equation (2)). (2) where CPop indicates the annual-mean or seasonal-mean population-weighted NO2 exposure concentrations in Suzhou; Popi represents the population density of grid i; and Ci indicates the estimated annual-mean or seasonal-mean NO2 concentrations of grid i. Figure 4 shows the workflow for the development of the satellite-derived LUR model in our study. Statistical analyses were performed with nlme packages (https://www.rdocumentation.org/packages/nlme/versions/3.1-151/topics/nlme) of R3.6.1.

Descriptive Statistics Analyses
In 2014, the annual-mean NO 2 was 46.23 µg/m 3 in Suzhou, with the lowest concentration of 36.52 µg/m 3 recorded in summer and the highest concentration of 53.22 µg/m 3 in winter, as measured at fixed monitoring sites. Among all predictors, the Pearson's correlation coefficient between seasonal OMI NO 2 tropospheric column density and seasonal NO 2 measurements was highest with the value of 0.65.

Model Development and Evaluation
After variable selection, as the results of the first step, the satellite-derived LUR model included four predictors: NO 2 tropospheric column density from OMI, population density, log transformed inverse of nearest distances to major roads (Log_distance), and NO 2 non-power plants emissions within a 10-km buffer zone ( Table 1). The R 2 and RMSE of this model were 0.63 and 5.76 µg/m 3 , respectively. The R 2 and RMSE of the 10-fold CV were 0.59 and 6.09 µg/m 3 , respectively. The VIFs of the four variables were all less than 2, showing weak multicollinearity among them. The results of the second step, including the estimated coefficients of fixed effects of the four predictor variables, are shown in Table 2. All predictors were positively and significantly associated with measured NO 2 concentrations, with p values less than 0.05. The absolute contribution (IQR × β), for each influencing predictor, was calculated as the regression coefficient (β) of fixed effects multiplied by the inter-quartile range (IQR) of the corresponding predictor. The results indicated that the non-power emissions within a 10-km buffer zone and OMI NO 2 tropospheric column density contributed most to NO 2 concentrations, because they had higher IQR × β values (Table 2). The R 2 and RMSE of the seasonal satellite-derived LUR model were 0.70 and 5.24 µg/m 3 , respectively. The R 2 and RMSE of the 10-fold CV were 0.61 and 5.91 µg/m 3 , respectively, for the seasonal model ( Figure 5). The RPE from 10-fold CV was 12.78%, which indicated a relatively high predicting accuracy at the seasonal level. The linear mixed effects model performed better than the traditional linear regression model, suggesting the importance of considering the seasonal variability of the association between ground NO 2 measurements and OMI NO 2 tropospheric column density. indicated a relatively high predicting accuracy at the seasonal level. The linear mixed effects model performed better than the traditional linear regression model, suggesting the importance of considering the seasonal variability of the association between ground NO2 measurements and OMI NO2 tropospheric column density.

Spatiotemporal Trends of Predicting NO2 Concentrations
Predictive maps of NO2 concentrations with a spatial resolution of 100 m × 100 m were produced at a seasonal timescale ( Figure 6). The seasonal pattern of predicted NO2 concentrations agreed well with field measurements. Mean NO2 concentration was highest in winter (47.3 μg/m 3 ) in Suzhou, which was 1.46 times higher than that in summer. The spatial patterns of NO2 predictions were similar at different seasons throughout the year. Maps with high spatial resolution showed that severe NO2 pollution occurred along the major roads and declined significantly with increasing distance from the road. Urban centers with high population density and an intensive road network also experienced higher NO2 concentrations than that of the rural areas ( Figure 6). For example, in summer, the maximum NO2 concentration (58.99 μg/m 3 ) that occurred in urban areas was 2.77 times higher than the minimum value (21.33 μg/m 3 ) in rural areas; and in winter, the maximum concentration (76.93 μg/m 3 ) was 2.03 times higher compared to the lowest value (37.91 μg/m 3 ) in rural areas. The results indicated that the NO2 concentration was generally higher in urban areas than that in rural areas both in winter and summer.

Spatiotemporal Trends of Predicting NO 2 Concentrations
Predictive maps of NO 2 concentrations with a spatial resolution of 100 m × 100 m were produced at a seasonal timescale ( Figure 6). The seasonal pattern of predicted NO 2 concentrations agreed well with field measurements. Mean NO 2 concentration was highest in winter (47.3 µg/m 3 ) in Suzhou, which was 1.46 times higher than that in summer. The spatial patterns of NO 2 predictions were similar at different seasons throughout the year. Maps with high spatial resolution showed that severe NO 2 pollution occurred along the major roads and declined significantly with increasing distance from the road. Urban centers with high population density and an intensive road network also experienced higher NO 2 concentrations than that of the rural areas ( Figure 6). For example, in summer, the maximum NO 2 concentration (58.99 µg/m 3 ) that occurred in urban areas was 2.77 times higher than the minimum value (21.33 µg/m 3 ) in rural areas; and in winter, the maximum concentration (76.93 µg/m 3 ) was 2.03 times higher compared to the lowest value (37.91 µg/m 3 ) in rural areas. The results indicated that the NO 2 concentration was generally higher in urban areas than that in rural areas both in winter and summer.
The population-weighted annual mean NO 2 concentration in 2014 was 44.94 µg/m 3 in Suzhou, higher than the annual-mean predicted concentration of 41.4 µg/m 3 and also higher than the annual-mean NO 2 standard of 40 µg/m 3 defined in the Chinese National Ambient Air Quality Standards (GB 3095-2012). In winter, 99% of the total population lived in areas with NO 2 concentrations exceeding 40 µg/m 3 in Suzhou (Table 3).  The population-weighted annual mean NO2 concentration in 2014 was 44.94 μg/m 3 in Suzhou, higher than the annual-mean predicted concentration of 41.4 μg/m 3 and also higher than the annual-mean NO2 standard of 40 μg/m 3 defined in the Chinese National Ambient Air Quality Standards (GB 3095-2012). In winter, 99% of the total population lived in areas with NO2 concentrations exceeding 40 μg/m 3 in Suzhou (Table 3).

Discussion
Our study built a satellite-derived LUR model with OMI NO 2 tropospheric column density data to predict NO 2 concentrations at seasonal timescales with a high spatial resolution (100 m × 100 m) in Suzhou. The R 2 values of model fitting and 10-fold CV were 0.70 and 0.61 at seasonal timescales, respectively, reflecting the relatively high stability of the model.
Our seasonal satellite-derived LUR model performance was comparable with previous satellite-based LUR models on NO 2 concentration assessment at global, national, and regional scales. For the global satellite-based LUR model, the R 2 and MAE (mean absolute error) for the model were 0.54 and 3.7 ppb at a 100 m×100 m resolution, respectively [20]. The adjusted R 2 values of models with satellite data were 0.48-0.58 in 17 contiguous countries of Western Europe [22]. The R 2 of the model fitting and CV were 0.79 and 0.77 of the national satellite-derived LUR in the United States, respectively [19]. Similarly, in China, Xu et al. and Yang et al. developed satellite-derived LUR models at national and regional scales, respectively [21,31]. The R 2 of 10-fold cross-validation (CV) was 0.78 for the national model in 2015 [31], and the R 2 of model fitting was 0.61 for the regional model [21]. Although increasing studies have used machine learning methods with satellite data to evaluate NO 2 concentrations based on a large number of measurements from fixed monitors at regional or national scales [40][41][42][43], the training data may be insufficient to develop machine learning models within a city because of the limited number of fixed stations in this study. The comparison suggested that our satellite-derived LUR model, including satellite-retrieved NO 2 tropospheric column density, population density, traffic indicators, and NO x emission data, predicted ground NO 2 concentrations with relatively high accuracy based on the fixed stations in Suzhou.
In terms of NO 2 concentration, our results exhibited significant spatial variability within a city at a fine spatial resolution (100 m × 100 m), and found a distinctive decline with increasing distance from the roads and significant differences between urban and rural areas. The high variability within a city suggested that exposure assessments of NO 2 might be inaccurate if they just depended on measurements of a limited number of fixed monitoring sites. This high spatial heterogeneity may be mainly dependent on NO 2 pollution-related sources, such as traffic and industrial emissions. Traffic and industrial emissions are known as the main sources of NO 2 , contributing to the high spatial heterogeneity of NO 2 concentrations along roads and within a city. On one hand, NO 2 is emitted as a primary pollutant from these sources. On the other hand, NO 2 is also a secondary pollutant [1,2]. In our study, NO 2 concentrations were significantly higher along roads and declined gradually with increased distance from roads in Suzhou, consistent with previous results of NO 2 spatial heterogeneity along roads [8]. The variables indicating traffic-related sources in our study were also frequently used in the previous LUR models for NO 2 concentrations assessment [6,17,36]. Additionally, industrial emissions, an important influencing predictor for NO 2 assessment in our model, had also been found to be an important variable in the previous LUR models to predict ground NO 2 concentrations within cities such as in Shanghai and Tianjin [16,17]. A recent study observed a notable decrease of NO 2 concentrations during the Chinese New Year holiday in 2020 led by the novel coronavirus (COVID-19) lockdown compared to those before or after this period in Suzhou [44]. A sharp decline in traffic emissions and a slight reduction in industry emissions caused by the shut-down policies might be the main contributors to the decrease of NO 2 concentrations during the lockdown period in Suzhou [44], suggesting that both traffic and industrial emissions are crucial sources of NO 2 in Suzhou. Additionally, our results found that mean NO 2 concentrations were higher in winter compared to that in summer. This was consistent with the previous studies on the seasonal pattern of NO 2 concentrations in China [24,45]. In winter, NO 2 -related emissions are stronger due to more emissions from coal combustion for heating; while meteorological conditions are less favorable and could impede the dispersion and transportation of NO 2 pollution [44,46,47]. Both of these might be contributors to the higher NO 2 concentrations in winter [44,46,47]. Our results in Figure 6 showed an approximately lower ratio between urban and rural NO 2 concentrations in winter compared to those in summer. This might be due to more coal combustion for the heating of houses in rural areas in winter compared to that in urban areas [48].
As another influencing factor for NO 2 spatial heterogeneity, the spatial pattern of population density was highly consistent with that of NO 2 predictions in Suzhou, suggesting that population density can be used as an indicator of anthropogenic emissions that reflects a series of emissions including traffic, industrial process, and heating sources [6]. High population density not only intensified the NO 2 pollution, but also resulted in an increased exposure of populations to high NO 2 levels. In this study, 84% of the population were exposed to higher NO 2 levels than the national annual-mean NO 2 standards (40 µg/m 3 ) in Suzhou in 2014; while the proportion of the population exposed to concentrations exceeding the World Health Organization (WHO) annual NO 2 standards (40 µg/m 3 ) was only 8% in Western Europe [39], which was much smaller than that in Suzhou. This might be because a high population density and high concentrations of air pollution coexist in Chinese cities. For example, many residential buildings are located along major roads for the convenience of transportation, and residents living in these buildings might be both influenced by the traffic-related emissions and housing heating emissions, especially during winter in the rural areas. Our results suggested that policy makers should take effective interventions for these areas of higher NO 2 concentrations, especially for urban regions with the higher population density, which is an urgent need for the public health.
The satellite-based LUR model also expanded the temporal resolution and improved the accuracy of seasonal NO 2 predictions. Land use data, including land cover, road network, and population data, used in traditional LUR models commonly have lower temporal resolution, whereas the NO 2 tropospheric column density data could represent temporal variability of NO 2 concentration with a strong correlation with ground NO 2 concentration. Previous studies mostly employed satellite data to expand the temporal resolution of the LUR model for the assessment of NO 2 concentrations to seasonal or monthly timescales at national or regional scales [19,21,30]; however, few satellite-based LUR models on NO 2 concentrations assessment have been developed at a city scale considering the local influencing factors with a flexible timescale in China. In this study, we developed a satellite-based LUR model in Suzhou to capture the fine gradients of NO 2 concentrations at a spatial resolution of 100 m × 100 m. More importantly, our predictions captured the significant seasonal variability of NO 2 concentrations within a city, which could not be achieved by traditional LUR models. These findings suggested that the satellite-derived model could provide exposure assessment of NO 2 concentrations at a flexible timescale for epidemiological studies and scientific evidence for protecting residents from NO 2 pollution.
Our study has several limitations. First, the OMI NO 2 tropospheric column density for spatial prediction was relatively coarse (13 km × 24 km). Satellite-based NO 2 data with a higher spatial resolution could help improve the model performance in the future when they are available. Second, our model was developed at a seasonal level rather than a daily level. The cloud cover and row anomaly problem of OMI lead to missing data at a daily level within a city; therefore, we resampled OMI data at a seasonal level to fill the gap. Satellite-based NO 2 data with a lower missing rate might help improve the temporal resolution of our model in the future. Third, traffic counts are an ideal predictor to identify the traffic emissions, but these were not accessible for this study. We used major road lengths and distance to the nearest major road as surrogates of traffic counts to indicate the influence of traffic emissions on NO 2 concentrations. This was also applied as a traffic variable in NO 2 LUR models in the European Study of Cohorts for Air Pollution Effects (ESCAPE) project and other studies of the development of NO 2 LUR models [6,36].

Conclusions
In summary, the satellite-derived LUR model could predict seasonal NO 2 concentrations at a 100 m × 100 m resolution with relatively high accuracy, at a city scale. This model could capture the fine gradients both along the road and within the urban-rural areas for each season based on the satellite data. According to the predictions, we found that 84% of the city's total population lived in areas with NO 2 concentrations exceeding the national annual standard of NO 2 of 40 µg/m 3 in Suzhou in 2014. Hence, reducing NO 2 concentrations is urgently needed, especially for urban areas with a higher population density. This model and its predictions could support policy developments in the control of air quality and accurate exposure assessment for future epidemiological studies.

Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.