Estimating Daily PM 2.5 Concentrations in Beijing Using 750-M VIIRS IP AOD Retrievals and a Nested Spatiotemporal Statistical Model

: Satellite-retrieved aerosol optical depth (AOD) data have been widely used to predict PM 2.5 concentrations. Most of their spatial resolutions (~1 km or greater), however, are too coarse to support PM 2.5 -related studies at ﬁne scales (e.g., urban-scale PM 2.5 exposure assessments). Space-time regression models have been widely developed and applied to predict PM 2.5 concentrations from satellite-retrieved AOD. Their accuracies, however, are not satisfactory particularly on days that lack a model dataset. The present study aimed to evaluate the effectiveness of recent high-resolution (i.e., ~750 m at nadir) AOD obtained from the Visible Infrared Imaging Radiometer Suite instrument (VIIRS) Intermediate Product (IP) in estimating PM 2.5 concentrations with a newly developed nested spatiotemporal statistical model. The nested spatiotemporal statistical model consisted of two parts: a nested time ﬁxed effects regression (TFER) model and a series of geographically weighted regression (GWR) models. The TFER model, containing daily, weekly, or monthly intercepts, used the VIIRS IP AOD as the main predictor alongside several auxiliary variables to predict daily PM 2.5 concentrations. Meanwhile, the series of GWR models used the VIIRS IP AOD as the independent variable to correct residuals from the ﬁrst-stage nested TFER model. The average spatiotemporal coverage of the VIIRS IP AOD was approximately 16.12%. The sample-based ten-fold cross validation goodness of ﬁt (R 2 ) for the ﬁrst-stage TFER models with daily, weekly, and monthly intercepts were 0.81, 0.66, and 0.45, respectively. The second-stage GWR models further captured the spatial heterogeneities of the PM 2.5 -AOD relationships. The nested spatiotemporal statistical model produced more daily PM 2.5 estimates and improved the accuracies of summer, autumn, and annual PM 2.5 estimates. This study contributes to the knowledge of how well VIIRS IP AOD can predict PM 2.5 concentrations at urban scales and offers strategies for improving the coverage and accuracy of daily PM 2.5 estimates on days that lack a model dataset.


Introduction
PM 2.5 (fine particles with an aerodynamic diameter ≤ 2.5 µm) produced by both natural and anthropogenic sources [1] is one of the main causes of ambient air pollution, which is thought to lead to approximately 3.3 million deaths per year worldwide [2,3]. Cities within China, India, and

Study Area and PM 2.5 Measurements
Beijing, the capital city of China, was chosen as the study area ( Figure 1). It is situated in the northern portion of the North China Plain and has 16 districts. The Beijing Municipal Environmental Monitoring Center (BMEMC) is responsible for measuring and distributing PM 2.5 concentrations. A total of 35 monitoring sites have been established across the domain of Beijing, with 23 sites for urban environmental assessment (UEA), six sites for regional background transmission (RBT), five sites for traffic pollution control (TPC), and one site for urban cleanliness comparison (UCC). UEA stations serve to assess the average level and variations in air quality in the urban environment. RBT stations serve to represent the regional background level of air quality and the regional transmission of atmospheric pollutants. RBT stations are distributed near the city boundaries to the northwest, northeast, east, south, and southwest. TPC stations are all located in downtown areas and monitor the influence of traffic on air quality. The sole UCC station represents the pollution level in urban areas assuming no influence of urban activities. In this study, hourly PM 2.5 measurements observed at all monitoring sites in 2014 were downloaded from the BMEMC air quality releasing platform (http://zx.bjmemc.com.cn), and daily averaged PM 2.5 concentrations were calculated as the dependent variable.
Remote Sens. 2019, 11, x FOR PEER REVIEW 3 of 19 traffic pollution control (TPC), and one site for urban cleanliness comparison (UCC). UEA stations serve to assess the average level and variations in air quality in the urban environment. RBT stations serve to represent the regional background level of air quality and the regional transmission of atmospheric pollutants. RBT stations are distributed near the city boundaries to the northwest, northeast, east, south, and southwest. TPC stations are all located in downtown areas and monitor the influence of traffic on air quality. The sole UCC station represents the pollution level in urban areas assuming no influence of urban activities. In this study, hourly PM2.5 measurements observed at all monitoring sites in 2014 were downloaded from the BMEMC air quality releasing platform (http://zx.bjmemc.com.cn), and daily averaged PM2.5 concentrations were calculated as the dependent variable.

VIIRS IP AOD Retrievals
VIIRS is a cross-track scanning radiometer onboard the Suomi National Polar-Orbiting Partnership (Suomi-NPP) satellite with a 1:30 pm local solar time ascending node. It is expected to continue the decade-long measurement series initiated by MODIS. It contains three types of bands: imagery bands (375 m at nadir), moderate bands (M bands, 750 m at nadir), and day/night bands (750 m across scan). The VIIRS aerosol algorithms, with a heritage from MODIS aerosol algorithms, use M bands to produce the full set of aerosol parameters contained in the IP, including AOD at 550 nm [36]. VIIRS IP AOD data (contained in IVAOT files named 'faot550') for 2014 in Beijing were collected from the Comprehensive Large Array-data Stewardship System (CLASS) (www.class.noaa.gov).  produce the full set of aerosol parameters contained in the IP, including AOD at 550 nm [36]. VIIRS IP AOD data (contained in IVAOT files named 'faot550') for 2014 in Beijing were collected from the Comprehensive Large Array-data Stewardship System (CLASS) (www.class.noaa.gov). Corresponding terrain-corrected geo-location files (contained in GMTCO files named 'Longitude' and 'Latitude') for these AOD data were also downloaded from CLASS for geometric correction purposes. To minimize the potential negative effects of deviations in VIIRS IP AOD retrievals on PM 2.5 estimates, only good AOD data (quality assurance flag = 0) were retained.

GEOS FP Meteorological Data
Meteorological conditions influence and hence can improve PM 2.5 -AOD relationships [37]. Global Earth Observing System Forward Processing (GEOS FP) gridded meteorological data for 2014 covering Beijing were collected from the Dalhousie data archive (ftp://rain.ucis.dal.ca/ctm). GEOS FP is the most recent validated GEOS system and provides meteorological fields at finer spatial (0.25 • latitude × 0.3125 • longitude) and temporal resolution (1-h or 3-h) over China [38,39]. Downloaded files named GEOSFP.YYYYMMDD.A1.025×03125.CH.nc, where YYYY is the year, MM is the two-digit month of the year, and DD is the two-digit day of the month, have a time interval of one hour. These files contain the planetary boundary layer height (PBLH) above the surface, temperature at 2 m above displacement height (T2M), total surface precipitation flux (PRCP), surface level pressure (PS), eastward wind at 10 m above displacement height (U10M), and northward wind at 10 m above displacement height (V10M) from 0: 30

Satellite-Retrieved NO 2 and NDVI Data
Anthropogenic emissions and dry deposition on natural surfaces such as vegetation increase or decrease PM 2.5 concentrations [40,41] and hence can adjust PM 2.5 -AOD relationships. This study selected NO 2 from the previous day (NO 2 _Lag) as a proxy for anthropogenic emissions [42] and Normalized Difference Vegetation Index (NDVI) as a reflection of variations in vegetation [43]. NO 2 data for 2014 covering Beijing with a resolution at 0.25 • × 0.25 • and 1-d based on the Ozone Monitoring Instrument were obtained from the Tropospheric Emission Monitoring Internet Service [44] (http: //www.temis.nl/airpollution/no2col/no2regioomi_v2.php). NDVI data for 2014 covering Beijing with a resolution of 1-km and 16-d based on MODIS sensors came from the Level-1 and Atmospheric Archive and Distribution System (LAADS) (https://ladsweb.modaps.eosdis.nasa.gov/search/). Terra (code: MOD13A2) and Aqua (code: MYD13A2) satellite observations were combined to improve the temporal resolution to 8-d.

Data Integration
These multi-source data can be divided into point data (PM 2.5 measurements) and raster data (VIIRS IP AOD retrievals, GEOS FP meteorological data, and satellite-retrieved NO 2 and NDVI data). A 750 m × 750 m grid was created over Beijing, and the raster data were resampled to this grid using the bilinear interpolation method. The cell values of the resampled raster data were then extracted to the PM 2.5 monitoring sites. Regression mapping and model datasets subsequently emerged.

Model Development and Validation
Based on previous two-stage models [11,17,35], a nested spatiotemporal statistical model was developed in hopes of improving the accuracy and coverage of PM 2.5 estimates as well as capturing the spatiotemporal heterogeneities of PM 2.5 -AOD relationships. The nested spatiotemporal statistical model was composed of two parts: a nested time fixed effects regression (TFER) model and a series where PM 2.5,sd , AOD sd , PBLH sd , RH_PBL sd , T2M sd , PRCP sd , PS sd , U10M sd , V10M sd , and NDVI sd are the PM 2.5 (µg/m 3 ), AOD (unitless), PBLH (km), RH_PBL (unitless), T2M (K), PRCP (kg/m 2 /s 2 ), PS (hPa), U10M (m/s), V10M (m/s), and NDVI (unitless) at monitoring site s during day d; except for PM 2.5,sd , the remaining have their corresponding coefficients β AOD , β PBLH , β RH_PBL , β T2M , β PRCP , β PS , β U10M , β V10M , and β NDVI ; NO 2 _Lag sd is the NO 2 (10 15 molec/cm 2 ) at monitoring site s during day d-1, and β NO2_Lag is the corresponding coefficient; λ d , λ w , and λ m are the nested intercepts for each day, week, and month; [] around the variable and its coefficient indicates that the variable and its coefficient might be incorporated according to the process of determining the best model structure as described below; and ε sd represents the error term at monitoring site s during day d.
To determine the best model structure for the TFER models with daily, weekly, and monthly intercepts, all possible combinations of variables were examined. At the significance level of α = 0.05, the TFER models with daily, weekly, and monthly intercepts whose slopes passed the statistical test were first retained. These models were then sorted according to the number of statistically significant intercepts and the adjusted goodness of fit (R 2 ) in descending order. The best TFER models with daily, weekly, and monthly intercepts that had the highest adjusted goodness of fit (R 2 ) were eventually determined from the top ten candidate models. Appendix A provides screenshots of these processes. The TFER model with daily intercepts was used to predict PM 2.5 concentrations for days that had a model dataset, the model with weekly intercepts to predict PM 2.5 concentrations for days that did not have a model dataset, and the model with monthly intercepts to predict PM 2.5 concentrations for weeks that did not have a model dataset.
The second-stage GWR model used the residuals from the first-stage nested TFER model as the dependent variable and the VIIRS IP AOD as the independent variable. Noting that the GWR model was more suitable for analyzing spatial data, the second-stage GWR model was constructed using the average values of dependent and independent variables during a specific period. The model structure is as follows: where Residual sp and AOD sp are the averaged residual from the first-stage TFER model and the AOD at monitoring site s during period p, respectively; β Intercept,sp and β AOD,sp are the location-specific intercept and slope; and ε sp is the error term at monitoring site s during period p. The value of p can be selected according to different purposes. It could be set to one day to produce daily PM 2.5 estimates with higher accuracy, assuming that sufficient model datasets are available for each day. This was not true in this study, and therefore a nested TFER model was developed in the first stage. Five GWR models were calibrated for each of four seasons and for the entire year to obtain an accurate spatial distribution of PM 2.5 concentrations during these periods. Because the potential maximum number of model datasets during this stage would be 35, which is a rather small number, the best fixed bandwidth for estimating coefficients was determined by minimizing the corrected Akaike Information Criterion (AICc). The bandwidth controls the size of the kernel function of the GWR model [45]. In other words, it specifies the spatial extent of the data points used to calibrate the model. AIC c was computed from a measure of the divergence between observed and fitted values and a measure of model complexity [46]. The lower the AIC c , the better was the model.
Sample-based ten-fold cross validation (CV) was used to assess the performance of the first-stage nested TFER model. For the TFER models with daily, weekly, and monthly intercepts, the model dataset was first randomly and equally split into ten subsets with the condition that all the days, weeks, and months of each subset must appear in the remaining subsets. This condition was incorporated so that predictions could be made for each subset using the model fitted from the other nine subsets. The rest of the process was similar: repeat using nine subsets to fit the model and use the fitted model to predict the remaining subset until all the subsets have been predicted; regress the predicted PM 2.5 on the observed PM 2.5 and calculate the R 2 , mean prediction error (MPE), and root mean square error (RMSE); and use these statistics to assess model performance.

PM 2.5 Prediction
Daily PM 2.5 estimates were readily obtained by combining the first-stage nested TFER model and the regression mapping dataset. Note that these daily PM 2.5 estimates had different accuracies because different TFER models with daily, weekly, or monthly intercepts were used on certain days depending on the availability of a model dataset. Therefore, different weights should be assigned to different daily estimates when using them to derive seasonal and annual PM 2.5 estimates. In this study, the sample-based ten-fold CV R 2 values of the TFER models with daily, weekly, and monthly intercepts were used as the weight for the daily PM 2.5 estimates calculated from those models. The weighted seasonal and annual PM 2.5 estimates were further corrected by the second-stage GWR models. Figure 2 illustrates the seasonal-spatial distribution of the VIIRS IP AOD coverage and average. Spatially, low spatiotemporal coverage and high VIIRS IP AOD values occurred in urban centers, whereas high spatiotemporal coverage and low VIIRS IP AOD values mainly appeared on the urban periphery. Temporally, autumn had much higher spatiotemporal coverage of VIIRS IP AOD than the other seasons, but its average AOD values were lower than those in spring and summer. Overall, the VIIRS IP AOD had an average spatiotemporal coverage and value of 16.12% and 0.4039, respectively, across Beijing in 2014. The missing data problem of the VIIRS IP AOD is somewhat severe, indicating the necessity of developing a nested spatiotemporal statistical model.         Table 1 shows the coefficients of the independent variables of the first-stage TFER models with daily, weekly, and monthly intercepts. PS and NDVI did not appear in any TFER models and were therefore omitted from Table 1 of the PRCP observations was only 0.00019, meaning that the products of the coefficient and the observations would not be very large and that the final PM 2.5 estimates would not be significantly biased. Figure 4 shows the variations of daily, weekly, and monthly intercepts. Clearly, the intercepts in winter and spring were always larger than those in summer and autumn. Because the remaining coefficients in each TFER model were fixed, such variations, to some extent, indicated that PM 2.5 concentrations during winter and spring were probably larger than those during summer and autumn under the same condition. The proportion of statistically significant intercepts approached 77.67%, 83.67%, and 100% for the TFER models with daily, weekly, and monthly intercepts, respectively, showing the goodness of model fitting.  Table 1 shows the coefficients of the independent variables of the first-stage TFER models with daily, weekly, and monthly intercepts. PS and NDVI did not appear in any TFER models and were therefore omitted from Table 1. Except for PRCP, the coefficients of the remaining variables had reasonable values. Although PRCP had an extremely large positive coefficient, the maximum value of the PRCP observations was only 0.00019, meaning that the products of the coefficient and the observations would not be very large and that the final PM2.5 estimates would not be significantly biased. Figure 4 shows the variations of daily, weekly, and monthly intercepts. Clearly, the intercepts in winter and spring were always larger than those in summer and autumn. Because the remaining coefficients in each TFER model were fixed, such variations, to some extent, indicated that PM2.5 concentrations during winter and spring were probably larger than those during summer and autumn under the same condition. The proportion of statistically significant intercepts approached 77.67%, 83.67%, and 100% for the TFER models with daily, weekly, and monthly intercepts, respectively, showing the goodness of model fitting.     Figure 5 illustrates the model fitting and sample-based ten-fold CV results for the first-stage nested TFER model. Considering the behavior of the R 2 and MPE (RMSE) of the model during ten-fold CV, from the TFER model with monthly intercepts to the TFER model with weekly intercepts and from this version of the model to the TFER model with daily intercepts, R 2 increased by 46.67% and 22.73%, respectively, and MPE (RMSE) decreased by 27.08% (22.11%) and 29.40% (27.47%), respectively. This indicates that the PM 2.5 -AOD relationship presents strong temporal heterogeneities. Varying intercepts with a shorter time interval could capture more temporal heterogeneities and hence might be more appreciated. Model overfitting problems existed to different extents in the first-stage nested TFER model. For days with a model dataset in which the TFER model with daily intercepts was used, the R 2 decreased by 4.71% from model fitting to ten-fold CV. The corresponding values were 8.33% and 4.26%, respectively, in the TFER model with weekly and monthly intercepts. These relatively small values indicate that the model overfitting problem was not very severe. The proportion of daily PM 2.5 estimates explained ranged from 0.45 to 0.81 according to the ten-fold CV results. Although 0.45 was not a high value, it yielded more information than if no PM 2.5 estimates were retrieved. In addition, the correlation coefficient between predicted and observed PM 2.5 could approach 0.67, which is not a bad result.

Results of Model Fitting and Validation
was not large because the values of the sample-based ten-fold CV R 2 of the AOD-only models shown in Figure 6 were relatively high. It is feasible to use only AOD itself to derive PM2.5 concentrations using a spatiotemporal statistical model (e.g., [16]) when high-coverage satellite-retrieved AOD are available, but high-resolution auxiliary variables are very difficult to access.
By removing the VIIRS IP AOD variable, a non-AOD nested TFER model was fitted. Table 2 lists the results of F tests between the full and non-AOD models. It shows that the incorporation of the VIIRS IP AOD retrievals significantly improved the performance of the TFER models with daily, weekly, and monthly intercepts at α = 0.05. In addition, the extent to which the improvement occurred follows an increasing order of the TFER models with daily, weekly, monthly intercepts according to the size of the F values. Figure 5. Model fitting and sample-based ten-fold CV results for the first-stage nested TFER model. Figure 5. Model fitting and sample-based ten-fold CV results for the first-stage nested TFER model. By removing the auxiliary variables, an AOD-only nested TFER model was fitted. Figure 6 illustrates its model fitting and sample-based ten-fold CV validation results. It is apparent that all the R 2 declined, whereas all the MPE and RMSE increased. Further calculations revealed that when moving from the AOD-only model to the full model, R 2 increased by 3.85%, 26.92%, and 40.63% for the TFER models with daily, weekly, and monthly intercepts, respectively, during sample-based ten-fold CV. These values indicated the relative role of varying intercepts and auxiliary variables on adjusting PM 2.5 -AOD relationships. When varying intercepts with shorter time intervals were incorporated, the role of auxiliary variables became less important. In this study, the auxiliary variables especially the meteorological data, did not have high spatial resolutions, but they were nevertheless used by interpolating them to the same spatial resolution as the VIIRS IP AOD. This may have introduced some uncertainties, but fortunately the degree to which uncertainties occurred was not large because the values of the sample-based ten-fold CV R 2 of the AOD-only models shown in Figure 6 were relatively high. It is feasible to use only AOD itself to derive PM 2.5 concentrations using a spatiotemporal statistical model (e.g., [16]) when high-coverage satellite-retrieved AOD are available, but high-resolution auxiliary variables are very difficult to access. Remote Sens. 2019, 11, x FOR PEER REVIEW 10 of 19 Figure 6. Model fitting and sample-based ten-fold CV results for the nested AOD-only TFER model.  Table 3 lists the estimated best bandwidth, minimum AICc, range of the intercepts, AOD coefficients, and local R 2 for each GWR model. Except for the bandwidth in spring and winter, which was almost double those in the other seasons and the entire year, all the other parameters had reasonable values and ranges. Spatially varying coefficients of the intercept and AOD indicated that the second-stage GWR captured the spatial heterogeneities of the PM2.5-AOD relationships. Spatially varying local R 2 showed the extent to which this improvement was achieved, up to a maximum of 0.672. Therefore, the performance of the first-stage nested TFER model was improved.  Figure 7 illustrates the seasonal-spatial distribution of satellite-based and ground measured PM2.5 concentrations. The PM2.5 predictions from the overall model were higher than those from the first-stage nested TFER model. These higher values were much closer to the real level shown in Figure  7C, indicating the necessity and effectiveness of adding the second-stage GWR models. In terms of the spatiotemporal pattern of PM2.5 pollution in Beijing, during summer, autumn, and the entire year, from the northwestern areas to the southeastern areas, a gradient of slight to severe PM2.5 pollution was observed. These gradual changes were consistent with previous studies [16,47] but this study Figure 6. Model fitting and sample-based ten-fold CV results for the nested AOD-only TFER model. By removing the VIIRS IP AOD variable, a non-AOD nested TFER model was fitted. Table 2 lists the results of F tests between the full and non-AOD models. It shows that the incorporation of the VIIRS IP AOD retrievals significantly improved the performance of the TFER models with daily, weekly, and monthly intercepts at α = 0.05. In addition, the extent to which the improvement occurred follows an increasing order of the TFER models with daily, weekly, monthly intercepts according to the size of the F values.  Table 3 lists the estimated best bandwidth, minimum AICc, range of the intercepts, AOD coefficients, and local R 2 for each GWR model. Except for the bandwidth in spring and winter, which was almost double those in the other seasons and the entire year, all the other parameters had reasonable values and ranges. Spatially varying coefficients of the intercept and AOD indicated that the second-stage GWR captured the spatial heterogeneities of the PM 2.5 -AOD relationships. Spatially varying local R 2 showed the extent to which this improvement was achieved, up to a maximum of 0.672. Therefore, the performance of the first-stage nested TFER model was improved.  Figure 7 illustrates the seasonal-spatial distribution of satellite-based and ground measured PM 2.5 concentrations. The PM 2.5 predictions from the overall model were higher than those from the first-stage nested TFER model. These higher values were much closer to the real level shown in Figure 7C, indicating the necessity and effectiveness of adding the second-stage GWR models. In terms of the spatiotemporal pattern of PM 2.5 pollution in Beijing, during summer, autumn, and the entire year, from the northwestern areas to the southeastern areas, a gradient of slight to severe PM 2.5 pollution was observed. These gradual changes were consistent with previous studies [16,47] but this study provides more spatial detail thanks to the high spatial resolution of the VIIRS IP AOD retrievals. However, similar gradual changes did not occur during spring and winter, but noisy distributions were found instead. This phenomenon may be attributed to very low spatiotemporal coverage (Figure 2) plus the insufficiently high accuracy [22][23][24] of the VIIRS IP AOD retrievals, which constitutes one of the deficiencies of these retrievals and deserves attention and improvement in the future. In addition, the fact that large quantities of coarse particles intrude into Beijing during spring [48] might also badly influence the accuracy of PM 2.5 estimates during this season. Seasonally, it was only possible to conclude roughly that PM 2.5 pollution during summer was relatively slight.

Discussion
To demonstrate the potential benefits and deficiencies of the nested spatiotemporal statistical model (hereafter referred to as the nested model), a non-nested spatiotemporal statistical model (hereafter referred to as the non-nested model) was developed and used to predict PM 2.5 concentrations. The difference between the nested and non-nested models was that the latter used the TFER model with daily intercepts only at the first stage. Consequently, daily intercepts and hence PM 2.5 concentrations could not be obtained for days without a model dataset. Figure 8 shows the PM 2.5 prediction maps from the non-nested model; they resemble those from the nested model, as shown in Figure 7. Nevertheless, one obvious difference was observed in winter (N.B. Also the remaining seasons and the entire year, shown in Figure 9B), during which the nested model produced more PM 2.5 estimates than the non-nested model. The absence of a model dataset on a certain day only meant that data integration at the PM 2.5 ground monitors failed, but the VIIRS IP AOD and other auxiliary variables may still have existed at other locations on that day. The nested model used weekly and monthly intercepts to predict PM 2.5 concentrations on those days, thus maximizing the use of existing datasets and leading to a better coverage of the final PM 2.5 estimates.

Discussion
To demonstrate the potential benefits and deficiencies of the nested spatiotemporal statistical model (hereafter referred to as the nested model), a non-nested spatiotemporal statistical model (hereafter referred to as the non-nested model) was developed and used to predict PM2.5 concentrations. The difference between the nested and non-nested models was that the latter used the TFER model with daily intercepts only at the first stage. Consequently, daily intercepts and hence PM2.5 concentrations could not be obtained for days without a model dataset. Figure 8 shows the PM2.5 prediction maps from the non-nested model; they resemble those from the nested model, as shown in Figure 7. Nevertheless, one obvious difference was observed in winter (N.B. Also the remaining seasons and the entire year, shown in Figure 9B), during which the nested model produced more PM2.5 estimates than the non-nested model. The absence of a model dataset on a certain day only meant that data integration at the PM2.5 ground monitors failed, but the VIIRS IP AOD and other auxiliary variables may still have existed at other locations on that day. The nested model used weekly and monthly intercepts to predict PM2.5 concentrations on those days, thus maximizing the use of existing datasets and leading to a better coverage of the final PM2.5 estimates. The predicted PM2.5 concentrations from the nested and non-nested models were further compared with the observed PM2.5 concentrations from the 35 monitoring sites. Figure 9B shows that the numbers of available daily PM2.5 estimates to calculate seasonal and annual PM2.5 estimates all increased, with an increase of approximately 50% in summer, autumn, and the entire year. Figure 9A shows that the accuracy of seasonal and annual PM2.5 estimates increased or declined during different seasons and the entire year. Generally, the accuracy of seasonal and annual PM2.5 estimates based on daily PM2.5 estimates is subject to two factors: accuracy and the number of daily PM2.5 estimates. More available daily PM2.5 estimates with higher accuracy produce more accurate seasonal and annual AOD retrievals during these two seasons were rather low, the advantages of which did not outweigh the disadvantages of the accumulation of lower accuracies, thereby leading to a decline, or more strictly speaking, a fluctuation, in the accuracies of winter and spring PM2.5 estimates. The nested model significantly increased the number of daily PM2.5 estimates in summer, autumn, and the entire year, the advantages of which outweighed the disadvantages of the accumulation of lower accuracies, thereby resulting in improved accuracies of summer, autumn, and annual PM2.5 estimates. The present work, to some extent, resembles the work by Ma et al. [15], who developed a nested linear mixed effects (LME) regression model to improve the accuracy of PM2.5 estimates on days without a model dataset. As stated in previous work by the authors [17], the LME model is more widely used for analyzing hierarchical data and accommodating complicated hierarchical correlations of observations, whereas the TFER model, which is derived from panel data regression models, is relatively easier to calibrate and to use for prediction. Because the sample size, variables, and study area differed between the work by Ma et al. [15] and the present work differed, it was difficult to compare directly the sample-based ten-fold CV R 2 values of these two studies. Nevertheless, it could be roughly concluded that a comparable model performance has been obtained without specifying varying slopes, which may increase the model's complexity, indicating the The predicted PM 2.5 concentrations from the nested and non-nested models were further compared with the observed PM 2.5 concentrations from the 35 monitoring sites. Figure 9B shows that the numbers of available daily PM 2.5 estimates to calculate seasonal and annual PM 2.5 estimates all increased, with an increase of approximately 50% in summer, autumn, and the entire year. Figure 9A shows that the accuracy of seasonal and annual PM 2.5 estimates increased or declined during different seasons and the entire year. Generally, the accuracy of seasonal and annual PM 2.5 estimates based on daily PM 2.5 estimates is subject to two factors: accuracy and the number of daily PM 2.5 estimates. More available daily PM 2.5 estimates with higher accuracy produce more accurate seasonal and annual PM 2.5 estimates. The situation was, however, slightly more complex in this study because the nested model provided more available daily PM 2.5 estimates with lower accuracies. Therefore, the eventual accuracy of the seasonal and annual PM 2.5 estimates was dependent on the tradeoff between these two opposing factors. The nested model increased the number of daily PM 2.5 estimates to a very limited degree in winter and spring given that the original spatiotemporal coverages of the VIIRS IP AOD retrievals during these two seasons were rather low, the advantages of which did not outweigh the disadvantages of the accumulation of lower accuracies, thereby leading to a decline, or more strictly speaking, a fluctuation, in the accuracies of winter and spring PM 2.5 estimates. The nested model significantly increased the number of daily PM 2.5 estimates in summer, autumn, and the entire year, the advantages of which outweighed the disadvantages of the accumulation of lower accuracies, thereby resulting in improved accuracies of summer, autumn, and annual PM 2.5 estimates.
The present work, to some extent, resembles the work by Ma et al. [15], who developed a nested linear mixed effects (LME) regression model to improve the accuracy of PM 2.5 estimates on days without a model dataset. As stated in previous work by the authors [17], the LME model is more widely used for analyzing hierarchical data and accommodating complicated hierarchical correlations of observations, whereas the TFER model, which is derived from panel data regression models, is relatively easier to calibrate and to use for prediction. Because the sample size, variables, and study area differed between the work by Ma et al. [15] and the present work differed, it was difficult to compare directly the sample-based ten-fold CV R 2 values of these two studies. Nevertheless, it could be roughly concluded that a comparable model performance has been obtained without specifying varying slopes, which may increase the model's complexity, indicating the feasibility of the combination of VIIRS IP AOD retrievals and the nested TFER model in Beijing city. In addition, the modeling steps can be easily repeated in other cities, meaning that the present work contributes not only a specific model for Beijing city, but also a modeling framework for similar cities, as long as high spatial resolution satellite-retrieved AOD and correlated auxiliary variables are accessible. Attention should be paid to selecting appropriate variables because factors influencing PM 2.5 -AOD relationships vary from one geographical area to another.
One limitation associated with this study is that no measures were taken to supplement the missing data in the VIIRS IP AOD retrievals. A nested model was developed to make the maximum use of these retrievals, but it was still difficult to produce daily PM 2.5 estimates with high accuracy for each day. Therefore, the choice was made to calibrate and validate the five GWR models at seasonal and annual scales in this study. One possible explanation of the low spatiotemporal coverage of the VIIRS IP AOD retrievals was that the current version of the VIIRS over-land algorithm, mostly based on the MODIS atmospheric correction algorithm [49], does not retrieve aerosol properties over bright surfaces, in cloud-affected pixels, over inland water such as the Great Lakes, or at night [36]. As such, a relatively low overall 16.12% spatiotemporal coverage occurred due to the limited number of clear sky days in Beijing in 2014. The value became even smaller during winter when leaves falling leading to more bright pixels. Developing specific algorithms for bright areas to retrieve AOD from the VIIRS will probably benefit mitigating this missing data problem. The LAADS Distributed Active Archive Center (DAAC), for instance, has recently released the Suomi-NPP VIIRS Deep Blue Aerosol products (https://ladsweb.modaps.eosdis.nasa.gov/alerts-and-issues/?id=25803). Integrating AOD retrievals from other sources such as 1-km MAIAC AOD could also be very helpful, thereby generating seasonal and annual PM 2.5 estimates as well as more days of PM 2.5 estimates with higher accuracy and coverage. Another limitation of this study is that a large number of independent variables proposed in previous studies were not tested here (e.g., the accuracy of the PM 2.5 estimates during spring could be potentially improved if variables representing coarse vs. fine fraction of aerosols were incorporated). The authors firmly believe that the model structure for the first-stage nested TFER model will prove to be more suitable and robust with more variables tested in the future.

Conclusions
According to the authors' knowledge, this study is one of the earliest applications of the most recent high-resolution VIIRS IP AOD to predict PM 2.5 concentrations across Beijing at a 750-m spatial resolution. The results were, on the whole, satisfactory despite some limitations. It could be concluded that the combination of the VIIRS IP AOD and the nested spatiotemporal statistical model could estimate urban PM 2.5 concentrations. The daily PM 2.5 estimates explained ranged from 0.45 to 0.81. More daily PM 2.5 estimates were derived, and the accuracies of the summer, autumn, and annual PM 2.5 estimates were improved based on the nested spatiotemporal statistical model. This study can surely benefit fine-scale PM 2.5 -related studies, such as urban-scale PM 2.5 exposure assessment.   Appendix A Figure A1. Screenshot of determining the best model structure for the TFER model with daily intercepts. Figure A2. Screenshot of determining the best model structure for the TFER model with weekly intercepts. Figure A2. Screenshot of determining the best model structure for the TFER model with weekly intercepts. Figure A2. Screenshot of determining the best model structure for the TFER model with weekly intercepts. Figure A3. Screenshot of determining the best model structure for the TFER model with monthly intercepts.