Satellite-Based Mapping of High-Resolution Ground-Level PM 2.5 with VIIRS IP AOD in China through Spatially Neural Network Weighted Regression

: Satellite-retrieved aerosol optical depth (AOD) data are extensively integrated with ground-level measurements to achieve spatially continuous ﬁne particulate matters (PM 2.5 ). Current satellite-based methods however face challenges in obtaining highly accurate and reasonable PM 2.5 distributions due to the inability to handle both spatial non-stationarity and complex non-linearity in the PM 2.5 –AOD relationship. High-resolution (<1 km) PM 2.5 products over the whole of China for ﬁne exposure assessment and health research are also lacking. This study aimed to predict 750 m resolution ground-level PM 2.5 in China with the high-resolution Visible Infrared Imaging Radiometer Suite (VIIRS) intermediate product (IP) AOD data using a newly developed geographically neural network weighted regression (GNNWR) model. The performance evaluations demonstrated that GNNWR achieved higher prediction accuracy than the widely used methods with cross-validation and predictive R 2 of 0.86 and 0.85. Satellite-derived monthly 750 m resolution PM 2.5 data in China were generated with robust prediction accuracy and almost complete coverage. The PM 2.5 pollution was found to be greatly improved in 2018 in China with annual mean concentration of 31.07 ± 17.52 µ g/m 3 . Nonetheless, ﬁne-scale PM 2.5 exposures at multiple administrative levels suggested that PM 2.5 pollution in most urban areas needed further control, especially in southern Hebei Province. This work is the ﬁrst to evaluate the potential of VIIRS IP AOD in modeling high-resolution PM 2.5 over large-scale. The newly satellite-derived PM 2.5 data with high spatial resolution and high prediction accuracy at the national scale are valuable to advance environmental and health researches in China.


Introduction
Fine particulate matters (e.g., PM 2.5 with aerodynamic diameter below 2.5 µm) are directly adverse to the environment and human health [1,2]. With the continued development of the economy and industry in recent decades, China is currently experiencing heavily PM 2.5 pollutions due to the mass emissions of air pollutants [3][4][5]. The ground-level monitoring network built in China supplies precise and stable observations, but limitations are inevitably encountered in evaluating pollution exposures and health effects due to the uneven and sparse measurements [6,7]. It is crucial to portray the complete space-time distributions of PM 2.5 to facilitate environmental and health researches in China.
Unlike ground-level measurements, satellite-based derivatives provide wide coverage datasets. Considering the significant correlation between PM 2.5 and aerosol optical depth Gridded population count data were also included to assess the pollution exposures in China on fine-scale. Table S1 displayed the detailed information and sources of the datasets.

Monitoring PM 2.5 Data
Hourly surface PM 2.5 data from 1 June 2017 to 31 May 2018 in China were acquired from the China Environmental Monitoring Centre (CEMC). The data quality was calibrated through the national standard of GB3095-2012. The valid observations from 1465 monitoring stations in 338 cities of China are shown in Figure 1.

VIIRS IP AOD Data
The VIIRS instrument was designed to continue the heritage of MODIS and was equipped with a 13:30 ascending node to obtain global coverage on each day. The VIIRS IP AOD data at 550 nm were produced by using moderate bands and the AOD products during the period 2017-2018 were downloaded from the Comprehensive Large Array-data Stewardship System (CLASS). To reduce the impacts of data bias in VIIRS IP AOD for PM 2.5 modeling, the AOD data with the best quality (flag = 0) were retained. The VIIRS IP AOD data retrieved on the same day were mosaicked and clipped to China extent with 750 m resolution using ArcGIS 10.3. Data Interpolating Empirical Orthogonal Functions (DINEOF) is a self-consistent and parameter-free interpolation method that can reconstruct missing data in both spatial fields and time series [46]. The DINEOF method has been widely used in various studies to fill data gaps [47,48]. Due to the low spatial coverage of daily VIIRS IP AOD data, we first used inverse distance weighted (IDW) to fill the small gaps in each day, and then applied the DINEOF method to reconstruct the full spatiotemporal coverage of VIIRS IP AOD data. Figure 1 depicts the annual mean values of VIIRS IP AOD, which indicates that the AOD values are higher in eastern China than in western China.

Meteorological Data
ERA5 is the fifth-generation atmospheric reanalysis model of global climate and provides hourly atmospheric estimates worldwide at a moderate resolution (0.1 • or 0.25 • ).
Meteorological data generated by ERA5 at the surface level during the period 2017-2018 were downloaded in this study, including 10 m wind components, 2 m air temperature (TEMP), total precipitation (TP), surface pressure (SP), boundary layer height (BLH), evaporation, and relative humidity (RH). We also computed the wind speed (WS) and wind direction (WD) based on the two wind components.

Geographical Data and Land Cover Data
Digital elevation model (DEM) with 90 m spatial resolution was obtained from the Shuttle Radar Topography Mission (SRTM). Normalized difference vegetation index (NDVI) indicating land cover, from 2017 to 2018 covering China with 500 m and 16 d resolutions retrieved from MODIS were achieved from the Level-1 and Atmospheric Archive and Distribution System (LAADS).

Population Data
Gridded population data of the world (GPWv4) with 5-year intervals during 1995-2020 are accessible at the National Aeronautics and Space Administration (NASA) Socioeconomic Data and Applications Center (SEDAC). The dataset of population count is produced as global grids at 30 arc-second resolution (~1 km). The annualized variation rate r is computed as: r = ln(P 2 /P 1 )/t (1) where P 1 and P 2 are the population counts of the earlier and current censuses, and t is the year span between two censuses. The population count in 2017 were then calculated by using the data in 2015 and 2020 as follows: P 2017 = P 2015 e rt = P 2015 e ln (P 2020 /P 2015 )/5×2 (2)

Data Preprocessing and Analysis
The experimental data included point datasets and raster datasets. The raster datasets were resampled to a 750 m grid (0.00625 • ) of China through bilinear interpolation. The PM 2.5 measurements, VIIRS IP AOD data, and other auxiliary data were averaged to monthly scale for experiments. The pixel values of the raster datasets in the PM 2.5 monitoring sites were then extracted to produce regression datasets of 12 months from 1 June 2017 to 31 May 2018. Correlation analysis and variance inflation factors (VIFs) were utilized to evaluate the statistical significance and collinearity among the parameters. Those variables with lower VIF values and higher Pearson's r values were retained for spatial modeling. According to the results of explanatory analysis (Table S2), AOD, WS, TEMP, TP, BLH, RH, DEM, NDVI were used as explanatory variables since they were strongly correlated with PM 2.5 and the VIF values suggested that their multicollinearity was rather weak.

Methods
The present work aims to map highly accurate and reasonable PM 2.5 with 750 m resolution by using a newly developed GNNWR model. To evaluate its performance, the widely used GWR and GRNN were also performed for comparison and the OLR was used as a baseline model. Previous PM 2.5 estimation researches evaluated model performance mainly using cross-validation (CV), which is not quite reliable to ensure its predictive ability [40]. Hence, a 10-fold CV and an external evaluation were adopted for the development of each model. In the modeling, the experimental data was spilt into a training dataset (85%) and a testing dataset (15%) for model calibration and evaluation. Each of the 10-fold CV models was implemented using 90% samples of the training dataset, with the rest 10% training samples for validation. The final model was built with the best parameters achieve by the 10-fold CV with all training samples. The testing dataset was used for the external evaluation of the final model. Model performance was evaluated using some metrics, including coefficient of determination (R 2 ), root mean square error (RMSE), mean absolute error (MAE), mean absolute percentage error (MAPE) and mean error (Bias).

GNNWR Model
The structure of GNNWR for estimating the PM 2.5 -AOD relationship was built as follows: β = (β 0 , β 1 , · · · , β 8 ) were the coefficients of the OLR model, which reflected the average PM 2.5 -AOD relationship over China. The ordinary least squares (OLS) estimates of β were expressed as:β where Then, the estimated valueP M 2.5i was calculated as: where W i was a spatial weight matrix and was exactly calculated through the SWNN of GNNWR as follows [43]: where d S i1 , d S i2 , · · · , d S in denoted the spatial distance between point i and all training points. Figure 2 illustrates the PM 2.5 modeling process of GNNWR. The OLR coefficients representing the average PM 2.5 -AOD relationship in China were pre-calculated using the training dataset. The prediction accuracy of PM 2.5 estimates was dependent on the non-linear fitting ability of SWNN, which was optimized through a grid search strategy (Supplementary Materials, SM). The neural network structure and hyper-parameters of GNNWR are shown in Table S3.
Moreover, we designed an adaptive learning rate strategy of stochastic gradient descent (SGD) for model training ( Figure S1). First, a small learning rate lr start was initialized and slowly increased to the maximum value lr max with rate α, which assured that the model gradually learns data features while avoided gradient explosion. Then, we accelerated the model training with the maximum learning rate. Finally, the learning rate exponentially declined by rate γ until the optimal model was achieved. In our experiments, lr start , lr max , α, and γ were set to 0.1, 0.35, 0.05, and 0.98 as follows: In addition, GNNWR used mean square error (MSE) as its loss function and was implemented based on Tensor Flow 1.6.0 and Python 3.7. The implementation details were described in SM and Figure S2.

GWR Model
The widely used GWR model was implemented to assess the performance of GNNWR with the following form: where β ik denoted the series coefficients of point i. GWR captures the local effects through the spatially varying coefficients β i , which was calculated as follows: where W i is an n × n geographical weighting matrix with the diagonal elements representing non-stationary weights. A certain weight kernel of GWR should be specified to calculate the weight matrix. The widely used fixed Gaussian and adaptive bi-square weighting functions were adopted in our experiments. In addition, GWR used the corrected Akaike information criterion (AICc) value as its performance criterion to achieve the best model and was implemented in MATLAB 2013a.

GRNN Model
The GRNN contains four neural layers: input, hidden, summation and output layers. The input layer with neuron size equal to the length of independent variables is directly delivered to the hidden layer. The hidden layer has neurons equal to the count of training samples, and its transfer function is expressed as [49]: where w ij is the jth output of the hidden layer and σ is the parameter which should be optimized; is the input vector of point i and x j is corresponding to the jth trainging sample. Then, two summations named S yw and S w are calculated in the summation layer and the estimateP M 2.5i in the output layer of GRNN is consequently represented as: where S yw and S w are the weighted and simple sums of the outputs from hidden layer, and PM 2.5 j is binding with the jth hidden neuron.

Population Weighted PM 2.5
To explore the population exposures on fine-scale in China, the spatially population data were incorporated to calculate the population weighted PM 2.5 (PPM 2.5 ) at multiple administrative units (e.g., province, city and county): where PPM 2.5i is the population-weighted PM 2.5 at administrative unit i, PM 2.5ij and Pop ij are the PM 2.5 and population data of pixel j in administrative unit i. It can be noted that the histograms of PM 2.5 and AOD were somewhat dissimilar, mainly because their complex associations were significantly affected by geographical, meteorological, and seasonal conditions [6]. Table 1 displays the mean performance of PM 2.5 estimations of monthly datasets for the OLR, GWR, GRNN and GNNWR models. Scatter plots of PM 2.5 estimates versus observations among these methods are presented in Figure 4. We can see that GWR, GRNN and GNNWR models were far superior to the OLR model for all statistical indicators. Regarding the fitting performance of the 10-fold CV results, the training and validation accuracy of the GWR with adaptive bi-square kernel (GWR-AB) were better than the GWR with fixed Gaussian kernel (GWR-FG) according to the statistics of R 2 , RMSE, MAE and MAPE. This suggested that the adaptive kernel was more suitable than the fixed kernel to characterize the spatially non-stationary PM 2.5 -AOD relationship, which likely resulted from the uneven distribution of monitoring stations in China [50].

Model Performance
Compared with the GWR-AB model, GRNN had higher training accuracy, but its validation performance was inferior. For example, GRNN increased the training R 2 of GWR-AB from 0.87 to 0.88 and decreased the training RMSE and MAE values from 8.36 and 5.62 µg/m 3 to 7.84 and 5.26 µg/m 3 . However, the CV R 2 of GWR-AB (0.81) was higher than GRNN (0.79) and its CV RMSE and MAE (9.87 and 6.47 µg/m 3 ) were lower than GRNN (10.42 and 6.97 µg/m 3 ). It was worth noting that the training and validation accuracies of GNNWR were higher than GWR and GRNN. Comparing with the best comparison indicators, GNNWR improved the training R 2 and RMSE of GRNN from 0.88 and 7.83 µg/m 3 to 0.90 and 7.27 µg/m 3 , while the CV R 2 and RMSE of GWR-AB were considerably enhanced from 0.81 and 9.87 µg/m 3 to 0.86 and 8.50 µg/m 3 of GNNWR. Histograms and descriptive statistics of the experimental datasets. The green color represents the smaller 25% data, the yellow color represents the middle 50% data, and the red color represents the larger 25% data. The prediction performance assessed by the external evaluation of testing dataset illustrated good agreement with the results assessed by the 10-fold CV (Table 1). Although the predictive indicators of OLR was still the worst among the models, its prediction performance was better than its fitting performance. This suggests that OLR does not suffer from overfitting. Comparing the prediction accuracy between GWR and GRNN, GWR-AB achieved the highest prediction performance and GRNN ranked second better than GWR-FG. Table 1 also revealed that the prediction indicators of GNNWR were superior to GWR and GRNN. Compared with the best comparison model (i.e., GWR-AB), the predictive R 2 was raised from 0.83 to 0.85 and the predictive RMSE, MAE and MAPE were declined at least 7%. Similar differences in model performance among the methods were noted in Figure 4. The scatter plots of OLR were most dispersive consistent with its worst performance. GWR and GRNN exhibited high estimation accuracy for low values but commonly underestimated high values. For the GNNWR model, its estimates showed the best fitting and prediction precision and less encountered unreasonable underestimation and overestimation.
In the meantime, it was unexpected that the mean estimated bias of OLR was very small and close to 0 (Table 1), which was much better than the other models. However, the scatter plots of OLR ( Figure 4) indicated that its underestimation of high values was quite severe. To investigate the potential reason, we further evaluated model performance in the subsets of estimates, including PM 2.5 concentration within 0-50 µg/m 3 , 50-100 µg/m 3 and >100 µg/m 3 as shown in Table 2. We can clearly observe that all models had a tendency of overestimating at low values while underestimating at high values. The mean bias of OLR showed that its biases at both low and high values were greater than other models, which verified its worst performance. We therefore speculated that the positive bias of OLR at low values and its negative bias at high values happened to cause its overall bias to be small. The estimation performance of GNNWR was the best for all subsets according to RMSE, MAE and MAPE values. Further, the estimated bias of GNNWR was also smaller than other models, especially for high values. Figures S3-S5 show the statistical indicators of model performance for OLR, GWR, GRNN and GNNWR in each month. It can be obviously noticed that the R 2 values of GNNWR were the highest while its RMSE, MAE and MAPE values were lowest among the models for almost all months, demonstrating its excellent ability and robustness to capture spatial non-stationarity and complex non-linearity in the PM 2.5 -AOD relationship.   Figure 5 displays the spatial mappings of annual mean satellite-estimated PM 2.5 obtained by GNNWR, and ground-measured PM 2.5 in China. The spatial varying pattern of the satellite-derived PM 2.5 estimates agreed well with ground-level observations, which was also consistent with the previous studies [16,17,51]. Notably, GNNWR has obtained almost complete spatial coverage of PM 2.5 estimates thanks to the spatiotemporal interpolation of the missing data of VIIRS IP AOD. More specifically, the annual mean PM 2.5 concentration was 31.07 ± 17.52 µg/m 3 and heavy pollutions were mainly located in Hebei, Henan, and Shandong Provinces, which were thought to result from the rapidly economic development and highly accelerated urbanization [6]. The Taihang Mountains in Hebei Province also hampered the PM 2.5 dispersions, further aggravating air pollutions in these areas [16]. High PM 2.5 concentrations as well appeared in the Xinjiang Autonomous Region due to the great quantity of dust aerosols coming from the Taklimakan Desert [17,52,53]. By contrast, the southern and northern provinces exhibited much better air quality with PM 2.5 concentrations basically below 30 µg/m 3 , such as Hainan, Guangdong, and Heilongjiang Provinces. In all,~30% areas of China exceeded the Chinese air quality standard of Level II (35 µg/m 3 ).  Figure 6 shows the maps of seasonal mean PM 2.5 concentrations achieved by GNNWR, which also covered almost the entire range of China. Significant seasonal variations were observed with the heaviest pollution in winter and relatively low pollution in summer. The average PM 2.5 in winter was 42.30 ± 25.21 µg/m 3 with >55% areas exceeding the Chinese air quality standard of Level II. This was mainly caused by the coal burning of heating systems in northern China and the unfavorable weather for pollutant dispersion in eastern China [54][55][56]. In summer, China showed the best air quality (20.12 ± 11.54 µg/m 3 ) with only 8.92% areas higher than the Chinese air quality standard of Level II. The primary explanation was that the abundant clean air and sufficient precipitation in summer greatly improved the air quality [16]. In addition, the high vegetation covers in summer helped the absorption of air pollutants, which also reduced the PM 2.5 concentrations to a certain extent [6]. The PM 2.5 in spring was higher than the annual mean with an average value of 35.77 ± 28.74 µg/m 3 and the mean PM 2.5 in autumn was 26.82 ± 16.78 µg/m 3 . Nevertheless, the PM 2.5 pollutions in China have been considerable reduced compared to previous studies performed in early years [6,7,17,26]. Since the GNNWR model was built on a monthly scale, the spatial mappings of monthly satellite-estimated and ground-measured PM 2.5 in China are presented in Figure 7. The satellite-estimated PM 2.5 agreed spatially well with the ground-measured observations in each month. Spatial discrepancies in PM 2.5 pollutions were noticeable on monthly scale. The most severe PM 2.5 pollution is in January of winter (52.38 ± 46.16 µg/m 3 ) with~60% of the country exceeding the Chinese air quality standard of Level II. By contrast, the August of summer experiences the lightest air pollution (15.68 ± 11.56 µg/m 3 ) with only~5% areas exceeding the air quality standard.

Applicability and Superiority of the GNNWR Model
Satellite-retrieved aerosol optical depth (AOD) data are extensively integrated with ground-level measurements to achieve spatially continuous fine PM 2.5 distributions [7][8][9][10][11][12][13][14][15][16]. Current satellite-based methods however face challenges in obtaining highly accurate and reasonable PM 2.5 distributions due to the inability to handle both spatial non-stationarity and complex non-linearity in the PM 2.5 -AOD relationship. This study therefore developed a satellite-based spatially neural network weighted regression model (i.e., GNNWR) to achieve highly accurate and reasonable satellite-derived PM 2.5 mapping with fine resolution (<1 km) across the entire China. Table 1 has shown that the performance of the GWR, GRNN and GNNWR models were far superior to the OLR model, which manifests significant non-stationarity and non-linearity indeed in the PM 2.5 modelling. The GRNN model presented higher training accuracy than the GWR-AB model, but its validation performance was inferior, suggesting that although GRNN could enhance the training accuracy of PM 2.5 -AOD relationship through its non-linear fitting capability, it may encounter severe overfitting problem since it does not take spatial effects into account. At the meantime, the higher training and validation accuracies of GNNWR than GWR and GRNN, demonstrates that GNNWR is more powerful than GWR and GRNN in fitting PM 2.5 -AOD relationship. It also suggests that the SWNN in GNNWR is superior to the spatial kernel of GWR in constructing spatially non-stationary weights [43]. Furthermore, the validation and prediction accuracies of GNNWR were even comparable to the training accuracy of GWR-AB and much closer to its training performance than GRNN. This demonstrates that the newly developed GNNWR robustly captured spatial non-stationarity and complex non-linearity in the PM 2.5 -AOD relationship and obtained an excellent prediction capacity for PM 2.5 modelling. The higher estimation accuracies and smaller estimated biases of GNNWR than other models for all subsets in Table 2 further indicates its broad applicability in PM 2.5 modelling.
Compared to previous works that integrated MODIS and MAIAC AOD products to map high-resolution PM 2.5 across the entire China, the prediction performance of our GNNWR model (CV and predictive R 2 of 0.86 and 0.85) was superior to most earlier studies with CV R 2 of 0.64-0.83 [6,12,16,27,57,58] and was also comparable to recent studies with CV R 2 of 0.76-0.89 [17,36,42,59,60]. Since the data accuracy of VIIRS IP AOD was somehow inferior to the MODIS and MAIAC AOD products, these results also demonstrated the robust predictive power of GNNWR for modeling the PM 2.5 -AOD relationship.

Accuracy and Reasonability of Satellite-Derived PM 2.5 Mappings
To assess the accuracy and reasonability of satellite-derived PM 2.5 mappings of GN-NWR, annual mean PM 2.5 maps of China achieved by OLR, GWR, and GRNN are depicted for comparison ( Figure S6). It can be noted that the spatial PM 2.5 maps of these models were basically consistent. The OLR estimates exhibited homogenous trends in space due to its inconsideration of spatial non-stationarity while the GRNN estimates displayed some unreasonable variations in western China caused by its overfitting. Obviously, the maps of GWR-AB and GNNWR were more reasonable and coincident with the ground-level measurements. The North China Plain is a highly developed and polluted area in China and is of great concern to the public [15,61]. The satellite-derived PM 2.5 estimates of these models in this area are amplified for detailed comparison (Figure 8). We can see that OLR and GRNN were unable to portray the high PM 2.5 concentrations (>60 µg/m 3 ) in heavily polluted areas due to their serious underestimation of high values. Although GWR-AB was capable of mapping high PM 2.5 and displayed a better distribution than OLR and GRNN, its spatial varying pattern was too steep and showed some unreasonable distributions, such as the high-value aggregation in Shandong Province and the drastic mutation in Shanxi Province. It was clear that GNNWR has obtained highly accurate estimates and rarely exhibited unreasonable variations benefited from its excellent prediction ability.

Analysis of Fine-Scale Population Exposures at Multiple Levels
High spatial resolution PM 2.5 datasets advance the full estimation of population exposure (i.e., PPM 2.5 ) with smaller bias produced by the averaging within each pixel [62,63]. By resampling our 750 m resolution PM 2.5 product to a 1 km grid and linking it with the GPWv4 population estimates, the annual mean PPM 2.5 concentrations at province level were calculated (Figure 9 and Table S4). Based on the urbanization calculation method proposed by [62], the urban areas in China were measured by the population density exceeding 600 people/km 2 ( Figure S7). Consequently, the PM 2.5 exposures for both urban and rural areas were also estimated to comprehensively explore the air pollution levels in China. We can note that the annual mean PPM 2.5 was higher than the PM 2.5 in most provinces and the PPM 2.5 in Xinjiang, Hebei, Shanxi, Henan and Beijing were more than 50 µg/m 3 . Surprisingly, although the mean population of Xinjiang Autonomous Region was quite small, its annual mean PPM 2.5 was the highest of 63.27 µg/m 3 , suggesting that most people in the Xinjiang Autonomous Region were located in the high PM 2.5 areas. In addition, Sichuan Province has greatly increased its PM 2.5 from 20.09 µg/m 3 to 40.08 µg/m 3 of PPM 2.5 with a growth rate of~100%, which also demonstrated that the population in Sichuan Province was mainly concentrated in highly polluted areas. Although the urban areas were much smaller than rural areas in China ( Figure S7), the urban PPM 2.5 was significantly higher than the rural PPM 2.5 for almost all provinces, except Xinjiang, Tianjin, and Jiangsu. The urban PPM 2.5 in most provinces exceeded the Chinese air quality standard of Level II and the high PM 2.5 exposures were primarily concentrated in the North China Plain (>55 µg/m 3 ). This manifested that urban residents were facing severer PM 2.5 pollutions and higher health risks than rural residents in China. More efforts are urgently needed to further control the PM 2.5 emissions in Chinese urban areas.
The high-resolution PM 2.5 product enabled us to investigate air pollutions in urban cities on more fine-scale. The annual mean PPM 2.5 concentrations at the city and county levels in the North China Plain were calculated to study the detailed exposures in highly polluted areas of China ( Figure 10). From the view of the city level, the high PPM 2.5 areas (>57.5 µg/m 3 ) were aggregated and mainly located in southern Hebei, southern Shanxi and northern Henan. Beijing and Tianjin showed lower PPM 2.5 than the southern Hebei, mainly attributed to the strict control policies of local governments in recent years. The PPM 2.5 at the county level presented more detailed spatial patterns. Figure 10b illustrated that the high PPM 2.5 areas in southern Hebei Province were more agglomerate than Shanxi and Henan Provinces, indicating the heaviest PM 2.5 pollutions and highest health risks in these areas.

Conclusions
To obtain highly accurate and reasonable satellite-derived PM 2.5 mapping with fine resolution (<1 km) across the entire China, this study developed a satellite-based spatially neural network weighted regression model (i.e., GNNWR) to deal with both spatial non-stationarity and complex non-linearity in PM 2.5 -AOD modelling. Our work was the first to use the high-resolution VIIRS IP AOD to predict 750 m resolution PM 2.5 concentrations over large-scale in China. The newly developed GNNWR model presented obvious advantages in PM 2.5 estimations because it overcame the defect of GWR to address complex non-linear features and enabled the neural network to handle spatial effects in the PM 2.5 -AOD relationship.
The model performance of GNNWR was fully assessed through a 10-fold CV and an external evaluation by comparing it with the OLR, GWR and GRNN methods. The GNNWR model achieved the highest performance for both fitting (training R 2 = 0.90, RMSE = 7.27 µg/m 3 and CV R 2 = 0.86, RMSE = 8.50 µg/m 3 ) and prediction (R 2 = 0.85, RMSE = 8.16 µg/m 3 ). Detailed comparison of spatial mappings also indicated that GN-NWR obtained superior prediction accuracy and rarely exhibited unreasonable variations. Although the data accuracy of VIIRS IP AOD was somehow inferior to the MODIS and MAIAC AOD, the prediction performance of GNNWR was comparable to most previous studies conducted across the entire China, further demonstrating its robust predictive power of PM 2.5 modeling. The potential of VIIRS IP AOD to estimate PM 2.5 with complex dispersions over large-scale has also been validated.
Our newly generated PM 2.5 data in China had higher spatial resolution (750 m) than most existing products and provided almost complete spatial coverage, which are valuable to advance environmental and health research in China. The spatial mapping of our PM 2.5 product showed that the PM 2.5 pollutions in China has been greatly improved in 2018 with annual mean concentration of 31.07 ± 17.52 µg/m 3 . Nonetheless, the fine-scale PM 2.5 exposures suggested that the PM 2.5 pollutions in most urban areas were still severe, especially in southern Hebei Province. More efforts are therefore needed for controlling PM 2.5 emissions in these populated and developed areas in the future.