Fusing Observational, Satellite Remote Sensing and Air Quality Model Simulated Data to Estimate Spatiotemporal Variations of PM2.5 Exposure in China

Estimating ground surface PM2.5 with fine spatiotemporal resolution is a critical technique for exposure assessments in epidemiological studies of its health risks. Previous studies have utilized monitoring, satellite remote sensing or air quality modeling data to evaluate the spatiotemporal variations of PM2.5 concentrations, but such studies rarely combined these data simultaneously. Through assembling techniques, including linear mixed effect regressions with a spatial-varying coefficient, a maximum likelihood estimator and the spatiotemporal Kriging together, we develop a three-stage model to fuse PM2.5 monitoring data, satellite-derived aerosol optical depth (AOD) and community multi-scale air quality (CMAQ) simulations together and apply it to estimate daily PM2.5 at a spatial resolution of 0.1◦ over China. Performance of the three-stage model is evaluated using a cross-validation (CV) method step by step. CV results show that the finally fused estimator of PM2.5 is in good agreement with the observational data (RMSE = 23.0 μg/m3 and R2 = 0.72) and outperforms either AOD-derived PM2.5 (R2 = 0.62) or CMAQ simulations (R2 = 0.51). According to step-specific CVs, in data fusion, AOD-derived PM2.5 plays a key role to reduce mean bias, whereas CMAQ provides spatiotemporally complete predictions, which avoids sampling bias caused by non-random incompleteness in satellite-derived AOD. Our fused products are more capable than either CMAQ simulations or AOD-based estimates in characterizing the polluting procedure during haze episodes and thus can support both chronic and acute exposure assessments of ambient PM2.5. Based on the products, averaged concentration of annual exposure to PM2.5 was 55.7 μg/m3, while averaged count of polluted days (PM2.5 > 75 μg/m3) was 81 across China during 2014. Fused estimates will be publicly available for future health-related studies.


Introduction
Many epidemiological studies have associated particulate matter with an aerodynamic diameter of ≤2.5 µm (PM 2.5 ) with adverse health outcomes, including cardiovascular and respiratory diseases [1,2], infant birth defects [3][4][5], DNA damages [6,7], cancer mortality [8,9] and many others. Severe PM 2.5 pollution in China has attracted considerable public attention [10][11][12] and inspired numerous epidemiological studies to investigate the health effects of air pollution in China since 2013 [13][14][15][16][17][18]. Accurately assessing PM 2.5 exposure is critical for estimating its health risks in such epidemiological studies. However, due to the limited number of ground monitors in China, previous studies generally ignored the spatial variation of PM 2.5 and assessed the ambient exposure uniformly using one monitor or averages of several monitors located within a city or a municipality [13][14][15][16], which causes exposure misclassification. Therefore, accurately estimating the fine-scale spatiotemporal variation of ground PM 2.5 may lay a foundation for future health-related studies of PM 2.5 in China.
Three types of numerical values have been applied in exposure assessments of ambient particles: (1) monitoring observations; (2) satellite remote sensing measurements of aerosol; and (3) air quality model simulations. Routine monitors were widely used to predict air pollution concentrations across an area using geostatistical methods such as Kriging [19,20] or land use regression (LUR) to incorporate external spatial covariates [21,22], but such monitors may be sparsely distributed in sub-urban or rural areas. The PM 2.5 monitoring network has been rapidly spreading over China. In 2013, only approximately 70 cities or municipalities were covered by official sites of the China Environmental Monitoring Center (CEMC), whereas by 2015, the number had increased to approximately 330. However, monitoring data remain inadequate for characterizing the national-scale spatial variability of PM 2.5 in China.
The satellite remote sensing technique can retrieve integrated column concentrations of gases and aerosol from the bottom to top of the atmosphere and has been applied to assess ground surface air pollution [23]. Satellite-derived aerosol optical depth (AOD) has been successfully associated with ground PM 2.5 [24] and has thus been used to generate spatiotemporal estimators of PM 2.5 by acting as a primary predictor in statistical models such as LUR [25][26][27] or being calibrated by ratios (PM 2.5 /AOD) simulated by a chemical transport model (e.g., GEOS-Chem) [28,29]. However, due to meteorological or geographical conditions, non-randomly missing values in satellite-derived AOD caused absent estimates of PM 2.5 in specific periods (e.g., winter [26]) or areas (e.g., deserts [28]).
Air quality models, such as the community multi-scale air quality model (CMAQ), simulate pollution concentrations based on emission inventories and chemical and physical processes driven by a meteorological model, such as the weather research and forecasting model (WRF) [30], and can provide exposure estimates with spatiotemporally complete coverage [31,32]. However, the accuracy of air quality models depends on the uncertainty of emission inventories and meteorological inputs and has thus been reported to vary with seasons and locations [33].
Hybrid models have been developed to combine different numerical values of air pollutants to improve exposure estimates. Beckerman et al. [34] estimated monthly PM 2.5 on an 8.9 km × 8.9 km grid over the contiguous United States (US) through combining LUR of monitors and satellite PM 2.5 derived from GEOS-Chem and AOD. Mcmillan et al. [35] developed a hierarchical Bayesian spatiotemporal model to bring monitors and CMAQ together and generated daily PM 2.5 and O 3 on both 36 km × 36 km and 12 km × 12 km grids over the US. Friberg et al. [36] introduced a method to fuse CMAQ simulations and monitoring observations for daily estimates of multiple air pollutants on a 12 km × 12 km grid over Georgia, US. Liu et al. [18] utilized the Ensemble Kalman Filter (EnKF) to assemble in situ observations with daily stimulations of PM 2.5 from an air quality model across China, and applied the analyzed products in risk assessment of chronic exposure to ambient pollution. Beloconi et al. [27] mixed spatiotemporal Kriging maps of monitoring data and satellite AOD to estimate fine-scale (1 km × 1 km) daily estimates of PM 2.5 and PM 10 during 2002-2012 over London.
To further increase the accuracy of exposure assessments of PM 2.5 via making full use of available data, this study aims to develop a fused estimator to join monitoring, satellite remote sensing and air quality modeling data together. Our approach incorporated: (1) monitoring records from routine sites as reference measurements of PM 2.5 ; (2) CMAQ simulations as prior knowledge, which provides completely spatiotemporal coverage of PM 2.5 ; and (3) satellite AOD as alternative observations with wider spatial coverage than monitors and higher accuracy than CMAQ. We applied a three-stage model for the fused estimator. In Step 1, we derived ground surface PM 2.5 from satellite AOD and calibrated CMAQ simulations by monitoring data using two separate regression models. In Step 2, we combined AOD-derived PM 2.5 and calibrated-CMAQ PM 2.5 using a maximum likelihood method. In Step 3, we incorporated the spatiotemporal autocorrelation of the monitoring data in the final estimator through interpolating the residuals in Step 2. We illustrated the three-stage model by a practice to develop daily maps of PM 2.5 on a regular grid of 0.1 • × 0.1 • over China. The performance of the statistical models was assessed using cross-validation (CV) methods by steps.  [38] and the sites of the Guangdong Environmental Monitoring Center [39]. Duplicate monitoring sites among these three networks were removed, leaving a total of 944 sites ( Figure 1). According to the Chinese National Ambient Air Quality Standard (CNAAQS, GB3095-2012) released in 2012, the ground-based PM 2.5 data are measured using the tapered element oscillating microbalance (TEOM) technique or the beta-attenuation method. For each monitor, we averaged PM 2.5 by day and excluded the dates with less than 19 hourly measurements. Figure 1 presents locations of the monitors. we combined AOD-derived PM2.5 and calibrated-CMAQ PM2.5 using a maximum likelihood method. In Step 3, we incorporated the spatiotemporal autocorrelation of the monitoring data in the final estimator through interpolating the residuals in Step 2. We illustrated the three-stage model by a practice to develop daily maps of PM2.5 on a regular grid of 0.1° × 0.1 ° over China. The performance of the statistical models was assessed using cross-validation (CV) methods by steps. . Duplicate monitoring sites among these three networks were removed, leaving a total of 944 sites ( Figure 1). According to the Chinese National Ambient Air Quality Standard (CNAAQS, GB3095-2012) released in 2012, the ground-based PM2.5 data are measured using the tapered element oscillating microbalance (TEOM) technique or the beta-attenuation method. For each monitor, we averaged PM2.5 by day and excluded the dates with less than 19 hourly measurements. Figure 1 presents locations of the monitors.

Satellite Remote Sensing of AOD
Satellite-derived AOD data during 2014 were obtained from moderate resolution imaging spectroradiometer (MODIS), equipped on two earth observing system satellites, Terra and Aqua, which are operated by US National Aeronautics and Space Administration (NASA). Terra (Aqua) scans the earth at 10:30 a.m. (1:30 p.m.) with a global coverage of 1-2 days. They retrieved AOD from visible and near-infrared electromagnetic signals at nadir. Level 2 MODIS AOD products (collection 6) with a spatial resolution of 10 km × 10 km were collected from Atmosphere Archive and

Satellite Remote Sensing of AOD
Satellite-derived AOD data during 2014 were obtained from moderate resolution imaging spectroradiometer (MODIS), equipped on two earth observing system satellites, Terra and Aqua, which are operated by US National Aeronautics and Space Administration (NASA). Terra (Aqua) scans the earth at 10:30 a.m. (1:30 p.m.) with a global coverage of 1-2 days. They retrieved AOD from visible and near-infrared electromagnetic signals at nadir. Level 2 MODIS AOD products (collection 6) with a spatial resolution of 10 km × 10 km were collected from Atmosphere Archive and Distribution System [40]. L2 swath data were resampled into a fixed grid, which covers the whole investigated regions with a spatial resolution of 0.1 • × 0.1 • via area-weighted averaging. The AOD_550_Dark_Target_Deep_Blue_Combined dataset with QA Flag equal to 2 or 3 were utilized in this study. According to Ma et al. [41], we combined Terra/Aqua MODIS AOD measurements together to increase the spatial coverage of AOD measurements. Satellite normalized difference vegetation index (NDVI) and fire spots (FS) were obtained from combined MODIS products. We aggregated monthly products of NDVI with a spatial resolution of 1 km × 1 km into seasonal averages over the regular grid of 0.1 • × 0.1 • . Daily counts of FS within a 75 km buffer around each centroid of the grid were calculated based on the location and time of fires, collected from MODIS burned area products. Integrated column concentrations of NO 2 were obtained from Ozone Monitoring Instrument (OMI), launched on Aura. Level 2 products of column NO 2 with a spatial resolution of 13 km × 24 km were prepared into seasonal means with a spatial resolution of 0.1 • × 0.1 • . The above data can be accessed from Land Process Distributed Active Archive Center [42], MODIS Active Fire and Burned Area Products [43] and Goddard Earth Sciences Data and Information Services Center [44].

WRF-CMAQ Simulation
In this study, the WRF model version v3.5.1 [45] and the CMAQ model version 5.1 were used to simulate the daily variations of PM 2.5 over China. The WRF model is driven by the National Centers for Environmental Prediction Final Analysis (NCEP-FNL) reanalysis data as initial and boundary conditions (ICs and BCs). Meteorological parameters simulated by WRF model were applied to drive CMAQ. Our CMAQ simulations utilized CB05 as the gas-phase mechanism, AERO6 as the aerosol module, and Regional Acid Deposition Model (RADM) as the aqueous-phase chemistry model. Boundary conditions for our CMAQ model were provided by dynamic GEOS-Chem simulation [46]. The anthropogenic emissions for Mainland China during 2014 are derived from the Multi-resolution Emission Inventory of China [47]. Detailed model configurations for WRF-CMAQ were presented in our previous study [48]. We simulated meteorological variables including ground wind speed (WS), planetary boundary layer height (PBL), ground ambient pressure (PS), and ground relative humidity (RH) by WRF and PM 2.5 by CMAQ with a spatial resolution of 36 km × 36 km, which were further downscaled to the 0.1 • × 0.1 • grid using an offline ordinary Kriging method [49]. The daily means of simulations were interpolated in spatial dimensions for each variable separately. The purpose of downscaling is to spatially match WRF-CMAQ simulations with the rest data. Validations for CMAQ-simulated PM 2.5 at both spatial resolutions (0.1 • and 36 km) were performed using monitoring data, which are presented in Figures S2 and S3 and briefly illustrated in the Discussion Section. After downscaling, CMAQ-simulated PM 2.5 covered 100% of spatiotemporal coordinates (99,351 pixels × 365 days), while the in situ observations or AOD measurements only covered 0.54% or 31.56% of spatiotemporal coordinates, respectively.

Statistical Analysis
The modeling framework of exposure assessment included three steps, which were presented in Figure 2. Briefly, we first developed two regression models (Steps 1.1 and 1.2) to associated AOD or CMAQ with in situ observations of PM 2.5 , separately; then, the estimates from the two models were combined based on a maximum likelihood (Step 2); and, finally, we incorporated spatiotemporal autocorrelations of the monitoring PM 2.5 (Step 3).  Based on the mature methodology developed by Ma et al. [41], we first derived PM 2.5 from satellite-retrieved AOD with the auxiliary variables, which were selected according to experimental findings (e.g., RH [50,51]) or empirical results on PM 2.5 -AOD associations (e.g., NO 2 [52] and FS [26]). Instead of using the linear mixed effect model (LME) [41], we developed an updated version, a linear mixed effect model with a spatial-varying coefficient (LME SC ) model: where In the LME SC , s, t and j denote spatial coordinates (longitude and latitude at the centroid of each pixel), daily and seasonal index, respectively; PM 2.5,st denotes in situ observations at spatial location s and date t; and µ, β 1 , · · · , β 8 denote fixed intercept and slopes for covariates including: (1) daily values of AOD, WS, PBL, PS, RH and FS; and (2) seasonal values of NDVI and NO 2 . β 2,j , . . . , β 8,j denote seasonally-specific random slopes for the other covariates than AOD. f (s) denotes a spatial-varying coefficient for AOD and is expanded by a given set of k-dimensional basis functions (e.g., local bisquare functions [53]) and daily-specific random slopes (b ,t ). In this study, for computing efficiency, we expanded f (s) by 2-D splines provided by R package mgcv [54]. ηs became known values depended on spatial coordinates (s), once the specific form of basis functions was determined. Thus, the inference of coefficients (b ·,t ) in f (s) was done in regression procedure, simultaneously with other parameters (e.g., βs) in Equation (1). If f (s) is simplified as a one-dimensional daily-specific random slope (β 1,t ), the LME SC will be reduced to a LME, which has been utilized in previous studies to generate AOD-derived PM 2.5 [41]. LME method has disadvantages in generating spatially smoothing predictors, especially near the provincial boundaries. Through introducing spatial-varying coefficients, LME SC fixed this problem and was evidenced to outperform LME by our cross-validation results. Detailed comparisons are presented in the Discussion Section. Spatial and temporal patterns for PM 2.5 -AOD associations (β 1 + f (s)) are presented in Figure S8. Fitted value and its standard deviation (SD) from Equation (1) are denoted by PM 2.5 AOD and SD AOD , respectively. We named PM 2.5 AOD as "AOD-derived PM 2.5 " in this study. Equation (1)  We calibrated CMAQ simulated PM 2.5 with the in situ observations by a similar LME SC model: where CMAQ st denotes downscaled CMAQ-simulated PM 2.5 with a spatial resolution of 0.
In Equation (2), we utilized original scale instead of log-scale of PM 2.5 in order to guarantee comparable error terms ( st ) to that in Equation (1), although logarithm transform was usually used to reduce the bias caused by violation of normality assumption of PM 2.5 in the regression analysis. Spatial and temporal variations of estimated coefficients of CMAQ-simulated PM 2.5 (β * 1 + f * (s)) are presented in Figure S9. Fitted value and its SD from Equation (2) For the places where the AOD is missing, the PM 2.5 ML is defined identically as PM 2.5 CMAQ .

Step 3: Spatiotemporal Kriging of the Residuals
Taking spatiotemporal autocorrelation of PM 2.5 into consideration, we interpolated the residuals (e st = PM 2.5,st − PM ML 2.5,st ) using spatiotemporal Kriging (S/T-Kriging) based on a product-sum covariance function [55]. Assuming a stationary multivariate normal distribution for the residuals (e st ), the variance-covariance matrix can be captured by a function (C) of the spatiotemporal coordinates, as shown in follows: where θ denotes the tuning parameters in the covariance function (C) and can be estimated using variogram approach. For a spatiotemporal point (s*, t*), where in situ observation of PM 2.5 does not exist, the residual can be interpolated asê s * t * = Cov(e s * t * , E)Σ −1 E. Therefore, the optimal estimates of PM 2.5 can be derived as PM For more details of S/T-Kriging, please refer to Chapter 6 in Cressie and Wikle [55].

Model Evaluation
Previous studies usually evaluated statistical performance of PM 2.5 estimators by the 10-fold cross validation (CV 10 ), which randomly divides the monitoring data into ten folds and iteratively leaves one fold out as the testing dataset to assess the predictions from a model trained by the remaining data. For independent data, the root of mean squared error (RMSE) has been considered an unbiased estimator for prediction accuracy [56]. However, for spatiotemporally auto-correlated PM 2.5 data, CV 10 may underestimate prediction errors [57]. To fairly evaluate the models, we designed isolated-site cross-validation (CV IS ), in which we held out about 10% of the monitoring sties and used all measurements from the testing sites to validate the modeling results based on the rest data. The testing sites were randomly selected with two constraints: (1) they should be separated from the training sites by more than 25 km; and (2) they should be universally spanned over the study domain, especially areas with dense population. In this study, we involved 91 testing sites with a minimum distance from the remained sites of 26.2 km, as shown in Figure 1. The testing set contained 27,800 samples out of 294,122 total daily values of monitoring measurements. We kept multiple testing values located within one grid at the same time point, because the discrepancy among those values represents the error caused by spatially aggregation, which should not be ignored in model evaluation. A comparison between CV 10 and CV IS was performed based on AOD-derived PM 2.5 from a LME model ( Figure S1), and more detailed rationale of CV IS is presented in the Discussion Section. We also evaluated the three-stage model using the same CV IS data step by step. The CV IS analysis of the intermediate estimators illustrated how the errors propagate in our data fusion model. Figure 3 presents the frequency distributions and summary statistics of in situ observations, CMAQ simulations and AOD-derived estimates of PM 2.5 . CMAQ simulated or AOD-derived PM 2.5 concentrations were extracted at the same spatiotemporal coordinates of monitoring data, in order to compare the three types of inputs in our model. During 2014, the overall mean of the monitoring PM 2.5 is 61.3 µg/m 3 , which is slightly higher that of CMAQ-simulated PM 2.5 (57.4 µg/m 3 ) but lower than that of AOD-derived PM 2.5 (66.4 µg/m 3 ), which suggests systematic bias in the latter two datasets. However, after excluding the observational PM 2.5 at the time points, when AOD is missing, the mean of monitoring data is increased to 66.6 µg/m 3 (Figure 3b), which is close to that of AOD-derived PM 2.5 . A Kolmogorov-Smirnov test indicated that monitoring data presented significantly different distributions depended on the missing status of AOD. According to our findings, in China AOD incompleteness occurred non-randomly and was influenced by the ambient concentrations of PM 2.5 , which leads to sampling errors in AOD-derived PM 2.5 . The systemic bias between frequency distribution of AOD-derived PM 2.5 and that of overall monitoring data was partially caused by the sampling errors of satellite-derived AOD.

Descriptive Statistics for Inputs of Data Fusion
Remote Sens. 2017, 9, x FOR PEER REVIEW 8 of 19 evaluation. A comparison between CV10 and CVIS was performed based on AOD-derived PM2.5 from a LME model ( Figure S1), and more detailed rationale of CVIS is presented in the Discussion Section. We also evaluated the three-stage model using the same CVIS data step by step. The CVIS analysis of the intermediate estimators illustrated how the errors propagate in our data fusion model. Figure 3 presents the frequency distributions and summary statistics of in situ observations, CMAQ simulations and AOD-derived estimates of PM2.5. CMAQ simulated or AOD-derived PM2.5 concentrations were extracted at the same spatiotemporal coordinates of monitoring data, in order to compare the three types of inputs in our model. During 2014, the overall mean of the monitoring PM2.5 is 61.3 μg/m 3 , which is slightly higher that of CMAQ-simulated PM2.5 (57.4 μg/m 3 ) but lower than that of AOD-derived PM2.5 (66.4 μg/m 3 ), which suggests systematic bias in the latter two datasets. However, after excluding the observational PM2.5 at the time points, when AOD is missing, the mean of monitoring data is increased to 66.6 μg/m 3 (Figure 3b), which is close to that of AODderived PM2.5. A Kolmogorov-Smirnov test indicated that monitoring data presented significantly different distributions depended on the missing status of AOD. According to our findings, in China AOD incompleteness occurred non-randomly and was influenced by the ambient concentrations of PM2.5, which leads to sampling errors in AOD-derived PM2.5. The systemic bias between frequency distribution of AOD-derived PM2.5 and that of overall monitoring data was partially caused by the sampling errors of satellite-derived AOD.   Optimal ) is in good agreement with the observational data (R 2 = 0.72). The root of mean squared error (RMSE) is 23.0 µg/m 3 , which accounts for 55% of the SD of observational PM 2.5 (defined as normalized root of mean squared error (NRMSE)) and 41% of the mean of observational PM 2.5 (defined as relative prediction error (RPE)). The mean bias is 4.9 µg/m 3 , which suggests that PM 2 (Figure 2c). Such weaknesses are partially overcome by data fusion (Figure 2c). The detailed CV IS scatterplots for the intermediate estimators are presented in the Figure 4b-d. We also explored temporal ( Figure S6) and spatial ( Figure S7) variations of CV IS results. To evaluate the temporal variation of CV IS errors, we calculated the statistics, including R 2 , RMSE and NRMSE by dates. The daily CV IS results reflected the final estimator's capacity to capture spatial variations of PM 2.5 . The CV IS RMSE for PM 2.5 Optimal is proportional to the observed value and thus was varied seasonally (higher in colder season, but lower in warmer season). However, we found significant trend neither in daily NRMSEs nor in daily R 2 s ( Figure S6), which indicates that the accuracy of PM 2.5 Optimal is temporally constant. Analogously, we also calculated CV IS statistics by sites to evaluate the final estimator's capacity to capture temporal variations of PM 2.5 . CV IS results by sites displayed significantly spatial patterns, which indicates that PM 2.5 Optimal is more accurate in eastern

Cross-Validation Results for the Estimates of the Three-Stage Model
China, but less in western China ( Figure S7). Partial reason is that the accuracy of PM 2.5 Optimal tends to increase with the density of training sites ( Figure 5), which are more clustered in eastern China, especially the urban areas (e.g., Yangtze River Delta or Pearl River Delta metropolitan region).

Exposure Assessments Based on the Fused Estimates
The fused estimator of PM2.5 (PM2.5 Optimal ) will support exposure assessments in future health-related studies. AOD or CMAQ based estimator of PM2.5 has been utilized to study long-term rather than short-term exposure to ambient pollution [18,58], because of data availability or data accuracy on daily scale. For example, we visualized spatiotemporal distributions of CMAQ simulations and AOD-derived PM2.5 with the corresponding monitoring data during an episode of haze around Beijing-Tianjin-Hebei region in Figure 8. According to the maps, AOD-based method overlooked some hotspots due to incompleteness and could not capture the whole polluting procedure; whereas CMAQ simulations underestimated the severity of haze due to systematic errors. Unlike them, the fused estimates accurately characterized the growth, expansion and elimination of the haze. Therefore, PM2.5 Optimal can serves as exposure estimates to study either acute or chronic effects of PM2.5. For example, combining PM2.5 Optimal with county-level data of China's sixth census, we assessed both annual and daily exposures to PM2.5 across China in 2014. Accordingly, population-weighted concentration of annual exposure to ambient PM2.5 was 55.7 μg/m 3 and 82% of total population inhabited in the places exceeding WHO Air Quality Interim Target-1 (35 μg/m 3 ), whereas population-weighted count of polluted and heavily-polluted days (defined as daily mean of PM2.5 > 75 μg/m 3 or 150 μg/m 3 by CNAAQS) was 81 and 14 days, respectively.

Exposure Assessments Based on the Fused Estimates
The fused estimator of PM 2.5 (PM 2.5 Optimal ) will support exposure assessments in future health-related studies. AOD or CMAQ based estimator of PM 2.5 has been utilized to study long-term rather than short-term exposure to ambient pollution [18,58], because of data availability or data accuracy on daily scale. For example, we visualized spatiotemporal distributions of CMAQ simulations and AOD-derived PM 2.5 with the corresponding monitoring data during an episode of haze around Beijing-Tianjin-Hebei region in Figure 8. According to the maps, AOD-based method overlooked some hotspots due to incompleteness and could not capture the whole polluting procedure; whereas CMAQ simulations underestimated the severity of haze due to systematic errors. Unlike them, the fused estimates accurately characterized the growth, expansion and elimination of the haze. Therefore, PM 2.5 Optimal can serves as exposure estimates to study either acute or chronic

Discussion
In this paper, we developed a three-stage model to estimate spatiotemporal variations of PM 2.5 through fusing CMAQ simulations, satellite remote sensing measurements and ground monitoring data together. We illustrated the method by a practice to generate daily PM 2.5 maps with a spatial resolution of 0.1 • × 0.1 • across China during 2014. The CV results evidenced that the fused estimator (PM 2.5 Optimal ) was in good agreement with the observational PM 2.5 , and outperformed the estimators based on either AOD or CMAQ data alone. AOD-based methods have been widely utilized to estimate PM 2.5 concentrations on regional [52], national [41] or global scale [29]. Among them, LME or its extension has been more widely used because of computing efficiency. For example, in China, Ma et al. [41] developed high-quality estimates of PM 2.5 covering a long-period from 2004 to 2013, through joining a LME and a generalized additive model (GAM) to retrieve PM 2.5 from MODIS AOD at 0.1 • resolution; and tested their estimates by a CV 10 of monitoring data in 2013 (RMSE = 27.99 µg/m 3 , R 2 = 0.78 and RPE = 36.3% for the first-stage estimator from LME; RMSE = 27.42 µg/m 3 , R 2 = 0.79 and RPE = 35.6% for the final estimator from LME + GAM). In this paper, we fitted a similar LME model through reducing the spatial-varying coefficient ( f (s)) in Equation (1) to a one-dimensional random slope; and evaluated it by CV 10 (RMSE = 23.3 µg/m 3 , R 2 = 0.69, RPE = 36.7%), which suggests that the LME method performs equally well on both our datasets and Ma and coworkers' [41]. However, LME has one disadvantage in modeling AOD in large scale. To incorporate geographical variations of the fitted parameters, LME were usually fitted separately by sub-regions (e.g., provinces), which resulted in spatially non-smoothing predictions near the boundaries of two sub-regions. Ma et al. [41] addressed this issue by creating buffer zones around each province and averaging the overlapped predictions from neighboring provinces. The buffer-zone-averaging method introduced a side effect that the uncertainty (standard errors) of predictions averaged from two different LME models could not be quantified directly. Our LME SC approach incorporated spatial variations of the modeling parameters by a nonlinear regression coefficient ( f (s)), rather than fitting separate models, so that it produced more spatially smoothed estimates than LME. Our CV IS analysis also confirmed that LME SC slightly outperformed LME in developing AOD-derived PM 2.5 (RMSE = 26.2 µg/m 3 and R 2 = 0.62 for LME SC ; RMSE = 26.8 µg/m 3 and R 2 = 0.61 for LME).
Another weakness of AOD-based estimators was caused by non-random incompleteness in satellite measurements. In other words, AOD-derived values are more likely to be absent, when estimating PM 2.5 concentrations within a specific range. At the testing sites of CV IS , AOD-derived PM 2.5 approximately covered 32%, 43%, 44% or 36% of unpolluted (PM 2.5 < 75 µg/m 3 ), lightly-polluted (75 µg/m 3 ≤ PM 2.5 < 115 µg/m 3 ), moderately-polluted (115 µg/m 3 ≤ PM 2.5 < 150 µg/m 3 ) or heavily-polluted (PM 2.5 ≥ 150 µg/m 3 ) days, respectively (as shown in Figure S5). In China, rainfalls AOD data tend to be missing during rainfalls, when particle concentrations are usually lower due to wet deposition. Such effect partially explains the lower sampling rate of AOD-derived PM 2.5 at unpolluted days ( Figure S5). Conversely, hazed episodes, especially in northern China, may be falsely classified as clouds by satellite and be neglected in current AOD algorithm, so that the sampling rate of AOD was also lower at heavily or severely polluted days. Long-termed averages of AOD-derived PM 2.5 can be biased from the truth due to the unevenly missing rates at different concentrations, which is known as sampling bias. Because the extreme values are less captured in their estimates, AOD-based methods may over-smooth the variability of PM 2.5 . Previous studies showed sampling bias of AOD may lead to ± 20% error in chronic exposure assessment of PM 2.5 [58]. Combining AOD-derived PM 2.5 with a spatiotemporally complete estimator, such as CMAQ simulations, can reduce the bias. Our step-specific CV IS results showed that comparing model performance before and after fusing with PM 2. Air quality modeling results have been utilized in risk assessment of ambient pollutants [32] but rarely in epidemiological studies because of their low accuracy and potential bias. Data assimilation methods have been applied to improve predictability of air quality models. In China, Tang et al. [59] first developed an EnKF to combine numerical outputs from the Nested Air Quality Prediction Modeling System (NAQPMS) [60] and in situ observations of Ozone; then Liu et al. [18] applied a similar method to estimate daily PM 2.5 across China during 2013 and reported a RMSE of 30.2 µg/m 3 by a five-fold CV, which is as accurate as our intermediate estimator, PM 2.5 CMAQ (CV IS RSME = 30.1 µg/m 3 ). In this study, we improved raw CMAQ estimates (CV IS RSME = 33.4 µg/m 3 , shown in Figure S3) is three aspects: (1) downscaling spatial resolution of CMAQ simulations to 0.1 • (CV IS RSME = 33.0 µg/m 3 , shown in Figure S2); (2) calibrating them with in situ observations (CV IS CV IS RSME = 30.1 µg/m 3 for PM 2.5 CMAQ , shown in Figure 4c); and (3)  Additionally, we also found that CV IS errors of PM 2.5 Optimal tended to be lower at the testing sites, which were surrounded by more training sites ( Figure 5). Similar findings have been reported in previous studies, which introduced spatial or spatiotemporal autocorrelations into PM 2.5 modeling [61]. Spatiotemporally autocorrelation benefits prediction of PM 2.5 , especially at the unmeasured locations but makes troubles for model evaluation. In CVs of auto-correlated variables, randomly selected testing data (e.g., CV 10 ) may not be independent of the training data, so that the predicting accuracy can be overestimated [57]. Through choosing the isolated monitoring sites in CV IS approach, we attempted to use the testing records, which were less correlated with training data. We compared performance of CV IS to that of CV 10 in evaluating the AOD-derived PM 2.5 from the LME model, as shown in Figure S1. In comparison, we used a subset of CV 10 to make sure that the two CVs were conducted on the same testing records. We found that CV 10 error of the LME was consistent with the previous studies [41], but considerably lower than CV IS error (CV 10 RMSE = 23.3 µg/m 3 vs. CV IS RMSE = 26.8 µg/m 3 ). The results suggested that CV 10 might overestimate the predicting accuracy. Lv et al. [57] addressed this issue through leaving out records from all monitors within a city simultaneously in CV, which is analogous to our approach, considering that monitors are usually clustered within cities but separated between different cities. Even though the models were evaluated by CV IS in this paper, the influence of spatiotemporal autocorrelations on CVs cannot be avoided completely. In other words, the true predicting error of the three-stage model may still be underestimated in this paper.
The uncertainty of our study sources from three aspects. First, during our study period, the routine monitoring networks for ambient particles were too sparsely distributed to characterize some polluted sub-urban areas, such as undeveloped cities in the provinces of Henan and Shannxi. Second, satellite-derived AOD measurements played a key role to control bias in our approach but were only available at approximately one third of the predicting points. Increasing the spatiotemporal coverage of AOD (e.g., combining AOD from multiple satellites) will be considered in our future studies to reduce modeling uncertainty. Finally, CMAQ-WRF simulating procedures and inputted emission inventories may also contribute to the uncertainty of the three-stage model.

Conclusions
We developed a three-stage statistical model to estimate PM 2.5 concentrations through fusing in situ observations, satellite-derived AOD measurements and CMAQ simulations. We applied the method to produce daily maps of PM 2.5 over China at a spatial resolution of 0.1 • . The final estimator of the three-stage model is shown to highly correlated with daily monitoring data (CV IS R 2 = 0.72) and to outperform CMAQ-simulated PM 2.5 (CV IS R 2 = 0.51) or AOD-derived PM 2.5 (CV IS R 2 = 0.62). Our estimates will support future health-related studies on either acute or chronic exposure to ambient PM 2.5 .