Application of Dynamically Constrained Interpolation Methodology in Simulating National-Scale Spatial Distribution of PM 2 . 5 Concentrations in China

Numerous studies have revealed that the sparse spatiotemporal distributions of ground-level PM2.5 measurements affect the accuracy of PM2.5 simulation, especially in large geographical regions. However, the high precision and stability of ground-level PM2.5 measurements make their role irreplaceable in PM2.5 simulations. This article applies a dynamically constrained interpolation methodology (DCIM) to evaluate sparse PM2.5 measurements captured at scattered monitoring sites for national-scale PM2.5 simulations and spatial distributions. The DCIM takes a PM2.5 transport model as a dynamic constraint and provides the characteristics of the spatiotemporal variations of key model parameters using the adjoint method to improve the accuracy of PM2.5 simulations. From the perspective of interpolation accuracy and effect, kriging interpolation and orthogonal polynomial fitting using Chebyshev basis functions (COPF), which have been proved to have high PM2.5 simulation accuracy, were adopted to make a comparative assessment of DCIM performance and accuracy. Results of the cross validation confirm the feasibility of the DCIM. A comparison between the final interpolated values and observations show that the DCIM is better for national-scale simulations than kriging or COPF. Furthermore, the DCIM presents smoother spatially interpolated distributions of the PM2.5 simulations with smaller simulation errors than the other two methods. Admittedly, the sparse PM2.5 measurements in a highly polluted region have a certain degree of influence on the interpolated distribution accuracy and rationality. To some extent, adding the right amount of observations can improve the effectiveness of the DCIM around existing monitoring sites. Compared with the kriging interpolation and COPF, the results show that the DCIM used in this study would be more helpful for providing reasonable information for monitoring PM2.5 pollution in China.


Introduction
Air pollution, especially PM 2.5 (particles with aerodynamic diameters less than 2.5 µm), has escalated to a serious level in China due to the combination of rapid industrialization and high population density [1]. High concentrations of PM 2.5 have been identified as having the greatest impact on air quality and visibility [2]. Owing to recent efforts, air pollution levels in China has been greatly reduced by effectively controlling energy consumption, to be an effective tool for improving the utilization of spatiotemporally sparse observations [21]. As opposed to the dynamically unconstrained interpolation methods, the DCIM takes numerical models as dynamic constrains, which allows dynamic information from a given dataset to be fully demonstrated and its estimates to be statistically consistent [22]. Thus, the DCIM is no longer a purely mechanical interpolation method that relies only on dense monitoring networks without considering surrounding environmental characteristics. Meanwhile, prior studies have suggested that the DCIM is more efficient for acquiring simulations with a satisfying level of accuracy at unsampled location-time pairs for spatially and temporally observed data [21,23,24]. Therefore, it would be more reasonable to employ the DCIM to reconstruct the national-scale distribution of PM 2.5 observations. In this paper, the DCIM takes a PM 2.5 transport model as dynamic constraints and provides the characteristics of the spatiotemporally varying key model parameters using the adjoint method to improve the accuracy of PM 2.5 simulations. The PM 2.5 transport adjoint model has been proven to be able to combine observations and models to form an optimal estimate of the PM 2.5 sources. Meanwhile, observations can be used to constrain estimates of model parameters that are both influential and uncertain in the DCIM [25][26][27][28][29]. The application effect of the DCIM will be discussed in this paper, and its results will be compared with those of kriging interpolation and OPF method.
In view of the above mentioned, the object of this article is to use the DCIM to simulate PM 2.5 concentrations at national scale together with ground-level observations. The remainder of this paper is organized as follows. In Section 2, materials and methods in this article are described in detail. Implementation and verification of the DCIM are shown in Section 3. Discussions and conclusions are summarized in Section 4.

Study Region and Data
A simulation of ground-level PM 2.5 concentrations was carried out over a grid area of China with a spatial extent between 15 • and 55 • north latitude and 70 • and 140 • east longitude ( Figure 1). The resolution of grid cells was determined to be 0.5 • × 0.5 • , which was determined for the convenience of analysis and discussion of results. In order to provide additional insight into the spatial aspects of air pollution across the country, 80 major cities were picked out according to economic development, climate, industrial activity and topography. The majority of the sample cities were distributed in the three most developed area: the Beijing-Tianjin-Hebei Region, the Yangtze River Delta, and the Pearl River Delta. The locations of these monitoring sites are plotted in Figure 1.

Dynamically Constrained Interpolation Methodology
Previous studies have shown that the DCIM combines dynamical constraints from Full names of these cities can be found in [25]. The PM 2.5 concentration observations in this paper were collected from the China National Environmental Monitoring Center, from January 2014 to November 2014. The experiments were carried out over two time periods according to climate and topography. The first period was from 5 to 11 November 2014. This noteworthy period coincided with the 21th APEC summit in Beijing. A total of 6385 effective PM 2.5 observations were obtained during the winter period. The second period was from 14 to 20 May 2014. A total of 6298 effective PM 2.5 observations were obtained during the spring period.

Dynamically Constrained Interpolation Methodology
Previous studies have shown that the DCIM combines dynamical constraints from the numerical model of dynamical processes with statistical information from observations, which can produce dynamically and statistically consistent simulation of the entire study area [21,22]. The observations are interpolated by the dynamic numerical model in the DCIM. Moreover, the interpolated results are optimized iteratively by adjusting the key model parameters through the adjoint method [21,24].

The Dynamic Model
Generally speaking, PM 2.5 varied due to interactions among many processes including emissions (anthropogenic emission and natural dust production), transport (as well as convection-influenced dispersion and dilution), photochemical transformation (new particle speciation and production of secondary PM 2.5 ), and deposition (dry and wet), with meteorology playing an overarching role [24]. Because of the complexity of the sources and sinks, the general PM 2.5 adjoint model included a rich description of the photochemical oxidant cycle and a chemical mechanism to simulate the formation of secondary PM 2.5 . The adjoint method is useful for assimilating the observations to obtain reasonable model parameters and is used widely in meteorology [24][25][26][27][28][29].
However, the adjoint model was significantly difficult to build because of complex chemical mechanisms. To reduce the complexity of the model and achieve the purpose of simulation we took the primary PM 2.5 and secondary PM 2.5 as a whole, calling it the "source and sink term" (SAST) without considering the specifics. Because of the type of observations and the uncertainty of vertical variation, we established a two-dimensional (2D) PM 2.5 CTM in rectangular coordinates as follows: Here, C represents the PM 2.5 concentration; u and v are the horizontal wind velocity in the x and y coordinates, respectively; A H is the horizontal viscosity coefficient; and S is the SAST. The model has initial conditions C 0 and is subject to constant boundary conditions at the inflow boundary Γ IN and to no gradient boundary conditions at the outflow boundary Γ OUT . The detailed numerical scheme for this 2D PM 2.5 CTM is the same as that in [18].

Parameter Optimization by the Adjoint Method
In order to construct the adjoint model, the cost function is defined as Based on the theory of the Lagrangian multiplier method, the Lagrangian function is defined as (6) Here Σ is the set of the observations; C and C obs are the simulated and observed PM 2.5 concentrations, respectively; p denotes the adjoint variable of C; and K is the weighting matrix and theoretically should be the inverse of the observation error covariance matrix. Assuming the errors of the data are uncorrelated and equally weighted, K can be simplified, and in this study K is 1 if observations are available and 0 otherwise.
To let the cost function reach the minimum, we make the first-order derivates of Lagrangian function with respect to all the variables and parameters to be zero: where k is the key model parameters, which includes initial condition, A H and S. From Equation (8), the adjoint model for the 2D PM 2.5 CTM, which governs the evolution of the adjoint variable p, can be obtained. From Equation (9), the gradients of the cost function with respect to the key model parameters A H and S can be deduced. The detailed description of the adjoint model and gradients can be found in [18]. Based on the gradients obtained from Equation (9), the key model parameters can be optimized using the steepest descent method.

Default Settings of the Dynamical Model
The resolution of grid cells was determined to be 0.5 • × 0.5 • . There were 81 grid cells in the x direction and 141 cells in the y direction. The integral time step was 600 s. The first studied period coincided exactly with the 21th Asia-Pacifc Economic Cooperation (APEC) summit taking place in Beijing, and this studied period was divided into two parts: the initial simulation period and target simulation period. The initial simulation time was 72 h and from 5 to 7 November 2014. The target simulation time was 96 h and from 8 to The initial meteorological fields adopted in our experiments were from NCEP Climate Forecast System (CFS-R/CFSv2) with 0.5 • × 0.5 • spatial resolution and 1 h temporal resolution. The dynamic model was subject to constant boundary conditions at the inflow boundary and to no gradient boundary conditions at the outflow boundary. According to the new ambient air quality standards announced by China's Ministry of Environmental Protection (MEP) in 2013, inflow boundary values were fixed as 35.0 µg/m 3 . The horizontal diffusion coefficient A H was fixed as 100.0 m 2 /s. In view of the importance of initial conditions to the simulated results of a dynamic model [21,[24][25][26], the initial conditions were obtained by interpolating the observed PM 2.5 using surface spline interpolation [21].

The Process of DCIM
Previous studies have indicated that the DCIM was a feasible and effective method for interpolating the sparse observations for both space and time, which can improve the utilization of these observations [21,[24][25][26][27][28][29]. Based on the processes to interpolate the sparse observations using DCIM in [21,24], the processes to simulate the PM 2.5 concentrations at national scale using DCIM in this paper were as follows: Step 1. Assign the initial guess values to all the model parameters in the dynamic model (i.e., 2D PM 2.5 CTM in this study). Divide observation period into two parts: the initial simulation period and target simulation period.
Step 2. Run the dynamic model in the initial simulation period and acquire the interpolated results.
Step 3. Calculate the cost function and then integrate the adjoint model backward in time to get the adjoint variables.
Step 4. Compute the gradients of the cost function with respect to the key model parameters (spatially varying initial conditions and S in this study).
Step 5. Adjust the key model parameters utilizing the steepest descent method, and update the values of the key model parameters in the dynamic model.
Step 6. Check whether the cost function has satisfied the requirements of minimization. If the minimization requirement is satisfied, then terminate the iteration and go to Step 7; otherwise, return to Step 2.
Step 7. Set the interpolated PM 2.5 concentrations at the last time-step obtained using the DCIM as the initial condition. Run the dynamic model with the optimized key model parameters to simulate the PM 2.5 concentrations over the following two hours.
Step 8. The initial time of the initial simulation period is longer by two hours. Take the interpolated results at this initial time as initial condition and assign the initial guess values to all the model parameters. Repeat Step 2 through Step 7 to simulate the PM 2.5 concentrations over the next two hours. Repeat Step 2 through Step 8, until all the PM 2.5 concentrations in the target simulation period are simulated.
In this study, the sparsely observed PM 2.5 concentrations were interpolated for space and time to perform the simulations following the aforementioned processes. We focus on the simulation results in the target period. When the DCIM was implemented, the key model parameters, including spatially varying initial conditions and SAST, were adjusted synchronously while the other model parameters remained constant.

The OPF Method Based on Chebyshev Basis Functions
The OPF method based on Chebyshev basis functions was an effective method for reconstructing the PM 2.5 fields accurately in the central and southern regions of China [20]. In our study, the OPF method based on Chebyshev basis functions (COPF) followed the definitions shown in [20]. Let x i be the points on x axis (i = 1, 2 · · · , I). The Chebyshev polynomials in the x direction are where k is the order of the polynomials in the Chebyshev polynomials from order zero to k. P is the coefficient of Chebyshev polynomials and defined as where P k,l is the l-th coefficient of the k-order polynomial. The Chebyshev polynomials in the y direction also follow the above definition.
x is a rectangular coordinate converted from longitudes based on the stereographic map projection; y is rectangular coordinate converted from latitudes based on the stereographic map projection, z is the PM 2.5 observation; and I is the total number of the data. The PM 2.5 observations Z(x i , y i ) can be stated as where i = 1, 2 · · · , I, k and s are the orders of polynomials for the x and y directions, respectively. K 0 and S 0 are the corresponding cutoff orders. A k,s are expansion coefficients. Φ k (x i ) is the k-order Chebyshev orthogonal polynomial in the x direction. and ς s (y i ) is the s-order Chebyshev orthogonal polynomial in the y direction. Based on the least-square method, the expansion coefficients can be solved as

Verification and Evaluation of DCIM
In this part, the 8-fold cross-validation technique was applied to evaluate the simulation power of the DCIM [16,24]. The 80 sampled cities were randomly divided into 8 folds. Seven folds were selected as the training set and labeled as interpolated samples, the datasets of which were interpolated to obtain the corresponding simulations. The other fold was employed as the validation set, the datasets of which were not used for interpolation but only for verification and labeled as checked samples. This process was repeated 8 times until every fold was tested.
Evaluation indices, including the mean absolute gross estimation error (MAGE), the mean normalized gross estimation error (MNGE), the correlation coefficient (R), and the index of agreement (IA), were adopted to evaluate the performance of the DCIM. The correlation coefficient indicates the correlation between simulations and observations. IA is a dimensionless indicator of accuracy between zero and one, which describes how well simulations and observations agree. The closer the values approach one, the more accurate the simulation. Their specific definitions, shown as follows, were calculated by the value of simulation and observations: Atmosphere 2021, 12, 272 8 of 19 where N was the number of observations; P and O were the simulated PM 2.5 concentrations and observations; and P and O were mean simulations and mean observations, respectively. For the DCIM method, we tried the evaluation procedure 8 times to check potential bias from using the result of a single time. Therefore a total of eight groups of experiments, named as PE_1-PE_8, were implemented. The COPF method and kriging interpolation were used for comparison. Reference to the work in [20], the semivariogram model was chosen as a spherical semivariogram model for the kriging interpolation in our study. The range and sill were respectively fixed to 10 • and 1 • . Due to the limited numbers of available observations, the range of polynomial orders in the x and y directions were limited to 8 in the COPF method. PM 2.5 distributions were calculated 81 times (polynomial orders of x and y directions each increased from 0 to 8) in the cross-validation. The optimal orders were found when the order combinations reached the minimal average MAGEs and IA of the verification cities.
The 8-fold cross-validation experiments were implemented in the first target discussion period from November 8 to 11,2014). For the COPF, all the combinations of k and s (polynomial orders of x and y directions) were tested first. The MAGEs and IA arrived at the minimum when s and k were set to 2 and 4, respectively. In this way, the optimal degree of COPF was found, and we denoted it by COPF24. The DCIM simulations were compared with the simulations obtained from kriging and COPF24.
The results of the 8-fold DCIM cross-validation experiments on November 8-11, 2014, were displayed in Tables 1 and 2 (Estimation errors for the 8-fold cross-validation DCIM experiments). For the interpolated samples, MAGEs between the observations and the corresponding 8-fold cross-validation DCIM simulations dropped at least 60.89%, which was reduced from 45.33 µg/m 3 to 16.01 µg/m 3 . The corresponding MNGEs decreased by 60.89%, which occurred in PE_8 (see Table 1). Furthermore, values of R (from 0.93 to 0.96) and IA (from 0.96 to 0.98) showed significant accuracy and correlation between interpolated observations and the corresponding 8-fold cross-validation DCIM simulations. For the checked samples, the largest drop in the MAGEs and MNGEs between the observations and the final 8-fold cross-validation DCIM simulations were 49.19% and 17.69%, respectively. Compared with the results of the interpolated samples, the discrepancies between the simulations and the observations were slightly large, but all much less than those at the initial iteration step. Meanwhile, the values of R (from 0.69 to 0.78) and IA (from 0.79 to 0.88) indicated a reasonable range of the national-scale DCIM simulations.  The comparison results of different methods were illustrated in Figure 2 (Interpolation errors in checked samples of different methods for the 8-fold cross-validation experiments) and Figure 3 (Scatter plots and performance of different methods for the 8-fold crossvalidation experiments). As shown in Figure 2, the MAGEs and IA of the DCIM were lower than those of kriging interpolation and COPF24 in most of the cross-validation processes except for PE_7 and PE_8. The interpolation errors of kriging interpolation were larger than those of the COPF24 in most of the cross-validation processes except for PE_8. As illustrated in Figure 3, the averaged MAGEs of the DCIM, kriging and COPF24 were 21

Performance of the DCIM
In the process of data analysis, it had been found that the checked samples showed a larger simulation error than the interpolated samples. That was to say, in DCIM, interpolated all the observations may improve the simulation accuracy. In order to further validate the simulation results, all the observations were interpolated in DCIM to simulate concentrations of PM 2.5 with the strategy stated in Section 2.2.3. Simulation errors analysis of the target simulation period was adopted to further investigate DCIM performance.
For the first target period (From 8 to 11 November), the MAGEs between the observations and the DCIM simulations were decreased from 41.48 µg/m 3 to 12.54 µg/m 3 , which was a decline of 69.77%. The corresponding MNGEs were decreased from 62.35% to 25.31%, which was down 59.41%. Furthermore, the values of R (0.91) and IA (0.95) indicate that DCIM is a feasible simulation method and high in accuracy. For the DCIM performance evaluation, regression statistics along with the normalized mean bias (NMB), and another measure of error, normalized mean error (NME) [26], were further calculated. The mean value of NMB and NME were 0.26% and 21.04%, respectively. Furthermore, the mean value of all the observations was 68.73 µg/m 3 , and the mean value of the simulated results was 68.34 µg/m 3 . Figure 6a   To further verify the simulation ability, the DCIM was implemented in the sprin from 14 to 20 May 2014. The initial simulation period was from 14 to 16 May. And t target simulation period was from 17 to 20 May. All the observations over the init simulation period were interpolated to simulate PM2.5 concentrations from 17 to 20 Ma according to the strategy stated in Section 2.2.3. Simulation errors analysis of the targ simulation period was adopted to further investigate DCIM performance. For the pe On the other hand, it should be noted that the national-scale simulations of PM 2.5 in China were almost always underestimated in high PM 2.5 concentration range. This issue will be further discussed in Section 4. However, the correlation coefficient (0.94) and IA (0.96) between the observations and simulated results indicated the DCIM showed a reasonable scope of underestimation. Meanwhile, Figure 6b had shown the usefulness of the DCIM in time series analysis and simulation. The values of mean, NMB, NME and correlation coefficients were calculated (spatial averages) and plotted as a two-hour time series (Figure 6b).
As can be seen, the mean values of observations and simulated results were almost equal and had the same time-varying trend. The NMB values ranged from-17.99% to 20.21%, the NME values were approximate to 19.50%, and correlation coefficients ranged from 0.89 to 0.97, which proved that the DCIM was stable with time. Figure 6c,d showed the spatial distributions of the NMB and NME between the observations and the simulation results. The DCIM simulated PM 2.5 well in most areas (85.64%), where the NMB value was less than 0.20 and the NME value was also less than 0.20.
To further verify the simulation ability, the DCIM was implemented in the spring, from 14 to 20 May 2014. The initial simulation period was from 14 to 16 May. And the target simulation period was from 17 to 20 May. All the observations over the initial simulation period were interpolated to simulate PM 2.5 concentrations from 17 to 20 May, according to the strategy stated in Section 2.2.3. Simulation errors analysis of the target simulation period was adopted to further investigate DCIM performance. For the performance evaluation, regression statistics along with two measure of bias (the mean bias (MB) and the normalized mean bias (NMB)), and two measures of error (the root mean square error (RMSE) and normalized mean error (NME)), were calculated. The mean value of MB and RMSE were −0.07 µg/m 3 and 18.73 µg/m 3 respectively, and those for NMB and NME were −0.12% and 20.94%, respectively. Furthermore, the correlation coefficient between the observations and simulated results was 0.89. Figure 7a illustrated the scatterplot to compare the observations and the DCIM simulations visually. The 1.25:1, 1:1 and 0.75:1 lines were shown for reference. For 75.63% of the observations, the ratio was between 0.75 and 1.25. In order to investigate the performance of DCIM over time, the values of mean, NMB, NME and correlation coefficients were calculated (spatial averages) and plotted as a time series (Figure 7b). As can be seen, the mean values of observations and the simulated results were almost equal and have the same time-varying trend. The NMB values ranged from −14.37% to 16.90%. The NME values ranged from 14.82% to 24.56%, which proved that the DCIM was steady during the time. Spatially, the DCIM estimated observed PM 2.5 well in most areas (79.35%) where the NMB value was less than 0.2 and the NME value was also less than 0.2 (Figure 7c,d). To sum up, the DCIM provided effective dynamic information compared with kriging and COPF. First, the time series analysis of the simulation values using mean, NMB, NME and correlation coefficients showed that the DCIM can achieve a more accurate real-time simulation, which the dynamically unconstrained interpolation techniques were unable to achieve. Second, the DCIM can also give the characteristics of the spatiotemporal variation parameters during the target simulation periods, which contributed to the existing relevant research in China to a certain extent.

Mapping of the Mean PM 2.5 Simulations
The mean observations of the 80 sample cities and mean distributions of PM 2.5 on 8-11 November 2014 were mapped in Figure 8. The mean distributions of DCIM, kriging and COPF showed similar distribution with the mean observations map. The DCIM results presented a smoother pattern with a smaller MAGE than the results from the other methods did. As can be seen from the DCIM results, the levels of PM 2.5 simulations were higher in the northern region than in the southern. Shandong and Jiangsu provinces presented higher PM 2.5 concentrations, which was in conformity with industrial development. Guangdong and Fujian provinces presented lower PM 2.5 concentrations due to the low levels of anthropogenic emissions and favorable meteorological conditions for atmospheric dispersion [10]. Meanwhile, heavily polluted regions were identified in North and Northeast China, PM 2.5 concentrations of which rose to 120-160 µg/m 3 during the target simulation period. Compared with the MODIS-derived seasonal mean 1 km-resolution PM 2.5 maps averaged over the winter period 2000-2018 as shown in [30], the DCIM results showed similar spatial distribution but presented a higher level of PM 2.5 concentrations in Northeast China. One major reason for the higher level might be the serious pollution from coal combustion in winter, as reported in a previous study [26]. Adverse weather conditions (i.e., long, dry winter that require indoor heating) further intensified the haze epidemic in northern cities. Another reason might be the atmospheric conditions of aerosols, caused by stagnant weather, with a weak wind and planetary boundary layer [31]. It was noteworthy that air quality in Beijing and Hebei (60-80 µg/m 3 ) was much improved, which benefited from the introduction of an emergency emissions-reduction strategy during the Asia-Pacific Economic Cooperation (APEC) summit in November.
The mean observations of the 80 sample cities and mean distributions of PM 2.5 during 17 May to 20 May 2014 were mapped in Figure 9. In detail, the DCIM, kriging interpolation and COPF obtained similar national-scale mean distributions with the mean observations map during 17 May to 20 May 2014. With smaller errors, the DCIM presented smoother pattern. Figure 9 illustrated that the mean values of PM 2.5 concentrations were much larger in the middle China, where the emission of PM 2.5 concentrations was enormous. And the mean values were smaller in southeast China, where the precipitation was rich. Compared with the research results in the same period in [18], similar spatial distribution of PM 2.5 concentrations indicated the rationality of the DCIM.

Discussions and Conclusions
This study introduced the PM2.5 transport model based on the adjoint data assimilation to interpolate the sparse observations for simulating the hourly average PM2.5 concentrations two hours in advance and national-scale spatial distribution of PM2.5 concentrations in China. From the perspective of interpolation accuracy and effect, this

Discussions and Conclusions
This study introduced the PM 2.5 transport model based on the adjoint data assimilation to interpolate the sparse observations for simulating the hourly average PM 2.5 concentrations two hours in advance and national-scale spatial distribution of PM 2.5 concentrations in China. From the perspective of interpolation accuracy and effect, this study selected kriging interpolation and COPF, which had been proved to have high PM 2.5 simulation accuracy for comparative analysis.
Comparison between final interpolated values and observations showed that the DCIM was better for national-scale simulation than kriging and COPF during the discussion periods. More importantly, the spatiotemporal variation of the simulated results was physically reasonable (see Figures 6 and 7). Our DCIM results showed a good correlation coefficient and index of agreement, which indicated the PM 2.5 simulations of the analysis period were highly spatially correlated. The mean distributions of DCIM showed similar distribution with that of kriging and COPF. And the DCIM presented smoother reasonable pattern with smaller simulation errors than the other methods. Though seasonal high levels of PM 2.5 concentrations during the analysis period were often recognized to bring significant biases, the final DCIM results in this study presented a relatively high simulation accuracy, which indicated its conceivable application for providing insights into policies to mitigate air pollution in China.
Previous studies had shown that mapping the distribution of PM 2.5 concentration in China faced many challenges because of the wide geographical range, complex terrain and sparse observations from unevenly distributed systematic ground-level monitoring stations [12]. For that reason, the DCIM was adopted to improve utilization of the sparsely observed PM 2.5 concentrations for both space and time [21]. Due to the rigorous mathematical basis and mature development, the transport model was chosen as the dynamical model among numerous models. This article sought to investigate the DCIM performance in the spatiotemporal simulation patterns of PM 2.5 concentrations over China utilizing ground-level observations.
In addition, the PM 2.5 transport model, which provided the dynamic constrains in the DCIM, had been proved to be a good solution in simulating national-scale groundlevel PM 2.5 in [25,26]. This adjoint model was easy to implement and had high practical operational performance, which effectively made up for the simulated errors caused by sparse observations. Moreover, a benefit from the adjoint data assimilation, the major parameter estimation can be obtained from the iterative optimization. The ability to provide dynamic information from the final DCIM results cannot be matched by other traditional interpolation methods. Therefore, the final simulations from the DCIM in this study were close to the observed values, which can be accepted as an effective research program of the PM 2.5 distribution and transport.
There were still some limitations to the approach used in this study. On one hand, we tended to underestimate our simulations and such underestimation mostly occurred when the measured ground-level PM 2.5 concentrations were high. In view of the seasonal emission effect in the analysis period, this underestimation may be mostly because of the coal-based industries in north such as coal-fired power plants, iron and steel manufacturing [17]. On the other hand, the uneven sampling distribution of the monitor and limited PM 2.5 measurements may not be able to make a relatively accurate simulation of the spatial concentration. For the target simulation periods in winter, the sparse PM 2.5 measurements in a highly polluted region have a certain degree of influence on the interpolated distribution accuracy and rationality. Therefore, three new cities in the North China Plain (Jinlin, Dandong and Fuxin) were added to the sampled cities. The observations of the new cities were interpolated in the DCIM for the mean distributions of PM 2.5 on 8-11 November 2014, and the new mean distribution of PM 2.5 were mapped in Figure 10. Compared with the mean distribution of PM 2.5 in Figure 8, the new pattern presented was smoother in the North China Plain. In a highly polluted region, the spatial distribution of PM 2.5 would be improved by adding the right amount of the observations in the DCIM experiments.
The bias between the simulations and the observations will be reduced with the gradually increasing number of monitoring sites in China and available PM 2.5 measurements.
observations of the new cities were interpolated in the DCIM for the mean distributions of PM2.5 on November 8-11, 2014, and the new mean distribution of PM2.5 were mapped in Figure 10. Compared with the mean distribution of PM2.5 in Figure 8, the new pattern presented was smoother in the North China Plain. In a highly polluted region, the spatial distribution of PM2.5 would be improved by adding the right amount of the observations in the DCIM experiments. The bias between the simulations and the observations will be reduced with the gradually increasing number of monitoring sites in China and available PM2.5 measurements. In addition, it is necessary to indicate that some aspects related to data assimilation and the dynamic model developments are restricted in this study, which should be further improved in future studies. Admittedly, for large-scale regions, the sparse PM2.5 measurements in a highly polluted region have a certain degree of influence on the interpolated distribution accuracy and rationality. Adding the right amount of the observations can improve this situation. Only using ground-level observations may result in relatively large observational errors in assimilation and independent verification. In future studies, the contemporaneous satellite observations of PM2.5 will be needed to supplement the data information and provide a further quality check.
Despite the potential application of the spatial interpolation for solving the PM2.5 studies with sparse data, the simulation of PM2.5 concentrations involved a large number of dynamic factors. We therefore applied the dynamically constrained interpolation method to give a better representation of the national-scaled PM2.5 simulations. Overall, compared with the kriging interpolation and COPF, the DCIM was promising for providing reasonable information for the spatiotemporal simulations of PM2.5 variation in large geographical regions. In order to obtain more accurate spatial distribution in large-scale region we will continue to improve the details of the DCIM.   In addition, it is necessary to indicate that some aspects related to data assimilation and the dynamic model developments are restricted in this study, which should be further improved in future studies. Admittedly, for large-scale regions, the sparse PM 2.5 measurements in a highly polluted region have a certain degree of influence on the interpolated distribution accuracy and rationality. Adding the right amount of the observations can improve this situation. Only using ground-level observations may result in relatively large observational errors in assimilation and independent verification. In future studies, the contemporaneous satellite observations of PM 2.5 will be needed to supplement the data information and provide a further quality check.
Despite the potential application of the spatial interpolation for solving the PM 2.5 studies with sparse data, the simulation of PM 2.5 concentrations involved a large number of dynamic factors. We therefore applied the dynamically constrained interpolation method to give a better representation of the national-scaled PM 2.5 simulations. Overall, compared with the kriging interpolation and COPF, the DCIM was promising for providing reasonable information for the spatiotemporal simulations of PM 2.5 variation in large geographical regions. In order to obtain more accurate spatial distribution in large-scale region we will continue to improve the details of the DCIM.  Data Availability Statement: The PM 2.5 concentration data used in this paper was collected from the China National Environmental Monitoring Center (CNEMC), this data can be found here: http://www.cnemc.cn/sssj/. Meteorological data was taken from NCEP Climate Forecast System (CFS-R/CFSv2), this dataset can be found here: http://apdrc.soest.hawaii.edu/las/v6/dataset? catitem=3041.