Prediction and Source Contribution Analysis of PM2.5 Using a Combined FLEXPART Model and Bayesian Method over the Beijing-Tianjin-Hebei Region in China

Fine particulate matter (PM2.5) has a serious impact on human health. Forecasting PM2.5 levels and analyzing the pollution sources of PM2.5 are of great significance. In this study, the Lagrangian particle dispersion (LPD) model was developed by combining the FLEXPART model and the Bayesian inventory optimization method. The LPD model has the capacity for real-time forecasting and determination of pollution sources of PM2.5, which refers to the contribution ratio and spatial distribution of each type of pollution (industry, power, residential, and transportation). In this study, we applied the LPD model to the Beijing-Tianjin-Hebei (BTH) region to optimize the a priori PM2.5 emission inventory estimates during 15–20 March 2018. The results show that (1) the a priori estimates have a certain degree of overestimation compared with the a posteriori flux of PM2.5 for most areas of BTH; (2) after optimization, the correlation coefficient (R) between the forecasted and observed PM2.5 concentration increased by an average of approximately 10%, the root mean square error (RMSE) decreased by 30%, and the IOA (index of agreement) index increased by 16% at four observation sites (Aotizhongxin_Beijing, Beichenkejiyuanqu_Tianjin, Dahuoquan_Xintai, and Renmingongyuan_Zhangjiakou); and (3) the main sources of pollution at the four sites mainly originated from industrial and residential emissions, while power factory and transportation pollution accounted for only a small proportion. The concentration of PM2.5 forecasts and pollution sources in each type of analysis can be used as corresponding reference information for environmental governance and protection of public health.


Introduction
Over the past several decades, industrialization and urbanization have caused serious PM 2.5 pollution in China. PM 2.5 refers to atmospheric particulates with aerodynamic diameters less than 2.5 µm in ambient air [1,2]. Ambient particles can affect air quality and climate by absorbing and scattering solar irradiation [3][4][5][6], and they also have adverse effects on human health. Studies have shown that high concentrations of PM 2.5 not only increase the morbidity and mortality of the public but also affect the cardiovascular system [7][8][9][10]. PM 2.5 is more harmful to human health than PM 10 because PM 2.5 has a smaller particle size and a larger specific surface area, which makes it easier to absorb toxic chemicals [11,12]. Therefore, increasing attention has been given to PM 2.5 in China, not only from the scientific community but also from the public. Overall, it is particularly important to accurately predict the PM 2.5 concentration. In addition, to reduce PM 2.5 concentrations, pollution source (industry, power, residence, and transportation) to the site concentration based on the posterior inventory.

Study Area
The Beijing-Tianjin-Hebei (BTH) region is located on the North China Plain. This region is the country's main high-tech and heavy industry base, and it is also the location of China's political and cultural center. These cities include Beijing, Tianjin, and 11 prefecturelevel cities of the Hebei Province. The land area is 218,000 square kilometers, and the resident population is approximately 110 million people, of which 17.5 million are from outside areas. Economic development in the BTH region is accompanied by serious air pollution. Some studies have found that the BTH region is the most seriously polluted region in China [42][43][44][45]. According to China's 2015 and 2016 Environmental Status Bulletin [46], PM 2.5 , as the main pollutant, has the largest number of days exceeding the standard in the BTH region, accounting for 68.4% and 63.1%, respectively [47]. Figure 1 is a sketch map of the study area in this research, and this area covers from 113.5 • E to 119.8 • E and from 36.0 • N to 42.6 • N.

Model Development
The LPD model is developed based on the FLEXPART-WRF model, which employs the FLEXPART model to calculate the forward Lagrangian particle dispersion with driving data from the WRF model [48][49][50]. Compared with an adjoint Eulerian model, the advantage of the LPD model is that there is no initial diffusion due to the release of the adjoint tracer into a finite-size grid cell, which is independent from the computational grid. Furthermore, long-range transport can be simulated more accurately as no artificial numerical diffusion is present. The LPD model does not consider dry deposition or chemical reactions in the simulation of PM 2.5 , so it has low computational costs. The trajectories of individual particles are computed using a zero-acceleration scheme, in which particle movements are dependent on the grid scale winds and the turbulent fluctuations, which is shown as follows: This equation is accurate to the first-order to integrate the trajectory equation [48] dX where t is the time step, ∆t is the time increment, X is the position vector, and v = v + v t + v m is the wind vector that is composed of the grid scale wind v, the turbulent wind fluctuations v t , and the mesoscale wind fluctuations v m . The LPD model makes it possible to obtain the contribution rate of each grid emission source to the receptors (observation sites), which provides the possibility to improve the prediction accuracy and analyze the contribution and spatial distribution of each type of pollution source (industry, power, residential, and transportation) to the site concentration of PM 2.5 . In addition, the forecast accuracy of the LPD model and the validity of the inventory inversion results have been verified and analyzed in previous studies, the prediction and inversion results are proved good [36,51]. In the following discussion, the receptors refer to the observation stations.

Methods
This section mainly includes two parts: inversion of the emission inventory of PM 2.5 and the pollution source analysis of receptors of PM 2.5 , and the flowchart is shown in Figure 2. The a priori emission inventory and grid emission ratio (industry, power, transportation, and residential) of

Inventory Inversion
In this research, the spatial distribution of the source-receptor relationship (SRR) was determined by the FLEXPART model. The SRR represents the contribution of each potential source to the receptors. In the present study, 30-day SRR was determined by releasing 10,000 particles from the potential sources per hour [51]. The Bayesian inversion method was used to combine the receptor observations to inverse the PM 2.5 inventory. The principle of inversion is to optimize the emission flux density by minimizing the mismatch between the observed and simulated concentrations. The basic equation of the inversion approach can be written as follows: where M is the matrix of SRR, y is a vector of PM 2.5 concentrations (µg/m 3 ) at the receptor, and x is the emission flux vector (Mg/grid/month). The equation can be written as follows: where m is the time series of observations at the receptor and n is the number of emission grids. Y m is a matrix that has m rows. Presuming that x = x − x a and y = y o − M.x a where x a , x, and y o are referred to as a priori, a posteriori source vector, and observation vector, the cost function is described as follows: The best estimate x was obtained by solving equation ∇ x J(x) = 0 as follows: where δ o and δ x represent the standard errors related to the observations and a priori values, respectively. Considering that the inventory cannot be a negative, we need to have the condition min(x) ≥ 0. To reduce the unrealistic emissions in the posteriori region, the calculation was iterated until all negative emissions were greater than or equal to 0. Finally, the posteriori inventory was used to predict the PM 2.5 concentration of receptors. where t represents the time series of simulations in the future.

Source Analysis
The LPD model can simulate the contribution of each potential source of pollution to the receptor sites in real time. The PM 2.5 emission inventory uses the 2016 MEIC products, which are divided into industrial, power, transportation, and residential emissions. The emission proportion of each source in each grid is calculated based on the MEIC PM 2.5 emission inventory of 2016 (Figure 3), and the time resolution is monthly; that is, it is assumed that the emissions proportion of each source does not change within one month. The contribution of each pollution source (industry, electricity, transportation, and residential emission ratio) and the source spatial distribution of those pollutants to the receptors can be analyzed through the pollution emission source contribution of the receptors multiplied by the emission ratio of each grid point, which is described as follows: Ft= Ft ind +Ft pow +Ft res +Ft tra (10) where Ft represents the forecasted concentration of PM 2.5 at time t; Ft ind represents the contribution of industrial sources to the forecasted concentration of the receptors at time t; Ft pow represents the contribution of power sources to the forecasted concentration of the receptors at time t; Ft res represents the contribution of residential sources to the forecasted concentration of the receptors at time t; and Ft tra represents the contribution of transportation sources to the forecasted concentration of the receptors at time t, where Ft ind , Ft pow , Ft res , and Ft tra can be obtained by Formula (11) as follow: where Per ind,n represents the proportion of industrial emissions from the nth emission grid, Per pow,n represents the proportion of power emissions from the nth emission grid, Per res,n represents the proportion of residential emissions from the nth emission grid, Per tra,n represents the proportion of transportation emissions from the nth emission grid, and St n represents the contribution of the nth emission grid to the concentration of receptors at time t, where Per ind,n , Per pow,n , Per res,n , and Per tra,n can be obtained by the MEIC PM 2.5 emission inventory in 2016, and St n can be obtained from the LPD model.

Statistical Analysis Methods
To evaluate the accuracy of the concentration forecast and inventory inversion of the LPD model for PM 2.5 , the main statistical methods used in this article include the Pearson product-moment coefficient of linear correction, the root mean square error (RMSE) [52,53], and the index of agreement (IOA). The IOA index is defined as follows: where i is the ith paired (model-observation) data point, N is the total number of paired data points, and C (m,i) and C (o,i) are the ith modelled and observed mixing ratios, respectively. Additionally, C o is the mean observed mixing ratio. The IOA value varies between 0.0 and 1.0, representing the limits of complete disagreement to perfect agreement, respectively, between the observations and simulations.

Evaluation of the Posteriori Inventory
The inversion results of the PM 2.5 inventory from the LPD model in January and February of 2018 will be used as the spin-up time and will not be part of the discussion of the results. The following series of discussions are based mainly on the results of the March 2018 period. Figure 4b shows the PM 2.5 emission inventory in March 2018 retrieved by the LPD model for the BTH region, whose spatial distribution is basically consistent with the a priori inventory (Figure 4a), and the emissions have a certain degree of reduction compared with the a priori inventory. To show the spatial difference before and after the optimization inventory in the BTH region, the spatial difference between the posterior inventory and the a priori inventory is processed (Figure 4c), where blue and red represent the negative and positive differences between the two, respectively. The green column in Figure 5 is the posteriori inventory of PM 2.5 statistics for the BTH region in March 2018, and the statistical results are basically consistent with the inventory of MEIC products (red column) in the corresponding months of 2016, which were 6.29 × 104 Mg and 6.13 × 104 Mg, respectively. In addition, Figure 5 shows that the annual emissions of PM 2.

Evaluation of Site Forecasts
To evaluate the forecast effect of the LPD model after inventory optimization, the 6-day forecast results from 15-20 March 2018, were compared and analyzed with the observations (Figure 7) based on four observation sites in the BTH region (Aotizhongxin_Beijing, Be-ichenkejiyuanqu_Tianjin, Dahuoquan_Xintai, Renmingongyuan_Zhangjiakou). In Figure 7, Sim_Raw represents the hourly average concentration of PM 2.5 that forecasts from the FLEXPART model based on the a priori inventory, Sim_Opt represents the hourly average concentration of PM 2.5 that forecasts from the LPD based on the a posteriori inventory, and Obs represents the hourly average concentration of PM 2.5 from the observation sites. The results show that the forecasted concentration of PM 2.5 after inventory optimization is significantly better than that before inventory optimization. In Figure 7, the forecasted PM 2.5 concentrations of the Aotizhongxin_Beijing and Beichenkejiyuanqu_Tianjin sites before inventory optimization are significantly higher than the observations. The Dahuo-quan_Xingtai and Renmingongyuan_Zhangjiakou results are opposite, and their forecast concentrations of PM 2.5 are significantly lower than the observations. The statistical results in Table 1 show that the correlation between the forecast concentration of PM 2.5 of the LPD model after the inventory optimization and observations is increased by 9.66% compared to before optimization on average, the RMSE is reduced by 28.74%, and the IOA index is increased by 16.26%. In Table 1, the prefix RAW represents before the optimization of the inventory, and the prefix OPT represents after the optimization of the inventory, where INCREMENT = (OPT-RAW)/RAW. This result is mainly due to the Bayesian optimization method, which makes the PM 2.5 emission inventory more reasonable and corrects part of the errors in the model simulation.    [61]. These errors are higher than the results of concentrations of PM 2.5 predicted by the LPD model using the posteriori inventory, in which RMSE with the observations is 23.41 µg/m 3 on average. In addition, the LPD model can capture the concentration change information of PM 2.5 at the hourly scale, which is more targeted for prevention and treatment. Guo et al., (2020) also compared the LPD forecasting model with the WRF-Chem and Camx models using data from monitoring stations in Xingtai, China, and the LPD forecasting model had higher accuracy than those models [51].

Analysis of the Spatiotemporal Forecasts of Pollution Sources
The LPD model improves the accuracies of the PM 2.5 concentration forecasts at stations and predicts the pollution sources (industry, power, residence, and transportation) at the stations and their spatial distributions. Figure 8 shows the main pollution sources that caused the change in the PM 2.5 concentrations at the stations from 15-20 March 2018. The black rendering represents the contribution of pollution sources outside the BTH region, and the red, blue, pink, and green renderings represent the contribution of industrial sources, power sources, residential sources, and transportation sources to the PM 2.5 concentrations at the stations (Aotizhongxin_Beijing, Beichenkejiyuanqu_Tianjin, Dahuo-quan_Xintai, Renmingongyuan_Zhangjiakou) in the BTH region, respectively. Figure 8 and the statistical analysis table (Table 2) show that the main pollution sources from 15-20 March 2018 originated from industrial and residential emissions, which are consistent with the previous sectoral analyzing results for other cities in BTH region [62]. In particular, residential emission sources contributed as much as 43.9% to the PM 2.5 concentration at four stations in the BTH region, which was similar to a previous report that used the WRF-CMAQ model in December 2015, representing 46% of the monthly average concentration [63]. In addition, the results are consistent with the analytical results of modelling methods, and show that emissions from the residential contribute more to the PM 2.5 concentration than the industrial sectors [64][65][66]. The result is reasonable and could be explained by an increase in residential coal usage due to heating during these colder seasons. In contrast, the power and transportation emission sources accounted for only a small proportion, especially the two monitoring stations in the Hebei Province (Dahuo-quan_Xintai and Renmingongyuan_Zhangjiakou stations), which had little influence from power sources, and the rates were 1.72% and 1.39%, respectively. The contribution rates of the surrounding pollution sources to the two monitoring stations (Dahuoquan_Xintai and Renmingongyuan_Zhangjiakou) in the Hebei Province for the PM 2.5 concentration were 16.44% and 19.39%, respectively, which were 8.9% higher than the Aotizhongxin_Beijing and Beichenkejiyuanqu_Tianjin sites, and the rates were 10.36% and 7.62%, respectively. A similar phenomenon was previously reported; for example,  used the WRF-Chem model to analyze the impact of the surrounding pollution sources on the BTH region, and the results showed that the surrounding pollution sources contributing to the PM 2.5 concentration in the BTH region were approximately 9.3% [67]. Although different models, simulation times, and input data have a great influence on the analytical results, the research results of previous studies and this paper show that the PM 2.5 concentration contribution in the BTH region mainly originates from local pollution sources, and this information can be targeted to control and prevent the occurrence of pollution events.   Figure 9 shows the spatial distribution of each type of pollution source that led to the PM 2.5 concentration change at the Aotizhongxin_Beijing station from 15-20 March 2018, simulated by the LPD model. This distribution map is the ratio between the grid contribution of the corresponding pollution source to the Aotizhongxin_Beijing station and the maximum grid contribution to the Aotizhongxin_Beijing station of this type of pollution source in the BTH region, with a spatial resolution of 0.25 • × 0.25 • . Red represents the grid values where the corresponding pollution sources (industry, power, residence, and transportation) contribute more to the site concentration of PM 2.5 , and blue represents the corresponding types of pollution sources that contribute less to the site concentration of PM 2.5 . During this forecast period (15)(16)(17)(18)(19)(20) March 2018), the surrounding industrial and residential emission sources have a greater influence on the Aotizhongxin_Beijing station over a wide range. In contrast, the power and transportation sources have a small influence on the Aotizhongxin_Beijing station. In particular, the influence of the power source on the Aotizhongxin_Beijing station is relatively scattered (Figure 9b).

Conclusions
LPD can improve PM 2.5 concentration forecast accuracy by optimizing the PM 2.5 emission inventory and by obtaining real-time analyses of the temporal and spatial distributions and pollution source contributions that lead to changes in the PM 2.5 concentration at the receptors. This model was applied to the BTH region, which has severe pollution, and the hourly forecast values over 6 days from 15-20 March 2018 were analyzed. The results show that the average correlation between the forecasted concentrations of PM 2.5 after the emission inventory was optimized and that the observations were as high as 0.82 at four observation sites (Aotizhongxin_Beijing, Beichenkejiyuanqu_Tianjin, Dahuo-quan_Xintai, Renmingongyuan_Zhangjiakou); the RMSE was~23.41, and the IOA index was~0.84, which was significantly improved compared to the values obtained before inventory optimization.
During this period (15-20 March 2018), the main pollution sources that led to the changes in the concentrations of PM 2.5 at the four observation sites originated from industrial and residential emissions, and power and transportation pollution sources accounted for only a small proportion, especially the Dahuoquan_Xintai and Renmin-gongyuan_Zhangjiakou stations in the Hebei Province; power pollution sources had little influence on these stations, and the values were 1.72% and 1.39%, respectively. The influence of the surrounding pollution sources from outside BTH on the Dahuoquan_Xintai and Renmingongyuan_Zhangjiakou stations in the Hebei Province was 16.44% and 19.39%, respectively. Compared with the Aotizhongxin_Beijing and Beichenkejiyuanqu_Tianjin sites, the average influence ratio increased by 8.9%.
Considering the actual situation and operating efficiency of the LPD model, the concentration forecast of PM 2.5 and the analysis of each type of pollution source are only at the site scale, without considering dry and wet sedimentation and chemical reactions of the particles, and this part of the error is attributed to the uncertainties in both the model and emission inventory. The reader should keep in mind that inverse approaches still have large uncertainties due to the limited observation sites and the method bias. Therefore, the LPD model needs to be further optimized and expanded in future research for the development of forecasting surface-level distributions of PM 2.5 concentration and each type of pollution source analysis at a regional scale. In addition, at present, we only quantitatively evaluate the reliability of the LPD model and input the parameters through indirect or by comparison with observations, and we need to further verify and analyze its reliability and adaptability.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author.