Modelling Spatial Heterogeneity in the Effects of Natural and Socioeconomic Factors, and Their Interactions, on Atmospheric PM2.5 Concentrations in China from 2000–2015

In recent years, atmospheric PM2.5 pollution in China has become increasingly severe and exploring the relationships among its influencing factors is important in the prevention and control of air pollution. Although previous studies have identified complexity in variations in PM2.5 concentrations and recognized the interaction of multiple factors, little quantitative information is available on the evolution of the relationships among these factors, their spatial heterogeneity, and the multiscale interactions between them. In this study, geographical detector and multiscale geographically weighted regression models have been used to explore the multiscale interactions among natural and socioeconomic factors and PM2.5 concentration in China over the period 2000–2015. The results indicate that the relationship between natural factors and PM2.5 concentration is stronger than that for socioeconomic factors. The type of interaction between each factor is dominated by bivariate and nonlinear enhancement, exhibiting strong interactions between natural factors and anthropogenic factors. Although the effect of each factor on PM2.5 is complex, the relative influence of both human activities and social factors is shown to have gradually increased over time and population, agriculture, urbanization, and socioeconomic activities in general make important contributions to PM2.5. In addition, the scale of effects related to natural factors is smaller and more stable compared to the influence of human activities during the period 2000-2015. There are significant differences in the way natural factors and socioeconomic factors affect PM2.5, and there is strong non-stationarity of spatial relationships. Factors associated with topography, vegetation (NDVI), climate (temperature), natural sources, and agricultural activity are shown to be important determinants of PM2.5 across China and warrant significant attention in terms of managing atmospheric pollution. The study demonstrates that spatial differences in the direction, intensity, and scale of each factor should be accounted for to improve prevention and control measures and alleviate regional PM2.5 pollution.


Introduction
In the context of global climate change, with the rapid progress of global industrialization and urbanization, ecological and environmental issues such as air pollution and land degradation have become urgent problems for sustainable development [1][2][3]. PM 2.5 air pollution, with both regional and complex characteristics, has become a significant challenge as the main component of air particulate pollutants, which can reduce atmospheric visibility [4], effect local climate change [5], interfere with traffic, and have serious economic impacts [6]. More importantly, numerous investigations have revealed that excessive concentrations of PM 2.5 have been found to increase morbidity and mortality rates and are a major risk factor for global disease [7]. For example, in 2015, the number of premature deaths associated with PM 2.5 in 161 cities in China was approximately 65.2 × 10 4 , accounting for approximately 6.92% of the total number of deaths in China in that year [8]. Therefore, identifying the key factors influencing PM 2.5 and exploring their interactions is crucial for the prevention and control of air pollution.
Identifying the factors influencing PM 2.5 has become an important research focus because it can help to inform more precise responses and preventive measures for PM 2.5 pollution [9]. Atmospheric PM 2.5 concentrations are affected by factors associated with the sources of particulate matter, transmission and diffusion [10][11][12][13]. Both direct factors, such as natural and artificial sources, and indirect factors, such as diffusion and settlement conditions and human activity intensity, have been widely studied [14][15][16]. At the regional scale, factors affecting the speed and direction of atmospheric diffusion (topography, temperature, pressure, wind speed and direction) [17][18][19] and factors related to the deposition of fine particles (humidity and precipitation) [4,13,18,20] play an important role in the transmission and diffusion of PM 2.5 . In addition, urban morphology [21], urbanization [22], land use [23], and other factors associated with human activities are known to strongly impact PM 2.5 particle source, diffusion and depositional processes by altering local hydrothermal conditions [24] or the dynamics of source-sink landscapes [25]. The impacts of geospatial attributes and the regional economy on PM 2.5 pollution have been demonstrated to have clear spatial aggregation and diffusion effects [26]. For instance, there is a strong correlation between the spatial configuration of the urban landscape and PM 2.5 pollution in China, illustrating the effects of marked spatial differences in national, regional and urban agglomeration [27]. Therefore, spatial heterogeneity of PM 2.5 concentrations and the processes influencing regional spatial variations must be taken into consideration when studying the mechanisms influencing PM 2. 5 .
In recent years, researchers have quantified the relationship between various natural and socioeconomic factors and PM 2.5 concentrations using ordinary least squares (OLS), spatial econometric models based on spatial autocorrelation, and nonlinear response models such as generalized additive models and random forest models [28][29][30]. However, these simulations are still in essence based on global regression models [27,31], without considering the spatial non-stationarity caused by interactions among the measured variables [32]. To this end, Wang et al. [31] applied a geographical weighted regression (GWR) model to assess regional differences in influencing factors through measuring spatial variation in the parameters. However, the relationship between the influencing factors and PM 2.5 concentration varies according to spatial scale [33] which must be accounted for in analyzing the factors influencing PM 2.5 . Fotheringham et al. [34] emphasized the concept of multiscale geographic processes and proposed a multiscale geographically weighted regression model (MGWR) mode, based on a GWR model, which allows relationships between different variables to operate at different spatial scales by searching for different bandwidths to capture the effect of spatial scale on the different factors. We have therefore applied the MGWR model here to explore the interactions among atmospheric PM 2.5 influencing factors in China.
Despite the shift from considering only a single spatiotemporal scale towards a comparative study of different spatiotemporal scales, spatiotemporal heterogeneity in the processes influencing PM 2.5 remains unclear [9,30,31,33]. At larger spatial scales, few studies have managed to overcome the restrictions imposed by administrative boundaries, which makes it difficult to understand the potential correlation mechanisms underlying the pattern of PM 2.5 pollution and effectively control the transfer of pollutants. Additionally, although PM 2.5 has been recognized as influenced by the interaction between multiple fac-tors, these interactions have not been quantified, and their evolution has not been studied, while the response process of PM 2.5 to each factor has not been spatially resolved. The major aim of this study, therefore, is to describe and explain the multiscale interactions between natural, socioeconomic factors, and PM 2.5 pollution in China. Firstly, spatial visualization of PM 2.5 raster data are used to identify the spatial and temporal evolution of atmospheric PM 2.5 concentrations during the period of 2000-2015. Secondly, correlation analysis and geographical detector models are used to quantify the global one-way and interactive relationship between natural, socioeconomic factors on PM 2.5 concentration. Finally, the MGWR model is applied to quantify the multiscale effects of natural, socioeconomic, and land use factors on PM 2.5 . This study applies a novel method that is capable of accounting for the spatially explicit responses of natural and socioeconomic factors on PM 2.5 concentrations that can provide accurate theoretical support for the control of PM 2.5 pollution in China.

Annual Mean PM 2.5 Concentrations
The data on PM 2.5 used in this study were selected from the PM 2.5 grid estimation dataset of China (V4.CH.02) simulated by the Atmospheric Composition Analysis Group with a spatial resolution of 0.01 • × 0.01 • (URL: http://fizz.phys.dal.ca/~atmos/martin/ ?page_id=140#V4.CH.02 accessed 22 September 2020). Considering that the response of PM 2.5 concentrations to natural and socioeconomic factors requires a certain amount of time to develop spatial pattern changes on a large spatial scale, the data period 2000-2015 was chosen to start the study by operating in a 5-year time step (2000,2005,2010,2015). This dataset uses the geophysical relationship between aerosol optical depth (AOD) and PM 2.5 from NASA MODIS, MISR, SeaWIFS and other satellite instruments to estimate the concentration of PM 2.5 through a chemical transport model (GEOS-Chem), and uses a GWR for calibration, with an accuracy of up to 81% [35]. Detailed information on the dataset is given in van Donkelaar et al. [35,36]. These data have been widely used in PM 2.5 research [37,38].
To verify the reliability of this dataset in across China, this study uses PM 2.5 concentration data monitored by 1497 ground-based monitoring stations in 2015 released by the Chinese Ministry of Ecology and Environment to compare and validate the PM 2.5 raster dataset. The monitored annual average PM 2.5 concentrations at the stations were spatially matched with the estimated PM 2.5 concentration values through the "extract by points" tool in ArcGIS 10.2, and the correlation coefficient and root mean square error (RMSE) were calculated as 0.852 and 10.57, respectively. Han et al. [37] used this dataset to assess the accuracy of remotely sensed annual mean PM 2.5 concentrations at the provincial spatial scale and, in validating the results, showed that the maximum relative error was 4.9% in 2015. Therefore, minimal information is lost in using the modelled dataset compared with the actual values of regional PM 2.5 concentrations in China, and it can therefore be reliably used in the spatial analysis of regional PM 2.5 concentrations in China. Regional PM 2.5 pollution in China is classified by categorizing PM 2.5 concentrations into five levels that relate to the annual limit of PM 2.5 concentration in China's Ambient Air Quality Standard [39] and the air quality guideline (AQG) by the World Health Organization [40]; specific classification information is shown in Figure 1.   [10,35); Moderate pollution, PM 2.5 concentration ∈ [35,70); Severe pollution, PM 2.5 concentration ∈ [70,100); Extreme pollution, PM 2.5 concentration ≥ 100. The numbers above each column indicate the percentage of areas with per PM 2.5 concentration level relative to the total area of the entire country. The values on the graph are approximate due to rounding.

Influencing Factors
This study focuses on the spatial heterogeneity in response to natural and anthropogenic factors of PM 2.5 concentration; therefore, based on the principle of spatial raster data accessibility, factors are selected in terms of natural factors and socioeconomic factors. Different factors are quantified using different datasets. Table 1 presents the factor categories established in this research and their corresponding application datasets. Through statistical analysis of the direction and intensity of the relationship between various factors and PM 2.5 concentration, the difference in the effects of natural conditions and human activities on PM 2.5 can be identified [29][30][31].
Topographic, meteorological, vegetation, and natural source factors were selected to express the influence of natural conditions. Elevation (ELE, acquired by digital elevation model) was chosen to quantify the topographic factor, which often acts as an important spatial constraint on the transfer of airborne particulate pollution. The effects of complex topographic units such as plateaus, basins, and valleys in the Chinese region are known to be more important than that of Europe and the United States in that pollutants are less likely to be dissipated in such situations [13]. Meteorological factors are quantified using mean annual precipitation (PRE), mean annual temperature (TEM), and wind speed (WIND). Precipitation is known to affect PM 2.5 in the form of a flushing effect and removing atmospheric pollutants [18]; temperature affects atmospheric convection and influences evaporative losses with a resultant influence on PM 2.5 [20]; high wind speeds have the ability to both erode and disperse particles and favor PM 2.5 dispersion process [4]. Vegetation cover index (NDVI), proportion of forested land (PFOL), and proportion of grassland (PGRL) were selected to quantify vegetation factors. Generally, relying on the adsorption and dust reduction effect of vegetation [41,42], there is a negative correlation between vegetation and PM 2.5 concentration, while there will be some variability in effects of vegetation in general and vegetation type on PM 2.5 [25]. In addition, unused or bare land and sandy land are susceptible to wind erosion and generate surface dust, making unused land an important input to PM 2.5 [11]. Therefore, the proportion of unused land (PUNL) is used as a quantitative expression of natural source factors.
Three factors that directly represent the intensity of human activities (gross domestic product (GDP), population density (POP), and nighttime light intensity (NLI)) and two land use factors that indirectly reflect the intensity of human activities in rural versus urban areas (proportion of agricultural land (PFAL) and proportion of built-up land (PCOL)) are used here to represent the influence of socioeconomic activities. Population density indicates the number of people in a spatial unit, and several existing studies point to the significant contribution of population density to PM 2.5 concentrations [43]. Gross domestic product (GDP) describes the economic development within a spatial unit, and in general the energy consumption and particulate emissions from regions with high population density and rapid economic development become important anthropogenic sources of PM 2.5 concentrations [33,43]. NLI is widely used in the spatial quantification of urban characteristics, representing integrated urban characteristics such as population, economic dynamics, traffic levels, and city-size distribution [44,45]. Improvements in the level of urban development (demographics, economic structure, transportation layout, etc.) can alleviate the degree of PM 2.5 pollution, and the Environment Kuznets Curve (EKC) in China's cities can provide a relevant measure of this [46]. While the proportion of cultivated land can be used directly to assess agricultural activities, the crop cover of cultivated land and agricultural activities such as fertilization soil plowing and biomass burning make cultivated land a dynamic source landscape for aerosols [47][48][49]. The proportion of construction land is directly related to urbanization. Construction and industrial activities resulting from the expansion of urban construction land have become major sources of PM 2.5 [50], while the rapid expansion of urban areas has amplified the intensity of the heat-island effect, thereby affecting the transmission and diffusion of PM 2.5 [51].
In considering all of the variables used in this study, spatial resolution was standardized to 1 km × 1 km so that the data on various influencing factors match well with the PM 2.5 data, and spatial sampling errors are minimized. The spatial location of all datasets was mediated by grid cells. Specifically, we established 23,614 grids of 20 km × 20 km size at the regional scale across China and PM 2.5 data, DEM data, NDVI, meteorological data, GDP, POP, and NLI data were resampled to the spatial resolution of the grid cells to extract the corresponding resampled values of the grid cells and to calculate the proportional areas of arable land, forest land, grassland, construction land, and unused land within each grid cell. Pearson correlation coefficients between PM 2.5 and influencing factors were calculated using all data attributes for each of the 23,614 grid squares as the sample set, with PM 2.5 concentration values set as the dependent variable. The independent variables were input into the geographic detector and multiscale geographically weighted regression model to evaluate the spatial heterogeneity of natural and socioeconomic factors and PM 2.5 concentrations.

Geographical Detector Model
Geographical detector modelling, an important statistical method in spatial differentiation research, is a statistical method used to describe heterogeneity in spatial stratification and reveal its internal driving factors through a nonlinear hypothesis. The model assumes that if an independent variable has a strong causal relationship with a dependent variable, then the independent variable and the dependent variable will exhibit a high degree of consistency in their spatial attributes [59]. Geographical detector models have been widely used in studies of land use, regional economies, ecology, and environmental science among others (see Wang et al. [60]). In this study, factor detectors and interaction detectors are used to address two issues, viz., which variables have a significant impact on PM 2.5 concentration and how do these variables interact with each other.
In regard to the factor detector, q statistics are first used to quantify the degree of influence of a single factor and the interaction between two factors. Values of q are calculated using the following formula: where h (1, . . . , L) represents the sub-region of independent variable X, N h represents the number of samples in the subregion of h, N represents the number of spatial units in the whole region, σ 2 h represents the variance of the subregion, and σ 2 represents the total variance of the whole region. In this study, the sub-region represents grid cells with a spatial scale of 20 km × 20 km.
An interaction detector is used to detect the interaction between two factors. The q value is used to assess the nature of interaction between the two factors, such as whether the interaction between two factors enhances or weakens the explanatory power of the single factor on the dependent variable Y or is consistent with the effect of the two factors acting independently on Y. The interaction detector is based on the factor detector result, and is calculated as follows: 1.
q values of X1 and X2 obtained from equation 1 are expressed as q(X1) and q (X2); 2.
The two factor layers X1 and X2 are used for superposition. The new superposition factor layer after the superposition is X1∩X2, and the Equation (1) is used again to calculate the q value of X1∩X2 at the superposition factor layer, which is expressed as q(X1∩X2); 3.
According to the magnitude of q (X1), q (X2) and q (X1∩X2), it can be divided into five types of interaction; the specific comparison and corresponding interaction relationships are shown in Table A1.
The geographical detector R package was used in this study to calculate the factor detector and interaction detector [61].

Multiscale Geographically Weighted Regression Model
Classical GWR sets up a consistent bandwidth for all factors such that all the relationships operate at the same spatial scale. However, the spatial relationships between each factor and dependent variable play different roles at different scales [33], and the relationship between the factor and dependent variable is also scale dependent [62]. Therefore, Fotheringham et al. [34] developed the MGWR model based on a GWR by relaxing the hypothesis that all spatial change processes in the model operate at the same spatial scale and assigning a dedicated bandwidth for each variable to adapt the model to account for its multiscale effects. The MGWR can eliminate the limitation that all relationships must vary at the same spatial scale, minimize overfitting, and improve the accuracy and superiority of the model [63]. The MGWR is expressed as where (u i , v i ) represents the coordinates of sample point i; y i and x ik represent the dependent variable and the kth explanatory variable, respectively; β bwk (u i , v i ) represents the local parameter estimation of the kth explanatory variable of sample point i in optimal bandwidth; β bw0 (u i , v i ) represents the intercept in optimal bandwidth; and ε i represents the error term; bwk is the specific optimal bandwidth used in the calibration of the conditional relationship between the explanatory variable k and the independent variable, which means that the relationship between explanatory variables and dependent variables allows changes in space in the model and provides bandwidth parameters of optimal geographical range, thus allowing different processes to run at different spatial scales. MGWR is essentially a multiple linear regression, and the multivariate complexity and correlation of PM 2.5 concentration influencing factors, so the multiple covariance problem needs to be solved before running MGWR; the test of variance inflation factor (VIF) is the most concise variable screening method [64]. This study solves the multiple covariance problem by screening out the variables with VIF > 10. The original data were also normalized in order to make the coefficients of the modeling results statistically comparable at different locations and for different variables.
During the establishment of the MGWR model, the bandwidth was calibrated using a back-fitting algorithm [34]. In this paper, all calibration of the model was carried out using MGWR 2.2 software [63,65], which is suitable for modeling large datasets. The software is available at https://sgsup.asu.edu/sparc/multiscale-gwr accessed 22 September 2020. In order to verify the practical validity of the MGWR model, a total of 658 matched samples were obtained using the spatial connection between ground-based monitoring stations and the 20 km × 20 km grids. The results of the linear fit between the predicted values of the MGWR model and the monitored PM 2.5 concentration values at the stations showed that the regression coefficient R 2 of the fit reached 0.715, which means that the MGWR model implemented in this study has a reliable practical value.

Correlation Analysis
Determination of Pearson's correlation coefficient values indicate that the PM2.5 concentration is significantly correlated with the influencing factors (Table A2). For natural factors, there is a significant positive correlation between TEM, PUNL, and PM2.5 concentration, whereas the remaining factors showed a significant negative correlation. Moreover, the factor with the greatest influence on PM2.5 concentration shifted from the natural source factor PUNL to the meteorological factor WIND over the period 2000-2015; the intensity of vegetation factors and meteorological factors in terms of transport dispersion is stronger than that of natural sources in the middle and late stages. For socioeconomic factors, all factors have a very significant positive correlation with PM2.5, and the correlation has strengthened over time; the impact of human agricultural and urban activities on PM2.5 concentration is stronger than the overall socioeconomic activities, and its intensity approaches the intensity of some natural factors, indicating that human activities and urban development have an increasingly prominent influence on PM2.5 concentration. Overall, the relationships between PM2.5 concentration and natural factors are stronger than those relating to socioeconomic factors. However, over time, the influence of human activities on the PM2.5 concentration has gradually increased.

Factor Detector Analysis
The degree of influence (q value) of the spatial distribution of each influencing factor derived from the factor detector for the spatial differentiation of PM2.5 concentration indicates that all of the influencing factors have a significant impact on PM2.5 concentration (p < 0.01) and follow the sequence: natural factors > socioeconomic factors (Table 2)

Correlation Analysis
Determination of Pearson's correlation coefficient values indicate that the PM 2.5 concentration is significantly correlated with the influencing factors (Table A2). For natural factors, there is a significant positive correlation between TEM, PUNL, and PM 2.5 concentration, whereas the remaining factors showed a significant negative correlation. Moreover, the factor with the greatest influence on PM 2.5 concentration shifted from the natural source factor PUNL to the meteorological factor WIND over the period 2000-2015; the intensity of vegetation factors and meteorological factors in terms of transport dispersion is stronger than that of natural sources in the middle and late stages. For socioeconomic factors, all factors have a very significant positive correlation with PM 2.5 , and the correlation has strengthened over time; the impact of human agricultural and urban activities on PM 2.5 concentration is stronger than the overall socioeconomic activities, and its intensity approaches the intensity of some natural factors, indicating that human activities and urban development have an increasingly prominent influence on PM 2.5 concentration. Overall, the relationships between PM 2.5 concentration and natural factors are stronger than those relating to socioeconomic factors. However, over time, the influence of human activities on the PM 2.5 concentration has gradually increased.

Factor Detector Analysis
The degree of influence (q value) of the spatial distribution of each influencing factor derived from the factor detector for the spatial differentiation of PM 2.5 concentration indicates that all of the influencing factors have a significant impact on PM 2.5 concentration (p < 0.01) and follow the sequence: natural factors > socioeconomic factors ( Table 2). Among the 13 selected factors, TEM has the highest degree of influence in 2000, 2005, and 2010, with q values of 0.39, 0.35, and 0.35, respectively. The degree of influence of WIND increased sharply in 2015, with the q value reaching 0.37. It is also worth noting that the intensity of the influence of POP on PM 2.5 is stronger than that of human regional activities, and the degree of influence of agricultural activities is stronger than that of urban activities, and this evidence is different from what is displayed by the correlation coefficients. In addition, the absolute value of the correlation coefficients is generally larger than the q value, and the difference between the q value of each influencing factor is larger than that of the correlation coefficients, which indicates that the effect of each influencing factor on PM 2.5 is dominated by the linear effect and that the nonlinear effect of natural factors is stronger than the socioeconomic factors.

Interaction Detector Analysis
Interaction detectors are applied to identify the interaction between two influencing factors on the PM 2.5 concentration among the 13 selected influencing factors (Table A3). The results show that there are two types of interactions between each pair of influencing factors during the period 2000-2015, viz., bivariate enhancement and nonlinear enhancement. As shown in Table A3, it is clear that the explanatory ability of the interaction between most factors on the spatial distribution of the PM 2.5 concentration is enhanced nonlinearly. Over time, the number of dual-factor enhanced factor pairs gradually increased from 30 pairs in 2000 to 40 pairs in 2015, and the increasing factor pairs are mainly those between WIND and other factors, PGRL and socioeconomic factors. This indicates that the interaction between WIND, PGRL, and other factors has weakened, while the individual influence of each factor has been enhanced.
Factor pairs with interactive q statistic values greater than 0.5 occur between natural factors, as well as between socioeconomic factors and temperature and precipitation. Temperature and precipitation have the greatest number of high interaction factor pairs, and the interaction with temperature is the most frequent in terms of the number of occurrences. Among the socioeconomic factors, POP, GDP, and PFAL all exhibit strong interactions with precipitation. Among all factors, natural factors are still the dominant force driving the variation in PM 2.5 concentration. However, the factor pairs with interaction q statistic values greater than 0.5 among the influencing factors were significantly reduced, and the difference between q values gradually decreased in 2015. The weakening of the interaction between natural factors and the strengthening of the interaction between human activities and social factors represent a change in the dominant force of natural factors and the synergistic force of socioeconomic factors (i.e., the weakening of the dominant force of nature and the strengthening of the force of human activities and social factors), and the gradual decrease in the difference in the forces of interaction reflects the gradual balance in the interaction between the factors influencing PM 2.5 concentrations.

Multiscale Geographically Weighted Regression Model Analysis
The spatial bandwidth derived from the MGWR model represents the change in the spatial scale of the relationship between each influencing factor and PM 2.5 concentration, and the bandwidth can be considered as the number of samples included in the local calculation. The bandwidth size determines whether the relationship between each factor and the dependent variable is local, regional, or global, on the one hand, and expresses the degree of spatial stationarity and spatial heterogeneity of each relationship, on the other hand. To classify the spatial scales of bandwidth, the number of spatial units is obtained by calculating the ratio of the total number of samples to the number of samples in the local calculation and rounding it. Comparing the number of spatial units with the number of administrative divisions at all levels in China divides that the larger regional scale > provincial division, provincial division > smaller regional scale > municipal scale, and local scale < municipal scale.
The bandwidth derived from the MGWR for 2000-2015 shows that the spatial scale at which each influencing factor operates varies (Table 3). During the period 2000-2015, only WIND, GDP, and NLI showed a shift in spatial scale range, showing the instability of spatial heterogeneity, and the fluctuation of spatial scale range only appeared at the smaller regional scale with the smaller regional scale; then, perhaps the provincial level is the key control range for WIND, GDP, and NLI factor. All other factors fluctuate and change within the same spatial scale, and the spatial heterogeneity is more stable, while the ELE has the least variation in the range of influence, indicating that the spatial heterogeneity is both stable and high and is strongest in spatial non-stationarity. The spatial influence range of vegetation factor, meteorological factor, and natural source factor is the smaller regional scale, but PUNL fluctuates more strongly in the smaller regional scale and is close to the provincial scale. Among the socioeconomic factors, the spatial influence range of POP and PCOL is the larger regional scale, and the influence range of POP is larger under the influence of population migration, while the larger regional spatial influence range of urban activities (PCOL) indicates that construction land plays an important role in the crossregional transmission of PM 2.5 . It is noteworthy that agricultural activities represented by cropland have a pattern similar to the variation in NDVI, suggesting that crops and vegetation have similar effects. In general, the influence range of natural factors tends to be closer to the provincial scale at the smaller regional scale, while the influence range related to human activities is dominated by a large regional scale and forms a range fluctuation. In assessing the statistical significance of the regression coefficients, at least 66.47% of the regression coefficients passed the significance test (p < 0.05) from 2000 to 2015, indicating that the regression coefficients in general are credible. Regression coefficients for each influencing factor derived from the MGWR model are shown in Table 4. In terms of the coefficient maximum ratio in all of the data, in which PCOL has a single positive effect on PM 2.5 nationwide, WIND has a single negative effect on PM 2.5 except in 2015, while other factors act in opposing directions in different regions much of the time. The effects of each factor on the degree of spatial heterogeneity in PM 2.5 vary. The range of coefficients for all stages of ELE is the largest, illustrating the strongest degree of spatial heterogeneity. In contrast, NLI in 2000-2010 and PCOL in 2015 exhibit a low range of coefficient values, thereby suggesting minimal spatial heterogeneity degree, which is consistent with the bandwidth results. In terms of the average value of various indicators, the factor that has the greatest impact on PM 2.5 changes from PRE (2000) to WIND (2005 and 2010) to PFAL (2015). Considering the regression coefficients for socioeconomic factors, the effect of human activity at the regional scale, represented by cultivated land and construction land, is much stronger than for all the regional socioeconomic indicators; the sequence of the intensity of effect is PFAL > PCOL.

The Advantages of the MGWR Model in Quantifying the Relationship between Factors and PM 2.5 Concentration
Referring to Tu et al. [27], the coefficient of determination (R 2 ), adjusted R 2 , Akaike information criterion (AIC), corrected Akaike information criterion (AICc), and residual sum of squares (RSS) were selected to compare the fit of the MGWR model degree (Table 5). Through comparison, it is found that the R 2 and adjusted R 2 of the MGWR are much higher than the OLS model, while the AICc and RSS are much lower than the OLS model, which indicates that the MGWR model performs better in the goodness of fit for spatial elements. This has also been shown before, in the comparison of model advantages of the OLS and the MGWR in the evaluation of the impact of air pollution in China by Fotheringham et al. [67], although in this case the degree of change in the AICc and the RSS from the OLS to the MGWR model is, however, much greater than in that study. In addition, the fitting ability of the MGWR model applied in our study has emerged as superior to that of the GWR [27], PCA-GWR [64], and SEM [28] models.
A previous study also used the global Moran's I to perform analysis of the residuals of models to determine consistency of the residual results against the assumptions [68] and to determine whether the model has the ability to solve the problem of spatial autocorrelation. In this study, the Moran's I of the residuals (MIR) indicates very significant positive spatial autocorrelation in the OLS model. However, the values of the MIR in the MGWR model show a very low degree of spatial autocorrelation. This indicates that the MGWR model can solve for the spatial autocorrelation problem well in quantifying the spatial relationship between various factors and PM 2.5 concentration [27,67,68]. By comparison with Tu et al. [27] the MIR of the MGWR model in this study was closer to 0 and more random, which effectively eliminates the spatial autocorrelation of each variable and indicates that the results are more reliable.

Drivers of PM 2.5 Concentration in China
The MGWR model achieves the characteristic of spatial heterogeneity of the relationship between factors and PM 2.5 at different spatial locations, focusing on solving the spatial non-stationarity of the relationship between factors and PM 2.5 , as manifested by the spatial distribution of the regression coefficients of each factor from the model output. In this part, the spatial effects of each driver of PM 2.5 concentration in the Chinese region are explored by taking the MGWR model output coefficients in 2015 as an example.
There are significant differences in the way natural factors and socioeconomic factors affect PM 2.5 . The relationship between each factor and PM 2.5 and its absolute intensity, spatial act extent scale, spatial trend of coefficients, and major influence areas can be obtained from Table 6, and spatial visualization information can be seen in Figure 3 (only for sample points in which all significance tests of the factor coefficients passed). In terms of influence range, the scope of human socioeconomic and urban activities is obviously wider than that of natural factors. In terms of the direction of influence, the socioeconomic factors act mainly in one direction, while the natural factors have positive and negative spatial non-stationarity on PM 2.5 concentrations across China. In this way, the non-stationarity of the spatial relationship between PM 2.5 and its influencing factors is relatively strong.    Taken at the largest spatial scale (i.e., China as a whole), the degree of influence was determined by the average of the absolute values of each factor coefficient, and the order of intensity of each factor on PM 2.5 was agricultural activity factor > natural source factor > topography factor > vegetation factor > urban activity factor > meteorological factor > socioeconomic activity factor (Table 6). PUNL as a natural source factor is the most influential of the natural factors on PM 2.5 , which may be associated with the sand and dust activities in the large deserts in the northwest and southwest. Moreover, the overall vegetation factor (NDVI) was observed to have a stronger effect on PM 2.5 than the single vegetation factor (PFOL), which may be related to the strong controlling effect of vegetation on dust activity in northwest China [69]. Another finding is that the degree of influence of arable land versus construction land is widely different but similarly stronger than meteorological factors and socioeconomic activities. Both studies at both the national and urban regional scale have indicated that farmland and construction land both play important roles in PM 2.5 concentration [48,50], then different land management of urban and agricultural land may be more effective than economic management for PM 2.5 management more effectively and with more accessibility than meteorological control.
The factors with the top two absolute values of the coefficients at each location were considered as the determining influencing factors at that location. The map shown in Figure 4 indicates the spatial heterogeneity of determining factors in China, and different colors represent different factors. In terms of the spatial proportions controlled by each principal factor, the topographic, vegetation (NDVI), meteorological (TEM), natural source factor, and agricultural activity factor are the factors that need obvious attention in the Chinese region. However, it should be noted that other pollutants (especially secondaryformed sulfate particulate matter) account for a considerable proportion of fine particulate matter (PM 2.5 ), with a contribution of 36.1% from secondary sources in Beijing [70]. This study did not include secondary particulate matter in the model calculations and did not distinguish artificial sources separately. The spatial distribution of the main control factors only distinguishes the strength of the spatial relationship between different natural and socio-economic factors, and the PM 2.5 concentration can identify the key factors in the variation of PM 2.5 concentration from the perspective of spatial heterogeneity and cannot distinguish absolutely the spatial location of natural and anthropogenic sources.  Geomorphology is thought to influence PM2.5 accumulation and enhance pollution [17], and some studies indicate that topographic factors have the greatest effect on PM2.5 concentrations in southwest China [30], which overlaps spatially with the ELE spatial distribution results in Figure 4, but southwest China has low pollution levels except for the Sichuan basin, which is not a pollution management control area. In contrast, the pollution in Xinjiang region is serious, and the ELE on PM2.5 is clearly visible in the basin topography effect.
It is widely accepted that vegetation plays an important role in alleviating PM2.5 pollution for a range of reasons, and in this study, it can be shown that the intensity of vege- Geomorphology is thought to influence PM 2.5 accumulation and enhance pollution [17], and some studies indicate that topographic factors have the greatest effect on PM 2.5 concentrations in southwest China [30], which overlaps spatially with the ELE spatial distribution results in Figure 4, but southwest China has low pollution levels except for the Sichuan basin, which is not a pollution management control area. In contrast, the pollution in Xinjiang region is serious, and the ELE on PM 2.5 is clearly visible in the basin topography effect.
It is widely accepted that vegetation plays an important role in alleviating PM 2.5 pollution for a range of reasons, and in this study, it can be shown that the intensity of vegetation factor in the northwest is almost twice as strong as that in the eastern and southern regions (Figure 3b). While desertification is serious in the northwest, vegetation cover is low, and dust storms are the main sources of PM 2.5 concentration in the region [71]; therefore, the vegetation in the northwest exhibits a strong negative effect on dust storm activity by stabilizing the sandy surfaces [69] and increasing vegetation cover through afforestation is regularly practiced in the control of PM 2.5 in northwest China [72]. Meanwhile, for PM 2.5 natural sources, the proportion of natural sources (sand storms) of PM 2.5 in northwestern cities increases from east to west and is less directly affected by human activities [11], and from the distribution of the main first control factor (Figure 4a) and the second control factor (Figure 4b) in the northwest, the trend of the intensity of the impact of unused land is consistent with the trend of the proportion of natural sources, but in some regions, the impact of cropland and vegetation is stronger. This may depend on the prevalence of planting vegetation to provide root anchorage and wind break to stabilize unused land surface in northwest China or through the use of straw lattice panels, which can significantly reduce PM 2.5 pollution levels [69,71]. Li and Huang [48] also indicated that agricultural activities in small urban cities are more important sources of pollution compared to large cities.
The effect of PUN is shown to be the first control factor in the south, while it is the second control factor in the east (Figure 4). In the southern and eastern regions, the occurrence of even relatively small areas of unused land appears to have a strong positive effect on pollution, indicating that any increase in the area of unused land in this region may cause a substantial change in PM 2.5 concentrations. It is necessary, therefore, to consider carefully the transfer of unused land to prevent deterioration in air quality. It is important to note the impact of cultivated land in the central-eastern region. The scope of the contribution of cropland to PM 2.5 concentrations is limited by the space of agricultural activities and the effect of crop cover [73], and its bandwidth is shown to be narrower than that of other human activities. In addition, cultivated land is a dynamic source-sink landscape, and rapid changes in land use/cover and human activities bring about changes in temperature and humidity constraints on atmospheric dispersion in the central and eastern regions [74], reinforcing the cropland source effect.
In addition, meteorological factors tend to interact with topography [13], and the negative effect of TEM on PM 2.5 is more pronounced in southwest China (Figure 3e), which may be due to atmospheric convection caused by higher temperatures here that accelerates the diffusion of particulate matter [20]. The positive high intensity of TEM makes TEM the dominant control condition in northeast China (Figures 3e and 4a), according to Li et al. [75]. The Shenyang study attributed the positive TEM effect to the effect of inversion temperature on PM 2.5 diffusion triggering accumulation. While PRE and WIND among meteorological factors are weak in intensity compared to other factors, the role of PRE in northwest of Xinjiang is not negligible (Figure 3d). Wang et al. [13] pointed out that strong precipitation has a significant scavenging effect by wet deposition on PM 2.5 concentrations and weak precipitation and rain and fog processes may trigger air stagnation conditions. Although the annual precipitation under arid climate conditions in Northwest China is low, the intensity of precipitation is mainly high intensity and moderate intensity [76], while the intensity and frequency of precipitation show an increasing trend [77], and the important scavenging role of strong precipitation on PM 2.5 in Northwest China can be speculated.
Spatial trends in GDP and NLI (Figure 3h,i) correlate with the spatial pattern of urbanization development in China. The results suggest that controlling pollution to improve atmospheric quality can be achieved in parallel with economic development, and the improvement of the integrated level of cities (population structure, economic structure, traffic layout, etc.) can mitigate the degree of PM 2.5 pollution. Earlier studies applied a semiparametric spatial autoregressive model to show that the Environmental Kuznets Curve (EKC) for Chinese cities has crossed the inflection point and again suggests that higher urbanization levels can alleviate haze pollution [46]. The effect of population density on PM 2.5 , on the other hand, needs special treatment because the positive effect of population impact contrasts with the population density distribution [78], suggesting that a particular increase in population may trigger a greater increase in PM 2.5 concentration in the more sparsely populated western areas, while the role of anthropogenic pollution sources in the east is weakened.  [79], confirming that natural conditions act to regulate PM 2.5 on a large spatial scale. However, at large scales, natural factors are not easily manipulated or managed and may not therefore be suitable targets for longer-term governance measures. In terms of socioeconomic factors in general, population tends to make a dominant contribution while, on the other hand, agricultural activities are important contributing activities to PM 2.5 in China. Moreover, the interaction between socioeconomic and natural factors has been shown to play an important and significant role in the PM 2.5 concentration change system [80]. It should therefore be possible to consider the identified factor pairs which exhibit strong interactions between natural factors and anthropogenic factors in managing aerosol pollution. Anthropogenic drivers are shown here to be gradually increasing in dominance over all other factors as demonstrated, for example, by the interaction detector model (Table A3), which clearly shows that the interaction between GDP, POP, PRE, and TEM can explain as much as 55.1% of PM 2.5 . Interactions between PFAL, TEM, and WIND can also exceed 50%, demonstrating nonlinearity of response. It is therefore suggested that the structure and spatial distribution of industry, population and distribution of farmland, for example, should be considered if the goal of regulating PM 2.5 pollution is to be achieved.
This study has demonstrated that most PM 2.5 influencing factors are scale dependent (Table 3) and that, as PM 2.5 pollution increases, the influence of regional transmission is enhanced, while the influence of local and surrounding emission sources is reduced [10]. In addition, several econometric spatial analyses factors that influence PM 2.5 in urban agglomerations and economic zones have emphasized the spatial spillover effect of PM 2.5 pollution [28,29], indicating that the key to the control of PM 2.5 pollution in China lies in joint governance between regions. Accordingly, the advantages of the spatial nonstationarity of the MGWR and the division of spatial act scales in this study can provide a reference for categorizing joint governance strategies level by level and provide more detailed management decisions for joint regional governance rather than a one-size-fits-all approach for all factors and all regions. For example, NDVI, PFAL, PUNL, TEM, PRE, and PCOL are selected as the key regulatory conditions of the treatment measures according to the main factor types and spatial distribution, and the strong action area of each factor is used as the center; then, the scope of treatment is determined according to its spatial bandwidth whether it is a joint treatment of cities in the province or a joint treatment of regions between provinces. Specifically, management needs to emphasize the limiting role of natural conditions and continue to promote vegetation restoration projects in the western parts of the country, while not neglecting the role of arable land contribution in small and medium-sized cities. In eastern China, the spatial layout of arable and construction land needs to be considered in relation to the distribution of industry and the source-sink system in the landscape so as to account for the interaction with natural conditions; at the same time, arable land and construction land should be managed separately and in a focused manner according to the spatial location of their intensity centers and their scale of action. In general, the study indicates that within a region, a reasonable policy should be applied to plan its control center and joint scope according to specific regional natural conditions, in the context of dominant human activities in a region such as industry, agriculture, and population etc., and to plan its important control and regulation factors by factor and sub-region to better achieve regional prevention and control of atmospheric pollution objectives.

Limitations and Future Work Directions
Without doubt, this study has some limitations. More detailed socioeconomic data including information on road traffic, industrial structure, and anthropogenic emissions are not readily available in a rasterized format and were excluded in this study. The mechanisms influencing PM 2.5 impact mechanism are complex, and the selected impact factors appear to be multi-correlated. While this study uses geographic detectors to analyze the interaction among factors, how the multi-correlation between variables affects the model should be considered in the future. Multiple validation of the linear and nonlinear relationships of influencing factors on PM 2.5 would be a fruitful research direction. In the future, as the understanding of the geospatial relationship between each factor and PM 2.5 increases, remote sensing inversion for PM 2.5 may provide corresponding information to improve the inversion model. Revealing important spatial relationships, such as vegetation factors, can be done by extracting the corresponding spectral information from satellite remote sensing in multiple directions in more detail to obtain special data products (vegetation structure, vegetation growth, crop growth) to optimize model variables.

Conclusions
This study was conducted to explore the interaction, spatial heterogeneity and multiscale interaction between natural and socioeconomic factors and atmospheric PM 2.5 concentrations in China using geographical detector and MGWR modelling. The results indicate that the PM 2.5 concentration has followed a sequence of rapid deterioration-deterioration mitigation-slight improvement during 2000 to 2015, and obvious spatial agglomeration characteristics are evident over the period. A clear spatial pattern of PM 2.5 expansion across the Hu Huanyong line from the southeast coast to the northwest interior is apparent. Generally, the influence of natural factors on PM 2.5 is stronger than socioeconomic factors; the effect of each influencing factor on PM 2.5 is dominated by the linear effect, and the interaction type of each factor is dominated by bivariate enhancement and nonlinear enhancement, exhibiting strong interactions between natural factors and anthropogenic factors. Simultaneously, the prominent influence of natural factors is shown to weaken over time, while the effects of human activities and social factors are increasing. The contribution of socioeconomic activities in driving the system must be taken into account while agricultural activities also make important contribution to PM 2.5 . The analysis also demonstrates the scale dependency of many of these processes and interactions over the period 2000-2015; the influence range of natural factors tends to be closer to the provincial scale at the small regional scale, while the influence range related to human activities is dominated by a large regional scale and forms a range fluctuation. There are significant differences in the way natural factors and socioeconomic factors affect PM 2.5 , and the non-stationarity of spatial relationships is stronger. Topographic, vegetation (NDVI), meteorological (TEM), natural source factors, and agricultural activity factors are the factors that need significant attention in Chinese regions. In addition, the study verified that higher urbanization levels can mitigate haze pollution, while the population in western regions needs to be treated with caution, and differential land management between arable land and construction land is recommended. Management decisions for atmospheric PM 2.5 concentrations should take into account both spatial heterogeneity and scale and should use reasonable policies to plan their control centers and joint scopes according to specific regional natural conditions and in combination with dominant human activities. The results here provide important information for the prevention and control of regional PM 2.5 pollution in China with a categorized joint management strategy level by level.
Author Contributions: Conceptualization, X.X. and T.W.; methodology, L.Z.; resources, J.Z. and G.J.; writing-original draft, L.Z. and T.W.; supervision, M.E.M. and C.W.; project administration, X.X. and L.P.; funding acquisition, T.W. and J.Z. All authors have read and agreed to the published version of the manuscript.

Data Availability Statement:
No new data were created or analyzed in this study. Data sharing is not applicable to this article.

Acknowledgments:
The authors would like to thank Atmospheric Composition Analysis Group [52] (http://fizz.phys.dal.ca/ accessed 22 September 2020), National Earth System Science Data Center, National Science & Technology Infrastructure of China [54] (http://www.geodata.cn/ accessed on 30 May 2021), the Resource and Environmental Science and Data Center of the Chinese Academy of Sciences [53] (http://www.resdc.cn/ accessed 22 September 2020) for providing data support. The national boundary data of China used in this study were provided by National Geomatics Center of China (http://www.ngcc.cn/ngcc accessed on 22 September 2020). The authors are grateful to the comments provided by the Editor and three anonymous reviewers, which were used to improve the quality of this work.

Conflicts of Interest:
The authors declare no conflict of interest.
Appendix A Table A1. The category and determination of the interaction between two factors.