Factors Underlying Spatiotemporal Variations in Atmospheric PM 2.5 Concentrations in Zhejiang Province, China

: Fine particulate matter in the lower atmosphere (PM 2.5 ) continues to be a major public health problem globally. Identifying the key contributors to PM 2.5 pollution is important in monitoring and managing atmospheric quality, for example, in controlling haze. Previous research has been aimed at quantifying the relationship between PM 2.5 values and their underlying factors, but the spatial and temporal dynamics of these factors are not well understood. Based on random forest and Shapley additive explanation (SHAP) algorithms, this study analyses the spatiotemporal variations in selected key factors inﬂuencing PM 2.5 in Zhejiang Province, China, for the period 2000–2019. The results indicate that, while factors inﬂuencing PM 2.5 varied signiﬁcantly during the period studied, SHAP values suggest that there is consistency in their relative importance as follows: meteorological factors (e.g., atmospheric pressure) > socioeconomic factors (e.g., gross domestic product, GDP) > topography and land cover factors (e.g., elevation). The contribution of GDP and transportation factors initially increased but has declined in the recent past, indicating that economic and infrastructural development does not necessarily result in increased PM 2.5 concentrations. Vegetation productivity, as indicated by changes in NDVI, is demonstrated to have become more important in improving air quality, and the area of the province over which it constrains PM 2.5 concentrations has increased between 2000 and 2019. Mapping of SHAP values suggests that, although the relative importance of industrial emissions has declined during the period studied, the actual area positively impacted by such emissions has actually increased. Despite developments in government policy, greater efforts to conserve energy and reduce emissions are still needed. The study further demonstrates that the combination of random forest and SHAP methods provides a valuable means to identify regional differences in key factors affecting atmospheric PM 2.5 values and offers a reliable reference for pollution control strategies.


Introduction
Urbanization, industrialization, and other human activities, coupled with climate change, have greatly affected air quality [1][2][3][4][5]. Fine particulate matter with an aerodynamic diameter of less than 2.5 µm, known as PM 2.5 , may be suspended in the atmosphere for a considerable period and can be transported over long distances [6]. In addition to their direct effect on the respiratory tract, these aerosols may absorb toxic substances, which further intensify their negative impact on human health [7][8][9]. Persistent PM 2.5 pollution

Study Region
Zhejiang Province (27 • 12 N-31 • 31 N, 118 • 01 E-123 • 10 E), lies to the south of the Yangtze River Delta in southeastern China ( Figure 1) and forms an important part of the large urban agglomeration in China [36,37]. The province is characterized by a subtropical monsoon climate with mean annual temperature between 15 • C and 18 • C, and mean annual precipitation between 1000 and 2000 mm [38]. Straddling the Jinqu basin in the central part of the province are the more mountainous areas to the southwest and the coastal plains which characterize the north [39]. Although the province only occupies 105,500 km 2 and accounts for just 1.1% of the total area of the country, its permanent population reached 58.5 million in 2019 with an annual gross regional production value Remote Sens. 2021, 13, 3011 3 of 20 of more than 6 billion CNY (Zhejiang Statistical Yearbook 2020). As a result of rapid urbanization and industrialization, with economic growth based on energy derived largely from fossil fuel, the region is associated with significant PM 2.5 -related problems [40]. The entire province has experienced persistent air pollution since the 1970s, and there are frequent occurrences of severe smog, for example the extreme event of December 2013 that lasted several weeks [41].
Remote Sens. 2021, 13, x FOR PEER REVIEW 3 of 21 nual precipitation between 1000 and 2000 mm [38]. Straddling the Jinqu basin in the central part of the province are the more mountainous areas to the southwest and the coastal plains which characterize the north [39]. Although the province only occupies 105,500 km 2 and accounts for just 1.1% of the total area of the country, its permanent population reached 58.5 million in 2019 with an annual gross regional production value of more than 6 billion CNY (Zhejiang Statistical Yearbook 2020). As a result of rapid urbanization and industrialization, with economic growth based on energy derived largely from fossil fuel, the region is associated with significant PM2.5-related problems [40]. The entire province has experienced persistent air pollution since the 1970s, and there are frequent occurrences of severe smog, for example the extreme event of December 2013 that lasted several weeks [41].

Data Sources
Five types of data were used, including atmospheric PM2.5 data, meteorological data, topography and land cover data (including a vegetation index), socioeconomic data, and administrative boundaries (Table 1).
(1) PM2.5 data retrieved from remote sensing imageries. Considering the fact that the monitoring stations in China were established in 2013, satellite-based aerosol optical depth (AOD) data with abundant historic archives were adopted. Researchers have found a high correlation between AOD and observed PM2.5 measurements, in addition to its advantages of low cost, wide spatial coverage, and high simulation accuracy [42]. The satellite-derived PM2.5 concentrations (µ g/m 3 ) dataset for China is freely available from the Hong Kong University of Science and Technology at: http://envf.ust.hk/dataview/aod2pm/current/ (accessed on 20 April 2021) and has been regularly utilized in studies of air pollution [43]. The PM2.5 data at a spatial resolution of 0.03° × 0.03° were obtained in this study for the period 2000 to 2019 (Due to the lack of PM2.5 data in parts of eastern

Data Sources
Five types of data were used, including atmospheric PM 2.5 data, meteorological data, topography and land cover data (including a vegetation index), socioeconomic data, and administrative boundaries (Table 1).
(1) PM 2.5 data retrieved from remote sensing imageries. Considering the fact that the monitoring stations in China were established in 2013, satellite-based aerosol optical depth (AOD) data with abundant historic archives were adopted. Researchers have found a high correlation between AOD and observed PM 2.5 measurements, in addition to its advantages of low cost, wide spatial coverage, and high simulation accuracy [42]. The satellite-derived PM 2.5 concentrations (µg/m 3 ) dataset for China is freely available from the Hong Kong University of Science and Technology at: http://envf.ust.hk/dataview/aod2pm/current/ (accessed on 20 April 2021) and has been regularly utilized in studies of air pollution [43]. The PM 2.5 data at a spatial resolution of 0.03 • × 0.03 • were obtained in this study for the period 2000 to 2019 (Due to the lack of PM 2.5 data in parts of eastern Zhejiang province, Zhoushan city is omitted from the study), and the PM 2.5 concentration below refers to the annual average value. All data values were normalized to the same coordinates and spatial resolution of 3 km × 3 km using ArcGIS 10.

Random Forest
Random forest is a machine learning algorithm that uses a combination of several random decision trees, where each tree is generated in a specific way to induce diversity, and all predictions are formed by voting [45]. The bootstrap aggregation technique, known as bagging, is used to achieve higher accuracy and reduce overfitting [46]. In regression, the algorithm takes the average of the individual predictions as the predicted outcome. The random forest method has been shown to be efficient and to provide high levels of predictive accuracy in situations where potential variables have complex relations [47], such as is the case for factors influencing PM 2.5 concentrations. In this study, PM 2.5 concentration is treated as the response variable, and the explanatory variables are influencing factors.
Parameter selection and adjustment are necessary for random forest to prevent overfitting and minimize complexity. In this study, 10-fold cross-validation was employed to find the most suitable parameters, including the number of trees (n_estimators), the maximum depth of each tree (max_depth), the minimum number of samples to split a node (min_sample_split), and the maximum number of features for the best split (max_features). The grid search mode was used during the process to evaluate model performance [48]. The datasets were randomly divided into 10 subsets; 9 were used as training data, and the other as validation data. This process was repeated 10 times across the dataset, and error rates averaged to obtain the results. Three metrics, viz., mean absolute error (MAE), root mean squared error (RMSE), and the coefficient of determination (R 2 ), were used in this study to evaluate model performance [49]. All models run were conducted in Python with the scikit-learn package.

Shapley Additive Explanation (SHAP)
Machine learning algorithms may yield powerful predictive models, but understanding why a model makes a particular prediction is as important as its accuracy, and the black-box nature of random forest needs to be resolved [44]. To understand the physical mechanisms, Lundberg and Lee [50] proposed Shapley additive explanations (SHAP) based on game theory. SHAP specifies the explanation as follows, in which the explanatory model g is a linear function of the feature attribution: where z' ∈ {0, 1} M , and M represents the number of simplified input features. φ i ∈ R indicates the feature attribution for feature i, the SHAP values. Intuitively, the model g can be used to interpret both a single prediction and the entire model based on the average feature attribution across all the observations, and therefore, the method resolves the relative importance of the various features under consideration. Before calculating SHAP values, it is necessary to fit one model f S∪{i} involving factor i, and another one, f s , without it. The difference in input x between these models is the marginal contribution of factor i. When more than one factor is being considered, contributions depend on interactions with the other factors such that the procedure is repeated for the complete set (i.e., all possible subsets S ⊆ F including the empty set and the set F of all factors). The final factor contribution φ i ∈ R is the weighted average of all marginal contributions [51]: The final model prediction is then obtained by summing up SHAP values for each input feature. SHAP values can be negative since every single SHAP value of each point is calculated relative to the average value. A positive SHAP value means that the prediction (PM 2.5 ) based on the corresponding influencing factor is above the mean value (the average PM 2.5 ). The relative importance of each variable is represented by their mean absolute SHAP values [52]. Advantages of the SHAP algorithm include: (1) global interpretabilitythe collective SHAP value can identify positive or negative relationships for each variable, and the global importance of different features can be calculated by computing their respective absolute SHAP values; (2) local interpretability-each feature acquires its own corresponding spatial SHAP value in different locations; this resolves the limitation of traditional methods of evaluating relative importance whereby results are obtained across the entire region or population but not for each pixel and individual [53]. Therefore, SHAP values can be used to represent the relative importance of potential influencing factors. Here, the mean absolute SHAP value (global SHAP value) is calculated to describe the importance of each factor for significance comparison, and local SHAP values in different positions are used to demonstrate their spatial contribution variation.
Traditional Shapley regression is time-consuming since a large number of possible feature combinations have to be included. However, faster computation with a high level of accuracy is possible, as in this study, using the SHAP framework with tree-based model. All SHAP values were computed using the "shap" package in Python 3.7. basin, and Taizhou City along the southeast coast. By 2005, the highly polluted area around Hangzhou city had extended outwards to its suburbs, while severe pollution in the western basin and southeast coast had also expanded significantly. However, the trend of increase in the areas exposed to high levels of pollution reversed after 2010 such that, by 2019, there were no parts of the province where mean annual PM2.5 values exceeded 50 μg/m 3 . According to the ambient air quality standards of China (GB3095-2012), mean annual PM2.5 levels greater than 35 μg/m 3 represent a health hazard, so it is important to consider the spatiotemporal dynamics of areas where this value is exceeded. Until 2015, areas with values above this threshold were located mainly in the northern parts of the province with the exception of the eastern coastal plain and Tiantai and Siming Mountains in the east, and Tianmu Mountain in the northwest. By 2019, other than in the more industrialized eastern coastal plain areas of Wenzhou and Taizhou, and the Hangjiahu Plain and Jinqu Basin, the province in general experienced mean annual PM2.5 concentrations below 35 μg/m 3 .   In 2000, the areas with very high pollution levels (i.e., mean annual PM 2.5 > 50 µg/m 3 ) were clustered around Hangzhou, situated in the low-lying plains to the north, Quzhou City and Jinhua City in the western basin, and Taizhou City along the southeast coast. By 2005, the highly polluted area around Hangzhou city had extended outwards to its suburbs, while severe pollution in the western basin and southeast coast had also expanded significantly. However, the trend of increase in the areas exposed to high levels of pollution reversed after 2010 such that, by 2019, there were no parts of the province where mean annual PM 2.5 values exceeded 50 µg/m 3 . According to the ambient air quality standards of China (GB3095-2012), mean annual PM 2.5 levels greater than 35 µg/m 3 represent a health hazard, so it is important to consider the spatiotemporal dynamics of areas where this value is exceeded. Until 2015, areas with values above this threshold were located mainly in the northern parts of the province with the exception of the eastern coastal plain and Tiantai and Siming Mountains in the east, and Tianmu Mountain in the northwest. By 2019, other than in the more industrialized eastern coastal plain areas of Wenzhou and Taizhou, and the Hangjiahu Plain and Jinqu Basin, the province in general experienced mean annual PM 2.5 concentrations below 35 µg/m 3 .
In order to avoid issues of bias and over-fitting in the application of the random forest method, 10-fold cross-validation of the optimal model was repeated 20 times to obtain the final result ( Table 2). The coefficient of determination (R 2 ) refers to the strength of the linear relationship between variables. The nearer that R 2 is to 1, the more the independent variable can account for variations in the dependent variable. The R 2 values for the five selected years exceed 0.96, indicating that the input factors are strongly correlated with Remote Sens. 2021, 13, 3011 8 of 20 the dependent variable (PM 2.5 ). Root mean square error (RMSE) is the square root of the mean square error between the fitted data and corresponding sampling points of the original data, and smaller values indicate more satisfactory predictions. Model RMSE values lie below 1.65 µg/m 3 , showing that the error between the predicted PM 2.5 and the original value is acceptable. Mean absolute error (MAE) is used to evaluate the accuracy of predictions when comparing predicted data with actual data sets, whereby smaller values signify better performance. In this case, MAE values fall within a range, indicating that the predicted PM 2.5 values conform with the established dataset. The three cross-validation parameters suggest that the random forest algorithm exhibits a high degree of accuracy and therefore lays a sound foundation for interpreting the relative importance of the model's independent variables. In order to avoid issues of bias and over-fitting in the application of the random forest method, 10-fold cross-validation of the optimal model was repeated 20 times to obtain the final result ( Table 2). The coefficient of determination (R 2 ) refers to the strength of the linear relationship between variables. The nearer that R 2 is to 1, the more the independent variable can account for variations in the dependent variable. The R 2 values for the five selected years exceed 0.96, indicating that the input factors are strongly correlated with the dependent variable (PM2.5). Root mean square error (RMSE) is the square root of the mean square error between the fitted data and corresponding sampling points of the original data, and smaller values indicate more satisfactory predictions. Model RMSE values lie below 1.65 μg/m 3 , showing that the error between the predicted PM2.5 and the original value is acceptable. Mean absolute error (MAE) is used to evaluate the accuracy of predictions when comparing predicted data with actual data sets, whereby smaller values signify better performance. In this case, MAE values fall within a range, indicating that the predicted PM2.5 values conform with the established dataset. The three cross-validation parameters suggest that the random forest algorithm exhibits a high degree of accuracy and therefore lays a sound foundation for interpreting the relative importance of the model's independent variables.    categories all showed an initial increase in importance, but this reduced in the last 10 years. The influence of population density, on the other hand, exhibited an opposing trend, at first declining and then increasing towards 2019. Overall, socioeconomic factors exerted lower levels of influence than meteorological parameters, although PD, GDP, PV, and IGE seem to be of somewhat greater importance. Topography and land cover factors, DEM, NDVI, and WP, all had some degree of impact on PM 2.5 concentrations, but other land cover classes seem to have exerted relatively little effect. SHAP values for DEM initially declined and then increased again towards 2019, while values for NDVI and WP continued to increase gradually. The SHAP values were normalized to compare the contribution of three categories. In summary, Table 3 indicates the relative importance ranking of selected factors as: meteorological factors (0.60) > socioeconomic factors (0.30) > topography and land cover factors (0.10). The contribution of socioeconomic factors initially increased but has declined in the recent past, while meteorological factors exhibit the opposite trend; topography and land cover factor appear to exhibit gradually increasing influence over the period studied. Figure 4 shows mean absolute SHAP values of all indicators for the five selected years, whereby higher SHAP values represent a greater level of influence on the dependent variable, and Figure 5 presents the heatmap of SHAP values, indicating their temporal trends. Meteorological factors, PRS and SSD, have exhibited the strongest level of influence on PM2.5 concentrations in general over the period. We classified the socioeconomic factors into five categories: viz., industrial production (GDP), emission index (IGE), transportation factors (PV), energy consumption (ELC), and population (PD). SHAP values of the first four categories all showed an initial increase in importance, but this reduced in the last 10 years. The influence of population density, on the other hand, exhibited an opposing trend, at first declining and then increasing towards 2019. Overall, socioeconomic factors exerted lower levels of influence than meteorological parameters, although PD, GDP, PV, and IGE seem to be of somewhat greater importance. Topography and land cover factors, DEM, NDVI, and WP, all had some degree of impact on PM2.5 concentrations, but other land cover classes seem to have exerted relatively little effect. SHAP values for DEM initially declined and then increased again towards 2019, while values for NDVI and WP continued to increase gradually. The SHAP values were normalized to compare the contribution of three categories. In summary, Table 3 indicates the relative importance ranking of selected factors as: meteorological factors (0.60) > socioeconomic factors (0.30) > topography and land cover factors (0.10). The contribution of socioeconomic factors initially increased but has declined in the recent past, while meteorological factors exhibit the opposite trend; topography and land cover factor appear to exhibit gradually increasing influence over the period studied.  The SHAP algorithm differs from the way in which importance is estimated in the random forest method, as it not only uses absolute values but also computes density scatter plots to further analyze the direction and degree of influence of potential factors on PM 2.5 concentrations. This is illustrated in Figure 6, whereby the x-axis designates SHAP values (positive or negative) for the various factors, while the y-axis indicates the level of contribution of input variables collected from selected locations, and the color denotes the feature value from low (blue) to high (red). Since SHAP values contain all the samples' contributions in different locations, all influencing factors have a range of both positive and negative effects. The method, therefore, also illustrates the strength of the linear relationship between the selected influencing factors and PM 2.5 . For example, for GDP, whose importance ranked fourth in 2005, nearly half of the locations with high true values have a positive relation with PM 2.5 , while those with low values showed a negative impact.
Partial dependence plots of the variables reveal their marginal effect on the predicted outcome of random forest outcomes. In this study, the plots reveal the direction and strength of influence of the various factors on PM 2.5 , as shown in Figure 7 and Supplementary Figures S1-S3. Across all five years, there is generally a negative correlation between PM 2.5 concentrations and topography and land cover factors, including DEM and NDVI ( Figure S2a,b), the significance of which decreases as the factor value increases, while the opposite is the case for the socioeconomic variables ( Figure S3). There are, however, more complex patterns of correlation evidence in the meteorological variables. For example, PM 2.5 is positively correlated with SSD when values are relatively small, but this shifts to a negative correlation at higher values ( Figure S1b).   The SHAP algorithm differs from the way in which importance is estimated in the random forest method, as it not only uses absolute values but also computes density scatter plots to further analyze the direction and degree of influence of potential factors on PM2.5 concentrations. This is illustrated in Figure 6, whereby the x-axis designates SHAP values (positive or negative) for the various factors, while the y-axis indicates the level of contribution of input variables collected from selected locations, and the color denotes the feature value from low (blue) to high (red). Since SHAP values contain all the samples' contributions in different locations, all influencing factors have a range of both positive and negative effects. The method, therefore, also illustrates the strength of the linear relationship between the selected influencing factors and PM2.5. For example, for GDP, whose importance ranked fourth in 2005, nearly half of the locations with high true values have  Note: to compare the relative importance of meteorological factors (A), topography factors (B), and socioeconomic factors (C), the mean absolute SHAP value of each category is calculated, and the importance ratio is defined as A/(A + B + C), B/(A + B + C), and C/(A + B + C), respectively. "Mean importance ratio" is the average value of importance ratio for each category over multiple periods.

Spatial Variation of Influencing Factors Based on SHAP Values
Spatial variations in the SHAP values of influencing factors can be mapped in order to understand the relative contribution of these variables on PM 2.5 in different regions of Zhejiang Province as described below for groups of factors associated with meteorological conditions, land surface conditions, and socioeconomic variables for the five sample years. Figure 8a illustrates the spatial distribution of SHAP values for PRS and suggests that aerosol pollution was more positively influenced by this factor in the plains and basins, especially around Hangzhou, Taizhou, and Wenzhou, while values in the mountainous areas were less strongly positive during the study period. Areas with higher atmospheric pressure, such as in the Hangjiahu and coastal plains, appear to be associated with elevated PM 2.5 concentrations. Under such conditions, the regional transmission of aerosols is likely to be blocked, resulting in locally elevated values of PM 2.5 [4,54]. Meanwhile, lower atmospheric pressures, as occur in upland areas of the province, appear to constrain PM 2.5 concentrations [17]. Mountainous areas in the northwest, southwest, and central parts of Zhejiang province, which are windier in general, are characterized by reduced PM 2.5 concentrations [18,55]. Partial dependence plots of the variables reveal their marginal effect on the predicted outcome of random forest outcomes. In this study, the plots reveal the direction and strength of influence of the various factors on PM2.5, as shown in Figure 7 and Supplementary Figures S1-S3. Across all five years, there is generally a negative correlation between PM2.5 concentrations and topography and land cover factors, including DEM and NDVI ( Figure S2a,b), the significance of which decreases as the factor value increases, while the opposite is the case for the socioeconomic variables ( Figure S3). There are, however, more complex patterns of correlation evidence in the meteorological variables. For example, PM2.5 is positively correlated with SSD when values are relatively small, but this shifts to a negative correlation at higher values ( Figure S1b). Partial dependence plots of the variables reveal their marginal effect on the predicted outcome of random forest outcomes. In this study, the plots reveal the direction and strength of influence of the various factors on PM2.5, as shown in Figure 7 and Supplementary Figures S1-S3. Across all five years, there is generally a negative correlation between PM2.5 concentrations and topography and land cover factors, including DEM and NDVI ( Figure S2a,b), the significance of which decreases as the factor value increases, while the opposite is the case for the socioeconomic variables ( Figure S3). There are, however, more complex patterns of correlation evidence in the meteorological variables. For example, PM2.5 is positively correlated with SSD when values are relatively small, but this shifts to a negative correlation at higher values ( Figure S1b).

Spatial Variation of Influencing Factors Based on SHAP Values
Spatial variations in the SHAP values of influencing factors can be mapped in order to understand the relative contribution of these variables on PM2.5 in different regions of Zhejiang Province as described below for groups of factors associated with meteorological conditions, land surface conditions, and socioeconomic variables for the five sample years. with high temperatures and long sunshine duration, promote photochemical reactions and produce more precursors of PM2.5 and other secondary pollutants, thereby resulting in elevated PM2.5 concentrations [17]. The northern plains have the longest sunshine duration but lower surface temperatures, resulting in reduced accumulation of PM2.5, and therefore exhibit a weaker positive impact on PM2.5 [15]. This explains the observation, referred to in the previous sections, whereby SSD is positively correlated with PM2.5 at lower values, but this reverses at higher SSD values ( Figure S1b).  The effect (positive/negative) of SSD on PM 2.5 , which emerges in this study as one of the most influential factors, also varies between regions. For southwest mountainous areas, lower temperatures weaken the photochemical reaction and impede the formation of PM 2.5 [56][57][58]. Central and western regions of the province, as well as the urban area around Hangzhou, exhibit the highest level of positive influence on aerosol concentrations, but the influence in the southwest mountainous areas is much lower (Figure 8b). More industrialized centers, including Hangzhou, Jinhua, and Quzhou, also associated with high temperatures and long sunshine duration, promote photochemical reactions and produce more precursors of PM 2.5 and other secondary pollutants, thereby resulting in elevated PM 2.5 concentrations [17]. The northern plains have the longest sunshine duration but lower surface temperatures, resulting in reduced accumulation of PM 2.5 , and therefore exhibit a weaker positive impact on PM 2.5 [15]. This explains the observation, referred to in the previous sections, whereby SSD is positively correlated with PM 2.5 at lower values, but this reverses at higher SSD values ( Figure S1b).  Figure S2a). Higher altitudes, especially in the southwestern part of the province, appear to suppress aerosol concentrations as higher wind speeds disperse PM 2.5 [19]. On the other hand, low-lying areas, such as the northern and coastal plains and central basins, have a highly positive influence on PM 2.5 ( Figure 9a).

Topography and Land Cover Factors
As shown in Figure S2b, NDVI has a significantly negative impact on PM 2.5 . Areas exhibiting the most strongly positive influence of NDVI are located in the major cities, while suburban and mountainous areas are associated with a negative effect (Figure 9b). In recent years, in regions with relatively high populations, including Hangzhou, Ningbo, and Wenzhou, the proportions of construction land and farmland have increased, thereby suppressing the typically negative effect of vegetation on PM 2.5 . In the hilly areas of Lishui and western Hangzhou, higher vegetation coverage has promoted the absorption and deposition of PM 2.5 , explaining the reduction in PM 2.5 [30,59]. From 2005 onwards, the Zhejiang Province Government introduced woodland conservation measures [60], and this is evident in the area of the negative impact of NDVI expanding over time, as shown in Figure 9b. Consequently, the impact intensity of NDVI on PM 2.5 was increasing as the negatively impacted area was significantly expanded.  Figure S2a). Higher altitudes, especially in the southwestern part of the province, appear to suppress aerosol concentrations as higher wind speeds disperse PM2.5 [19]. On the other hand, low-lying areas, such as the northern and coastal plains and central basins, have a highly positive influence on PM2.5 (Figure 9a). As shown in Figure S2b, NDVI has a significantly negative impact on PM2.5. Areas exhibiting the most strongly positive influence of NDVI are located in the major cities, while suburban and mountainous areas are associated with a negative effect (Figure 9b). In recent years, in regions with relatively high populations, including Hangzhou, Ningbo, and Wenzhou, the proportions of construction land and farmland have increased, thereby suppressing the typically negative effect of vegetation on PM2.5. In the hilly areas of Lishui and western Hangzhou, higher vegetation coverage has promoted the absorption and deposition of PM2.5, explaining the reduction in PM2.5 [30,59]. From 2005 onwards, the Zhejiang Province Government introduced woodland conservation measures [60], and this is evident in the area of the negative impact of NDVI expanding over time, as shown in Figure 9b. Consequently, the impact intensity of NDVI on PM2.5 was increasing as the negatively impacted area was significantly expanded.

Socioeconomic Factors
The concentration of PM2.5 in Zhejiang Province continued to decline from 2014 to 2018, which can be attributed to the introduction and implementation of a series of government policies, including the 2012 Zhejiang Province Air Pollution Prevention and Control Implementation Plan, the 2013 Zhejiang Province Air Pollution Prevention and Control Action Plan (2013-2017), and the 2014 Zhejiang Air Pollution Prevention and Control Action Plan Key Work Department Division Plan. Such policy measures appear to have had a marked impact on the relative influence of socioeconomic factors on air pollution, as suggested in this study and highlighted in the following analysis. Generally, GDP and PM 2.5 are positively correlated ( Figure S3a), along with regions with higher GDP, especially in the cities (Figure 10a). The past few decades have been characterized by urbanizations and rapid economic development accompanied by increased energy consumption and exhaust emissions, therefore elevating PM 2.5 concentrations [20,61]. In the less economically developed areas of the southwestern uplands, lower GDP values are associated with correspondingly low PM 2.5 values. In 2000, before the economic boom, there were areas of very high positive impact (hotspots) in Hangzhou and Wenzhou, which expanded to all coastal cities as investment and consumption increased and led to higher levels of atmospheric pollution. During this time, the influence of GDP was continuously strengthening. However, from 2010 to 2019, even as GDP continued to increase, the relative importance of this factor was reduced, coinciding with the introduction of the air pollution control policies referred to above, together with the active adjustment of industrial structure in the province [62].
Population (PD) is, as expected, positively correlated with PM 2.5 ( Figure S3b). As shown in Figure 10b, the highly industrialized cities of Hangzhou and Wenzhou, in particular, were subject to rapid industrial agglomeration and increased production activities, thereby exhibiting increased pollutant emissions and energy consumption [63,64]. Dense housing and increased traffic aggravate PM 2.5 pollution, further highlighting the positive effects of population on PM 2.5 in cities [65]. However, in the more sparsely populated areas of the southwest mountains, PM 2.5 values are correspondingly lower. The relatively stable SHAP values for PD suggest that it plays an important role in affecting the PM 2.5 concentrations in Zhejiang Province, but as the structure of urban areas has been rationalized by more environmentally sensitive planning [66], the area of positive impact area has gradually reduced in 2019 (Figure 10b).
Changes in the characteristics of transportation associated with economic development have been substantial, for example, higher passenger volumes (PV), freight volumes, car ownership, and highway mileage, all of which increase fossil fuel consumption and road traffic emissions [30]. A positive correlation is evident between PV and PM 2.5 ( Figure S3c). Motor vehicle exhaust is one of the principal sources of PM 2.5 , discharging primary and secondary fine particles [27], exhibiting a positive impact on PM 2.5 . The area of Zhejiang with the strongest degree of influence of PV on pollution was, in 2010, situated mainly in the urban areas of Hangzhou, Jinqu Basin, and around the coast (Figure 10c). However, the degree of influence declined somewhat in 2019, possibly in response to the introduction of environmental policies. The provincial government has actively promoted lower carbon transport technologies, giving priority to public transport investment, constructing urban light rail systems, and favoring renewable energy vehicles [67]. For example, by 2016, more than 90% of the public buses in Hangzhou were powered by renewable energy. Compensated by the focus on sustainable urban transport, the increase in road networks and vehicles has not further augmented PM 2.5 concentrations, and the SHAP values suggest that, in 2019, PV had little impact on PM 2.5 ( Figure 5).
The emission index, as anticipated, exhibits a positive effect on PM 2.5 concentrations ( Figure S3d), principally because fossil fuel combustion, construction dust, and secondary pollution contribute so much to PM 2.5 [22]. Major industrial waste gas emissions (IGE) in the province include sulfur dioxide, nitrogen oxides, carbon dioxide, fluoride, soot, and productive dust. All these substances produced nitrates, sulfates, and secondary organic aerosols through atmospheric photochemical reactions, generating PM 2.5 [68]. As Figure 10d shows, in 2005, IGE had the highest positive influence around the industrial cities of Quzhou, Hangzhou, and Taizhou. The introduction of air pollution control legislation from 2004, coupled with strengthened environmental supervision of air pollutant emitters, has promoted cleaner production and led to the elimination of outdated production systems and reduced the emission of volatile organic compounds from thermal power, steel, building materials, and other production processes. Meanwhile, the government has legislated for desulfurization and denitration in key industrial enterprises to prevent and control smoke and dust pollution [69]. In 2019, albeit with low SHAP values, IGE exhibits a positive impact on PM 2.5 in most areas, indicating that the impact of the emission index had reduced, but energy conservation and emission reduction were still necessary. This evidence suggests that there is much potential in Zhejiang Province for further emission reduction and technological innovation as major targets for improving energy efficiency.
In terms of energy consumption, higher ELC is associated with increased PM 2.5 concentrations ( Figure S3e), especially when sourced from fossil fuels. Coal, which is a major source of pollutant precursors when combusted, is still the principal source of generating electricity, where the coal input for thermal power generation accounts for more than 70% of total power generation input in Zhejiang Province (China Energy Statistical Yearbook 2020). Figure 10e shows that it is the main urban centers that are most positively influenced by ELC, but that this area has been reduced since 2010 as a result of the introduction of renewable energy technologies, including photovoltaics for solar power, hydropower, and both wind and tidal power generators. This trend has been particularly noticeable since 2016 as the provincial authorities vigorously promoted the application of solar power and encouraged the development of centralized composite photovoltaic energy systems [67]. Remote Sens. 2021, 13, x FOR PEER REVIEW 15 of 21 The emission index, as anticipated, exhibits a positive effect on PM2.5 concentrations ( Figure S3d), principally because fossil fuel combustion, construction dust, and secondary pollution contribute so much to PM2.5 [22]. Major industrial waste gas emissions (IGE) in the province include sulfur dioxide, nitrogen oxides, carbon dioxide, fluoride, soot, and productive dust. All these substances produced nitrates, sulfates, and secondary organic aerosols through atmospheric photochemical reactions, generating PM2.5 [68]. As Figure  10d shows, in 2005, IGE had the highest positive influence around the industrial cities of Quzhou, Hangzhou, and Taizhou. The introduction of air pollution control legislation

Discussion
Current popular methods of analyzing the relative importance of factors driving dependent variables, such as ordinary least squares [70] and geographic detectors [71,72], only acquire statistical values describing their relative importance. Although the output from the random forest algorithm reveals the relative significance of various drivers, the method is not suited to displaying the degree of influence spatially [32]. In this study, we employed a combination of random forest and Shapley additive explanations to reveal the spatiotemporal dynamics of factors' driving patterns of PM 2.5 pollution in Zhejiang Province, emphasizing the direction, strength, and spatial heterogeneity of underlying factors.
Given that Zhejiang Province, as with much of the rest of China, has experienced rapid economic growth over the last 2 or 3 decades, a detailed understanding of the role of economic factors in relation to atmospheric pollution is especially useful. The global interpretability feature of SHAP explanations reveals that growth of GDP initially led to sharp increases in atmospheric pollution but that the influence of increasing GDP declined more recently, a trend that parallels the inverted U-shaped environmental Kuznets curve [73]. These results are consistent with studies of PM 2.5 in other parts of China [74], and indeed the course of atmospheric pollution in general. Shapley explanations further reveal that, while the intensity of influence of GDP, PD, and IGE may have weakened in urban areas in the recent past, such socioeconomic factors still contribute to atmospheric pollution. The local interpretability feature of the SHAP method reveals the complexity of spatiotemporal trends and highlights that even when the total degree of importance of a factor declines, the spatial coverage of its influence may not change. For example, across the Zhejiang province as a whole, IGE has declined in importance, but it remains significant in and around the major urban centers. In short, the level of detail regarding factors that determine the magnitude and spatial distribution of aerosols is of great potential value in the development of policy and strategy for pollution control and regulation.
Meteorological factors are shown to have the greatest significance in respect to PM 2.5 among all the factors studied but are, of course, largely beyond any possibility of direct control. However, detailed monitoring of these factors may be helpful in developing early warnings of pollution events [75]. It is, on the other hand, possible to manage at least some of the land cover factors. For example, the results suggest that an emphasis on conserving (or even enlarging) the area of natural vegetation cover is a means of reducing PM 2.5 pollution [76]. Moreover, adjusting land-use configuration [77], especially in urban areas such as Hangzhou, Shaoxing, and Ningbo, and increasing the proportion of urban green space [78] are known to have moderating effects on atmospheric pollution. Specific measures such as vertical planting, urban wind tunnel design [61], and rational spatial planning of green belts [79] should be encouraged.
Among the factors affecting PM 2.5 , the importance of meteorological factors is remarkable. A previous study mentioned that the intensity of direct solar radiation reaching the earth's surface under clear-sky conditions is attenuated by gases and aerosols [80]. As SSD is defined as the daily sunshine duration during which the direct solar irradiance is greater than the threshold value of 120 Wm −2 , it can effectively reflect the change of AOD (PM 2.5 ). However, it is still difficult to clearly explain the internal mechanism for all the meteorological factors based on spatial SHAP values. In the future, we will consider combining the Weather Research and Forecast (WRF) model to quantitatively explain the specific reasons for their important change. Furthermore, meteorological conditions are constantly changing even within a day; the adopted PM 2.5 data and recalculated meteorological data here are annual, which may bring great uncertainty and complexity to the mechanism analysis. Therefore, higher resolution of the original dataset both in time and space could be collected for more accurate consequences. In addition, the interaction analysis of influencing factors should be put forward in future research.

Conclusions
Based on multi-source data, the influence of meteorology, topography, land cover, and socioeconomic factors on PM 2.5 was investigated; random forest and SHAP methods were combined to reveal the importance and spatiotemporal variation of key factors affecting PM 2.5 in Zhejiang Province from 2000 to 2019. The main conclusions are as follows: (1) Three categories of factors exhibit different variation characteristics: the contribution of meteorological factors initially increases, but it has declined in the recent past. Changes in the importance of anthropogenic impacts such as GDP and PV (Passen-ger Volume) are opposite to that of meteorological factors, while the importance of topography and land cover factors continues to rise. However, the selected factors are generally ranked in terms of importance as follows: meteorological factors > social-economic factors > topography and land cover factors. (2) The spatial visualization of the relative importance of influencing factors in five years reveals that the details of their spatial change should be appreciated. Although the SHAP value representing relative importance was declined, the impacted coverage was not diminished. For example, among the socioeconomic drivers, even though the importance of PV and IGE has declined, they are still positively correlated with PM 2.5 across the whole province and remain important sources of atmospheric pollution, prompting the need for further control measures. (3) The SHAP method is helpful to spatiotemporal visualization of influencing factors on PM 2.5 , especially for its outstanding local interpretability. For instance, the spatial distribution of the contribution of NDVI demonstrates that its negative impacted coverage is increased over the mountain areas, whereas the highly positive contribution in cities is also enlarged. This indicates that when ecological management concerned vegetation is implemented for PM 2.5 regulation, more attention should be paid to the corresponding urban areas. Therefore, maps of SHAP values could be considered for putting forward practical advice.
Supplementary Materials: The following are available online at https://www.mdpi.com/article/10 .3390/rs13153011/s1, Figure S1: Dependence plot of SHAP values of meteorological factors in five periods. Figure S2: Dependence plot of SHAP values of topography and land cover factors in five periods. Figure  Funding: This research was funded by the Natural Science Foundation of Zhejiang Province (NO. LQ19D010007).

Data Availability Statement:
The data presented in this study are available on request from the corresponding author.