Spatial Contribution of Environmental Factors to Soil Aggregate Stability in a Small Catchment of the Loess Plateau, China

: Soil aggregate stability and erodibility are the inﬂuential factors governing soil resistance to water erosion. The interactions among aggregate stability, erodibility, and their inﬂuencing factors have not been fully explored. We collected soil samples from 0–10 cm and 10–20 cm layers in the Zhifanggou watershed. Then, the major contributors to aggregate stability and erodibility and how soil properties, environmental factors and land use contributed to them were explored by using partial least-squares regression and path analysis, respectively. The results showed that the major contributors included the slope, soil organic carbon (SOC), elevation, the percentage of landscape area of farmland (PLAND_F) and grassland (PLAND_G), the land surface temperature difference between seasons ( ∆ LST), topographic wetness index (TWI), pH, amorphous iron (poorly ordered forms of iron, Fe o ), and calcium carbonate (CaCO 3 ). In which, the slope, SOC, and elevation were the most important contributors to the mean weight diameter (MWD) and the percentage of water-stable aggregates greater than 0.25 mm (WSA >0.25 ) and had a direct contribution to MWD, WSA >0.25 , and K factors. The PLAND_F and PLAND_G had a signiﬁcant and indirect contribution to those three indices by affecting slope. Meanwhile, the effects of pH, Fe o , and CaCO 3 on WSA >0.25 should also not be underestimated. For MWD and WSA >0.25 , there was a signiﬁcantly higher effect of the land use types and composition than hydrothermal conditions. For K factors, PLAND’s contribution was still higher than ∆ LST and TWI, but they were all signiﬁcant. The other soil properties, including pH, CaCO 3 , and Fe o , indirectly affected them by inﬂuencing SOC. However, the direct contributions of soil properties increased as the soil layer deepened.


Introduction
Soil aggregate is the basic unit of soil. It is formed by cementing inorganic and organic materials. The comprehensive effects of climate, biology, parent material, topography, and soil-forming factors lead to the spatio-temporal variability of soil aggregate [1]. Soil aggregate stability is the capacity of the aggregate-bonding to resist the stresses which cause soil aggregate disintegration, such as tillage, wetting and drying cycles, and the kinetic energy of raindrops [2]. Soil aggregate stability significantly influences soil sustainability and crop production [1]. Soil erodibility represents the soil resistance to soil erosion [3,4]. They are both closely related to soil properties, biological factors, land use, and climate [1,[5][6][7][8].
Improving the understanding of these processes is of great importance to ecosystem sustainability. There is a high collinearity between those influencing factors, including soil properties, climate factors, terrain factors, and land uses. This collinearity would lead to redundancies and potentially misleading results [9]. Partial least-squares regression (PLSR) can handle the collinearity of variables and then explore their relationships.
PLSR combines the characteristics of principal component analysis (PCA) and linear regression analysis. Wang et al. [10] explored how the main soil properties influence soil macro-aggregate and micro-aggregate stability based on PLSR. They demonstrated that free Al oxides and clay were the most important factors for binding macro-aggregates and micro-aggregates, respectively. Using PLSR, Liu et al. [11] explained the effects of climate factors, land use, lithologic factors, soil properties, and topographical factors on runoff coefficient. Then, the PLSR was proved to be an effective way in quantifying the control of influencing factors on runoff coefficient and was recommended to other study areas [11].
Previous studies have emphasized that the influencing factors have direct and indirect contributions to soil aggregate stability and soil erodibility [1,12]. Path analysis, or structural equation modelling, is a powerful way to distinguish the direct and indirect effects effectively [13][14][15]. Current studies have explored the direct and indirect contributions of binding agents [16,17] and vegetational properties [18] to soil aggregate stability and K factors by using path analysis. Zhang et al. [17] revealed that soil organic carbon (SOC) indirectly affected aggregate stability through the effects on soil microbial biomass C, and N. Xu et al. [16] showed that the aggregate stability mainly depended on the iron oxide contents after artificial interference. The amorphous and complex irons had a direct contribution to aggregate stability, and free iron had an indirectly positive effect on soil aggregating. Liu et al. [18] explored the direct and indirect effects of the FI (a ratio of flooded days to whole days), soil physicochemical properties, and vegetational properties on aggregate stability in the riparian zone. They proved that FI was the main contributor in facilitating soil aggregate stability through both direct and indirect effects. To sum up, few studies have been performed to identify the direct and indirect effects of binding agents and vegetational properties to aggregate stability or runoff coefficient by using PLSR and path analysis. How the climate factors, land use, soil properties, and topographical factors contribute to aggregate stability and soil erodibility have not been fully understood.
We previously explored the spatial correlation and distribution of aggregate stability and K factors at catchment scale by semivariograms, local indicators of spatial association, ordinary kriging, and inverse distance weighted [19]. Then, we built their spatial prediction models by using multiple linear regression and regression kriging [20]. In this paper, the PLSR and path analysis were used to fully explore how the climate factors, land use, soil properties, and topographical factors contribute to aggregate stability and soil erodibility. The Zhifanggou watershed, which is a typical catchment in the hilly gullied loess landscape, is located at the center of Loess Plateau. The water erosion leads to soil erodibility in this area [20]. Hence, the objectives of this study are as follows: (i) evaluate the roles of climate factors, land use, soil properties, and topographical factors in the aggregate stability and K factors, and (ii) distinguish their direct and indirect contributions to aggregate stability and K factors.

Study Area and Soil Sampling
The field sampling was conducted in the Zhifanggou watershed, located in the center of the Loess Plateau (Figure 1a). The catchment has a typically semiarid climate with a mean annual precipitation of 505 mm. The soil is mainly Calcaric Regosols, according to the WRB reference system derived from aeolian loess parent material [21]. The average thickness of soil is 50-80 m. Its soil is of a silt loam texture. The altitude ranges from 1034 m to 1423 m (Figure 1b). Its main land use types are orchard, forest, grassland, and dry farm (Figure 1c). In the processing of soil sampling, the terrain condition, land use type, and accessibility were considered. Ultimately, a total of 70 sampling sites were selected.
A GPS receiver was used to obtain the information of longitude ( • ), latitude ( • ), and elevation (m). The farmers provided the field management data. The undisturbed soil and disturbed soil were collected from 0-10 cm and 10-20 cm soil layers by aluminum containers, cutting ring, and ziplock bags, respectively. The details were described by Ye et al. [19,20].
The field sampling was conducted in the Zhifanggou watershed, located in the center of the Loess Plateau (Figure 1a). The catchment has a typically semiarid climate with a mean annual precipitation of 505 mm. The soil is mainly Calcaric Regosols, according to the WRB reference system derived from aeolian loess parent material [21]. The average thickness of soil is 50-80 m. Its soil is of a silt loam texture. The altitude ranges from 1034 m to 1423 m (Figure 1b). Its main land use types are orchard, forest, grassland, and dry farm (Figure 1c). In the processing of soil sampling, the terrain condition, land use type, and accessibility were considered. Ultimately, a total of 70 sampling sites were selected. A GPS receiver was used to obtain the information of longitude (•), latitude (•), and elevation (m). The farmers provided the field management data. The undisturbed soil and disturbed soil were collected from 0-10 cm and 10-20 cm soil layers by aluminum containers, cutting ring, and ziplock bags, respectively. The details were described by Ye et al. [19,20].

Data Sources
The soil samples were used to measure soil properties. Undisturbed soil samples were used to measure soil bulk density (BD, g/cm 3 ), total porosity (TP, %), and aggregate stability. The mean weight diameter (MWD), the percentage of water-stable aggregates greater than 0.25 mm (WSA>0.25), geometric mean diameter (GMD), and K are the indictors

Data Sources
The soil samples were used to measure soil properties. Undisturbed soil samples were used to measure soil bulk density (BD, g/cm 3 ), total porosity (TP, %), and aggregate stability. The mean weight diameter (MWD), the percentage of water-stable aggregates greater than 0.25 mm (WSA >0.25 ), geometric mean diameter (GMD), and K are the indictors of aggregate stability. Disturbed soil samples were used for physical and chemical analyses, including SOC, calcium carbonate (CaCO 3 ), free iron (Fe d ), amorphous iron (Fe o ), pH, and soil texture. They were determined by conventional methods [20].
The soil was derived from wind-accumulated loess parent material. Hence, the environmental variables relating to the soil forming factors, climate, vegetation, and relief were chosen. Based on the results of Ye et al. [20], we selected 10 environmental and land use variables. The microclimatic factors included temperature (the land surface temperature difference between seasons, ∆LST) and dryness (temperature vegetation dryness index, TVDI). The normalized difference vegetation index (NDVI) was used to reflect the vegetation cover. The images were processed in ENVI 5. (DEM). The land use classification was derived from an aerial photo with 40 cm resolution. The percentage of landscape area of farmland (PLAND_F), woodland (PLAND_W), and grassland (PLAND_G) were used to represent the landscape composition. Ye et al. [20] introduced the sources of remote sensing data and retrieval method in detail.

Partial Least-Squares Regression
We used the correlation analysis to reveal the significant correlations among the variables at the 0.01 or 0.05 probability level ( Figure 2).
The soil was derived from wind-accumulated loess parent material. Hence, the environmental variables relating to the soil forming factors, climate, vegetation, and relief were chosen. Based on the results of Ye et al. [20], we selected 10 environmental and land use variables. The microclimatic factors included temperature (the land surface temperature difference between seasons, ΔLST) and dryness (temperature vegetation dryness index, TVDI). The normalized difference vegetation index (NDVI) was used to reflect the vegetation cover. The images were processed in ENVI 5.1 software. The terrain factors included elevation (H, m), slope (°), aspect (°), and topographic wetness index (TWI). These factors were calculated in ArcGIS 10.2, based on a 5 m × 5 m digital elevation model (DEM). The land use classification was derived from an aerial photo with 40 cm resolution. The percentage of landscape area of farmland (PLAND_F), woodland (PLAND_W), and grassland (PLAND_G) were used to represent the landscape composition. Ye et al. [20] introduced the sources of remote sensing data and retrieval method in detail.

Partial Least-Squares Regression
We used the correlation analysis to reveal the significant correlations among the variables at the 0.01 or 0.05 probability level ( Figure 2).
Then, PLSR was used to identify the key factors and their magnitudes of influence in controlling the aggregate stability. It generalizes and combines features from PCA and multiple regression (MR). Further information on PLSR is described by Carrascal et al. [9]. SIMCA-P + 13.0 (Umetrics AB, Sweden) was used to perform the PLSR. The cross-validation was used to finalize the appropriate number of components to overcome overfitting. Then, PLSR was used to identify the key factors and their magnitudes of influence in controlling the aggregate stability. It generalizes and combines features from PCA and multiple regression (MR). Further information on PLSR is described by Carrascal et al. [9]. SIMCA-P + 13.0 (Umetrics AB, Umeå, Sweden) was used to perform the PLSR. The crossvalidation was used to finalize the appropriate number of components to overcome overfitting. Then, the optimal combination of the explained variation in the response (R 2 ) and the predictive ability of the model (Q 2 ) was achieved. The Q 2 and Q 2 cum (the cumulative Q 2 over all the selected PLSR components) were calculated by the following equations: where PRESS is the prediction error sum of squares, SS is the residual sum of squares, and n is the number of components. The model exhibits good predictive ability when Q 2 cum is greater than 0.5. The root mean square estimation error (RMSEE) also improves the development of the regression model. The contribution of a predictor can be inferred from the variable importance for the projection (VIP). The variables with higher VIP values are considered as most important for explaining the dependent variable [22]. Generally, the independent variable significantly explains the dependent variable when its VIP value is greater than 1. The regression coefficient (RC) indicates the direction of the impact of each variable in the aggregate stability.

Path Analysis
We used a path analysis to assess the direct and indirect contributions of soil physicochemical properties, environmental factors and land use pattern and structure in determining aggregate stability within each depth. The VIPs of those predictors in path analysis were greater than 1, which meant they had significant effects on aggregate stability. It was processed by SPSS 19.0. We can find the detailed information of path analysis from previous papers [15]. Table 1 presents the descriptive statistics for aggregate stability parameters and selected factors. The variation of the MWD at different soil layers varied substantially from 0.21 mm to 3.23 mm ( Table 1). The WSA >0.25 was from 14.96% to 83.92% and K factor was from 0.0156 to 0.0484. Their coefficient of variations (CV) ranged from 27.71% to 50.32%. It indicated a moderate or high variability, especially for MWD. MWD has a high variability both at 0-10 cm and 10-20 cm. The selected factors in the study varied widely. Those 10 environmental and land use variables we selected had moderate to high variability, indicating the variability of environmental and land use variables.  Figure 2 shows the heat map of the correlation matrix of the independent variables used in the PLSR analysis. It displayed that there were correlations among most independent variables. The Fe o was more independent than the other variables. A summary of the PLSR model for MWD, WSA >0. 25 , and K factors are presented in Table 2. The first component explained 41.84% and 52.68% of the MWD and WSA >0. 25 in the dataset at the 0-10 cm layer, respectively, and the first and second component cumulatively explained 48.02% and 58.95%, respectively. The first component explained 43.68% and 51.64% of the MWD and WSA >0. 25 in the dataset at the 10-20 cm layer, respectively, and the first and second component cumulatively explained 48.57% and 60.77%, respectively. For K factors in the areas, the first component explained 45.85% of the variance at the 0-10 cm layer and 58.01% of the variance at the 10-20 cm layer, respectively. The addition of the second component decreased the Q 2 cum except for the WSA >0.25 at 10-20 cm even though it increased the explained variance. An increase in the number of components resulted in a decrease in the prediction error; however, its further increase could lead to a higher prediction error. We are able to clearly understand the importance of the predictors by comparing their VIP and RC values. Figure 3 illustrates the VIP values and the regression coefficients for aggregate stability and K factors against the influencing factors. Those factors were grouped into three main types, including soil properties, environmental factors, and land use types and composition. In the case of aggregate stability, the highest VIP value was obtained for SOC from the MWD and WSA >0.25 . It was followed by slope, PLAND_F, PLAND_G, and H at 0-10 cm and by slope, pH, Fe o , PLAND_G, Fe d , and PLAND_F at 10-20 cm for the MWD. For the WSA >0.25 , the highest VIP value was followed by H, Fe o , Fe d , CaCO 3 , and slope at 0-10 cm and by Fe d , pH, Fe o , slope, H, and CaCO 3 at 10-20 cm. The VIP and RC values showed that K factors was increasing with increasing PLAND_F, ∆LST, and TWI. Slope, PLAND_G, PLAND_W, SOC, and TP contributed to lower K factors. We are able to clearly understand the importance of the predictors by comparing their VIP and RC values. Figure 3 illustrates the VIP values and the regression coefficients for aggregate stability and K factors against the influencing factors. Those factors were grouped into three main types, including soil properties, environmental factors, and land use types and composition. In the case of aggregate stability, the highest VIP value was obtained for SOC from the MWD and WSA>0.25. It was followed by slope, PLAND_F, PLAND_G, and H at 0-10 cm and by slope, pH, Feo, PLAND_G, Fed, and PLAND_F at 10-20 cm for the MWD. For the WSA>0.25, the highest VIP value was followed by H, Feo, Fed, CaCO3, and slope at 0-10 cm and by Fed, pH, Feo, slope, H, and CaCO3 at 10-20 cm. The VIP and RC values showed that K factors was increasing with increasing PLAND_F, ΔLST, and TWI. Slope, PLAND_G, PLAND_W, SOC, and TP contributed to lower K factors.

The Direct and Indirect Contributions of Related Factors to the Aggregate Stability and K Factors
Based on the PLSR results, we explored the direct and indirect contributions of soil properties, natural factors, and land use in determining aggregate stability and K factors within each depth by using path analysis (Figures 4-6). The results showed that SOC, H, and slope were superior and directly contributed to the MWD and WSA >0. 25 . At 0-10 cm, the direct contribution of soil properties to WSA >0.25 was only from SOC, and other soil properties, including CaCO 3 and Fe d , indirectly contributed to WSA >0.25 by affecting SOC and H. The direct contribution of soil properties was strengthened at 10-20 cm. The Fe d and pH also directly contributed to WSA >0. 25 . Then other soil properties, including CaCO 3 and Fe o , indirectly affected WSA >0.25 by influencing SOC, Fe d , and pH. For K factors at 0-10 cm, they were mainly directly contributed to by slope. At 10-20 cm, the direct effects of SOC, ∆LST, and TWI were also prominent. The indirect contributions to K factors were seen through the effects of slope and temperature conditions.

The Direct and Indirect Contributions of Related Factors to the Aggregate Stability and K Factors
Based on the PLSR results, we explored the direct and indirect contributions of soil properties, natural factors, and land use in determining aggregate stability and K factors within each depth by using path analysis (Figures 4-6). The results showed that SOC, H, and slope were superior and directly contributed to the MWD and WSA>0.25. At 0-10 cm, the direct contribution of soil properties to WSA>0.25 was only from SOC, and other soil properties, including CaCO3 and Fed, indirectly contributed to WSA>0.25 by affecting SOC and H. The direct contribution of soil properties was strengthened at 10-20 cm. The Fed and pH also directly contributed to WSA>0.25. Then other soil properties, including CaCO3 and Feo, indirectly affected WSA>0.25 by influencing SOC, Fed, and pH. For K factors at 0-10 cm, they were mainly directly contributed to by slope. At 10-20 cm, the direct effects of SOC, ΔLST, and TWI were also prominent. The indirect contributions to K factors were seen through the effects of slope and temperature conditions.    The land use types and composition, including PLAND_F, PLAND_G, and PLAND_W, mainly directly affected slope, and then significantly contributed to those three indices. Moreover, they also indirectly contributed to K factors by affecting ∆LST. In conclusion, land use mainly affected the aggregate stability and K factors indirectly through slope. This was because GGP has been implemented by converting slope farmland (>25 • ) and bare land into grassland, shrubland, and woodland. Human activity contributed to this result. The land use types and composition, including PLAND_F, PLAND_G, and PLAND_W, mainly directly affected slope, and then significantly contributed to those three indices. Moreover, they also indirectly contributed to K factors by affecting ΔLST.
In conclusion, land use mainly affected the aggregate stability and K factors indirectly through slope. This was because GGP has been implemented by converting slope farmland (>25°) and bare land into grassland, shrubland, and woodland. Human activity contributed to this result.

Discussion
The PLSR results showed that SOC, slope, pH, Feo, PLAND_G, Fed, PLAND_F, H, and CaCO3 were the most important contributors to MWD and WSA>0.25, and PLAND_F, slope, ΔLST, TWI, PLAND_G, PLAND_W, SOC, and TP were important to K factors at the Zhifanggou watershed. Then, we used a path analysis to assess the direct and indirect contributions of those factors on MWD and WSA>0.25 and K factors. As a primary binding agent of aggregate, SOC made a direct and important contribution to MWD and WSA>0.25. Black and grey paths indicate direct and indirect effects, respectively. Thick lines indicate significant relationships (p < 0.05), and the fine lines indicate non-significant relationships. TP, total porosity; SOC, soil organic carbon; ∆LST: the land surface temperature difference between seasons; TWI: topographic wetness index; PLAND_F, PLAND_W, and PLAND_G: the proportion of the landscape occupied by farmland, woodland, and grassland, respectively. K factors: soil erodibility factor.

Discussion
The PLSR results showed that SOC, slope, pH, Fe o , PLAND_G, Fe d , PLAND_F, H, and CaCO 3 were the most important contributors to MWD and WSA >0.25 , and PLAND_F, slope, ∆LST, TWI, PLAND_G, PLAND_W, SOC, and TP were important to K factors at the Zhifanggou watershed. Then, we used a path analysis to assess the direct and indirect contributions of those factors on MWD and WSA >0.25 and K factors. As a primary binding agent of aggregate, SOC made a direct and important contribution to MWD and WSA >0.25 . However, it was interesting to note that slope also had a direct contribution to aggregate stability.
The slope was reported to be indirectly and negatively contributed to aggregate stability [12]. Normally, the steeper the slope, the more susceptible to erosion. The primary bonding agents in aggregation, SOC, and clay were easily removable by erosion [23]. In addition, the erosion of SOC may also enhance mineralization [24]. However, the slope had a direct and positive contribution to aggregate stability after the GGP. It is inconsistent with the findings of Ayoubi et al. [23], who presented a good soil aggregate stability in the lower slope. However, it is in agreement with the findings of Ye et al. [20]. This is because we had been converting slope farmland (>25 • ) into grassland, shrubland, and woodland for implementing GGP in Loess Plateau. Hence, the positive effect of GGP was stronger than the effect of stabilizing agents.
There was a strengthening contribution of Fe d and pH to WSA >0.25 at 10-20 cm. Then CaCO 3 and Fe o indirectly affected WSA >0.25 by influencing SOC, Fe d , and pH. Soil pH influenced clay dispersion, metal ion solubility, microbial activity, and so on [25]. In a weakly alkalescent and oxidizing condition, Ca 2+ is mainly from the dissolution of precipitation. Then, CaCO 3 indirectly contributed to aggregate stability [20]. In addition, the high content of CaCO 3 resulted in a reduction of SOC mineralization. Similarly, in the study by Wang et al. [10], Fe o was the weakest contributor to soil aggregation.
The direct contribution to K factors was mainly from slope, ∆LST, and TWI. The seasonal variation of soil erodibility factors has been proven by previous studies and was mainly attributed to the changes in soil temperature and soil moisture [26]. The snowmelt resulted in a saturated surface soil with water and a frozen subsurface soil, which led to a slippage between these two soil layers. This process directly contributed to an increased erodibility [27]. In addition, seasonal temperature variation significantly affected soil bulk density, soil moisture, and soil permeability, which contributed to a high erodibility [28][29][30]. The topographic wetness index reflects the control of terrain in the soil moisture distribution [31]. This control could directly affect the intensity of water erosion [32]. There was also an enhanced and direct contribution to K factors from SOC at 10-20 cm. The indirect contributions to K factors were all through the effects of slope and temperature conditions.
Previous works showed that land use can directly and greatly influence near-surface soil characteristics, such as soil properties, root systems, and tillage practices; thus affecting aggregate stability and soil erodibility significantly [33,34]. On Loess Plateau, the land use types relied on slope dramatically because of the implementation of GGP. Hence, the land use types and composition, including PLAND_F, PLAND_G, and PLAND_W, were mainly dependent on slope and significantly contributed to MWD and WSA >0. 25 . Moreover, they also indirectly contributed to K factors by affecting ∆LST, which was because seasonal temperature also significantly determined vegetation type. In conclusion, land use mainly affected the aggregate stability and K factors indirectly through slope. This was because GGP has been implemented by converting slope farmland (>25 • ) into grassland, shrubland, and woodland.

Conclusions
In this study, we explored how aggregate stability and soil erodibility were related to the soil properties, environmental factors, land use types, and the composition of the Zhifanggou watershed, which is a hilly, gullied, loess landscape of the Loess Plateau, by using PLSR methodology and path analysis. Our results proved that the aggregate stability and soil erodibility were primarily controlled by the slope, SOC, elevation, PLAND_F, PLAND_G, ∆LST, and TWI, followed by the pH, Fe o , and CaCO 3 . In addition, PLAND_F and PLAND_G were found to significantly and indirectly contribute to MWD, WSA >0.25 , and K factors by affecting slope. The land use types and composition had significantly higher effects on MWD and WSA >0.25 than the hydrothermal condition had. For K factors, the land use's contribution was still higher than ∆LST and TWI, but they were both significant. The contribution of soil properties increased as the soil layer deepened. The results help to distinguish the contribution of internal and external factors to aggregate stability, which serve as a foundation for a comprehensive understanding of soil aggregate formation and stabilization mechanisms. It will be beneficial for improving soil fertility and soil quality and also provide a scientific basis for the establishment of accurate soil erosion models.