Evaluation and Development of Pedo-Transfer Functions for Predicting Soil Saturated Hydraulic Conductivity in the Alpine Frigid Hilly Region of Qinghai Province

: In recent years, Pedo-Transfer Functions (PTFs) have become a commonly used tool to predict the hydraulic properties of soil. As an important index to evaluate the function of forest water conservation, the prediction of saturated hydraulic conductivity ( K S ) on the regional scale is of great signiﬁcance to guide the vegetation construction of returning farmland to forest area. However, if the published PTFs are directly applied to areas where the soil conditions are different from those where the PTFs are established, their predictive performance will be greatly reduced. In this study, 10 basic soil properties were measured as input variables for PTFs to predict K S in the three watersheds of Taergou, Anmentan, and Yangjiazhai in the alpine frigid hilly region of Qinghai Province, China. The parameters of the eight published PTFs were modiﬁed by the least-squares method and new PTFs were also constructed, and their prediction performance was evaluated. The results showed that the K S of coniferous and broad-leaved mixed forests and broad-leaved pure forests in the study area were signiﬁcantly higher than those of pure coniferous forests, and grassland and farmland were the lowest ( p > 0.05). Soil Organic Matter plays an important role in predicting K S and should be used as an input variable when establishing PTFs. The Analysis-Back Propagation Artiﬁcial Neural Network (BP ANN) PTF that was established, with input variables that were, Si · SOM, BD · Si, ln 2 Cl, SOM 2 , and SOM · lnCl had a better predictive performance than published PTFs and MLR PTFs.


Introduction
Soil, as an important link in the Soil-Plant-Atmosphere Continuum (SPAC) system, exerts an important influence on global terrestrial surface water and energy balance by regulating local water balance through infiltration, runoff, and groundwater recharge [1]. Saturated hydraulic conductivity (K S ), an important physical characteristic of soil, refers to the water flux or seepage velocity per unit area when the water potential gradient is fixed and the soil reaches saturation state [2]. To some extent, K S can be used as an index to evaluate the ability of soil to conserve water and resist erosion. Although some studies have shown that the soil preferential flow plays a dominant role in groundwater recharge and chemical transport [3][4][5], the flowability of saturated flow in soil has a significant influence on the generation and development of surface runoff. Hence, K S plays an important role in solving the management problems of soil and water resources related to agricultural production and ecological environment [6]. Meanwhile, K S is usually a sensitive input variable in many hydrological models such as HYDRUS, TOPMODEL, and DHSVM, so whether K S is accurate or not will directly affect the simulation results of the models [7]. K S is not only an important variable in the Darcy equation to calculate the velocity in saturated soil, but is also crucial in unsaturated flow equations such as the Richards equation.

Description of the Study Area
The study area is located in Datong Hui and Tu Autonomous County (100 • 51 -101 • 56 E, 36 • 43 -37 • 23 N), Xining City, Qinghai Province. This county is located in the eastern part of Qinghai Province and the southern foot of the eastern section of the Qilian Mountains, and it belongs to the transition zone between the Loess Plateau and the Qinghai-Tibet Plateau, with an altitude of roughly 2280 to 4622 m. The region is deep inland, far from the ocean, in the middle latitude area, with high altitude and continental plateau climate. The total annual radiation is 547.03-615.93 kJ/cm 2 , and the average annual sunlight duration is 2605 h. The vertical distribution of temperature in this region is obvious, and the temperature difference between day and night is large. The annual average temperature is 2.8 • C, the highest temperature being in the three months of June, July, and August. Datong County has many precipitation days and low intensity, and the seasonal distribution of rainfall is very uneven. The average annual rainfall is about 508 mm, and the annual rainfall ranges from 450 to 820 mm. The main land types in Datong County are alpine stony soil, alpine meadow soil, mountain brown soil, chernozem soil, chestnut soil, and tidal sand soil, etc. In our study, three basins were selected as experimental points: Taergou, Anmentan, and Yangjiazhai. Since the 1970s, these basins have undergone large-scale projects to convert farmland to forests. Due to the insufficient cognitive condition and backward technology of vegetation reconstruction at that time, the water conservation function of the existing forest types was low and could not completely fulfil its ecological service function.

Soil Sample Collection
According to the vegetation coverage type and soil type distribution characteristics of the study area, a total of 126 plots from May to September 2019 were set up. Soil profiles were dug at each sampling site for soil sample collection. The average depth of soil in the study area is 60 cm; the soil samples were collected at soil layer depths of 10-20 cm, 20-40 cm, and 40-60 cm. Three undisturbed soil cores were taken from the depth of each soil layer to determine K S (mm min −1 ) by using ring knifes (Φ 61.80 mm × 56.70 mm), and the average value was taken as the K S value of the sampling point. Small ring knifes (Φ 50.50 mm × 50.00 mm) were used to collect another three undisturbed soil cores, which were used to measure soil saturated water content (SSWC, cm 3 cm −3 ), total porosity (P, %), bulk density (BD, g cm −3 ), non-capillary porosity (P NC , %), capillary porosity (P C , %), and field capacity (FC, %). In addition, three portions of disturbed soil (1 per kg) were taken from the same soil layer for determination of organic matter (SOM, %) and mechanical composition (clay (Cl, %), silt (Si, %), sand (Sa, %). In total, 2268 undisturbed soil cores and 1134 disturbed bulk soils were taken back to the laboratory for analysis and measurement. The average of three repeated experiments was used to characterize the soil properties of each soil layer. The physical and chemical properties of 378 soil layers were obtained from 126 plots (Table 1).

Laboratory Experiment
The saturated hydraulic conductivity (K S ) of undisturbed soil scores was measured by the constant head method [9].
Before measurement, all the soil samples were dried naturally. Soil Organic Carbon (SOC, %) was determined by the potassium dichromate dilution heating method. Soil Organic Matter (SOM, %) can be converted by the following formula: (1) Particle-size distribution was determined using a Malvin laser particle size analyzer (Mastersizer 2000). Soil texture was classified according to the International Classification of soil texture. Soil samples that were disturbed were dried in an oven at 105 • C for 24 h and weighed to determine their SSWC, P, BD, P NC , P C , and FC.

Data Analysis
Modeling data (n = 264) were used to modify the parameters of the PTFs using the nonlinear least-square method. Test data (n = 114) were used to validate the prediction performance of the modified PTFs. For consistency, modeling data (n = 264) was used to develop the MLR and ANN PTFs, and the test data (n = 114) were used to validate the prediction performance of the newly established PTFs.

Regression Analysis
Multiple Linear Regression (MLR) is a well-established method for the development of Pedo-Transfer Functions to predict the soil hydraulic parameters. Stepwise regression is a common method to avoid multicollinearity in Multiple Linear Regression analysis. Firstly, variables were introduced into the model one by one, and an F test was performed after each explanatory variable was introduced. Secondly, to ensure that only significant variables were included in the regression equation before each new variable was introduced, T-test was performed on the selected explanatory variables one by one when an explanatory variable was introduced and it was significant. However, when the second explanatory variable was introduced, the previously introduced explanatory variable was deleted because it was no longer significant. The explanatory variables finally retained in the model were not only important, but there was also no serious multicollinearity after stepwise regression. Before the multiple regression analysis, the variables (P NC , FC, Cl, K S ) that did not conform to the normal distribution were converted logarithmically to make them meet the normal distribution.
The basic form of the regression equation was: where Y is the dependent variable, K S , a is the intercept; b 1 , . . . , b i are unstandardized regression coefficients; and X 1 , . . . X i are independent variables. Although stepwise multiple regression can be used as a method to solve multicollinearity problems, important variables might be removed in practical applications. For example, variables X 1 , X 2 , and X 3 had been selected and continued to introduce variable X 4 in the execution process. Due to the multicollinearity problem between X 4 and X 2 and X 3 , the variance inflation of X 4 was not significant and was excluded. As a matter of fact, X 4 had a stronger response related to the dependent variable Y, but variable X 4 was perfectly eliminated after the execution of stepwise regression. To solve this problem, it was necessary to consider the possible interaction between variables and quadratic regression when carrying out stepwise multiple regression, so that important variables could be retained as much as possible and the prediction accuracy of the model improved. The formula was as follows: Stepwise multiple regression was accomplished in SPSS 23.0.

Construction of BP Artificial Neural Network
When two or more highly correlated independent variables are input into the Artificial Neural Network, it will harm the learning ability of the Artificial Neural Network (multicollinearity). Highly correlated inputs mean that the amount of characteristic information presented by each variable is small, so less important variables can be eliminated or new input variables can be sought to replace the old ones. Input variables ensured by stepwise regression were used as the input variables of the ANN to test whether it had better predictive performance than MLR.
The Artificial Neural Network model was established by using MATLAB R2018b. In this study, the new variables extracted from the principal component analysis were taken as the input layer, while K S was taken as the output layer and the Artificial Neural Network based on the Feed-forward backprop (BP) algorithm used for modeling. In the ANN, the Sigmoid function was selected as the activation function of the input layer, the activation function of the hidden layer was the Tansig function, the Purelin function was the activation function of the output layer, and Trainlm was the training function of the network. The training accuracy was 0.0001, the number of training times was 1000, and the learning rate was 0.01. Training was stopped when the training accuracy was below the set accuracy, or when the maximum number of training times was reached. At present, there is no effective way to determine the number of hidden layers in the Artificial Neural Network. It can only be tested one by one according to the empirical formula: where h is the number of hidden layers, m and l are the numbers of the input layer and output layer, respectively, and α is a constant between 1 and 10. The number of hidden layer nodes was determined by cross-validation. The modeling data were randomly divided into two parts, namely, training data (N = 264) and verification data (N = 114). The training data was used to create the Artificial Neural Network, and the validation data was used to test the error of the model. The number of nodes in the hidden layer with the minimum error was selected as the final model structure.

Evaluation Criteria of Performance
Two methods are usually used to verify the predicted performance of PTFs. One is the traditional method of statistical comparison of predicted values and measured values of K S . Another method is a process-based functional verification that uses soil hydraulic parameters to predict water flow through a software program [26]. In this study, the first approach was taken because it can judge the prediction performance of the model well without disturbing the modeling data. For this reason, 264 sample points were randomly selected as modeling data, and the remaining 114 sample points were used as test data to evaluate the prediction effect of the model. To evaluate the predictive performance of PTFs, the following five indicators were selected: (1) the Mean Relative Error (MRE); (2) the Mean Error (ME); (3) the Root Mean Square Error (RMSE); (4) the coefficient of determination (R 2 ); and (5) the Akaike Information Criterion (AIC).
where n is the number of the test data, p i and m i are the ith predicted value and measure value, respectively (i = 1, 2, 3, . . . , n), and p and m are the mean values of the predicted and measured values, respectively. The k is the number of parameters in the PTFs. To achieve a better predictive performance of PTFs, MRE, RMSE, and AIC should be as low as possible, and ME should approach 0 while R 2 should be close to 1.

Descriptive Statistics
The descriptive statistics and variable coefficient of K S and basic soil parameters for total data, modeling data, and test data are shown in Table 1. For all our samplings, K S varied between 0.17 and 2.17 mm/min, with an average value of 0.94 mm/min and a medium variation of 52.45%. The average bulk density was 1.22 g/cm 3 with a variable coefficient of 8.98%, in the range of 0.89-1.56 g/cm 3 . The mean of SSWC was 0.43 cm 3 cm −3 ; this represents the maximum water-holding capacity of the soil, that is, the amount of water in which all the porosities between soil particles are filled with water. The P, which included both P NC and P C , varied between 42.28 and 63.39%. The P C , also known as soil water-holding porosity, ranged between 38.11 and 58.51%, with a variable coefficient of 7.74%. The P NC , which varied between 0.31 and 18.46%, with a coefficient of variation of 46.92%, was filled by water only when there was a large amount of gravity water, so it was also called the soil air pore or soil aeration pore. Representing the soil mechanical Among all the measured soil basic parameters, the P had the least variable coefficient (7.73%), while the K S had the highest variable coefficient (52.64%). As the research results of Arshad et al. show, one of the reasons for the higher variable coefficient of K S is probably the large study area and obvious spatial heterogeneity [27]. The higher variation degree of parameters may be due to the difference between small-scale and large-scale data fluctuations [7,28].
The texture types of the soil layers 0-20 cm, 20-40 cm, and 40-60 cm at 128 plots in the study area and the International Classification of soil texture are shown in Figure 1. It can be seen that all soil samples collected covered four types of soil texture (sandy clay loam, sandy loam, loam, and clay loam). Sandy clay loam and loam were the main soil texture types in the study area. There was also no obvious heterogeneity of soil texture at different soil depths. Therefore, all soil samples were taken as a whole and a single soil transfer function was adopted instead of calculating soil layers separately when predicting soil saturated water conductivity by PTFs, which can increase the number of samples and improve the prediction accuracy.  As Figure 2 shows, Pearson correlation analysis of 11 soil basic properties demonstrates that, among the 55 variable pairs, 51 pairs were significantly correlated when p ≤ 0.05, and 4 pairs were not. The strongest positive correlations (R > 0.8) were obtained between P and SSWC, P C and SSWC, P C and P, FC and SSWC, FC and P, and FC and P C , while the highest negative correlations (R < −0.8) were observed between BD and SSWC, BD and P, BD and FC, and BD and P C . K S had a significant correlation with all variables, and had a moderate positive correlation with SWWC (0.68), P (0.57), P C (0.54), FC (0.66), Sa (0.71) and SOM (0.66), and a moderate negative correlation with BD (−0.74) and Cl (−0.54). Exploratory factor analysis was carried out for 10 basic physical and chemical properties of soil by principal component analysis, as shown in Table 2. There are generally two selection criteria for principal component factors: (1) select principal component factors with characteristic roots greater than 1.00; (2) select principal component factors with a cumulative contribution rate greater than 85%. The maximum variance method was used to carry out the orthogonal rotation and shows the factor load matrix after rotation. The eigenvalues of the first five principal component factors were greater than 1 and explained 94.402% of the variation of soil properties. The PC1 could explain 45.358% of the variation, and it had high positive loadings from SSWC (0.922), P (0.916), PC (0.941), and FC (0.948), while having a high negative loading from BD (−0.865). PC1 was named the "pore of soil" component, because it chiefly explained the variations of soil bulk density and pore characteristics. The PC2 explained 17.516% of variation with high negative loading from Si (−0.914) and positive loading from Sa (0.843); hence, PC2 was defined as the "silt and sand" component. The PC3 had a high loading from Cl (−0.95) that explained 12.517% of variation, and it was named the "Clay" component. The PC4 explained 11.508% of variation and had a high loading from PNC (0.984). PC4 was named the "nonwater retention" component because these non-capillary porosity samples have no capillary function and are filled with air. The last PC also had one high loading from SOM (0.917), and it explained 7.503% of variation; PC5 was named the "organic matter" component. Exploratory factor analysis was carried out for 10 basic physical and chemical properties of soil by principal component analysis, as shown in Table 2. There are generally two selection criteria for principal component factors: (1) select principal component factors with characteristic roots greater than 1.00; (2) select principal component factors with a cumulative contribution rate greater than 85%. The maximum variance method was used to carry out the orthogonal rotation and shows the factor load matrix after rotation. The eigenvalues of the first five principal component factors were greater than 1 and explained 94.402% of the variation of soil properties. The PC1 could explain 45.358% of the variation, and it had high positive loadings from SSWC (0.922), P (0.916), P C (0.941), and FC (0.948), while having a high negative loading from BD (−0.865). PC1 was named the "pore of soil" component, because it chiefly explained the variations of soil bulk density and pore characteristics. The PC2 explained 17.516% of variation with high negative loading from Si (−0.914) and positive loading from Sa (0.843); hence, PC2 was defined as the "silt and sand" component. The PC3 had a high loading from Cl (−0.95) that explained 12.517% of variation, and it was named the "Clay" component. The PC4 explained 11.508% of variation and had a high loading from P NC (0.984). PC4 was named the "non-water retention" component because these non-capillary porosity samples have no capillary function and are filled with air. The last PC also had one high loading from SOM (0.917), and it explained 7.503% of variation; PC5 was named the "organic matter" component.

Characteristics of K S in Different Stand Types of the Study Area
Comparisons of the average K S of 14 stand types in the study area were made, and significant difference analysis was conducted, as shown in Figure 3. The saturated water conductivity of the mixed forest of Betula-Picea was significantly higher than that of other stands, but there was no significant difference between it and the mixed forest of Populus-Betula. Farmland and grasslands were significantly lower than most forestland. Rachman et al. and Seobi et al. provided the K S in vegetation areas, and it can be seen that it is generally higher than that found in farmland [29,30]. The result was not surprising because related studies showed that the K S of the native natural forest was higher than that of other vegetation types [31]. As the top vegetation community of natural succession in this area, the mixed forest of Betula-Picea had the best water conservation benefits. It is worth mentioning that the K S of the coniferous pure forest was generally lower than that of coniferous and broad-leaved mixed forest. Lim et al. analyzed 1456 soil samples from South Korea and found that K S in the broad-leaved mixed forest was significantly higher than that in the coniferous pure forest, which coincides with the results of this study [32]. Firstly, the roots of broad-leaved tree species were more widely spread laterally and produced a large number of small root branches, resulting in more developed soil pores than in coniferous forests [33]. Secondly, the litter of coniferous pure forests did not easily decompose, leading to soil acidification, reducing Soil Organic Matter content and changing soil structure, which is not conducive to the growth of understory vegetation. To increase the K S , the litter layer could not only increase the organic matter content but also reduce the soil bulk density [34,35]. On the contrary, the soil structure of the coniferous and broad-leaved mixed forest was complex and the vegetation environment was diverse, which results in the difference of K S . Previous studies have shown that better soil structure and root and pore networks, stronger soil microbial activity, and higher organic matter content can all increase soil saturated water conductivity.

Modification of the Published PTFs
Currently, many PTFs have been developed with different input parameters and mathematical concepts. These PTFs, however, were established based on linear regression to simulate the relationship between soil hydraulic characteristics and soil basic proper-

Modification of the Published PTFs
Currently, many PTFs have been developed with different input parameters and mathematical concepts. These PTFs, however, were established based on linear regression to simulate the relationship between soil hydraulic characteristics and soil basic properties. Based on this, some well-known PTFs were selected that at least contained information regarding soil particle size distribution. Table 3 lists the equation forms of the eight PTFs selected and the required input variables. Table 4 shows the results of parameter estimation for each PTFs. To evaluate the applicability of these functions in the study area more reasonably, the nonlinear least-square method was adopted to modify the parameters.  Table 5 shows the estimated and measured values of AIC, R 2 , RMSE, ME, and MRE, calculated using the eight modified Pedo-Transfer Functions of soil saturated water conductivity. From the determination coefficient R 2 , the R 2 of the Vereecken (0.655) and Wosten 1999 (0.654) PTFs was the largest, which indicated that the predicted value of K S calculated by these PTF s was closest to the variation trend of the measured value in the study area. The predicted value of the Wosten 1997 PTF had a minimum of RMSE (0.226) and AIC (216.723). From this aspect, the estimation accuracy of the Wosten 1997 PTF was the highest, and the estimation accuracy of the Li and Vereecken functions was also similar to that of the Wosten 1997 PTF. The ME of the predicted value calculated by all PTFs and the measured value were all less than 0, which indicated that the predicted value as a whole was less than the measured value. To further more intuitively understand the prediction effect, Figure 4 shows a comparison of the predicted and measured values of K S , calculated using eight PTFs. It can be seen that the predicted values of the modified functions of Wosten 1997, Wosten 1999, Li, and Vereecken were basically distributed on both sides of the 1:1 scale line. Yao et al. used the Levenberg-Marquardt algorithm to non-linearly fit the Vereecken function and achieved good prediction results [6]. Tietje and Tapkenhinrichs compared several famous PTFs and believed that the Vereecken PTF had the best prediction performance, while the Saxton PTF had a poor prediction effect for soil with higher organic matter content, which was consistent with the results of this study [36]. In another study, Tietje pointed out that the K S predicted values of the Saxton, Cosby, and Vereecken PTFs were usually higher than the measured values, which was contrary to the results of this study and might be caused by the correction of model parameters [37]. In summary, functions of the modified Wosten 1997, Wosten 1999, Li, and Vereecken PTFs had a better prediction effect, and they all took SOM as the input variable, while the modified Cosby, Saxton, Pucket, and Campbell functions had a poor prediction effect and did not take SOM into account. Therefore, SOM might be an important variable affecting the prediction accuracy of K S. In addition to the selection of input variables, inherent variations of K S , such as measurement error, spatial variability, and sample error, were the reasons for prediction errors of K S using PTFs [37]. It was therefore necessary to establish a suitable Pedo-transfer function to better predict K S .

Multiple Linear Regression
The multiple regression model established by the stepwise regression method was as follows. MLR 1 only considered the primary correlation of variables, while MLR 2 considered the interaction and secondary correlation between variables.

Development of BP Artificial Neural Network
In order to avoid multicollinearity among input variables, the interaction between soil properties should also be considered. Five variables (x 1 = Si·SOM, x 2 = BD·Si, x 3 = ln 2 Cl, x 4 = SOM 2 , and x 5 = SOM· ln Cl) were adopted, which were generated by the MLR 2 model as the input layer and ln K S as the output layer of the ANN to verify whether it had better predictive performance than MLR 2 . The number of hidden layer nodes selected ranged from 4 to 12. Figure 5 shows that the R and MSE of the ANN can be used for predicting soil saturated water conductivity by setting the different numbers of hidden layer nodes. When the number of hidden layer nodes was 5, the ANN had the lowest MSE (0.0776) and the highest R (0.902). Therefore, when the topology of the neural network was 5-5-1, it had optimal prediction performance.

Multiple Linear Regression
The multiple regression model established by the stepwise regression method was as follows. MLR1 only considered the primary correlation of variables, while MLR2 considered the interaction and secondary correlation between variables. MLR1: . . . .

Development of BP Artificial Neural Network
In order to avoid multicollinearity among input variables, the interaction between soil properties should also be considered. Five variables (x1 = Si • SOM, x2 = BD • Si, x3 = ln Cl, x4 = SOM , and x5 = SOM • ln Cl) were adopted, which were generated by the MLR2 model as the input layer and ln K as the output layer of the ANN to verify whether it had better predictive performance than MLR2. The number of hidden layer nodes selected ranged from 4 to 12. Figure 5 shows that the R and MSE of the ANN can be used for predicting soil saturated water conductivity by setting the different numbers of hidden layer nodes. When the number of hidden layer nodes was 5, the ANN had the lowest MSE (0.0776) and the highest R (0.902). Therefore, when the topology of the neural network was 5-5-1, it had optimal prediction performance.

Evaluation of Prediction Performance of the Model
The 114 data that were not involved in the modeling were used as test data, and they were input into the proposed Multiple Linear Regression equation and the BP Artificial Neural Network to obtain the predicted values of the two models for soil saturated hy-

Evaluation of Prediction Performance of the Model
The 114 data that were not involved in the modeling were used as test data, and they were input into the proposed Multiple Linear Regression equation and the BP Artificial Neural Network to obtain the predicted values of the two models for soil saturated hydraulic conductivity.
As can be seen from Figure 6, the predicted values of K S of the two models were basically distributed around the 1:1 proportion line, and some points greatly deviated from the 1:1 proportion line, but the overall predicted values were not significantly different from the measured values. Table 6 shows the accuracy information of the two models in predicting soil saturated water conductivity. Compared with MLR 1 and MLR 2 , the ANN had a higher R 2 and a lower RMSE and AIC. However, from the perspective of ME, MLR was lower than the ANN, which was not surprising, because MLR was unbiased. The prediction results of MLR and the ANN were superior to those of the published PTFs in Table 5.
Agronomy 2021, 11, x FOR PEER REVIEW 15 of 19 As can be seen from Figure 6, the predicted values of KS of the two models were basically distributed around the 1:1 proportion line, and some points greatly deviated from the 1:1 proportion line, but the overall predicted values were not significantly different from the measured values. Table 6 shows the accuracy information of the two models in predicting soil saturated water conductivity. Compared with MLR1 and MLR2, the ANN had a higher R 2 and a lower RMSE and AIC. However, from the perspective of ME, MLR was lower than the ANN, which was not surprising, because MLR was unbiased. The prediction results of MLR and the ANN were superior to those of the published PTFs in Table 5.  As expected, the prediction performance of MLR2 was better than that of MLR1 because MLR2 retained more useful information, such as Si. Another reason for this was that MLR2 took into account possible interactions and quadratic correlations between soil properties. The ANN had the highest R 2 and the lowest RMSE, MRE, and AIC compared to MLR1 and MLR2, while ME was only just over 0.01 higher. Therefore, it can be considered that the BP-ANN had higher prediction accuracy than the model established by Multiple Linear Regression. Some scholars believe that Artificial Neural Networks can extract nonlinear information between soil properties, which was not available in Multiple Linear Regression [6]. As stated by Wosten et al., another advantage of Artificial Neural Networks is their ability to mimic the behavior of complex systems by altering the strength of interaction and range of choices of structures of interconnections between every component in a network [7].Compared with the eight previously modified the established PTFs, MLR2 and ANN had higher predictive accuracy. From the perspective of the prediction results of the model, the difference between the prediction results of MLR and ANN was not only caused by the nature of the model itself (MLR2 and ANN) but also by the different input variables (MLR1 and ANN). After considering the interaction between every soil property, the prediction performance of MLR2 and ANN was significantly higher than that of MLR1. Although there was no strong multicollinearity between the input variables by stepwise regression, there was more or less of a correlation (even if the correlation was weak, it cannot be ignored) between the soil properties. Therefore, we suggest that the interaction between independent variables should be considered in future  As expected, the prediction performance of MLR 2 was better than that of MLR 1 because MLR 2 retained more useful information, such as Si. Another reason for this was that MLR 2 took into account possible interactions and quadratic correlations between soil properties. The ANN had the highest R 2 and the lowest RMSE, MRE, and AIC compared to MLR 1 and MLR 2 , while ME was only just over 0.01 higher. Therefore, it can be considered that the BP-ANN had higher prediction accuracy than the model established by Multiple Linear Regression. Some scholars believe that Artificial Neural Networks can extract nonlinear information between soil properties, which was not available in Multiple Linear Regression [6]. As stated by Wosten et al., another advantage of Artificial Neural Networks is their ability to mimic the behavior of complex systems by altering the strength of interaction and range of choices of structures of interconnections between every component in a network [7].Compared with the eight previously modified the established PTFs, MLR 2 and ANN had higher predictive accuracy. From the perspective of the prediction results of the model, the difference between the prediction results of MLR and ANN was not only caused by the nature of the model itself (MLR 2 and ANN) but also by the different input variables (MLR 1 and ANN). After considering the interaction between every soil property, the prediction performance of MLR 2 and ANN was significantly higher than that of MLR 1 . Although there was no strong multicollinearity between the input variables by stepwise regression, there was more or less of a correlation (even if the correlation was weak, it cannot be ignored) between the soil properties. Therefore, we suggest that the interaction between independent variables should be considered in future modeling work, rather than eliminating all of them in order to avoid collinearity problems.
To some extent, the MLR and ANN established in this study were comparable because they considered all the soil properties measured, and subjectivity was avoided in the selection of input variables. Of course, PTFs based on Artificial Neural Networks are not always better than PTFs built by MLR. Zhao et al. found that the randomness of grouping test data and modeling data had a great impact on the prediction accuracy of PTFs, so that the stability of the ANN was less than MLR when the input variables were the same, although their prediction performance was generally similar in their study on the Loess Plateau of China [18]. Hasan et al. analyzed 195 soil samples from the Erzincan Plains and showed that the accuracy of prediction of K S by regression was slightly better than that of ANNs, but the experimental results of this study alone did not indicate that regression was always better than ANNs. Better results may be obtained if a better algorithm is used in the ANN [16]. For example, Rezaei Arshad et al. used two different neural network algorithms, namely a Radial Basis Function (RBF) neural network and a Multilayer Perceptual (MLP) model, to predict the soil saturated hydraulic conductivity in southwest Iran, and their prediction performance was both better than that of Multiple Linear Regression [27]. The research results of S. Tamri et al. and I. Yilmaz et al. also showed that an ANN PTF can obtain more reliable prediction results than MLR [38,39]. In contrast to traditional regression models, ANNs do not require a prior regression model that links the input data to the output data, which was often difficult because these models are unknown [40].
In addition to the commonly used linear regression and Artificial Neural Networks, many statistical techniques have been used in the development of PTFs. Lamorski et al. used soil data from Poland to compare PTFs developed by ANN and Support Vector Machine (SVM). They found that the prediction accuracy of two PTFs was roughly similar, and the one developed from the SVM was even higher [41]. Names et al. developed PTFs for 2125 soil samples from the US NRCS soil database using K-Nearest Neighbor methods (KNN), which is another machine algorithm, but the results showed that the predictive performance of KNN and ANN was almost identical [42]. McKenzie et al. and Souza et al. developed PTFs by methods of regression tree and random forest to predict saturated hydraulic conductivity and bulk density, respectively, and provided their predictive performance [43,44]. In addition to the methods already mentioned, there are some statistical techniques, such as Gaussian processes and Naive Bayes Classifiers, that have not been explored in developing PTFs. Some popular deep learning algorithms, such as convolutional neural networks and deep neural networks, are also expected to be widely used in the development of PTFs.
It should be noted that most PTFs are developed using data-driven methods, so their predictive performance is largely dependent on the reliability of soil data. No matter what kind of PTF, they are an indirect method to predict the saturated water conductivity and are based on the direct measurement method. As noted by Wosten et al. and Minasny and Hartemink, the reliability of the data is far more important than the complexity of the model [7,45]. Therefore, while exploring PTFs with more advanced algorithms, we should also devote ourselves to the innovation of direct measurement technology. The modeling data and test data in this study were randomly divided, and the choice of modeling data and test data will have an impact on the prediction effect of PTFs. Therefore, when comparing and selecting PTFs in the future, it is necessary to verify repeatedly to ensure the stability of the PTFs.

Conclusions
(1) Sandy clay loam and loam are the main soil types in the three watersheds. Among different vegetation types, the Ks of coniferous and broadleaf mixed forest and broadleaf pure forest were significantly higher than that of coniferous pure forest and shrub forest, while the Ks of wasteland and farmland were the lowest.
(2) In this study, the parameters of eight published PTFs were modified by the leastsquare method and their prediction accuracy was evaluated. Among them, the Wosten 1997, Wosten 1999, Li, and Vereecken models with SOM as input variables had good prediction accuracy, while the Cosby, Saxton, Pucket, and Campbell models without SOM as input variables had a poor prediction effect. When PTFs are used in this study area, Soil Organic Matter plays an important role in the estimation of Ks.
(3) In this paper, two types of PTFs were established based on Multiple Linear Regression and Artificial Neural Networks, respectively. Our newly created PTFs showed better predictive performance than the published PTFs evaluated earlier. In the regression analysis, we considered the possible interactions between variables and used the stepwise regression method to eliminate redundant variables, so as to avoid the multicollinearity problem while retaining the important variables. The constructed BP Artificial Neural Network model had better prediction performance than Multiple Linear Regression.
It should be pointed out that, in addition to soil properties, stand structure is also an indirect factor affecting K S in the conversion of farmland to forest areas. In recent years, the northwest alpine region has gradually started to regulate the stand structure of artificial forests (such as supplementary planting of Qinghai Spruce in pure birch forest to promote natural regeneration; selective cutting and supplementary planting were also started for stands with unreasonable stand density. The PTFs we have established may only apply to the current vegetation structure. Whether the PTFs established in the present stage are suitable for the next few years is still to be further explored, regardless of whether the artificial stand is transformed or there is natural succession of the vegetation community. It is necessary to collect more soil information in the future and take into account site conditions and stand structure factors to develop a PTFs that is suitable in both time and scale.