Evaluation and Development of Pedotransfer Functions for Predicting Saturated Hydraulic Conductivity for Mexican Soils

In the present work, we evaluate the prediction capability of six pedotransfer functions (PTFs), reported in the literature, for the saturated hydraulic conductivity estimations (KS). We used a database with 900 measured samples obtained from the Irrigation District 023, in San Juan del Rio, Queretaro, Mexico. Additionally, six new PTFs were constructed for KS from clay percentage, bulk density, and saturation water content data. The results show, for the evaluated models, that one model presents an overestimation for KS > 0.5 cm h−1 values, three models have an underestimation for KS > 1.0 cm h−1, and two models have a good correlation (R2 > 0.98) but more than three parameters are necessary. Nevertheless, the last two models require 3–4 parameters in order to obtain optimization. On the other hand, the models proposed in this work have a similar correlation with fewer parameters. The fit is seen to be much better than using the existing ones, achieving a correlation of R2 = 0.9822 with only one variable and R2 = 0.9901 when we use two.


Introduction
Soil water movement is important in fields such mechanics, irrigation, drainage, hydrology, and agriculture [1] and one of the most important hydraulic parameter is the saturated hydraulic conductivity (K S ), defined as the ability of soils to transmit water throughout the saturated zone [2,3]. The calculation of saturated hydraulic conductivity is a very crucial factor to optimize the flow rate applied to the border or furrow in the gravity irrigation [2][3][4][5][6]. Although this property is easily measurable in a laboratory, or in the field, for its application at the small scale, most of the time it is required to be used on a large scale [5,6]. This inconvenience brings us numerous tests for soil measurements, which is time consuming, costly, and impractical [7,8].
In recently years, there are a lot of studies about pedotransfer functions (PTFs) [7][8][9][10]. These mathematical models allow to estimate the saturated hydraulic conductivity (K S ) from some soil characteristics such as texture, field capacity, the permanent wilting point, bulk density, porosity, and organic matter, among others [7][8][9][10]. The robustness of the model is link to the number of physical parameters used to calculate the saturated hydraulic conductivity; the more parameters, the more accurate the prediction. However, as it was mentioned before, the number of measurements makes the PTFs difficult to get, due to economic resources and time, which presents a limitation in this kind of function.
Diverse pedotransfer models can be found in the existing literature [7][8][9][10][11][12], and several of them validated with already known databases like UNSODA (Unsaturated Soil Hydraulic Database), ROSETTA (a computer program for estimating soil hydraulic parameters with hierarchical pedotransfer functions), among others. However, the predictive capacity they have has been questioned, because the soils in which they want to apply are different from the soils used for their development [13]. Those results indicate that, for its use at the local level, it is necessary to take in-situ samples to validate its application [14], or where necessary, make the appropriate corrections or adjustments [8]. Thus, the objectives of the present work are: 1) Use some PTFs from the literature to obtain the hydraulic conductivity of the sampled soils with a great diversity of textural classes; 2) Develop new PTFs based on the principal component analysis technique to reproduce the observed data and validate them.

Pedotransfer Functions in Literature
For the estimation of K S , some of the models shown in [10] were tested, however, due to in the samples analyzed in the laboratory in this work the content of organic matter was not quantified, some models were used using the measured variables (Table 1). Statistical analysis of predictive capacity was performed with the R stats package [15] using non-linear least squares estimation [16].  [19] Abbreviations are as follows. Sa: Sand (%); Cl: Clay (%); θ s : saturation water content (cm 3 cm −3 ); ρ a : bulk density (g cm −3 ) and the coefficients from a to m are obtained by fitting the model to the experimental data.

Soil Data Base
The database used in this study was developed from samplings in 900 plots in the Irrigation District 023 San Juan del Río Querétaro ( Figure 1). These samples were send to the laboratory to obtain the following parameters: soil texture by the Bouyucos hydrometer, bulk density by the cylinder method of known volume, moisture content at saturation, field capacity and permanent wilting point by the method of the pressure membrane pot, and the saturated hydraulic conductivity by the variable head permeameter method. The measurement of the variables and the hydrodynamic characterization of soils are widely discussed in [20].

Statistical Analysis
The accuracy of PTFs in predicting hydraulic conductivity was evaluated by calculating three statistical measures between the predictions and observations, namely root mean square error (RMSE) between the measured and predicted hydraulic conductivity, the modeling efficiency (EF), and the mean absolute error (MAE): where Ksm is the measurement hydraulic conductivity (cm h −1 ), Ksp the predicted hydraulic conductivity (cm h −1 ), K the mean of the measured values, and N the total number of observations. For best performing models, the value of MAE should be close to zero [8]. Also, for the best matching between predictions and observations, the value of the RMSE should be as small as possible and the value of EF should be as high as possible. The modeling efficiency, EF values greater than or equal 0.2 and less than 0.50 were considered acceptable. EF greater or equal to 0.5 indicated high

Statistical Analysis
The accuracy of PTFs in predicting hydraulic conductivity was evaluated by calculating three statistical measures between the predictions and observations, namely root mean square error (RMSE) between the measured and predicted hydraulic conductivity, the modeling efficiency (EF), and the mean absolute error (MAE): where K sm is the measurement hydraulic conductivity (cm h −1 ), K sp the predicted hydraulic conductivity (cm h −1 ), K sm the mean of the measured values, and N the total number of observations. For best performing models, the value of MAE should be close to zero [8]. Also, for the best matching between predictions and observations, the value of the RMSE should be as small as possible and the value of EF should be as high as possible. The modeling efficiency, EF values greater than or equal 0.2 and less than 0.50 were considered acceptable. EF greater or equal to 0.5 indicated high performance for the evaluated PTFs. The PTF that has the highest EF value was the best performing function [8,21]. Table 2 shows a summary of the texture results obtained in the laboratory. It can be seen that the samples collected cover 11 of the 12 textural classes, and for this set, the predominant soils in the study area, are those of loamy silty clay and silty clay texture. Furthermore, Table 3 presents a summary of the statistical properties of each of the variables used in this study.

Saturated Hydraulic Conductivity Obtained with Existing Models
A comparison of laboratory measured data on saturated hydraulic conductivity (K S ) with those estimated (K SE ) with models PTF-1-PTF-6 are shown in Figure 2. The dotted diagonal line corresponds to the 1:1 ratio, while the horizontal indicates a value of zero in the estimation of the residuals. Pearson's correlation coefficient and standard deviations of the models used are shown in Table 4.  The performance of PTFs were affected when use less to two measurement (PTF-5 and PTF 6) and the results are better when use more than three. The PTF-1 needs three parameters (sand, clay, and saturation water content) and shows less variance and a higher Pearson correlation coefficient (R 2 = 0.9953). However, more physical parameters imply a high cost and more time in the laboratory.
According to the residual analysis [22,23], any value greater than 2 corresponds to anormal values in the distribution of errors, commonly called "outlier", and this characteristic appear in all  The performance of PTFs were affected when use less to two measurement (PTF-5 and PTF 6) and the results are better when use more than three. The PTF-1 needs three parameters (sand, clay, and saturation water content) and shows less variance and a higher Pearson correlation coefficient (R 2 = 0.9953). However, more physical parameters imply a high cost and more time in the laboratory.
According to the residual analysis [22,23], any value greater than 2 corresponds to anormal values in the distribution of errors, commonly called "outlier", and this characteristic appear in all evaluated models (Figure 2). The first four models begin to show this characteristic in the range of 3.6 to 4.25 cm h −1 (PTF-1 -PTF-4). However, in the PTF-5 model it is presented from 2.00 cm h −1 . In addition, this model has more variation in the estimation of K S . This indicates that the variance of the errors is not constant with respect to the value of the saturated hydraulic conductivity, which is reflected in the range of application in the estimation of this parameter [24]. In addition, the models where they present a better prediction, there is a greater presence of outliers, that is, the existence of anormal values in the distribution of errors. However, this can be attributed to the low predictive capacity of the models used, land use, characteristics of the plot, and formation factors, among others [24][25][26].

Principal Component Analysis
Notwithstanding the predictive capacity of some models, the number of variables necessary to be able to apply them is high. Also, when taking too many variables on a set of objects, we will have to consider an excessive number of possible correlations. In order to explore the principal variables which describe the sample and make an improvement in the PTF, we perform a principal component analysis (PCA). Despite the fact that this method has been proven to induce a reduction in the number of variables [27], it is useful for finding some relations between variables too. The calculations were made with the "Factoextra" package [28] and "FactoMineR" [29] from R using the data described in Table 3. The results are shown in Figure 3.
Agronomy 2020, 10, x FOR PEER REVIEW 6 of 10 evaluated models (Figure 2). The first four models begin to show this characteristic in the range of 3.6 to 4.25 cm h −1 (PTF-1 -PTF-4). However, in the PTF-5 model it is presented from 2.00 cm h −1 . In addition, this model has more variation in the estimation of KS. This indicates that the variance of the errors is not constant with respect to the value of the saturated hydraulic conductivity, which is reflected in the range of application in the estimation of this parameter [24]. In addition, the models where they present a better prediction, there is a greater presence of outliers, that is, the existence of anormal values in the distribution of errors. However, this can be attributed to the low predictive capacity of the models used, land use, characteristics of the plot, and formation factors, among others [24][25][26].

Principal Component Analysis
Notwithstanding the predictive capacity of some models, the number of variables necessary to be able to apply them is high. Also, when taking too many variables on a set of objects, we will have to consider an excessive number of possible correlations. In order to explore the principal variables which describe the sample and make an improvement in the PTF, we perform a principal component analysis (PCA). Despite the fact that this method has been proven to induce a reduction in the number of variables [27], it is useful for finding some relations between variables too. The calculations were made with the "Factoextra" package [28] and "FactoMineR" [29] from R using the data described in Table 3. The results are shown in Figure 3.  Figure 3a shows that, mainly, clay, volumetric water content, and bulk density contribute the most to the first principal component and silt and sand for the second principal component. PMP is the component with less contribution to the sample. Figure 3b shows no clear correlation inside the sample (except for the points arround 0.2), but the two principal components describe 93.9% of the total sample. These results lead us to focus our efforts for new PTF in the clay, volumetric water content, and bulk density as our main variables, followed by silt and sand information. Also, as expected, bulk density, volumetric water content, and clay are highly related, taking into account that the first variable is inversely proportional to the other two.   Figure 3b shows no clear correlation inside the sample (except for the points arround 0.2), but the two principal components describe 93.9% of the total sample. These results lead us to focus our efforts for new PTF in the clay, volumetric water content, and bulk density as our main variables, followed by silt and sand information. Also, as expected, bulk density, volumetric water content, and clay are highly related, taking into account that the first variable is inversely proportional to the other two.

Development of New Models
Given the low predictive capacity of saturated hydraulic conductivity of the models used in the literature to predict the saturated hydraulic conductivity of our database, we decided to develop new models, linear, exponential, and potential, using the parameters that are most correlated with K S : Cl, θ s and ρ a . For optimization, it was decided to use the NLS method, which uses a relative displacement convergence criterion comparing the numerical imprecision in the estimates of the current parameters with the residual sum of squares, executed on the data, using the constraint to avoid failures in the method: y = f(x,θ) + eps.
The optimization of the constants in the models was performed with 50% of the data obtained randomly in each textural class. After trying to fit several models, it was decided to choose six that are shown in Table 5, where a, b, c, and d are constant. Table 5. New pedotransfer models (NPTF) and statistical measurements of the entire database.   The results obtained whit the proposed models show that the NF-4 model better predicts the data obtained experimentally by showing less variance and a higher Pearson correlation coefficient (R 2 = 0.9901), and has the best performance (EF = 0.9788, RMSE = 0.2738) followed by the NF-6 (EF = 0.9783, RMSE = 0.2110). On the contrary, with the linear model (NF-1), when using three variables, it has more bias in the residuals than the exponential model (R 2 = 0.8709). However, the NF-2 model only uses the clay content as the input variable, which makes it a simple model with good predictability to obtain saturated hydraulic conductivity (R 2 = 0.9822). Furthermore, in the six proposed models, there are no outliers. These results are particularly relevant for understanding regional soil-water dynamics for irrigation and drainage studies in plots or irrigations districts with similar soil properties.
This study and previous studies showed the importance of clay and the volumetric water content as a predictor in PTFs of saturated hydraulic conductivity. Five of the six new pedotransfer functions have an exponential form and the NF-4 showed better performance among the evaluated PTFs. The NF-2 proposed function showed better performance than other functions with more parameters. This function could be used to predict the saturated hydraulic conductivity in model simulations which use an incomplete database, because it only requires clay content and the results are better (R 2 = 0.9822).

Conclusions
The purpose of this study was to evaluate the performance of six published pedotransfer functions for predicting the saturated hydraulic conductivity and to develop six new functions. Soil samples from one irrigation district were collected (n = 900) and the measurement of eight physical parameters was conducted in the field and laboratory. As a result, when applying these pedotransfer functions to the database we have, the prediction of saturated hydraulic conductivity is deficient for values greater than 2.00 cm h −1 . However, the results obtained with the new proposed models are better. The new function NF-4 showed the best performance with the training dataset, the validation data set, and the entire data base (RMSE = 0.1983, EF = 0.9788, R 2 = 0.9901).
Testing the pedotranfer functions in soils with different conditions showed the variation of functions performances according to the characteristics of the dataset (samples number, soil use, location, factors of soil formation, and other). The study recommends that the use of pedotransfer functions for predicting saturated hydraulic conductivity should be done with care. Also, the study recommends the preferable use of functions that depend on clay and the volumetric water content as inputs due to their high performances. The importance of clay content and moisture content at saturation as predictors of PTFs for saturated hydraulic conductivity has been analyzed in previous studies. The above characteristics, and the results obtained with existing models and our own give us tools to conclude that there are no universal pedotransfer models, so the use of any of them should The results obtained whit the proposed models show that the NF-4 model better predicts the data obtained experimentally by showing less variance and a higher Pearson correlation coefficient (R 2 = 0.9901), and has the best performance (EF = 0.9788, RMSE = 0.2738) followed by the NF-6 (EF = 0.9783, RMSE = 0.2110). On the contrary, with the linear model (NF-1), when using three variables, it has more bias in the residuals than the exponential model (R 2 = 0.8709). However, the NF-2 model only uses the clay content as the input variable, which makes it a simple model with good predictability to obtain saturated hydraulic conductivity (R 2 = 0.9822). Furthermore, in the six proposed models, there are no outliers. These results are particularly relevant for understanding regional soil-water dynamics for irrigation and drainage studies in plots or irrigations districts with similar soil properties.
This study and previous studies showed the importance of clay and the volumetric water content as a predictor in PTFs of saturated hydraulic conductivity. Five of the six new pedotransfer functions have an exponential form and the NF-4 showed better performance among the evaluated PTFs. The NF-2 proposed function showed better performance than other functions with more parameters. This function could be used to predict the saturated hydraulic conductivity in model simulations which use an incomplete database, because it only requires clay content and the results are better (R 2 = 0.9822).

Conclusions
The purpose of this study was to evaluate the performance of six published pedotransfer functions for predicting the saturated hydraulic conductivity and to develop six new functions. Soil samples from one irrigation district were collected (n = 900) and the measurement of eight physical parameters was conducted in the field and laboratory. As a result, when applying these pedotransfer functions to the database we have, the prediction of saturated hydraulic conductivity is deficient for values greater than 2.00 cm h −1 . However, the results obtained with the new proposed models are better. The new function NF-4 showed the best performance with the training dataset, the validation data set, and the entire data base (RMSE = 0.1983, EF = 0.9788, R 2 = 0.9901).
Testing the pedotranfer functions in soils with different conditions showed the variation of functions performances according to the characteristics of the dataset (samples number, soil use, location, factors of soil formation, and other). The study recommends that the use of pedotransfer functions for predicting saturated hydraulic conductivity should be done with care. Also, the study recommends the preferable use of functions that depend on clay and the volumetric water content as inputs due to their high performances. The importance of clay content and moisture content at saturation as predictors of PTFs for saturated hydraulic conductivity has been analyzed in previous studies. The above characteristics, and the results obtained with existing models and our own give us tools to conclude that there are no universal pedotransfer models, so the use of any of them should be undertaken with caution. Failing that, regional equations should be developed that are able to obtain saturated hydraulic conductivity, given an error criterion, so that the results obtained are in accordance with the reality of the phenomenon and the place being studied.
Funding: This research was supported as part of a collaboration between the National Water Commission (CONAGUA, according to its Spanish acronym); the Irrigation District 023, San Juan del Río, Querétaro; and the Autonomous University of Querétaro, under the program RIGRAT 2015-2019.