Developing Pseudo Continuous Pedotransfer Functions for International Soils Measured with the Evaporation Method and the HYPROP System: II. The Soil Hydraulic Conductivity Curve

: Direct measurement of unsaturated hydraulic parameters is costly and time-consuming. Pedotransfer functions (PTFs) are typically developed to estimate soil hydraulic properties from readily available soil attributes. For the ﬁrst time, in this study, we developed PTFs to estimate the soil hydraulic conductivity (log( K )) directly from measured data. We adopted the pseudo continuous neural network PTF (PC NN -PTF) approach and assessed its accuracy and reliability using two independent data sets with hydraulic conductivity measured via the evaporation method. The primary data set contained 150 international soils (6963 measured data pairs), and the second dataset consisted of 79 repacked Turkish soil samples (1340 measured data pairs). Four models with different combinations of the input attributes, including soil texture (sand, silt, clay), bulk density (BD), and organic matter content (SOM), were developed. The best performing international (root mean square error, RMSE = 0.520) and local (RMSE = 0.317) PTFs only had soil texture information as inputs when developed and tested using the same data set to estimate log( K ). However, adding BD and SOM as input parameters increased the reliability of the international PC NN -PTFs when the Turkish data set was used as the test data set. We observed an overall improvement in the performance of PTFs with the increasing number of data points per soil textural class. The PC NN -PTFs consistently performed high across tension ranges when developed and tested using the international data set. Incorporating the Turkish data set into PTF development substantially improved the accuracy of the PTFs (on average close to 60% reduction in RMSE). Consequently, we recommend integrating local HYPROP TM (Hydraulic Property Analyzer, Meter Group Inc., USA) data sets into the international data set used in this study and retraining the PC NN -PTFs to enhance their performance for that speciﬁc region.


Introduction
Direct measurements of soil hydraulic properties in the field and laboratory can be tedious, laborious, and often expensive due to their significant inherent spatial variability. Therefore, pedotransfer functions (PTFs) are often developed and used to indirectly estimate these properties by establishing empirical relationships based on the readily available soil properties such as soil texture, bulk density (BD), and soil organic matter content (SOM) [1]. The primary soil hydraulic properties include the soil water retention and hydraulic conductivity curves (SWRC and SHCC) that define the volumetric water content's and (III) assess the performance of the developed models across soil textures and different ranges of soil tension.

Soil Data Sets
In this study, two soil data sets were used to develop hydraulic conductivity PC NN -PTFs and evaluate their accuracy and reliability. The measured hydraulic conductivity data and the soil textural classification of the samples for both data sets are shown in Figure 2. The primary data set, hereafter referred to as the international data set, was published by Schindler & Müller [20] and consisted of 173 soils collected from 71 sites from all over the world. This data set contains the measurements of water retention, unsaturated hydraulic conductivity, K(h), and several basic soil properties, including textural data, organic matter content (SOM), and dry bulk density (BD). The soil hydraulic properties were measured using the evaporation experiments or the extended evaporation method via the HYPROP method. A majority of the soil samples in the data set were collected from arable lands, yet few samples from other land use types such as urban land, grassland, forests, fallow lands and riverbanks were also present. After screening the international data set, a subset of samples (i.e., 150 soils with 6963 total K(h) data pairs) was selected to develop PC NN -PTFs. The characteristics of the selected soils are shown in Table 1. The most dominant texture was silt loam; comprising 78 soil samples (52% of the data set), followed by loam; consisting of 18 soil samples (12% of the data set). Values of K(h) were log-transformed because hydraulic conductivity data are generally log-normally distributed [11]. The measured log(K(h)) values ranged from −6.64 to 0.98 (0 to 9.65 cm d −1 ), with an average of −2.26 (0.073 cm d −1 ). The pF (logarithmic transformation of soil tension in cm of water) values ranged from 0.22 to 4.21, with an average of 2.47. Figure 2. The soil hydraulic conductivity and tension pairs (a), and soil textural distribution for the datasets (b). Dark orange circles depict the international data set [20] and blue circles represent the Turkish dataset [21,[28][29][30]. pF is the logarithmic transformation of soil tension in cm of water and K is the unsaturated hydraulic conductivity. The second data set (referred to as the Turkish data set herein) was mainly collected from areas surrounding Ankara, Turkey, and consisted of 79 repacked soil samples with 1340 K(h) data pairs that were measured via the HYPROP system [26]. In this dataset, the K s data (pF 0) were measured using the falling head method with the KSAT instrument (Meter Group Inc., Pullman, WA, USA). The K(h) points were measured for each sample with pF ranging from 1.80 to 3.91, with an average of 2.51, and log(K) ranging from −4.75 to 3.27 (0 to 1862 cm d −1 ), with an average of −1.6 (0.03 cm d −1 ). Clay was the dominant texture (38 soil samples or 48.1% of the data set), followed by sandy loam (13 samples or 16.5% of the data set). Further details about the laboratory procedures used to develop this data set are available in Haghverdi et al. [21,[28][29][30]. More information about HYPROP's measurement principles is available in Schindler et al. [28].
A statistical analysis conducted in the companion manuscript [27] using Mahalanobis distance [29] revealed that these two data sets were independent and most Turkish samples fall outside the domain of applicability of the international dataset.

Unsaturated Hydraulic Conductivity Calculations
During HYPROP measurements, saturated soil samples (closed from the base) were placed on a balance. Two tensiometers were positioned such that the tensiometers' tips were at depths of 0.25 L cm and 0.75 L cm, where L was the soil column height (which is typically 5 cm in laboratory evaporation experiments). The soil surface was open to the ambient atmosphere so that the soil water could evaporate. The medial pF value of the sample was calculated based on the average value of the two tensions measured by two tensiometers and corresponding water content was calculated based on the mass change of the soil sample.
The hydraulic conductivity was calculated using the water flow velocity (q i [cm/d]) between time points t i-1 and t i through a horizontal plane that laid exactly in the middle of the two tension-tips: where, ∆V i is the change in water volume in the whole sample (cm 3 ), ∆t i is the time interval between two consecutive measurement points, and A the cross-sectional area (cm 2 ) of the column. The data points for the hydraulic conductivity function were calculated by inverting Darcy's equation as: where, h i (cm) is the time-and space-averaged tension, ∆h i is the difference of tensions between the two tensiometer tips, and ∆z (cm) is the distance between the tensiometer tips. The calculations assume that moisture tension and water content distribute linearly through the column and, therefore, the arithmetic mean of the tensions at two points was used. This simplified assumption was shown to provide accurate results because linearity errors in fluxes and tensions cancel each other out. [23]. The effect of hysteresis on water flow and transport is well understood [30]. However, since HYPROP measurements are taken during natural evaporation-based drying of soil samples, only drying hydraulic path was considered in this study.

PC NN -PTFs Development
A three-layer feed-forward perceptron model was developed using MATLAB R2019a (Mathworks, 2019). The transfer functions were the "hyperbolic tangent sigmoid" and "linear" for the hidden and the output layers, respectively. The Levenberg-Marquardt algorithm [31] was used for training the models. The maximum epoch (one complete pass of the training data set through the learning process) was set to 1000 and the best weights were loaded automatically for testing.
Soil samples were randomly partitioned into 5 folds such that 80% of the data were used for the development of the PC NN -PTF models and 20% for testing the models. The bootstrap technique was used on the development set to generate 100 replica datasets, each containing approximately 67% of the data. The rest of the development data (~33%) were used for cross-validation of the NN models. The training process was terminated when the root mean square error (RMSE) of the cross-validation subset began to increase or remain unchanged. To find the optimal topology of the neural network, the number of neurons of the hidden layer was iteratively changed from 1 to 14. This process was repeated five times leaving aside a different fold as the test set each time, such that all samples in the data set were used for testing the models. The outputs of the 100 PC NN -PTFs with optimum topology were averaged to obtain the hydraulic conductivity estimations.

Modeling Scenarios
We evaluated the accuracy and reliability of the PC NN -PTFs (developed using the international and the Turkish data sets) with four combination models of the input attributes, including textural constituents-sand, silt, and clay (SSC), BD, and SOM (Table 2). Model 1 constituted all the input attributes and the logarithmic transformation of soil suction (pF). Model 2 included SSC and pF. Model 3 included SSC, BD, and pF. Model 4 included SSC, SOM, and pF. The log(K (cm/d)) was the output parameter corresponding to the input pF value. Four data partitioning scenarios, as shown in Table 3, were considered when the international data set was used for training and testing (scenario 1), the Turkish data set was used for training and testing (scenario 2), the international data were used for training and the Turkish data for validation (scenario 3), and a combination of the two data sets was used for training and the Turkish dataset for testing (scenario 4). The accuracy of PTFs was assessed using a randomly selected subset of the development data set that was not used to derive the PTF. The reliability was evaluated based on the performance of PTFs on an independent data set beyond the statistical training limits and the geographical training area of the development dataset. For example, we estimated the log(K) of the Turkish soil samples to assess the reliability of the PC NN -PTFs derived using the international data set. The results of the modeling scenarios were assessed to (i) quantify the improvements in international PC NN -PTFs for a specific region after incorporating local samples into the training data set and (ii) determining whether the international PC NN -PTFs trained using the integrated data works as accurately as the local PC NN -PTFs.

Model Evaluation
The root mean square error (RMSE, Equation (1)), mean absolute error (MAE, Equation (2)), mean bias error (MBE, Equation (3)), and correlation coefficient (R, Equation (4)) were calculated for the test data to evaluate the performance of PC NN -PTFs: where, E and M are the estimated and measured log(K), respectively; E and M are the mean estimated and measured log(K), respectively; and n is the total number of measured water retention points for each model. In addition, the error statistics were calculated separately for dominant soil textures at the wet (pF ≤ 2), intermediate (2 < pF ≤ 3), and dry ranges (pF > 3) of the SHCC. Figure 3 shows the scatterplots of measured versus estimated log(K) values for the PC NN -PTFs developed in this study using different combinations of input predictors. All models showed acceptable performance, demonstrated by the well-scattered data around the 1:1 reference line except for the K s estimations in scenario 3 (training: the international dataset, test: Turkish datasets).     Figure 3 as well, where data points are well scattered but located mainly below the 1:1 line.

Importance of the Input Predictors
For scenario 4 (training: combined international and Turkish data sets, test: the Turkish data set), the best performance was observed for Model 3 with RMSE of 0.453 and MAE of 0.308. Model 1 also had a similar performance. Slight underestimation of log(K) was observed with MBE ranging from −0.335 for model 2 to −0.139 for Model 1. Correlation between observed and estimated log(K) was high and similar among all models, with R values ranging from 0.906 to 0.947.
No distinct relationship was observed between BD and SOM with RMSE values except for the Turkish clay soils where RMSE declined as BD increased (Figure 4).

Performance across Soil Textures
The following analysis was only conducted using model 1 as the best performing PTF in the test phase. Table 5 shows the performance of the PC NN -PTF models for the dominant soil textures, representing about 89% and 92% of the international and Turkish data sets, respectively. When the international data set was used as the training set (scenario 1), clay loam had higher RMSE and MAE values than other soil textures. RMSE values ranged from 0.517 to 1.124, MAE values ranged from 0.342 to 0.748, and MBE values ranged from 0.026 to 0.288 for all textures. Furthermore, the model showed a tendency to overestimate log(K) for all soil textures, except loam, where underestimation of log(K) was observed. The correlation coefficient (R) values varied between 0.603 in clay loam to 0.881 for silt loam. When only Turkish data were used for training (scenario 2), RMSE and MAE values varied from 0.206 to 0.395 and 0.146 to 0.312, respectively. MBE values ranged from −0.096 for sandy loam to 0.018 for clay loam, and no substantial underestimation or overestimation of log(K) was observed. The agreement between the measured and estimated log(K) values was very high, indicated by high and similar R values (between 0.926 and 0.982) for all the models within each soil texture.
When the international data set was used for training and the Turkish data set for validation (scenario 3), RMSE and MAE values varied from 0.964 to 1.444 and 0.863 to 1.377, respectively. Underestimation of log(K) was observed for all the soil textures with MBE values ranging from −1.370 to −0.860. Loam had the highest error relative to other soil textures, while clay had the lowest. High and similar correlation coefficient values (between 0.917 and 0.978) were observed for all the models and soil textures.
When the Turkish data set was used as a test and a combination of international and Turkish data sets were used for the training (scenario 4), the RMSE and MAE values varied from 0.230 to 0.745 and 0.173 to 0.683, respectively. The loam and sandy loam had higher RMSE and MAE values, while the errors for clay loam and clay were similar to when just the Turkish data were used for training (scenario 2). The MBE values ranged from −0.614 to −0.013, showing slight underestimation of log (K) for most soil textures except clay and clay loam where underestimation was not substantial. The agreement between the estimated and observed log(K) was high, as depicted by the high R values (ranging from 0.915 to 0.985) for all the models. Table 6 shows the performance of different PC NN -PTFs over three moisture ranges of the SHCC for the four data partitioning scenarios evaluated in this study. When the international data set was used for the training and testing of the models (scenario 1), the RMSE of Model 1 varied from 0.548 in the wet range to 0.603 in the dry range. The MAE values varied from 0.420 in the wet range to 0.440 in the intermediate range of the SHCC. The MBE values varied between −0.060 in the wet range to 0.140 in the dry range. The R values ranged from 0.509 for the wet to 0.640 in the intermediate range. When only Turkish data were used for the training and testing of the PC NN -PTF models (scenario 2), the lowest error was observed in the intermediate range (

Accuracy and Reliability of the Developed PTFs
In our study, the best international hydraulic conductivity PC NN -PTF showed the accuracy (same data set for development and test) and reliability (independent data sets for development and test) of RMSE = 0.520 and 1.097, respectively. The local PC NN -PTF developed and tested using the Turkish dataset showed even higher performance, as expected, with an RMSE of 0.317. Parasuraman et al. (2006) stated that better performance in estimating K s is observed when a NN model is trained even on a small set of relevant data rather than a larger general dataset. Our study emphasizes that a local data set, when available, should be included in the training of PC NN -PTF for a more accurate estimation of the SHCC.
Schaap and Leij [6] reported RMSE values ranging from 1.12 to 1.76 for calibration subset and from 1.18 to 1.77 for an independent validation data set for their hydraulic conductivity PTFs, indicating lower accuracy and reliability than the PC NN -PTF developed in this study. Børgesen et al. [7] reported a reasonable accuracy for their hydraulic conductivity PTFs with RMSE ranging from 0.598 to 1.196, yet most of the models showed underestimation. The above mentioned studies used the typical procedure to estimate the SHCC, which relies on estimated or measured K s values and parametric SWRC PTFs using NN approach combined with bootstrap. Therefore, the PC NN -PTFs approach developed and tested for the first time in this study could be used as an alternative high-performance approach to estimate the SHCC.
Large data sets being used to develop international PTFs typically consist of smaller data sets, employing different techniques to measure soil hydraulic properties. The commonly used devices have discrepancies in K s measurements because of factors such as sample size, soil conditions, flow geometry and installation procedures [32]. The same is true for the various devices used for unsaturated hydraulic conductivity measurements such as the steady-state pressure membrane method, tension disc infiltrometer, hot-air methods, and the widely used multistep outflow method [33][34][35]. We recommend using HYPROP data sets for developing hydraulic conductivity PC NN -PTFs. PC NN -PTF takes advantage of the high resolution measured data provided by the HYPROP system to learn the shape of the SHCC directly from the actual measured data points, unlike the parametric PTFs where the relationships between the parameters and their predictors have to be known a priori. Furthermore, using only one method (evaporation experiment) for obtaining hydraulic conductivity data in the laboratory is expected to improve the performance of the PC NN -PTFs by eliminating the variance related to employing multiple measurement techniques.

Importance of Input Variables
Studies have shown that the systematic variation in K s is explained by properties like soil texture, porosity, SOM, and BD [36,37]. According to Zhang and Schaap [38], adding BD and SOM as input predictors improved the performance of PTFs in most studies estimating the K s . Moosavi and Sepaskhah [12] observed that the combination of inputs SSC, BD, SOM (Model 1 in this study) and SSC (Model 2 in this study) produced the best accuracy in estimating unsaturated hydraulic conductivity. In our study, considering SOM and BD as extra input attributes in addition to soil texture did not improve the accuracy of the international and local PTFs (scenarios 1 and 2). However, adding BD and SOM as input attributes noticeably enhanced the performance of the PC NN -PTF in scenarios 3 and 4. This result differs from our observation in the companion paper (Singh et al. [27]), where adding SOM as an extra input did not improve the performance of the water retention PC NN -PTFs.
Except for scenario 1, BD was a more effective additional attribute than SOM in all modeling scenarios. Improvements in the estimation of log(K) were observed in other studies too when BD was included as an additional PTF input predictor [6,39]. The BD, however, only provides limited information about soil structure as different preferential flow pathways may result in substantially different soil hydraulic conductivities [40]. Hao et al. [41] reported that K s is influenced primarily by porosity and macro waterstable aggregates, which are not among typical PTF inputs. Further studies are needed to determine the impact of considering additional soil structural input parameters on the performance of hydraulic conductivity PC NN -PTF.

Performance across Textural Classes and Tension Ranges
The PC NN -PTFs showed better performance for fine-textured (Clay and Clay Loam) than more coarse-textured (Loam and Sandy Loam) Turkish soils (scenarios 2, 3, and 4), which is attributed to a relatively higher number of fine-textured Turkish soil samples. We observed an overall improvement in the performance of PTFs with the increasing number of data points per soil textural class ( Figure 5). These results concur with the results we reported in the companion paper (Singh et al. [27]), where high performance was observed for the dominant soil textures for the SWRC estimations. Since PC NN -PTFs are machine learning-based models, their performance is expected to improve as more data become available for training. The performance of PC NN -PTFs was consistent across tension ranges when developed and tested using the international data set, which concurs with the results observed in the companion paper for soil water retention estimations [27]. Moosavi and Sepaskhah [12] reported a relatively lower accuracy of the NN-based PTFs to estimate hydraulic conductivity at the saturated and/or near-saturated tensions. We observed a somewhat higher error in the wet tension region (pF ≤ 2) for the Turkish data set, primarily when PC NN -PTFs were developed using the international data set. In our companion paper, however, the performance of water retention PC NN -PTFs was similar in three tension regions. The relatively higher error in the wet range in this study was because the Turkish data set only contained K s data in the wet part (measured via the KSAT instrument), while K s measurements were not available for the international data set.

Conclusions
We developed and evaluated PC NN -PTFs to estimate the SHCC measured using the evaporation experiments, mainly via the HYPROP system. The PC NN -PTF approach showed promising performance for continuous hydraulic conductivity estimation over a wide range of soil tensions. The HYPROP system offers the advantage of producing high-resolution soil hydraulic conductivity data over a wide range of soil tensions (pF = 1.5 to 3.5), which is critical for developing robust PC NN -PTF models since this approach learns the shape of the SHCC directly from measured data. The KSAT instrument can be employed to measure the saturated hydraulic conductivity (K s ) that can be used along with HYPROP data. The water retention PC NN -PTFs developed and validated in the first part of this study (Singh et al. [27]) also performed very well. Consequently, we recommend the PC NN -PTF approach to derive the next generation of water retention and hydraulic conductivity models using high-resolution data measured via the HYPROP system.