Saturated Hydraulic Conductivity Estimation Using Artiﬁcial Neural Networks

: In the present work, we construct several artiﬁcial neural networks (varying the input data) to calculate the saturated hydraulic conductivity (K S ) using a database with 900 measured samples obtained from the Irrigation District 023, in San Juan del Rio, Queretaro, Mexico. All of them were constructed using two hidden layers, a back-propagation algorithm for the learning process, and a logistic function as a nonlinear transfer function. In order to explore different arrays for neurons into hidden layers, we performed the bootstrap technique for each neural network and selected the one with the least Root Mean Square Error (RMSE) value. We also compared these results with pedotransfer functions and another neural networks from the literature. The results show that our artiﬁcial neural networks obtained from 0.0459 to 0.0413 in the RMSE measurement, and 0.9725 to 0.9780 for R 2 , which are in good agreement with other works. We also found that reducing the amount of the input data offered us better results. A.Q. and A.G.-L.; visualization, J.T.-A. and B.G.-C.; supervision, C.C., C.F. and A.Q.; project administration, C.C.; fund-ing acquisition, C.C. All authors have read and agreed to the published version of the manuscript.


Introduction
Soil water movement is important in several fields, like irrigation, drainage, hydrology, and agriculture [1]. Among all the measurable quantities in soils, one of the most important is the saturated hydraulic conductivity (K S ), defined as the ability to transmit water throughout the saturated zone [2,3], which is highly correlated with the optimization of the flow rate applied to the border or furrow in the gravity irrigation [2][3][4][5][6]. Although this property is measured easily in a laboratory, or in the field, it needs to be applied at a small scale, and most of the time, it is required to be used on a large scale [5,6]. This is inconvenient due to the fact that all these tests and measurements are time-consuming, impractical, and not cost-effective [7,8].
In order to solve the inconveniences mentioned above, a great number of studies about pedotransfer functions (PTFs) were published [7][8][9][10]. These mathematical models allow us to estimate the K S from some soil characteristics, such as texture, field capacity, the permanent wilting point, bulk density, porosity, and organic matter, among others [7][8][9][10].
The robustness of the model is linked to the number of physical parameters used to calculate the saturated hydraulic conductivity; the more parameters, the more accurate the prediction. However, as it was mentioned before, depending on the measurements, the PTFs are difficult to get, due to economic resources and the time it takes to measure all the variables, which presents as a limitation for this kind of function and the predictive capacity. Besides, some works have been questioned because the soil in which they want to apply are different from the soil used for their development, such as [11].
In recent years, another alternative has been explored-Artificial Neural Networks (ANNs), which have become a common tool used as a special class of PTFs, for example, [12,13]. ANNs are an artificial intelligence that simulate the behavior of the human brain, and their structures consist of a number of interconnected elements called neurons which are logically arranged in layers, known as input, output, and hidden (see, e.g., [12] and references therein). Each neuron connects to all the neurons in the next layer via weighted connections. In Figure 1  Until now, there has been no analytical way to obtain the ideal network structure (number of hidden layers and neurons inside them) as a function of the complexity of the problem. The structure must be selected by performing a trial-and-error process. ANNs with one or two hidden layers and an adequate number of hidden neurons are found to be sufficient for most problems (e.g., [12,14]). There are also several works studying the ideal number of neurons in the hidden layers [15,16], but these methods present general guidelines for the selection in the number of neurons only.
The ANNs' name comes not only from of their structure, but because they "learn". The most common algorithm used in ANNs for the learning process is back-propagation (e.g., [17,18]). Each neuron belonging to a layer receives weighted inputs from the neurons in the previous layer and processes them to transmit its output to the neurons in the next layer, and this is done through links. Each link is assigned a weight that is no more than a numerical estimate of the strength of the connection. This weighted sum of inputs in a neuron are converted into the numerical estimate that we see, according to the nonlinear transfer function (the most commonly used is the sigmoid function). The ANNs then modify the weights of neurons in response to errors between the actual output values and the target output values, using what is known as gradient descent [19,20]. This is then applied on the sum of the squares of the errors for all the training patterns, until the mean error of the sum squared of all the training patterns is minimal or within the tolerance specified for the problem.
In this work, we use ANNs to obtain the K S in Irrigation District 023 placed in Queretaro Mexico using a sample of 900 plots. The sample, ANN configurations, and validation tests are described in Section 2. Finally, in Section 3, we show the results, and a comparison between the several configurations obtained in this work and comparisons of our results with PTFs and other ANNs in the literature.

Study Area
The Irrigation District 023 is located in Queretaro, Mexico, at 20 • 18 to 20 • 34 N, 99 • 56 to 100 • 12 W with an altitude of 1892 M.A.S.L, and it has an area of 11 048 ha. It includes the municipalities of San Juan del Rio and Pedro Escobedo ( Figure 2). Its predominant climate is semiarid with summer rains, with an annual precipitation avarage of 599 mm and annual average temperature of 20 • C [21]. The water is conducted through open channels. The main channels are lined with concrete, but all the lateral channels that carry water to the plots are unlined. The separation of the plots in some cases are by trees, unlined channels, drains, or roads [22].

Soil Database
An extensive and detailed database description can be found at [23]. In summary, the database used in this study was developed from samplings in 900 plots at Irrigation District 023, San Juan del Rio Queretaro. These samples were sent to the laboratory to obtain the following parameters: soil texture by the Bouyucos hydrometer, bulk density (ρ a ) by the cylinder method of known volume, moisture content at saturation (θ S ), field capacity (FC) and permanent wilting point (PWP) by the method of the pressure membrane pot, and the saturated hydraulic conductivity by the variable head permeameter method. From these measurements, we took seven and considered them through the paper as the input data: percentage of clay, sand and silt, bulk density, permanent wilting point, moisture content at saturation, and field capacity ( Table 1).

The ANNs' Setup
As mentioned in Section 1, there is no analytical way to define the ANN structure, but based on similar several works (e.g., [12,14,16]) we noticed that the ANNs have one or two hidden layers only. With this in mind, we tested several structures for our ANNs with two hidden layers and an additional one with three hidden layers as a test. Details about the ANNs' settings are described below. We began with all the input data (seven) and tested several configurations as follows: we start with two neurons in the first layer and two neurons in the second. Then we continue to vary the number of neurons in the first layer from two to ten, and leave the second layer constant. Next, the number of neurons in the second layer is increased to three and we vary the number of neurons in the first, again, from two to ten, and so on until the number of the second layer varies from two to ten, just like the first one. This input layer has the seven input data mentioned before. Finally, the output layer contains the K S predicted value.
Then, we changed the number of data in the input layer (decreasing it by one) and repeated the process as explained before. The choice of which parameter has to be removed is based on the importance plot (details in Section 2.4), except for the last configuration, where we remove θ S instead of the percentage of silt, because θ S is closely related with the clay percentage. We kept removing input data until we had three measurements only. For each configuration, we varied the number of neurons in the hidden layers from two to ten.
The ANN was programmed using the neuralnet package [24] and the caret package [25], both of them provided by the R software [26].
The neuralnet package trains neural networks using backpropagation, resilient backpropagation (RPROP) with [27] or without weight backtracking [28], or the modified globally convergent version (GRPROP) by [29]. The function allows flexible settings through custom-choice of error and activation function. The caret package (short for Classification And REgression Training) is a set of functions that attempts to streamline the process for creating predictive models. There are many different modeling functions in R. Some have different syntax for model training and/or prediction. The package started off as a way to provide a uniform interface for the functions themselves, as well as a way to standardize common tasks (such as parameter tuning and variable importance). Specifically, we used the train function of this package to perform the cross-validation process.
Furthermore, the calculation of generalized weights [30] is implemented. In this work, we use RPROP as an algorithm type to calculate the neural network, the sum of squared errors as a differentiable function for the calculation of the error and a logistic differentiable function for smoothing the result of the cross-product of the covariate or neurons and the weights.
Using the RMSE measurement, we selected the optimal ANN structure in each case, and finally, we kept this last ANN structure as ideal. The Mean Absolute Error (MAE) measurement was calculated as: where y i are the measurement values, x i are the predicted values, and n is the total measurement. We also calculate the RMSE as:

Cross-Validation
In order to validate our ANN results, we made a cross-validation analysis using the train function from the caret package. This function sets up a grid of tuning parameters for a number of classification and regression routines, fits each model, and calculates a resampling-based performance measure. In this case, we use the parameters optimized to neuralnet, and we generated 25 bootstrap replications for each ANN configuration. Finally, for the prediction of new samples, we used the predict function.
The train function returned several results: the ideal ANN configuration, the best RMSE, MAE and R 2 values for each tested configuration, a RMSE matrix, and an importance plot. This last plot indicates which parameter contributed the most to the K S final approximation. Based on this last plot, we decided which input data would be removed on the next run, with the exception already mentioned above.

Results and Discussion
As mentioned in Section 2, we generated several ANNs which contained all the combinations from two to ten neurons in each hidden layer. In Figure 3 we show the results for these tests. In the x axis we have the number of neurons in the first hidden layer, the y axis represents the RMSE value obtained from the 25 bootstrap replications, and each color represents the number of neurons in the second hidden layer. The election of the best configuration is based on the smallest RMSE value in these plots. Additionally, from the top to the bottom and from the left to the right, we show the variation in the input data. Another result is shown in Figure 4. This plot is a 10 × 10 matrix where the color represents the RMSE value for each configuration, the rows are the number of neurons in the second hidden layer, and the columns are the number of neurons in first hidden layer. Remember that each plot differs from the other in the number of input data in the same way that Figure 3. In general, we can see that a small number of neurons in the first layer presents higher values of RMSE. Another important result is presented in the Figure 5 where we show a density plot representation (varying the input data from top to bottom) for RMSE, MAE, and R 2 measurements derived from the bootstrap analysis. From these plots we have small variations for each measurement which indicate that the results are not highly dependent on the ANN configurations. Finally, in Figure 6, we show the importance plots, which are described in the cross-validation section presented previously. These plots help us to decide which parameter we must keep or eliminate in each run when we have different numbers of input data.
In order to explore another possibility and to get more confident results, we made a test increasing the number of hidden layers to three. We applied the same process explained previously for this new configuration, and we got an improvement of ∼9% in the RMSE values, but with 15 times more computation time. This was the reason for dismissal of these configurations.
In Table 2 we present the three best configurations of each run, where we have the number of input data, the ANN structure, and the RMSE, MAE, and R 2 measurements.
Following the results, we present Figure 7, where we compare the K S measurements with the K S predicted by the ANNs for all tested configurations. The dotted line is a 1:1 relation and the R 2 is obtained from the train function. The plotted values were obtained applying the best ANN configuration for each run (Figure 7). The histograms in the topleft corner show the residuals (∆K S ), which is the difference between the predicted and measured values.
Recall that to obtain the ideal neuron configuration, we apply the train package, which allows us to obtain the RMSE, MAE, and R 2 data for each arrangement of neurons in the hidden layers. The only variation was the input data (ranging from 7 to 3). Therefore, it was possible to obtain five different boxplots for the RMSE, whose only difference would be the aforementioned input data. This is shown in Figure 8, where a trend is observed for the RMSE to fall, while the number of input data decreases.     In Table 3 we show the results obtained from the literature with PTFs or ANNs and compare them with this work. Table 3. Comparison between several works for obtaining K S .

RMSE R 2 Type
This work 0.0413 0.9780 ANN Tamari et al. [31] 0.0707 NA ANN Brakensiek et al. [9] 0.1370 0.9953 PTF Erzin et al. [12] 0.1700 0.9970 ANN Saxton et al. [32] 0.1895 0.9915 PTF Parasuraman et al. [33] 0.1900 NA ANN Trejo-Alonso et al. [23] 0.1983 0.9901 PTF Cosby et al. [34] 0.4325 0.9546 PTF Ahuja et al. [35] 0.6498 0.8910 PTF Schaap & Leij [36] 0.7130 NA ANN Vereecken et al. [37] 0.7143 0.9307 PTF Minasny et al. [38] 0.7330 NA ANN Ferrer-Julià et al. [39] 1.3018 0.4083 PTF Merdun et al. [40] 3.5110 0.5240 ANN The results for RMSE obtained in this work are better in, at least, 35% compared with the ones presented by [31], and we reported the fifth-best value for R 2 . Another advantage of ANN is that the initial shape for the function to get the relation between the variables or a principal component analysis is not needed. Additionally, as we can see in Figure 3 and Table 2, the results are independent of the ANN structure. This is supported by the fact that the RMSE values for different configurations are very similar (the largest difference is ∼10%), and for R 2 values the difference is ∼5%, and we got ∼11% for MAE. Besides, the Figure 5 presents an almost gaussian distribution for these three statistical measurements, which is in agreement with a non-biased result. In this Figure, we also see that the density distributions are narrower, while the number of input data is smaller. This tells us that in contrast with the PTF models, we found a tendency which indicates that the less input data we have, the more accurate our prediction of K S is, as well as the Figure 8 shown this tendency too. This result can be explained by the following reasons. Based on the Principal Component Analysis of [23], we noted that the principal variables contributing the most to the sample were K S , the percentage of clay, θ S , PWP, and FC, which is supported by the importance plots ( Figure 6). Additionally, for the 900 analyzed samples contained in 10 of the 12 existing types of soil (according to the USDA Textural Soil Classification), they showed that the infiltration rate depended directly on the percentage of clay and the ρ a .

Conclusions
In this work, we developed five Artificial Neural Networks in order to calculate the saturated hydraulic conductivity based on the sample used by [23]. All networks consist of one input layer, two hidden layers, and one output layer. We tested a network with three hidden layers, but with little better results. We took 75% of the sample for training and 25% for validation. We also tested all the possible combinations for the number of neurons in each hidden layer, taking into account that the number of neurons for each hidden layer will vary from 2 to 10 neurons. Finally, we selected the best number of neurons in each layer based on RMSE measurements obtained from a cross-validation analysis.
The results show that, compared with other works, we get better or similar results for RMSE and R 2 measurements and similar configurations for our ANN. Finally, we can say that if the necessary resources are available to obtain a large number of data in the field, it is necessary to develop a study of PTF as well as ANN to compare the results of each process and be able to choose the best option between both of them. The latter will not only be based on the RMSE or R 2 measurements, but also on the desired application (a statistical ground property study or prediction for irrigation proposes). A more detailed study to define an exact range of the amount of data needed from a reliable artificial neural network study should be carried out, but the latter is beyond the objective of this work. Besides, we have to be more careful in the characteristics of the sample where the models come out. In our case, we analyzed 10 of the 12 types of soils where the bulk density and the percentage of clay became more important parameters compared to others. This made our models more reliable for almost any type of soil.
The coupling of the Saint Venant and Richards equations is the complete mathematical model for modeling gravity irrigation [41]. However, its use requires detailed information on the physical properties of the soil, as well as a series of field and laboratory experiments that can be expensive [42,43]. In this way, the results used in this article, combined with some rapid field and laboratory tests, can be an excellent alternative to reduce costs and the time used to obtain that information.
Finally, the application of artificial neural networks have been demonstrated to successfully solve classification and prediction problems, and this is probably for the nonlinear relation between the variables. The calculation of the saturated hydraulic conductivity in this work proves that we need only three variables to predict new values, but the soil properties are crucial for the correct application of these models in contrast with the ANN configuration, which has been proved to play a minor role in the final results.