Parameter Estimation for Soil Water Retention Curve Using the Salp Swarm Algorithm

: This paper employs an optimization algorithm called the salp swarm algorithm (SSA) for the parameter estimation of the soil water retention curve model. The SSA simulates the behavior of searching for food of the salp swarm and manages to ﬁnd the optimal solutions for optimization problems. In this paper, parameter estimation of the van Genuchten model based on nine soil samples, covering eight soil textures, is conducted. The optimization problem that minimizes the difference between the measured and the estimated water content is formulated, and the SSA is applied to solve this problem. To validate the competitive advantage of the SSA, the experimental results are compared with Particle Swarm Optimization algorithm, the Differential Evolution algorithm and the RETC program, which indicates that SSA performs better than the three methods.


Introduction
Soil water retention is a key soil property, playing a significant role in many applications in the fields of irrigation, hydrology, geotechnical engineering, and soil science in general [1].The soil water retention curve (SWRC), which gives the relation between the amount of water contained in soil and the soil water potential, is widely used to estimate unsaturated soil properties [2].The method of measuring the SWRC directly can be time-consuming, and improper measurement may lead to greater errors.Thus, mathematical equations to approximate the general form of the SWRC have been suggested, and several different analytical models, including Farrell-Larson [3], Brooks-Corey [4], Fredlund-Xing [3], Gardner [5], van Genuchten [6], Campbell [7], and Russo [8,9] models, have been proposed.These models usually contain several parameters that need to be identified based on collected experimental data.An optimization problem that minimizes the difference between the measured and the estimated water content are formulated to determine the parameters.In order to address the optimization problem, nonlinear least-square (NLS) [6] methods and heuristic search methods [10] are usually considered.
NLS methods, a form of least squares analysis, are used to fit a set of m observations with a model that is non-linear parameterized by n unknown parameters (m > n) [11,12].The basis of the NLS method is to approximate the model by a linear function and to refine the parameters by successive iterations.The RETC algorithm is a representative NLS method [13] and is widely used in parameter estimation of SWRC.However, the estimation result of RETC cannot be guaranteed to be a globally optimal solution, which requires parameter initialization with prior knowledge of specified soil types.On the contrary, the heuristic search methods do not need prior knowledge to initialize the model parameters, which contributes to a performance that is better than that of the NLS methods in the parameter estimation of SWRC.However, these heuristic search methods require some specified parameters that could have effects on the final estimation solution, and the adjustment of these parameters may be also time-consuming [14], such as the inertia weight, and learning factors in particle swarm optimization (PSO) [15].
There are many related studies on the parameter estimation of SWRC.Hosseini et al. [2] used the van Genuchten and Fredlund-Xing equations for the estimation of the SWRC, which takes the plasticity index and fine content into consideration.Xing et al. [16] proposed a vertical infiltration method to estimate parameters α and n and further develop the SWRC and establish the relationships describing the cumulative infiltration and infiltration rate with the depth of the wetting front.Oh et al. [17] modified van Genuchten's function for SWRC within the range of low matric suction to improve the Mualem hydraulic conductivity (HC), and van Genuchten-Mualem hydraulic conductivity was then modified to integrate the proposed SWRC for each interval decomposed by a tangential curve.The modified model was tested on Korean weathered soils and accurately predicted the unsaturated behavior of the HC functions.Wang et al. [18] proposed a method based on the Jaya algorithm to estimate the parameters of the van Genuchten model via four different soil samples, and the results showed the superiority of the proposed method.Matula et al. [19] applied Woesten's continuous pedotransfer functions to 140 SWRCs measured in a selected location, and the results showed that the general equation of Woesten's pedotransfer functions were not very suitable to estimate the SWRC for the location studied.Maggi [20] used a method based on the differential evolution algorithm for the determination of the characteristic parameters of several water retention models from the experimental data, and the method could find the optimal model parameters without any prior information.Chen et al. [21] used the Monte-Carlo method to analyze the sensitivity of the parameters and uncertainty of the van Genuchten model to obtain the key parameters and a posteriori parameter distribution to guide the parameter identification, and a new type of intelligent algorithm-difference search algorithm was then employed to identify these parameters.Nascimento et al. [22] estimated the parameters of the van Genuchten model using the inverse modeling function of Hydrus-1D, and the results revealed that the Hydrus-1D can simulate well the behavior of matric potential and moisture over time if the field data is fed.
In order to avoid the additional, unnecessary work caused by adjusting these specified parameter in the SWRC parameter estimation, the SSA was employed to determine these parameters.Taking advantage of the lack of algorithm-specified parameters in SSA [23], the parameter estimation of the SWRC based on nine different soil samples, covering eight soil textures, is performed and the van Genuchten Model is selected as the SWRC model.To verify the efficient performance of this algorithm, the estimation solution is compared with the PSO algorithm, the differential evolution (DE) algorithm, and the RETC program.

The Van Genuchten Model
The van Genuchten Model was proposed by van Genuchten in 1980 [6]. Due to the superiorities of simple form and high fitting ability, the van Genuchten Model has gained popularity and is widely used in research on soil water retention.The van Genuchten Model depicts the relation of the water contained ϕ(h)(cm 3 cm −3 ) and soil water potential h(cm −1 ).The formula of the van Genuchten Model is shown in (1): where ϕ r is the residual water content, ϕ s represents the saturated water content, α is a parameter given by experience, n is the shape parameter of the curve, and m = 1 − 1/n.
Referring to the optimization problem formulated, the sum of squared errors (SSE) is utilized as the metric in order to evaluate the difference between the measured and the estimated water content.The SSE is defined as follows: where the ϕ(h k ) r is the measured data, while the ϕ(h k ) e is the estimated value corresponding to the h k , and N is the total number of measurements in each soil sample.

Salp Swarm Algorithm
The salp swarm algorithm was proposed by Seyedali Mirjalili in 2017 [23] and is used in many fields.For example, Sayed et al. [24] used an SSA-based algorithm for feature selection, and the results indicate that the algorithm has the capability of finding an optimal feature subset, which maximizes the classification accuracy and minimizes the number of selected features [24,25].It can also used to adjust the hyper-parameters of neural networks and other machine learning models [26][27][28].
The main idea of SSA is inspired by the predatory behavior of salp, a kind of creature in the ocean.The salps often form a swarm called a salp chain and move forward to the food resource, which has not been able to be explained clearly until now.In SSA, the population is divided into two groups, leaders and followers, according to the position in the chain.The leader is in the front of the chain and the followers follow it.For a specified model, assume that n variables need to be estimated, x denotes the position of a salp, and y defines the food source, which indicates the target of the swarm in the search space.The leader salp updates its position by (3) in the search process: where the x 1 i is the position of the first salp in the ith dimension, and y i is the food position in the ith dimension.lb i and ub i represent the lower bound and the upper bound of the ith dimension, and r 1 , r 2 , r 3 are three random numbers.
Among the three random numbers, r 1 occupies the dominant position because it balances exploration and exploitation during the entire search process, and it is defined as where L is the maximum iterations, l is the current iteration, and r 2 , r 3 are random numbers between [0,1].For the followers, the following equation is used to update their positions according to Newton's law of motion: where j ≥ 2, x j i indicates the position of the jth salp in the ith dimension, t is the time, δ 0 is an initial speed, and λ = δ f inal δ 0 , where δ = x−x 0 t .Considering the assumption that δ 0 = 0, and t equals the iteration in an optimization problem, the aforementioned equation can be transformed into the following form: where j ≥ 2. This equation shows that the following salps update their position based on the position of their own and the prior salp.
If some salps move outside the limited search space, they will be brought back to the boundaries though the limitation formula: All of the above updates are iteratively executed until satisfaction of the end condition.It should be pointed that the food source can be updated sometimes because the salp chain may find a better solution by exploring and exploiting the space around the current solution.In other words, the salp chain has the ability to move toward the global optimum solution during the optimization.The procedure of the SSA is illustrated in Algorithm 1, which shows that the SSA starts approximating the global optimum by initiating multiple salps with random positions, updates these positions iteratively, and finally finds a global optimization solution.A flowchart of the SSA is shown in Figure 1.

Algorithm 1
The Procedure of the Salp Swarm Algorithm (SSA) Algorithm.

Benchmarking Algorithms
To assess the performance of the SSA in parameter estimation, its estimation results are compared with the PSO and DE algorithms, as well as the RETC program.
PSO is an evolutionary algorithm-algorithm, which starts from a random solution and finds the global optimum by following the optimal value found in the current research.The advantage of PSO is that it is easy to implement and there are not many parameters to be adjusted.The updating formula of PSO is where ω, c 1 , and c 2 are three specified parameters, δ i and δ i represent the new updated and the current velocities, and pbest i is the best solution found by the ith particle, while the gbest i indicates the global best solution.
The RETC program, a computer program which includes an NLS algorithm for optimization, developed by the U.S Salinity Laboratory, can be used to predict the hydraulic conductivity from observed soil water retention data.The executable program can be obtained from the Laboratory's website freely, and the program can work normally on a Windows 10 system.The end condition is satisfied?

Estimation Algorithms and Dataset
In this paper, the parameter estimation of the van Genuchten Model using the SSA is performed.In order to show the universality of SSA, nine different soil samples that consist of eight soil textures are employed to conduct the experiment.Furthermore, the PSO and DE algorithms, as well as the RETC program, are utilized as benchmarks for comparison.
The DE algorithm is a type of evolutionary algorithms for solving the global optimal solution in multidimensional space [29].In the DE algorithm, M parent vectors, each of which consists of j-dimensional vectors, are initialized as where r j is randomly generated in (0, 1).In the gth generation, three different individual vectors, x i1 , x i2 , and x i3 , are selected randomly to compute the mutant vector p i : where F is a constant factor in [0, 2].Then the trial vector is formed: where R j is a parameter specified by user which is a constant in [0, 1), R m is generated randomly in set {0, 1, ...., M − 1, M}.In the process of vector updating, the vector in x i,j (g) and u i,j (g), which has a smaller fitness value for the objective function would be selected as the parent vector for the next generation.The iteration process ends until the set condition is satisfied.

Data Description
The data for this experiment are collected from the UNSODA database and can be obtained freely.The nine soil samples data recorded contain the soil water potential and the water content.More details of physical properties are represented in Table 1.From Table 1, we can see that the data are different in type, bulk density, and location, and the number of data points covers from 5 to 21, which indicates that the effectiveness of the proposed method is validated with different datasets to examine the universality of this algorithm.The lower bound and upper bounds of parameters, ϕ r , ϕ s , α, and n of the van Genuchten Model are set and presented in Table 2.

Estimation Results and Discussion
To estimate the parameters of the van Genuchten Model efficiently using the SSA, the population size was set to 50 and the maximum number of iteration was 30,000.In this experiment, the end condition was that the current iteration reaches the maximum iteration.Furthermore, in order to search the set of parameters corresponding to each soil sample, data with different sizes were used, and the estimation procedure was repeated.
For the three benchmarking methods, the algorithm-specific parameters in DE and PSO were set according to [20,30], and a precompiled version RETC was applied.The source code of the SSA was written in Matlab, and a Windows 10 system with an Intel Core i5@3.20GHzCPU and 4GB memory was used to conduct the experiment.The computational results of the experiment are shown in Table 3.
Table 3.Estimated parameters and the sum of squared errors (SSE) of the van Genuchten model using SSA, differential evolution (DE), particle swarm optimization (PSO), and RETC.

Soil Sample ID
Algorithm ϕ r (cm 3 cm −3 ) ϕ s (cm 3 cm −3 ) α (cm −1 ) n SSE (10 As shown in Table 3, we can see that the SSA performs better than the three benchmarking methods for Soil Samples 3020, 1120, 3154, 1102, 1361, 1173, and 2400, which indicates that SSA can find more accurate parameter sets in the van Genuchten model.However, it is observable that the SSA has the same SSE value with DE for Soil Samples 1330 and 1162.Meanwhile, it can be seen that the RETC program shows a good performance for Soil Samples 3020, 1120, 3154, and 1330, but it has poor performance for the remaining soil samples, especially Soil Sample 1361, where the SSE of RETC is 1.092369 × 10 −2 .For PSO, its performance is similar to that of DE.The results of the nine soil samples show the superiority of heuristic algorithms compared with the NLS method for the SWRC parameter estimation problem. In order to illustrate the difference between the measured data and the estimated data, Figure 2 shows the details of SWRCs of all eight soil samples corresponding to the eight soil textures.Figure 2 visualizes the estimation results, and it can be observed that the estimated curves with the modeled data can fit the measured data accurately for Samples 3020, 1120, 2400, and 1173 (see in Figures 2a,b,f,h), which have fewer data points than the rest of the soil samples.Correspondingly, it is observable that, for Soil Samples 1330 and 1162, which have more data points, the SWRC cannot perform as well as it can for Soil Samples 3020 and 1120.From the values of SSEs recorded in Table 2 and the eight subgraphs, it can be observed that the SSA can estimate parameters of the van Genuchten model accurately even with a smaller dataset.Apart from the lower and upper bounds of the set of parameters need to be estimated, the SSA does not have any other model-specified parameters.Due to this advantage, the SSA is applicable to estimate parameters of the van Genuchten model simulating the SWRC.
It is not practical to perform the parameter estimation with a normal computer in the field.Thus, a portable computing device is a better option.To verify the performance of the SSA on embedded devices, we implemented the parameter estimation algorithm on Raspberry Pi 3 using the Octave language.The hardware specification of the Raspberry Pi 3 platform was a 64-bit quad core processor, with on-board WIFI, Bluetooth, and USB boot capabilities, and 1 GB of memory.The lower and upper bounds, population size, and maximum iteration were set to the same value as those on the Windows 10 system in the verification experiment.Soil Samples 2400, 3020, 1361, 1330, and 1102 were selected, and the estimation SSEs are given in Table 4. From Table 4, the estimated SSEs are the same as those on the Windows system.The results show that the SSA can perform well on the Raspberry Pi 3, a embedded computing device, which makes the algorithm applicable on portable devices in the field.

Conclusions
A new heuristic algorithms SSA was utilized to estimate the parameters of the van Genuchten model, and nine soil samples, covering eight soil textures, were collected to validate the universality of the SSA.
In the nine estimation experiments, the SSA performed better than the three benchmarking methods, including DE and PSO algorithms, and the RETC program.The SWRC using parameters estimated by SSA based on the van Genuchten model could fit the measured data well and provided minimum values of SSE among all the algorithms.Furthermore, by comparing the estimated SSEs on both Raspberry Pi 3 and Windows 10 platforms, the SSA can obtain the same results, so it is suitable for the embedded implementation.

Table 1 .
The physical properties of nine soil samples.

Table 2 .
The lower and upper bounds of the four model parameters.

Table 4 .
SSEs of SSA on Raspberry Pi 3 and a Windows 10 system.