Novel Nature-Inspired Hybrids of Neural Computing for Estimating Soil Shear Strength

This paper focuses on the prediction of soil shear strength (SSS), which is one of the most fundamental parameters in geotechnical engineering. Consisting of 12 influential factors, namely depth of sample, percentage of sand, percentage of loam, percentage of clay, percentage of moisture content, wet density, dry density, void ratio, liquid limit, plastic limit, plastic Index, and liquidity index as input variables, as well as the shear strength as the desired output, the dataset is provided through a field survey in Vietnam. Thereafter, as for used intelligent techniques, the main focus of the current study is on evaluating the efficiency of three novel optimization techniques for optimizing an artificial neural network (ANN) in predicting the SSS. To this end, the dragonfly algorithm (DA), whale optimization algorithm (WOA), and invasive weed optimization (IWO) are synthesized with ANN to prevail its computational drawbacks. The complexity of the models is optimized by sensitivity analysis. The results confirmed the effectiveness of all three applied algorithms, as the learning error was reduced by nearly 17%, 27%, and 32%, respectively by functioning the DA, WOA, and IWO. As for the testing phase, the IWO and DA achieved a close prediction accuracy. Overall, due to the superiority of the IWO-ANN ensemble, this model could be a promising alternative to traditional methods of shear strength determination.


Introduction
Soil shear strength (SSS) is defined as the resistance of soil against shear stresses [1]. It is one of the most determinant parameters in the designing process of geotechnical engineering projects [2]. For example, for designing high and massive structures, proper analysis of the SSS is very important, as the load is directly applied to the soil underneath. More clearly, this parameter enables the engineers to decide about the foundation type, and also, whether terrain improvement measures are required or not [3]. As a matter of fact, other than providing appropriate prerequisites (such as sampling and specimen maintenance) for SSS laboratory tests, these methods required a considerable time as well as high costs [4]. High technical skills are also necessary due to utilizing complicated equipment such as a triaxial test apparatus [5,6]. This is why achieving inexpensive and non-destructive SSS evaluative methods is a crucial task in related projects. Recently, powerful artificial intelligence techniques such as adaptive neuro-fuzzy inference system (ANFIS) and artificial neural network (ANN) have almost antiquated traditional models for analyzing geotechnical phenomena. Moayedi and Hayati [7] investigated the feasibility of five well-known predictive models of ANFIS, genetic programming (GP), classical support vector machine (SVM), and two of its variants, namely regularized generalized proximal SVM and twin SVM in modeling the friction capacity of driven piles set in clay. In addition to presenting a GP-based predictive formula, they demonstrated the superiority of the ANFIS. For SSS, Besalatpour et al. [8] employed the ANN and ANFIS for simulating the SSS from measured particle size distribution, normalized difference vegetation index (NDVI), soil organic matter (SOM), and calcium carbonate equivalent (CCE). Based on the obtained values of error and correlation of 0.05 and 0.86 for the ANN, and 0.08 and 0.60 for ANFIS, they concluded that the ANN performs more efficiently. Kiran et al. [9] predicted the SSS parameters using a probabilistic neural network. Jokar and Mirasi [10] showed the efficiency of two clustering method of ANFIS, namely fuzzy c-mean clustering and subtractive clustering for estimating the shear strength of unsaturated soils.
The advent of metaheuristic algorithms enabled scholars to optimize the conditions of various engineering problems. As well as this, overcoming the computational drawbacks of the intelligent model is another notable application of these techniques. Many researchers have successfully used these algorithms for optimizing the ANN [11,12] and ANFIS [11,13] in geotechnical engineering issues. In the case of shear strength, Bui et al. [14] developed the combination of cuckoo search optimization and least squares support vector machine (LSSVM) for predicting the SS of soil. In that work, they studied a national expressway project. The findings revealed that the proposed ensemble outperforms ANN, regression tree, and typical LSSVM. Likewise, Nhu et al. [3] synthesized the support vector regression (SVR) with particle swarm optimization (PSO) to create a hybrid model for the approximation of soil SS. Referring to the correlation of 0.888, they introduced the proposed SVR-PSO model as a promising alternative for the mentioned aim. Pham et al. [15] tested the capability of two ensembles of ANFIS based on PSO and genetic algorithm (GA) for SS prediction of plastic clay soil. It was shown that both proposed ensembles achieved higher accuracy than ANN and SVR. However, the PSO (correlation = 0.601 and error = 0.038) surpassed GA (correlation = 0.569 and error = 0.040) in optimizing the ANFIS. As stated, although the potential of well-known hybrid algorithms, such as PSO and GA, has been accepted in the field of geotechnical engineering, a lack of new exploration for employing other optimization algorithms could be considered as a knowledge gap in this field. Hence, this study aims to introduce and compare three novel hybrid methods, namely the dragonfly algorithm (DA), whale optimization algorithm (WOA), and invasive weed optimization (IWO) used for prevailing the computational drawbacks of the ANN in predicting the shear stress of soil. Accordingly, the main contribution of the mentioned algorithms to the stated problem lies in the appropriate selection of connecting weights and biases of the ANN, which are assigned to the input, output, and middle factors in this model.

Methodology
As Figure 1 illustrates, the implementation of this study comprises three major steps: (a) As is known, providing a proper dataset is an essential task in the utilization of computational intelligence tools. Hence, data provision and preprocessing is the first stage. This process is broadly explained in the following section. (b) After determining the appropriate structure of the basic model (i.e., the multi-layer perceptron (MLP) neural network), the optimization algorithms of DA, WOA, and IWO are synthesized with it to design the DA-ANN, WOA-ANN, and IWO-ANN hybrid ensembles. Next, an extensive sensitivity analysis is applied to the ensembles in order to find the best-fitted structure of them.
(c) Lastly, the results are evaluated using three well-known accuracy criteria, namely root mean square error (RMSE), coefficient of determination (R 2 ), and mean absolute error (MAE). The formulation of these indices is expressed by the following equations: where Y i predicted , Y i observed , and Y observed symbolize the predicted, observed SSSs, and the average of the observed SSSs, respectively. Also, N denotes the number of data.
Appl. Sci. 2019, 9, x FOR PEER REVIEW 2 of 17 where Yi predicted, Yi observed, and observed symbolize the predicted, observed SSSs, and the average of the observed SSSs, respectively. Also, N denotes the number of data. The considered intelligent model, as well as hybrid optimization techniques, are described in the following.

Multi-Layer Perceptron Neural Network
Multi-layer perceptron (MLP) is a commonly held notion of ANNs that has shown high robustness in different engineering simulations. More essentially, the name ANN implies a powerful processor suggested by [16]. It mimics the biological neural network that has made the ANNs capable models for discerning nonlinear relationships within a set of data. Figure 2 portrays the structure of an MLP. Generally, the MLP benefits the training algorithm of Levenberg-Marquardt (LM) [17] as well as the learning method of backpropagation (BP) [18] to establish a mathematical relationship between a number of independent and dependent variables. The considered intelligent model, as well as hybrid optimization techniques, are described in the following.

Multi-Layer Perceptron Neural Network
Multi-layer perceptron (MLP) is a commonly held notion of ANNs that has shown high robustness in different engineering simulations. More essentially, the name ANN implies a powerful processor suggested by [16]. It mimics the biological neural network that has made the ANNs capable models for discerning non-linear relationships within a set of data. Figure 2 portrays the structure of an MLP. Generally, the MLP benefits the training algorithm of Levenberg-Marquardt (LM) [17] as well as the learning method of backpropagation (BP) [18] to establish a mathematical relationship between a number of independent and dependent variables. Based on Equation (4), each input variable (M is the number of whole input variables) is multiplied by a connecting weight; then, a bias is added to the resulted value. Lastly, an activation function is applied to produce the final outcome of the neuron (O). This process is repeated for the subsequent layers until the neurons in the output layer release the final response.
where W j and b j are the weight and bias terms of the jth node, respectively. Also, F is the activation function, which is considered to be tangent sigmoid (i.e., Tansig) in this work. The input vector is also represented by T.  where Wj and bj are the weight and bias terms of the jth node, respectively. Also, F is the activation function, which is considered to be tangent sigmoid (i.e., Tansig) in this work. The input vector is also represented by T.

Dragonfly Algorithm
Proposed by Mirjalili [19], the dragonfly algorithm (DA) mimics the dynamic and static conducts of dragonfly for optimization aims. Many scholars have successfully used the DA for non-linear engineering problems [20,21]. The cycle of dragonfly's life comprises two major stages, namely the nymph and transformation to the adult. Note that the mentioned cycle mostly relies on the first stage. The exploration could be defined in dynamic conducts where dragonflies join some groups and seek food sources [22]. The Reynolds swarm intelligence is the basis of this algorithm, which follows three distinct principles: namely separation, alignment, and cohesion in order to discover the solution of weights (Figure 3).

Dragonfly Algorithm
Proposed by Mirjalili [19], the dragonfly algorithm (DA) mimics the dynamic and static conducts of dragonfly for optimization aims. Many scholars have successfully used the DA for non-linear engineering problems [20,21]. The cycle of dragonfly's life comprises two major stages, namely the nymph and transformation to the adult. Note that the mentioned cycle mostly relies on the first stage. The exploration could be defined in dynamic conducts where dragonflies join some groups and seek food sources [22]. The Reynolds swarm intelligence is the basis of this algorithm, which follows three distinct principles: namely separation, alignment, and cohesion in order to discover the solution of weights ( Figure 3). Notably, the position of each swarm is updated through two natures of (i) considering prime principals for captivating the food sources, and (ii) diverting the sources out from invaders [23,24].

Whale Optimization Algorithm
As the name implies, the whale optimization algorithm (WOA) is inspired by the behavior of whale herds, and more clearly, the bubble-net hunting conduction of humpback whales, which was first proposed by Mirjalili and Lewis [25]. Figure 4 displays the humpback whale's bubble-net feeding behavior. The WOA comprises three operational steps of shrinking encircling hunt, exploitation (i.e., the bubble-net attacking), and exploration (i.e., searching for the prey) [25,26]. In this algorithm, since there is no information about the optimal hunting place, the target prey is considered as the most appropriate candidate for the problem solution. In the exploitation phase, some spiral mathematical approaches are applied in order to detect the equidistance between the prey and whale positions. The involving whales also try to update their positions close to the most successful member. The algorithm continues improving the solution until a stopping criterion is met.

Whale Optimization Algorithm
As the name implies, the whale optimization algorithm (WOA) is inspired by the behavior of whale herds, and more clearly, the bubble-net hunting conduction of humpback whales, which was first proposed by Mirjalili and Lewis [25]. Figure 4 displays the humpback whale's bubble-net feeding behavior. The WOA comprises three operational steps of shrinking encircling hunt, exploitation (i.e., the bubble-net attacking), and exploration (i.e., searching for the prey) [25,26]. In this algorithm, since there is no information about the optimal hunting place, the target prey is considered as the most appropriate candidate for the problem solution. In the exploitation phase, some spiral mathematical approaches are applied in order to detect the equidistance between the prey and whale positions. The involving whales also try to update their positions close to the most successful member. The algorithm continues improving the solution until a stopping criterion is met. and more clearly, the bubble-net hunting conduction of humpback whales, which was first proposed by Mirjalili and Lewis [25]. Figure 4 displays the humpback whale's bubble-net feeding behavior. The WOA comprises three operational steps of shrinking encircling hunt, exploitation (i.e., the bubble-net attacking), and exploration (i.e., searching for the prey) [25,26]. In this algorithm, since there is no information about the optimal hunting place, the target prey is considered as the most appropriate candidate for the problem solution. In the exploitation phase, some spiral mathematical approaches are applied in order to detect the equidistance between the prey and whale positions. The involving whales also try to update their positions close to the most successful member. The algorithm continues improving the solution until a stopping criterion is met.

Invasive Weed Optimization
The name invasive weed optimization (IWO) connotes a nature-inspired hybrid algorithm that was first suggested by Mehrabian and Lucas [27]. Basically, this algorithm was developed to optimally determine the location for the weeds to grow and reproduce. The high capability as well the low complexity have made the IWO a popular technique for various non-linear optimizations [28][29][30]. Five major steps of the IWO are (i) initialization, (ii) reproduction, (iii) spatial dispersal, (iv) competitive exclusion, and (v) termination condition. Similar to other metaheuristic algorithms, the relations (i.e., the weeds) are randomly distributed in the space. Next, considering the goodness of them, they might do the reproduction in the growing frame ( Figure 5). The new members (i.e., the produced seeds) are distributed close to the family. In the following, the next generation is generated through combining the seeds and weeds. This process for the most promising weed is carried out with the help of two stages, namely reproduction and competition.

Invasive Weed Optimization
The name invasive weed optimization (IWO) connotes a nature-inspired hybrid algorithm that was first suggested by Mehrabian and Lucas [27]. Basically, this algorithm was developed to optimally determine the location for the weeds to grow and reproduce. The high capability as well the low complexity have made the IWO a popular technique for various non-linear optimizations [28][29][30]. Five major steps of the IWO are (i) initialization, (ii) reproduction, (iii) spatial dispersal, (iv) competitive exclusion, and (v) termination condition. Similar to other metaheuristic algorithms, the relations (i.e., the weeds) are randomly distributed in the space. Next, considering the goodness of them, they might do the reproduction in the growing frame ( Figure 5). The new members (i.e., the produced seeds) are distributed close to the family. In the following, the next generation is generated through combining the seeds and weeds. This process for the most promising weed is carried out with the help of two stages, namely reproduction and competition.

Data Collection and Statistical Analysis
The data used in the present work is provided from a vast geotechnical assessment from the Royal City project located in Hanoi city of Vietnam [31]. The construction area is nearly 120,950 m 2 . The main point was to explore the situation of the sub-surface soil using the boring sampling method. A total of 28 boreholes were constructed using a so-called mixture of water and bentonite "slurry", where the minimum and maximum depths were 55 and 75 m, respectively. Notably, piston samplers were used to obtain soil samples with 91-mm diameter. Finally, 154 soil specimen were tested to measure the shear strength by taking into consideration 12 influential factors, including depth of sample (DOP), percentage of sand, percentage of loam, percentage of clay, percentage of moisture content (MC), wet density (WD), dry density (DD), void ratio (VR), liquid limit (LL), plastic limit (PL), plastic Index (PI), and liquidity index (LI). Figure 6 addresses the graphical relationship between the SSS and soil variables. Besides, Table 1 denotes the descriptive statistics of the dataset.
The aforementioned factors are used as input data for estimating the SSS as the desired variable (i.e., the output). In other words, the intelligent models of this study are applied to analyze the relationship between the SSS and these factors. To this end, two sets of data are required: (i) the first group called training data are used to train the models, and (ii) the second group called testing data are specified as unseen soil conditions to evaluate the integrity of the developed networks. Similar to many previous studies [12,32], 80% of the

Data Collection and Statistical Analysis
The data used in the present work is provided from a vast geotechnical assessment from the Royal City project located in Hanoi city of Vietnam [31]. The construction area is nearly 120,950 m 2 . The main point was to explore the situation of the sub-surface soil using the boring sampling method. A total of 28 boreholes were constructed using a so-called mixture of water and bentonite "slurry", where the minimum and maximum depths were 55 and 75 m, respectively. Notably, piston samplers were used to obtain soil samples with 91-mm diameter. Finally, 154 soil specimen were tested to measure the shear strength by taking into consideration 12 influential factors, including depth of sample (DOP), percentage of sand, percentage of loam, percentage of clay, percentage of moisture content (MC), wet density (WD), dry density (DD), void ratio (VR), liquid limit (LL), plastic limit (PL), plastic Index (PI), and liquidity index (LI). Figure 6 addresses the graphical relationship between the SSS and soil variables. Besides, Table 1     The aforementioned factors are used as input data for estimating the SSS as the desired variable (i.e., the output). In other words, the intelligent models of this study are applied to analyze the relationship between the SSS and these factors. To this end, two sets of data are required: (i) the first group called training data are used to train the models, and (ii) the second group called testing data are specified as unseen soil conditions to evaluate the integrity of the developed networks. Similar to many previous studies [12,32], 80% of the dataset (i.e., 123 samples) were randomly selected for the training data, and the remaining 20% (i.e., 31 samples) were used as testing data.

Results and Discussion
As mentioned previously, the present research investigates the applicability of three metaheuristic algorithms, namely DA, WOA, and IWO in optimizing the performance of an artificial neural network for estimating the shear strength of the soil. This section comprises two parts. Firstly, the optimization of the neural network with the proposed evolutionary algorithms is explained, and in the second part, the results are obtained and discussed to evaluate the efficiency of the models.

Optimizing the ANN Using DA, WOA, and IWO
An MLP neural network is selected to represent the basic network of this study. Although the literature review shows the high capability of the ANN for estimating various scientific phenomena, utilizing these models has been associated with some computational drawbacks such as getting trapped in local minima. Hence, the aforementioned optimization techniques are employed to overcome these shortcomings.
To this end, a trial and error process was first carried out in MATLAB 2014 environment to determine the most efficient architecture of the ANN. Although an MLP could contain several hidden layers, many previous studies have shown the adequacy of one of it for predicting any complex problem [33,34]. The results showed that the MLP with five hidden nodes presents the lowest error of performance. Note that based on the input and output parameters, the proposed ANN had 12 and 1 nodes in the first and last layer, respectively. Next, the ANN was mathematically introduced to each one of the DA, WOA, and IWO to achieve the most appropriate computational weights and biases. Each model was performed within 1000 repetitions to optimize the ANN. Note that the RMSE criterion was considered as the objective function to measure the error at each iteration. Based on the population size, nine different complexities (i.e., population sizes of 10, 25, 50, 75, 100, 200, 300, 400, and 500) were tested for each ensemble. Note that the populations size is a common variable in metaheuristic algorithms that denotes the size of the involved individuals (e.g., the whale population in the WOA technique). Figure 7a-c depicts the convergence curves for each structure of the used models. Besides populations size, there are other influential parameters for some of the optimization algorithms [35][36][37][38][39]. In this work, after determining the best complexity, a trial-and-error process was executed to optimize these parameters. Based on the results, the WOA with an intensification factor = 1, and the IWO with a variance reduction exponent = 2, initial value of standard deviation = 0.5, and final value of standard deviation = 0.001 build the most powerful networks.
As is seen, each ensemble shows different convergence behavior. All of the DA-ANN networks, for example, needed 500 repetitions at maximum to minimize the error. This is while the majority of the objective function was reduced during the first 200 repetitions and has remained more or less steady after that for the WOA-ANN. In addition, the IWO-ANN kept minimizing the error until the last moment as the curves have a downward trend. Eventually, all three models presented the best performance by the population size = 400. The obtained RMSE for the DA-ANN, WOA-ANN, and IWO-ANN were 0.02725, 0.02382, and 0.02225, respectively. It denotes that the IWO performed more efficiently than two other algorithms in optimizing the ANN.  Figure 7a-c depicts the convergence curves for each structure of the used models. Besides populations size, there are other influential parameters for some of the optimization algorithms [35][36][37][38][39]. In this work, after determining the best complexity, a trial-and-error process was executed to optimize these parameters. Based on the results, the WOA with an intensification factor = 1, and the IWO with a variance reduction exponent = 2, initial value of standard deviation = 0.5, and final value of standard deviation = 0.001 build the most powerful networks. As is seen, each ensemble shows different convergence behavior. All of the DA-ANN networks, for example, needed 500 repetitions at maximum to minimize the error. This is while the majority of the objective function was reduced during the first 200 repetitions and has remained more or less steady after that for the WOA-ANN. In addition, the IWO-ANN kept minimizing the error until the last moment as the curves have a downward trend. Eventually, all three models presented the best performance by the population size = 400. The obtained RMSE for the DA-ANN, WOA-ANN, and IWO-ANN were 0.02725, 0.02382, and 0.02225, respectively. It denotes that the IWO performed more efficiently than two other algorithms in optimizing the ANN.
Moreover, the required time for implementing the models is examined. Since the computation Moreover, the required time for implementing the models is examined. Since the computation time of the typical ANN is considerably shorter than its improved versions, this parameter is discussed only for the ensemble models, due to the higher accuracy of them. Figure 8 illustrates the obtained RMSE versus the taken computation time. According to this diagram, on the operating system at 2.5 GHz and six gigs of RAM, the implementation of the models required 13,859.2, 5884.9, and 5043.6 s, respectively.

Accuracy Assessment of Predictive Models
The results are evaluated in this part by comparing the predicted values of SSS with actual values. In this sense, the error of the performance is calculated by RMSE and MAE error criteria. Besides, the R 2 index is used to measure the correlation between the actual and predicted SSSs. Needless to say, the results of the training phase denote the capability of the models for pattern recognition, while the testing results indicate the prediction power.
In the training phase, all four models achieved a good understanding of the relationship between the SSS and independent parameters. In this phase, the calculated RMSE values of 0.0328, 0.0272, 0.0238, and 0.0222, respectively for the ANN, DA-ANN, WOA-ANN, and IWO-ANN demonstrate that the learning error of the unreinforced ANN experienced a considerable decrease as a result of being coupled with the mentioned algorithms. The values of the MAE also support the mentioned claim as it decreased from 0.0258 to 0.0200, 0.0186, and 0.0171. Furthermore, the calculated values of R 2 proved that the correlation of the ANN products with actual SSSs increased from 0.9010, 0.9300, 0.9465, and 0.9534.
As for testing data, the trained networks were applied to some unseen soil condition to estimate the SSS for the testing phase. Figure 9 shows the results based on the computed error (i.e., the difference between the output and target values) of each sample, alongside the histogram of errors. Note that the current histogram charts depict the frequency of each error extents. The higher the frequency of errors that are close to zero, the greater the accuracy of the prediction.

Accuracy Assessment of Predictive Models
The results are evaluated in this part by comparing the predicted values of SSS with actual values. In this sense, the error of the performance is calculated by RMSE and MAE error criteria. Besides, the R 2 index is used to measure the correlation between the actual and predicted SSSs. Needless to say, the results of the training phase denote the capability of the models for pattern recognition, while the testing results indicate the prediction power.
In the training phase, all four models achieved a good understanding of the relationship between the SSS and independent parameters. In this phase, the calculated RMSE values of 0.0328, 0.0272, 0.0238, and 0.0222, respectively for the ANN, DA-ANN, WOA-ANN, and IWO-ANN demonstrate that the learning error of the unreinforced ANN experienced a considerable decrease as a result of being coupled with the mentioned algorithms. The values of the MAE also support the mentioned claim as it decreased from 0.0258 to 0.0200, 0.0186, and 0.0171. Furthermore, the calculated values of R 2 proved that the correlation of the ANN products with actual SSSs increased from 0.9010, 0.9300, 0.9465, and 0.9534.
As for testing data, the trained networks were applied to some unseen soil condition to estimate the SSS for the testing phase. Figure 9 shows the results based on the computed error (i.e., the difference between the output and target values) of each sample, alongside the histogram of errors. Note that the current histogram charts depict the frequency of each error extents. The higher the frequency of errors that are close to zero, the greater the accuracy of the prediction. According to these figures, the prediction reliability of the ANN (RMSE Note that the current histogram charts depict the frequency of each error extents. The higher the frequency of errors that are close to zero, the greater the accuracy of the prediction. According to these figures, the prediction reliability of the ANN (RMSE = 0.0515 and MAE = 0.0408) increased by applying DA (RMSE = 0.0425 and MAE = 0.0329), WOA (RMSE = 0.0436 and MAE = 0.0328), and IWO (RMSE = 0.0432 and MAE = 0.0322) metaheuristic techniques. Moreover, the histogram chart of the ANN results yields the highest standard error (0.0522).   The findings of this study revealed that applying the named evolutionary algorithms can effectively help ANN to adjust the weights and biases more properly, and consequently present a more accurate estimation of the SSS. Considering the obtained results, a score-based ranking system is developed to have a relative evaluation of the performance of the used models. To do so, scores are attributed to each model based on the calculated values of the accuracy criteria. In this way, the higher reliability of the model, the larger the score assigned to it. From a comparison point of view, it can be said that the IWO-ANN (training score = 12) performed more efficiently in terms of all three RMSE, MAE, and R 2 in comparison with DA-and WOA-based neural ensembles. After that, all three criteria demonstrated more reliability of the WOA than DA for training the ANN. As for the testing phase, The findings of this study revealed that applying the named evolutionary algorithms can effectively help ANN to adjust the weights and biases more properly, and consequently present a more accurate estimation of the SSS. Considering the obtained results, a score-based ranking system is developed to have a relative evaluation of the performance of the used models. To do so, scores are attributed to each model based on the calculated values of the accuracy criteria. In this way, the higher reliability of the model, the larger the score assigned to it. From a comparison point of view, it can be said that the IWO-ANN (training score = 12) performed more efficiently in terms of all three RMSE, MAE, and R 2 in comparison with DA-and WOA-based neural ensembles. After that, all three criteria demonstrated more reliability of the WOA than DA for training the ANN. As for the testing phase, it was shown that the DA-ANN surpassed the WOA-and IWO-based neural ensembles in terms of both RMSE and R 2 . This is while the MAE of this model was larger than WOA-ANN and IWO-ANN. Taking into consideration all three criteria, the IWO-ANN and DA-ANN presented the same generalization capability (testing scores = 10). It means that the computational parameters suggested by these two algorithms construct more promising ANNs compared to the WOA (testing score = 7).
All in all, the final position of each model is determined by a total raking score (TRS), as the summation of all partial scores. As Table 2 denotes, the IWO-ANN (TRS = 22) emerged as the most accurate model of this study, followed by WOA-ANN and DA-ANN (TRS = 16), and typical ANN (TRS = 6). Also, the observed discrepancy between the results of the training and testing phases of the DA-ANN cannot be ignored. As explained, in spite of the lowest learning capability among the proposed neural ensembles, it achieved the prediction as accurate as that of the IWO-ANN. Since the IWO presented the best-fitted parameters of the ANN, the neural equation of this combination is presented in this part (Equation (5)). Note that this is a predictive formula that can properly estimate the SSS of soil using the mentioned independent factors, due to the satisfying performance of the IWO-ANN model. In fact, the values presented in this equation indicate the weights and biases of the unique output neuron of the optimized ANN. As the formula expresses, the activation function of the output layer is "purelin", in which the input is directly released as the response (i.e., f(x) = x). (5) where Z1, Z2, . . . Z5 represent the ANN middle parameters (i.e., the outputs of the hidden neurons) that are given by Equation (6). In addition, the term "Tansig" is the activation function of the corresponding neurons, which is expressed by Equation (7):  it was shown that the DA-ANN surpassed the WOA-and IWO-based neural ensembles in terms of both RMSE and R 2 . This is while the MAE of this model was larger than WOA-ANN and IWO-ANN.
Taking into consideration all three criteria, the IWO-ANN and DA-ANN presented the same generalization capability (testing scores = 10). It means that the computational parameters suggested by these two algorithms construct more promising ANNs compared to the WOA (testing score = 7). All in all, the final position of each model is determined by a total raking score (TRS), as the summation of all partial scores. As Table 2 denotes, the IWO-ANN (TRS = 22) emerged as the most accurate model of this study, followed by WOA-ANN and DA-ANN (TRS = 16), and typical ANN (TRS = 6). Also, the observed discrepancy between the results of the training and testing phases of the DA-ANN cannot be ignored. As explained, in spite of the lowest learning capability among the proposed neural ensembles, it achieved the prediction as accurate as that of the IWO-ANN.  ANN  1  1  1  1  1  1  3  3  6  3  DA-ANN  2  2  2  4  2  4  6  10  16  2  WOA-ANN  3  3  3  2  3  2  9  7  16  2   IWO-ANN  4  4  4  3  4  3  12  10  22  1 Since the IWO presented the best-fitted parameters of the ANN, the neural equation of this combination is presented in this part (Equation (5)). Note that this is a predictive formula that can properly estimate the SSS of soil using the mentioned independent factors, due to the satisfying performance of the IWO-ANN model. In fact, the values presented in this equation indicate the weights and biases of the unique output neuron of the optimized ANN. As the formula expresses, the activation function of the output layer is "purelin", in which the input is directly released as the response (i.e., f(x) = x). SSSIWO-ANN = 1.1762 × Z1 − 1.0995 × Z2 − 0.5416 × Z3 + 0.1767 × Z4 + 0.2055 × Z5 − 0.6625 (5) where Z1, Z2, … Z5 represent the ANN middle parameters (i.e., the outputs of the hidden neurons) that are given by Equation (6). In addition, the term "Tansig" is the activation function of the corresponding neurons, which is expressed by Equation (7): Tansig (x) = 2 1 + e −2x − 1 (7) Lastly, it should be noted that the extent of the used data can be mentioned as a limitation for the current study. More clearly, since the data were collocated from a real-world construction project, there were obviously special characteristics for the soil in the studied area. This led to limited values of key factors, and consequently the SSS. Therefore, the extent of data should be considered for using the developed models (as well as the SSS predictive formula) of this research. In addition, from an artificial intelligence point of view, implementing ideas such as optimizing the input factors and executing k-fold cross-validation techniques can be potential suggestions for enhancing the quality of prediction. Furthermore, we believe that utilizing well-known optimization algorithms such as the GA and PSO along with more recently developed algorithms will result in helpful comparative studies. However, due to the pivotal aim of this work, such suggestions, which need to be meticulously investigated, might be noted as favorable subjects for future studies.

Conclusions
The crucial role of the shear strength of soil in geotechnical engineering has driven scientists to present more reliable and cost-effective techniques for determining this parameter. In this paper, three wise optimization techniques, which mimic the behavior of dragonflies, whales, and invasive weeds, were applied to artificial neural network to optimize the connecting weights as well as biases. After providing a proper dataset containing several shear strength key factors, the DA-ANN, WOA-ANN, and IWO-ANN ensembles were created and optimized in terms of population size. The results were evaluated, and it was derived that all three mentioned algorithms help the ANN to have a better understanding of the relationship between the shear strength and its influential factors. From comparison point of view, it was shown that the IWO surpasses two other colleagues in the training phase. However, it gained the same accuracy score with the DA in the testing phase. In overall, the findings proved the viability of shear strength modeling using improved intelligent models. Also, a neural-based predictive formula was presented as an alternative to traditional methods.