Evaluating Slope Deformation of Earth Dams due to Earthquake Shaking using MARS and GMDH Techniques

: Assessing the behavior of earth dams under dynamic loads is one of the most significant problems with the design of such large structures. The purpose of this study is to provide new models for predicting dam dispersion in real earthquake conditions. In the first phase, 103 real cases of deformation in earth dams were collected and analyzed due to earthquakes that occurred over recent years. Using nonlinear and machine learning techniques, i.e., group method of data handling (GMDH) and multivariate adaptive regression splines (MARS), two models for prediction of the slope deformation in earth dams under the various types of earthquakes were applied and developed. The main parameters used in these simulation techniques were earthquake magnitude (Mw), fundamental period ratio (Td/Tp), yield acceleration ratio (ay/amax) as inputs and value of slope deformation (Dave) as output. Finally, in order to check the accuracy of the results of the new models, a comparison was made with the previous relations and models in seismic conditions for the slope deformation in earth dams. The results showed that the MARS model, which is able to provide a mathematical equation, has a better result than the GMDH model. These new models are recommended to be used for future analyses based on their flexible capabilities.


Introduction
In seismic loading, the study of dams' behavior is a necessary and important issue. A wrong study of dams' behavior in these conditions may result in irreversible damages. The first method to obtain the permanent slope deformation in dams was introduced by Newmark [1] as the sliding block. This method simply calculates changes of slope in the earthquake conditions. Among the primary and basic assumptions of the sliding block method is considering slide mass as a rigid body as well as the greater magnitude of acceleration arising from the earthquake than the yield acceleration that can result in movement of the mass [2,3]. This method was later modified and developed by scholars [4,5]. For example, considering the acceleration of the block response as the input acceleration of the system, Makdisi and Seed [6] evaluated the value of displacement with modification of the block model. Uncertainty of results between real cases and the suggested method has been discussed as well [7]. Moreover, using the numerical methods, such as finite element and finite difference methods, deformations in the earth slopes were obtained [5,8]. In different works, the main parameters of ground motion were used through which the changes of the slope deformation in different conditions in earth dams were studied [9,10]. Hynes-Griffin and Franklin [11] have deformations of dams based on the two features of failure acceleration and maximum acceleration irrespective of the vertical section of acceleration during an earthquake. In general, earthquake issues are very complicated in different subjects [3,12]. Furthermore, other important factors, such as the fundamental period of the dam, non-homogeneity, and settlement characteristics, have different impacts on huge structures, such as an earth dam [13]. Many scholars have referred to the high complication of the precise relations of these factors in the study of deformations in the earth dam during an earthquake and have considered it as an important challenge in the geotechnical area [14][15][16].
In recent years, new methods based on soft computing have been used in different engineering analyses considering the ability to solve complicated problems. These methods, in fields of civil engineering and the geotechnical area, have been applied to different issues, such as the dynamic behavior of earth, deformations of earth, tunneling, and laboratory samples of mines . There are a few researches concerning the use of soft computing methods in the earth dams' deformations arising from earthquakes. It shows that a new methodology, based on these methods, for earth dams' deformations can be developed and implemented. Therefore, in this study, given the above discussions, two new methods of soft computing, i.e., group method of data handling (GMDH) and multivariate adaptive regression splines (MARS), were implemented to evaluate the slope deformations of earth dams. These models were selected to propose a black box model (GMDH) and mathematical formula (MARS) in predicting slope deformation. In addition, these types of intelligent models have unique features for comparison purposes. To this end, the data of different earthquakes were gathered from around the world. Then, the initial analyses were performed on them. Using the initial analyses, the intelligence models predicted the slope deformations. In the end, new models were compared with the old models. The rest of this paper will continue by giving some information about the case study and data collection. After that, the principles of MARS and GMDH models and their implementation in predicting slope deformation will be given in detail. Finally, the most appreciated model will be selected through the use of some well-known performance indices.

Case Study of Earth Dam
In this study, a dataset of the slope changes of earth dams and embankments that are created during earthquakes was gathered from around the world. This dataset includes a various range of dams, including homogeneous and non-homogeneous earth dams, embankment dams with concrete facing, rockfill dams, and a few natural slopes. Various recorded data of these samples have been presented in studies and reports [68][69][70][71][72][73][74][75]. The gathered parameters include earthquake magnitude (Mw), yield acceleration ratio (ay/amax), fundamental period ratio (Td/Tp), and value of slope deformation (Dave) for the collected results that have been obtained from 103 real cases. Figure 1 shows the different distributions of these data. The Dave parameter indicates the movement toward the lower section of the earth along the sliding surface that is created by an earthquake. In fact, this value is a point of the vertical and horizontal components of deformations and is calculated in the form of a single vector in the direction of the base of the sliding surface. Using the critical failure arising from the quasi-static analyses, the base inclination angle is derived. Other parameters, such as Mw, amax, and Tp, refer to the earthquake magnitude, the horizontal component of the maximum acceleration of earthquake and predominant earthquake period, respectively, which are introduced as seismic loading features. The two parameters of ay and Td are also called geotechnical features of earth dams and imply the yield acceleration and fundamental period of the earth dam, respectively. The threshold acceleration occurs when a mass is sliding downward, and using the quasi-static analyses, this value is obtained as yield acceleration as in [76]. The ay parameter, in fact, is the same acceleration value that is obtained equal to 1 in the safety coefficient of static analyses. To obtain the parameter of the fundamental period of the earth dam, generally, the reported previous resources are used. Otherwise, it is computed using the relation of Td = 4H/Vs, which has been suggested by several scholars [4]. In this regard, the parameters of H and Vs show the height of the dam and the shear wave velocity in the body of the dam, respectively. In this study, five parameters of Mw, ay/amax, and Td/Tp, which were the key parameters obtained by the review of the previous studies, are used as the input of the intelligent models [3,77].
In the modeling stage, according to the previous studies, 80% of data are used for training systems and 20% for the test stage [78][79][80][81]. The data are selected by random and this process is repeated several times until the final outputs are obtained, ideally containing less error. The parameters are selected the same in both sections. Finally, the models are obtained for the next analyses and compared with the existing models. The statistical information of the used data (5 parameters as input) are presented in Table 1.

Group Method of Data Handling (GMDH)
The GMDH method is a self-organized algorithm and can provide a good solution for problems that have unknown nature. In fact, this method is a branch of artificial neural network that models the behavior of a phenomenon based on the input and output dataset. Dataset is a pair of data that is composed of several input parameters and one corresponding output parameter (Xi,yi) (=1,2, …,M). Quadratic node is a computational function that is defined based on a transfer function in a feedforward network. In the GMDH method, in order to define coefficients of this function, a regression analysis is used [82]. In a model that is built by the GMDH method, there are some neuron layers, and several data pairs in every layer are linked with each other by a quadratic polynomial. In this way, new neurons are made. Input and output data are linked and correspond with each other based on the same process [17,83].
The structure of this process is shown in Figure 2. According to this figure, four parameters are first entered in the first layer, and some functions are defined in this layer. Then, based on this criterion, these functions are selected and transferred to the next layer. The output of the model is also a function that is built in the last layer. The general function of parameters is given in Equation Obtaining the parameters in Equation (2) is one of the main steps in this modeling.
In fact, the above-mentioned parameters are obtained from the solution of a mathematical relation in which the most important parameter among them is the "a" coefficient. In each layer, several functions are defined to obtain the "a" parameter, and this process is repeated until Equation (3) is minimized. If needed, for more information about research in this area, see [39].
In each layer, with the help of neurons, the computational functions are defined. The selection pressure criterion is defined in several layers in order to select parameters. Based on the mentioned criterion, if the built function has the desired conditions, it is transferred to the next layer or otherwise deleted. Equation (4) shows the mentioned criterion in the form of a mathematical relation. The range of change of this parameter is [0, 1]. In order to select a function that has the minimum error, the value of α is considered to be 1. On the other hand, if the value of α is considered close to zero, more data are transferred to the next layer. Figure 3 shows the modeling process in which all cases that are applied to run the GMDH model are considered. In order to evaluate the model results, two indices of the correlation coefficient and root mean square error (R 2 , RSME) were used.

Multivariate Adaptive Regression Splines (MARS)
MARS is a non-parametric regression method, developed by Friedman [84]. The model takes the form of development in the production of basis spline functions, where a number of basic functions (BFs) and the parameters related to each function (product degree and knot location) are automatically determined by the data. The main difference between MARS and non-adaptive spline is that in the former, knots are not pre-determined; they are placed at the best possible locations by a search method. In the case of having only one knot, any possible location in the definition span of an explanatory variable, = ∈ (where a and b are MARS nodes, which should be designed during model development) is considered, and after fitting, a model with a minimum value of sum square error (SSE) is selected. In the case of having two knots, finding the best location for this pair of points requires more calculations and, a fortiori, to have n knots, finding the best n number set of points is very difficult and complicated. The required number of knots is also not known. In the MARS method, the number and location of the required knots are determined through a step-by-step forward-backward process.
BFs are a set of functions that can demonstrate information existing in one or more variables (Figure 4). MARS uses the expansion of piecewise linear basis functions in the forms of (x − t)+ and (t − x)+, also called hinge functions or hockey sticks, and "+" means the positive section, thus [85]: where x is the variable, and t is the knot constant. The MARS algorithm makes a regression model function according to Figure 4.
where x = [x1,x2,..,xp]T is a vector of explanatory variables and p is the dimension of input variables, βihi(x) represents terms of the model that are automatically selected by the algorithm, hi(x) signifies the basis functions in MARS, and M is the number of terms in the model. Furthermore, assume that (xi, yi), i = 1,..,n forms the observations set. Finally, there are two modes for Equation (7). If the defined functions are considered as the basis functions hi(x), the predictive model obtained by Equation (7) is called non-adaptive; on the other hand, if these functions are determined by modeling data set (xi, yi), it is called adaptive, like the MARS model.

Forward Phase
Forward selection phase is the same as the case in piecewise linear regression; it just differs in the point that it uses the A set functions (including all of the functions of h1i, h2i, i = 1,…,n and product of two or more of these functions) instead of explanatory variables. This phase is started by considering y-intercept term a0 (average of response values), then the basis functions are repeatedly added as pairs, and corresponding β coefficients are estimated in Equation (7) by minimizing the SSE. In each step of this phase, MARS finds a pair of basis functions that have the highest reduction in SSE. At the end of the phase, a large model is formed with a large number of terms having unsuitable performance in response to the prediction for new data. In other words, this model causes data overfitting; therefore, it will need a backward deletion phase to achieve the best number of terms [86].

Backward Phase
In order to make a model with higher generalization capacity, the backward phase prunes the model. The backward deletion phase starts its work with the largest model (based on number of MARS equations) obtained at the end of the forward model and, in each step, deletes the least important basis function (according to general cross validation, GCV criterion) that has a minimum share in residual sum of squares. By deletion of each basis function, a new model is estimated that may be a nominee for the final model. The forward phase adds terms in pairs. However, in the backward phase, distinctively, one side of the pair is deleted; therefore, there will be no pair in the final model. The performance of the new model is assessed considering the GCV criterion. The backward phase deletes terms one by one; indeed, it deletes the least effective term in each step to reach the best sub-model. MARS selects the optimal model with the minimum GCV value. The raw residual sum of squares (RSS) of the training set is unsuitable for model comparison because when the MARS terms are deleted, RSS normally increases. In other words, if RSS is used for model comparison, the backward phase will always choose the largest model (based on the number of MARS equations). Assume that GCVk belongs to the k-th model in the deletion phase and is defined by the following relation [84]: where fk is the estimated model in the k-th step of the backward deletion phase, and C(k) is the number of effective parameters in the model that is equal to the number of the model's terms plus the number of parameters used for selection of the best knot locations, in other words: C(k) = the number of the model's terms (BF) at the k-th step + λ (the number of the model's terms (BF) at the k-th step -1)/2, where λ is a smoothing parameter (penalty per knot (GCV)) and is practically selected between 2 and 4. Additionally, BF is basic function. It should be noted that the clause "(the number of the model's terms (BF) at the k-th step -1)/2" indicates the number of hinged functions' knots h existing in the model [84].

GMDH Simulation
In this section, the GMDH model was implemented for evaluating the slope deformation of earth dams. Of the data sets collected in this study, about 80% of the data were used to design and train models randomly. Different models were designed to obtain the least error model. Finally, the remaining data (20% of the total data) were used to evaluate the power of the models [87][88][89]. The code for the GMDH model was implemented in MATLAB software. According to the description given in the previous section, the GMDH model is designed by various components that control system performance. In order to design the best model, the impacts of each were evaluated under different conditions to obtain the best model to predict the slope deformation of earth dams.
One of the most important parts of intelligent models is the number of neurons used in the system and their layers. In the GMDH model, this parameter is also effective. According to the recommendation of previous research and initial analysis for these data, the number of neurons in different layers of GMDH ranged from 1 to 10. The number of neurons defined in this model is, in fact, the maximum number of neurons that each layer used according to the solution conditions. In other words, it is possible for each layer to use less than the maximum number of neurons to solve the problem. For this reason, the analysis of this study is presented in Figure 5. As can be seen, the results are presented for both training and testing sections to obtain the performance of each model. For the data of this study, the best condition is obtained with four neurons. As can be seen, the proximity of the training and testing results shows that the simulation is presented with acceptable accuracy. In addition, models with higher neurons show less accuracy than this model; thus, it can be indicated that the model with four neurons is the optimal one. Based on the analysis of the previous step, in this step, the effect of the number of layers applied in the GMDH model is investigated. The difference between the number of layers of this model compared to other intelligent models is that the number of layers can be more than one depending on the problem. However, most artificial intelligence works have recommended a layer for solving most linear and nonlinear problems. The GMDH model, with more layers, can also provide convergence conditions to specific conditions. Hence, five layers (2,4,6,8,10) were evaluated. The results for the different sections are presented in Figure 6. As can be seen, the effect of the number of layers on the results is different. The best performance is the six-layer model, of which its results can be introduced as an optimized model. Finally, the third important factor that influences the GMDH method is investigated by parametric analysis. Selection pressure is a criterion that allows the data of each layer to know what number of the previous section data can be used for continuing analysis. This criterion actually selects the number of data that best fits each step and sets it as the next layer input, repeating it until the final steps, so that the system achieves the lowest error and highest accuracy. In this case, one should choose the model that offers the best runtime and accuracy. The results of this study are shown in Figure 7. As can be shown, the best value for selection pressure is 40%. This also indicates the high accuracy of the model that was able to predict the data well. Finally, the final model for predicting the slope deformation of earth dams using the GMDH method is introduced at this stage. The best model was able to provide a system with R 2 = 0.8489 and 0.8458 accuracy for the training and testing sections, respectively. In the next section, the new MARS model is implemented for this problem.

MARS Modeling
Likewise, the data were divided into two sets for training and testing with percentages of 80% and 20%, respectively. As mentioned before, MARS is sensitive to its initial parameters. Thus, there is a need for determining the most effective parameters on the MARS model with their suitable ranges. MARS effective parameters with their suitable range/value are shown in Table 2. It should be noted that the MARS model was constructed in a MATLAB environment to estimate the slope deformation of earth dams. Examining different conditions for designing models can give various results. Using these results, the main objective of this research can be achieved by presenting a new model for assessing slope deformation of earth dams due to earthquake shaking. Several MARS models based on ranges mentioned in Table 3 were constructed to predict slope deformation of earth dams. Then, the best MARS model was selected according to the obtained prediction performance. As mentioned earlier, MARS is able to propose a relationship between model inputs and model output. In the following, two different methods of MARS techniques are used to evaluate the slope deformation of earth dams.

Piecewise-Linear Method
In the first technique, linear nodes are used to create data prediction conditions. Each variable (X1 to X6) is considered under different conditions and creates states to evaluate the slope deformation of earth dams. All the states between these variables for the prediction are given in Figure 8. As can be seen, these scenarios are designed for a variety of linear conditions and ultimately provide a MARS model to predict the slope deformation of earth dams. The results of this model are presented in Table 4. Finally, the structure of the model is presented using Table 4 and Equation (8).
This relationship can be a good solution for issues that need to be more realistic. This formula is one of the advantages of the MARS method that can be used in various fields of geotechnical and civil engineering. Using this relationship, the slope deformation of earth dams can be determined with an accuracy of R 2 = 0.8905 and 0.8956 for two sections of training and testing the MARS model based on the piecewise-linear technique.

Piecewise-Nonlinear Method
In the second method, instead of using linear nodes, nonlinear conditions are considered. As in the first method, different states of influence of the input parameters on the output are obtained, and finally, using them, the MARS model is created. Figure 9 illustrates the results of nonlinear states. The accuracy of this new model is given in Table 5. As can be seen from this table, the results of the first method were better in predicting the slope deformation of earth dams. The results of the piecewise-cubic method are presented in Table 6 below. These functions eventually provide a relationship to predict the slope deformation of earth dams in Equation (10).

Basis Function Values/Expression
This relationship can be a good solution for issues that need to be more realistic. This formula is one of the advantages of the MARS method that can be used in various fields of geotechnical and civil engineering.
Finally, the best model obtained for the slope deformation of earth dams distance prediction is presented in Table 6. As can be seen, its performance is better than the GMDH and second method of MARS. As a result, the first method of the MARS model (piecewise-linear) can be successfully applied to making such predictions in a reasonably accurate way.

Evaluation of the Performance of MARS Technique and the Old Models
In this section, a comparison has been made between the models developed in this study and the old models developed by previous scholars [6,11,77,90,91]. Table 7 shows the previous models with their unique characteristics that have been extracted and presented for the earth slopes in the effects of an earthquake.

Hynes-Griffin and Franklin
Ambraseys and Menu

Saygili and Rathje
These relations are based on the Newmark block model. For comparison, the relative error (RE) of all the models was obtained using the following relation: where Dave-p and Dave-m are the mean deformations predicted by different models (piecewise-linear method of MARS model and the existing relationships) and the real deformations arising from earthquakes. Results of this comparison have been presented in Figure 10. As it is observed in this figure, the newly developed model provides a more appropriate performance than the other models. Compared to the conventional old models, using the new methods can increase the precision of calculations and designs. One of the major reasons for the high accuracy of the artificial intelligence methods is considering the various complications of the earth structures affected by earthquakes. These models are quicker in calculations and can be a better substitution for the previous models. Finally, the new models can be a positive step in order to reduce uncertainty in the geotechnical problems and particularly the slope deformations of the earth dams.

Conclusions
In designing the security of the earth dams, it is necessary to evaluate the slopes in the effect of earthquakes accurately. Therefore, measurement and prediction of this issue in the designs are considered as an important prerequisite. In this study, an attempt was made to predict the value of the slope deformations in the earth dams. To the end, through gathering a set of data from the recorded changes in the real samples, including the parameters of earthquake magnitude (Mw), maximum horizontal earthquake acceleration (amax), yield acceleration of earth dam slope (ay), predominant earthquake period (TP), and fundamental period of earth dam (Td), the results and models were evaluated and analyzed. These parameters were selected as important and effective parameters and introduced as the input to the intelligent models. In this study, the two models of GMDH and MARS were developed and implemented to evaluate the slope deformations in the earth dams created in the effect of an earthquake. These models were evaluated by different examinations, and the effective parameters of each one were identified and evaluated. In the end, a comparison was made between the best models of GMDH and MARS to select the chosen model. The new MARS model could predict the values of Dave with higher performance. The accuracy of the suggested model was obtained R 2 = 0.8905 and R 2 = 0.8956 for the test and training sections, respectively. The performance of this model shows the acceptable precision for predictions. After this section, the impact of different parameters on the output was obtained using the sensitivity analysis. Mw and ay were determined as the most effective and the least effective parameters in the prediction models. In the end, a comparison was made between the models developed in this study and the old models. These results show that the present model can predict the value of Dave in the earth dams arising from earthquakes with higher precision.