Energy Loss in Skimming Flow over Cascade Spillways: Comparison of Artificial Intelligence-Based and Regression Methods

In this study, the energy dissipation of cascade spillways was studied by conducting a series of laboratory experiments. Five spillways slope angles (α) (10°, 20°, 30°, 40°, and 50°), various step numbers (N) ranging from 4 to 75, and a wide range of discharges (Q), were considered. Some data-based models were developed to explain the relationships between hydraulic parameters. Multiple linear and nonlinear regression-based equations were developed based on dimensional analysis theory to compute energy dissipation over cascade spillways. For testing the robustness of developed data-based models, M5P, stochastic M5P, and random forest (RF) were used as new artificial intelligence (AI)-based techniques. To relate the input and output variables of energy dissipation, AI-based and regression approaches were developed. It was found that the formulation based on the stochastic M5P approach in solving energy dissipation problems over cascade spillways is more successful than the other regression and AI-based methods. Sensitivity analysis suggests that spillway slope in degrees (α) is the most influential input variable in predicting the relative energy dissipation (%) of the spillway in comparison to other input variables.


Introduction
These days, hydraulic structures have great importance in power generation and hydropower energy. The Ertan projects dam in China with a height of 240 m, design discharge of 20,000 m 3 /s, and flood power of 39,000 MW is a good example of great hydropower projects [1]. In the similar hydraulic structures, because of the high reservoir elevation, there is additional kinetic energy in comparison with other structures and the safe passage should be fulfilled for these overtopping flows [2][3][4]. Under uncontrolled kinetic energy condition flood power can easily cause downstream erosion, structure collapse, etc. [5].
Stepped and cascade spillways are commonly used as energy dissipaters in hydraulic and hydropower projects all over the world. Usage of the stepped channels and cascade/stepped spillways goes back to at least three centuries ago [6,7]. They are also used in other projects, such as aeration projects and waterscape projects [8][9][10][11]. In the literature, three classical flow regimes on the stepped spillways have been reported, as follows: nappe flow regime, skimming flow regime, and transition flow regime. When a small discharge occurs, nappe flow regime is shown over the stepped structures. Additionally, skimming and transition flow regimes belong to large discharge and intermediate flow rates, respectively [12][13][14][15][16][17]. Additionally, Ren et al. [8], Bai et al. [18], Bung [19], Chen et al. [20], Chinnarasri and Wongwises [21], Sorensen [22] have hydraulically studied stepped spillways.
In the stepped channels and cascade spillways, considerable values of kinetic energy and flow power are dissipated by steps. Due to this energy dissipation along the structure, their stilling basin is made with short length [23]. Due to significant amount of kinetic energy and velocity reduction and also high value of air entrainment, the risk of cavitation will be reduced in these structures [24,25]. The process of energy dissipation over the cascade spillways has been studied experimentally and numerically. Takahashi et al. [26] studied the effect of steps with end sills at 30° on the energy dissipation and large value of energy loss in comparison with horizontal steps was reported. Zare and Doering [27] used baffles and sills on the stepped spillways and reported that the energy dissipation due to the baffle edged spillways is more than the sill edged spillways. To determine the optimum values of dissipated energy, Chatila and Jurdi [28] studied stepped spillways with different step sizes. They suggested that stepped spillways with the maximum possible number of steps could have better performance in comparison to traditional smooth back spillways. Chanson [29] conducted an experimental study to analyze energy dissipation in stepped chutes. He reported that skimming flow in long chutes and nappe flow in short chutes enable higher energy dissipation, respectively. For analysis of the effect of step pool porosity on energy dissipation, a pooled stepped spillway was investigated [30]. They reported that the flat steps have large energy dissipation in comparison to the porous pooled steps. Relvas and Pinheiro [31] used stepped chutes lined wedge shaped blocks and analyzed the velocity distribution and the energy dissipation in this structure. Results showed that by increasing the steps slope, the energy dissipation reduces in comparison to common stepped chutes.
Multivariate data analysis techniques such as artificial intelligence (AI)-based models and soft computing techniques were recently employed as decision-support tenets in water science [32][33][34][35][36][37]. The operational costs and the expended time of experimental measurements are reduced by development of predictive models which are integrated into decision-support systems [38,39]. Instead of solving the governing equations, these techniques use effective parameters and their background to estimate the problems [40]. In other words, because of higher uncertainty and complexity of the solutions, a large number of effective parameters existing, and their interactions, these techniques can be used as a direct method to solve such problems [32]. Jiang et al. [41] used genetic algorithm (GA) and support vector regression (SVR) for prediction of stepped spillways energy dissipation and their results showed that the GA-SVR model can be successfully used for prediction of energy dissipation. The multiple regression equations, artificial neural networks (ANN), and gene expression programming (GEP) were used for modeling energy dissipation over stepped gabion weirs and the ANN results were recommended for reliable estimation [42].
It is evident from the previous studies that evaluation of energy dissipation using new stochastic methods such as M5P, stochastic M5P, and random forest has not been done yet. Evaluation of new models can introduce the optimum model for prediction of energy dissipation. The primary focus of this research is to investigate the accuracy of three new artificial intelligence-based methods (M5P, stochastic M5P, and random forest) for estimation of skimming flow regime energy dissipation along the cascade spillways. In addition, several regression equation based models are also tested for energy dissipation estimation. For evaluation of used models, experimental data are used. The experimental variables are spillway steps (N), spillway slope (α), spillway height (Hdam), and discharge over spillway (Q).

Experimental Setup
In the present study, using effective parameters changes, experiments were done on stepped spillways. Study experiments were done at the two hydraulic laboratories: (i) University of Tabriz, Iran and (ii) Shahid Chamran University (SCU), Iran. The fabricated physical models were installed in a flume with the following dimensions: 0.5 and 0.8 m width, 10 m length, and 1.0 and 2.0 m height. Table 1 shows the geometrical characteristic of physical models. Variable parameters include spillway steps (N), spillway slope (α), spillway height (Hdam), and discharge over spillway (Q). With these variable parameters, several runs were performed on physical models in the laboratory. Using a gate in the downstream of the experimental flume, the water level was controlled in the form of hydraulic jump and a sharp triangle weir with angle of 53° was used for flow measurements. Additionally, a point gauge with the accuracy of ±0.1 mm was used upstream water level measurement instrumentation. Upstream water level, y0, was measured at a distance of 0.60 m from the spillway and y2 is the secondary depth for the hydraulic jump. In studied cascade spillways and similar structures, due to air entrainment and thin flow, measuring the flow depth (y1) is difficult. Pegram et al. [43] used the conjugate water depth of hydraulic jump (y2) for calculation of energy dissipation.
Using different values for the effective parameters including spillway slope (α), step number (N), discharge rates (Q), and structure height (Hdam), 3772 experimental data were collected and used for evaluation of M5P, stochastic M5P, random forest, multiple linear regression (MLR), multiple nonlinear regression (MNLR), and logarithmic regression. In total, 70% of the data was used for training and the remaining 30% was used for testing the implemented models.

Dimensional Analysis
Calculation of upstream energy head (Emax) was done using Equation (1). Additionally, Equations (2) and (3) were used for calculation of downstream energy head (E1) and relative energy dissipation (∆E/Emax), respectively. In these equations, g is the acceleration due to gravity, Hdam is the height of spillway, y0 is the upstream water depth, V0 is the approach velocity, and q is the discharge rate per unit width. The conjugate depth (y2) was used to calculate flow depth (y1) as given by Equation (4).
where Fr2 is Froude number , y2 and V2 are water depth and velocity after hydraulic jump, respectively. Generally, effective parameters in the energy dissipation process can be written as Equation (5). In this equation, h, l, and N are step height, step length, and step number, respectively. Both of y1 and y2 were calculated in corresponding locations that are shown in Figure 1, but using Equations (1)-(4), only y2 is used in calculations of energy dissipation. In summary, the most effective parameters in hydraulic of stepped spillways are divided in two groups: geometrical parameters and hydraulic parameters.
Step length (l), the spillway height (Hdam), spillway slope (α), step height (h), and steps number (N) are the geometrical effective parameters. Additionally, upstream energy head (Emax), downstream energy head (E1), and discharge per unit width of canal (q) can be classified as hydraulic effective parameters. The Buckingham π theorem was used to present relative energy dissipation as provided in Equation (6): Equation (6) can be rewritten as follows: where, dc is critical depth (dc = (q 2 /g) 1/3 ) and = −1 (ℎ/ ) is spillway slope in degrees.

Multiple Linear Regression (MLR)
MLR is a linear method to develop the relationship between dependent and independent variables. The general equation of the MLR is as follows:

Multi Nonlinear Regression (MNLR)
Multiple regression (MNLR) can also be applied to more than one predictor's parameters. The general equation of the MNLR model is given by Equation (9), where, E is the dependent variable and independent variables are x1, x2, x3, …, xn.

Random Forest
The random forest algorithm was initially proposed by Breiman [44]. A highly flexible random forest (RF) algorithm was successfully used in the solution of several engineering-related complex problems. A large number of trees are developed in this algorithm with the root node attaining a dissimilar bootstrap (bagging) sample of the original data set. At every node, division is executed using arbitrarily chosen subset of the estimator parameters. The random forest algorithm is simple, less sensitive in training, and has higher precision in prediction [45]. Only two user defined parameters are required in this algorithm. The number of trees grown (k) and number of input parameters (m) are the user defined parameters. Trial and error process is used for the model development. In this study, WEKA 3.9 software was used for the analysis.

M5P Model
M5P tree is a binary regression tree model which can produce continuous numerical attributes by using linear regression function [46]. M5P uses a divergence metric to produce a decision tree by constructing the tree model. M5P tree algorithm uses the standard deviation of a particular class value at the terminal node by measuring the error value at that node by evaluating the expected outcome. M5P generates the compact and relative comprehensible model. M5P combines a conventional decision tree with the possibility of linear regression functions at the nodes.

Stochastic M5P Model
Gradient boosting (stochastic) based assembly method was initially introduced by Friedman [47] for the nonlinear problems. Stochastic develops an additive regression model by in succession applying a traditional technique to current by least squares method at every iteration. Stochastic simplifies them through optimizing the random differentiable loss function, concerning the based model. It was found that both the estimation correctness and speed of model development of gradient boosting can be significantly raised by integrating randomization into the method. Particularly, at every iteration, a subsample of the calibration data is picked randomly (without substitution) from the total data set of calibration. This arbitrarily chosen subsample is then applied in the place of the total data set to fit the base model update for the present iteration. The size of the subsample selected for each iteration is a primary parameter and is considered as a small part of the total calibrating dataset. This arbitrary selection method also raised the robustness beside overcapacity of the base model. In this study, M5P is used as the base model.

Performance Assessment Parameters
For the model performance assessment, four statistical indexes, coefficient of correlation (CC), root mean square error (RMSE), mean absolute error (MAE) were used. The above-mentioned assessment indexes can be calculated by the equations as follows: where = observed values, P = predicted values, � = mean of observed values, = number of observations.

Dataset
A total of 3772 observations of relative energy dissipation in spillway were used for model development and validation. The total data set was split into two different groups. The process of splitting was arbitrary. A larger group (70% observations) was considered as training data set for model development and the remaining group was considered as testing data set for the model validation. Three independent variables, the number of steps (N), the ratio of spillway height and critical depth (Hdam/dc), and spillway slope in degrees (α) were considered as inputs whereas relative energy dissipation in % was considered as a target for model development and validation. A correlation matrix plot of the total data set is shown in Figure 2. Characteristics of the data used for model development and validation are illustrated in Table 2.

Results of Linear and Nonlinear Regression Based Models
Different linear and nonlinear regression-based models were also applied in the current study. These equations which were developed by least square technique to drive regression coefficients using the training data set are listed in Table 3. XLSTAT software was used for the development of these equations. The performance evaluation parameter values are listed in Table 3 for all developed equations. Four statistical performance assessment statistics, CC, RMSE, MAE, and SI, were used to compare the performance of the MLR and MNLR based models (Table 3). According to the Table 3, the Equation (14) based model with higher value of CC (0.9492) and lower value of SI, MAE, and RMSE (0.1218, 5.4797%, and 6.9755%, respectively) has better results in comparison to other MLR and MNLR based models. Figure 3 provides agreement plots among observed and predicted relative energy dissipation (%) by Equation (14) (logarithmic regression) based models for training and testing stages, respectively. As observed from the graph that Equation (14)   Further, single-factor ANOVA showed that the F-value (0.005627) was less than f-critical (3.8454) and p-value (0.9402) was greater than 0.05, which suggested that the difference in predicted values by the Equation (14) based model and actual values is insignificant.

Results of Soft Computing Based Models
Figures 4-6 provide agreement plots between observed and predicted relative energy dissipation (%) of spillway by RF, M5P, and stochastic M5P based models for training and testing stages, respectively. Predicted values from RF, M5P, and stochastic M5P based models closely follow the observed values. It is clear that the predictions of RF are less scattered than those of the M5P. However, the stochastic M5P provided better predictions compared to RF. Results of test data set from Table 4 indicates that the performance of the stochastic M5P approach is the best among the implemented approaches. The stochastic M5P based model with correlation coefficient of 0.9993 and RMSE of 0.8246% presents reliable results in comparison to the RF-based model with a correlation coefficient of 0.9992 and RMSE of 0.9093%. Additionally, the M5P based model with correlation coefficient of 0.9958 and RMSE of 2.054% has poor results in comparison to stochastic M5P and RFbased models. Comparison of M5P and stochastic M5P based models reveals that the latter model considerably improved the first model (traditional M5P); an increase in CC by 0.35% and decrease in RMSE by 60% (Table 4). Testing Figure 4. Agreement plot of observed and predicted relative energy dissipation by the random forest (RF) model using training and testing data sets, respectively.    The brief statistical properties of the observed and predicted values using regression and soft computing-based models are summarized in Table 5. The statistics of predictions by the RF and stochastic M5P are very close to those of the observations. Using the results of single-factor ANOVA presented in Table 6, it can be seen that used regression and soft computing-based models results have an insignificant difference in comparison to actual values. For the comparison of regression and soft computing-based models, an agreement, performance, and relative error plot are shown in Figure 7 using the testing data set. It can be inferred from this figure that the predicted values provided by the stochastic M5P based model lies very closer to the line of perfect agreement and predicted relative energy dissipation (%) of the spillway was found to follow the same patterns of actual relative energy dissipation (%) of the spillway with minimum relative error.   For the assessment of the obtained outcomes, box plots ( Figure 8) were drawn, in which overall error distribution was presented. Accordingly, the negative and positive error values correspond to the over and under estimation behavior of the models, respectively. The statistics of the overall error distribution is listed in Table 7. The values of minimum error, first quartile, median, third quartile, and maximum error are listed and presented in Table 7 and Figure 8 for all applied models. The lower quartile value in the stochastic M5P is −0.2508 which is less than all other discussed models. On the other hand, in upper quartile, the stochastic M5P with Q75 = 0.2928 performed better than the other applied models. As can be seen in Figure 8 and Table 7, the maximum and minimum errors in stochastic M5P model are −11.53 and 2.675, respectively, which verify the capability of this model to predict the relative energy dissipation (%) of the spillway. The lower box width in the stochastic M5P model confirms the maximum errors around the zero.  Taylor diagrams are plotted in Figure 9 for all applied models. A Taylor diagram was used to illustrate schematically the performance of the applied models [48]. Three statistic parameters including standard deviation, correlation, and root mean square error, evaluated the accuracy of the applied models. Figure 9 indicates that the stochastic M5P model achieves higher correlation with minimum RMSE value. The Taylor diagram also confirms that the stochastic M5P model is performing better than the other applied models.

Sensitivity Investigation
Sensitivity investigation was performed to find the most substantial input variable in predicting relative energy dissipation (%) of the spillway (Table 8). For this investigation, the stochastic M5P based model was used because it produced the best results. It is apparent from Table 8 that four input combinations were considered and in three of them, one input variable was omitted. Results from the table conclude that spillway slope in degrees (α) is the most influential input variable in predicting the relative energy dissipation (%) of the spillway in comparison to other input variables used in this study. The second essential parameter is Hdam/dc in the prediction of the relative energy dissipation with respect to CC and RMSE statistics. For validation of the best model, the stochastic M5P model, it was also used for prediction based on the data set that resulted from the original equation by Chanson [29]. The developed empirical formula, by Chanson [29], for calculating the energy dissipation of stepped spillways in the skimming flow is listed as Equation (21), where f is the friction factor and α is the spillway slope. Figure 11 shows the scatter plots of the results obtained from the stochastic M5P model and resulting data from Equation (21). Figure 11 illustrates the agreement between the predicted values and the resulting data from Equation (21) and it is clear that the stochastic M5P model has powerful generalization ability for the prediction of stepped spillways energy dissipation in skimming flow.

Conclusions
This paper investigates the potential of various regression-based equations, RF, M5P, and stochastic M5P based models in predicting the relative energy dissipation (%) of the cascade spillway. Three input parameters and one output parameter were spillway slope (α), step number (N), rate of spillway height to critical depth (Hdam/dc), and relative energy dissipation of stepped spillway (ΔE/Emax), respectively. Experimental data were used to evaluate the models' exactness. From the comparison of performance evaluation parameters, it was found that the stochastic M5P based model works better (CC = 0.9993, RMSE = 0.8246%) in comparison to other aforementioned regression and soft computing-based models. Another major conclusion was that the RF-based model works better than the M5P and various equation-based models. The stochastic approach significantly improved the performance of the traditional M5P algorithm, with an increase in RMSE by 60%. Sensitivity investigation concludes that spillway slope in degrees (α) is the most influential input variable in predicting the relative energy dissipation (%) of the spillway in comparison to other input variables used in this study.
The study revealed that a more reliable energy dissipation predictor model can be built based on the stochastic M5P model and experimental data. In other words, using such studies, the design efficiency can be improved due to time saving and low operational costs. The main limitations of the proposed model are the fact that it has a black box structure and is highly based on used data. In case of lacking 3D modeling methods which are able to perform as well as or better than the black box, the M5P can be recommended. In the present study, the data driven methods were only compared with linear and nonlinear regression methods. In future works, the proposed methods may be compared with 3D modeling methods and their accuracies can be assessed by considering their advantages and disadvantages.
Author Contributions: M.N.; analyzed experimental results and wrote the paper. P.S.; used the predictor models and analyzed their results. F.S.; carried out experiments and contributed in the results analysis. O.K.; reviewed and revised the paper and also contributed to the models' results analysis. All authors have read and agreed to the published version of the manuscript.