Flow Discharge Prediction Study Using a CFD-Based Numerical Model and Gene Expression Programming

The significance of spillways is to allow the flood to be safely discharged from downstream. There is a strong correlation between the poor design of spillways and the failures of dams. In order to address this concern, the present study investigates the flow over the Nazloo-ogee spillway using the CFD 3D numerical model and an artificial intelligence method called Gene Expression Programming (GEP). In a physical model, discharge and flow depths were calculated for 21 different total heads. Among different turbulence models, the RNG turbulence model achieved the maximum compatibility in computational fluid dynamic simulation. In addition, GEP was used to estimate Q, in which 70% of collected data was dedicated to training and 30% to testing. R2, RMSE, and MAE were obtained as performance criteria, and the new mathematical equation for the prediction of discharge was obtained using this model. Finally, the numerical model and GEP outputs were compared with the experimental data. According to the results, the numerical model and GEP exhibited a high level of correspondence in simulating flow over an ogee-crested spillway.


Introduction
Flood control, reliable water supply, navigation, recreation, and hydroelectric power generation are the most important reasons for dam development around the world [1][2][3][4]. Spillways are hydraulic structures built on dams to convey excess flood flow beyond the dam's capacity. In order to avoid serious damage, spillways should be strong, reliable, and high-structures. Accordingly, design and construction play a role in dam spillways. Flood risks can be prevented if a dam spillway is properly designed and constructed [5][6][7]. Given their excellent hydraulic features, ogee-crested spillways are among the most investigated hydraulic structures. The U.S. Bureau of Reclamation, USBR, (1987) performed detailed laboratory experiments to investigate the behavior of water flow over a spillway, which resulted in the development and publication of spillway design manuals [8]. Physical studies have a number of problems, including high construction costs, the considerable time required for development and testing, and the presence of errors in results due to scaling effects [9][10][11]. The available computing capacity and algorithm improvements helped in finding solutions for a variety of problems, such as flow over the spillway. Numerical models provide a tool the fundamental design of spillways to identify operational concerns at a lower cost and in a shorter time [12][13][14]. Savage and Johnson investigated both pressure and discharge in the physical and numerical models in an ogee spillway, which indicated a satisfactory performance [15]. Kim and Parkn investigated the pressure and velocity distribution over an ogee spillway using the RNG K − turbulence model in the CFD model, taking into account the surface roughness effects [16]. Peltier et al. used experimental modeling of an ogee spillway to estimate velocity and pressure for heads that were much higher than the design head and validated the data with experimental data [17]. Nowadays, numerous researchers have utilized the K − turbulence model in their studies, including (Jahad et al. [18]; Aydin et al. [19]; Wan et al. [20]). Moreover, RNG K − model has been applied in (Ghanbari and Heidarnejad [21]; Bayon et al. [22]; Valero et al. [23]), and remarkable accuracy in stimulating the overflow spillway. Artificial intelligence (AI) is a field of computer science that focuses on developing machines that can engage in intelligent behaviors. In recent years, machine learning techniques (e.g., gene expression programming (GEP), adaptive neuro-fuzzy inference system (ANFIS) and artificial neural network (ANN s ), have been used in a wide range of publications. These methods are outstanding forecasting tools that have been used in a variety of civil engineering, hydraulic, and hydrological research in the last decade. Ferreira proposed gene-expression programming as a new adaptive approach for solving issues [24]. Yildiz et al. investigated discharge and flow depth over an ogee spillway using the ANFIS andCFD models in Flow-3D software. According to their results, therewas a reasonable agreement between the physical, numerical, and ANFIS artificial neural networks models [25]. The GEP technique has been used to represent a variety of water resource system components. Ebtehaj et al. utilized this method to predict the discharge coefficient in rectangular side weirs [26].
Using soft computing techniques, Kisi et al. examined the prediction of lateral outflow over triangular labyrinth side weirs under subcritical conditions [27]. Salmasi found a new equation for predicting discharge coefficients in an ogee weir, using gene expression programming and multiple regression techniques. The results demonstrated that the GEP technique was more successful than the regression equations [28]. Khan et al. employed a GEP method, to establish a functional relationship for bridge pier scour. The performanceof GEP was compared to that of other Al-based techniques, such as artificial neural networks (ANNs) and conventional reqression-based techniques [29]. Roushangar et al. used GEP and ANN techniques to estimate energy dissipation over stepped spillways, and the findings showed that GEP and ANN were extremely beneficial and encouraging in these circumstances [30]. Using GEP, Baxgatur and Onen created flood routing prediction models [31]. Bertonse et al. used GEP to simulate the concentration of dissolved oxygen (Do) in lakes [32]. New algorithms and models, particularly those based on soft computing, enable researchers to address the most complex systems in a number of ways [33,34]. In different engineering fields, the forecast methods that are not dependent on physics equations, including remote sensing methods and AI-based approaches likethe GEP method, are becoming more common [35,36]. The GEP has the advantage of being explicit in its formulation. This provides some insight into the nature of the phenomenon under investigation. It is simple to apply in reality [37]. The GEP is an artificial intelligence technique that utilizes key principles of genetic algorithms (GA) and genetic programming (GP) to create a calculation algorithm for forecasting a certain phenomenon. It mimics biological evolution [38]. Genetic programming is based on evolutionary principles developed for mathematical modeling. For solving regression and classification problems, GP provides several computer programs. The optimal values of some predefined parameters are obtained by GA, while finding both the best models and best parameters for a set of variables produced by GP is based on the Darwinian evolution theory [39]. There are also several other applications of such techniques in the literature that have been investigated insightfully in recent years [40,41].
A review of the literature indicates a lack of exclusive comprehensive research on the use of CFD and GEP models to estimate discharge values over an ogee-crested spillway with a pier. In the present study, the statistical performance of the model of an ogee-crested spillway with a pier was evaluated using error criteria, such as the root-mean-square error (RMSE), which is a significant criterion, as well as the determination coefficient (R 2 ) and mean absolute error (MAE) and a new mathematical equation was developed for predicting the spillway's discharge. The results of the GEP were then compared to those obtained from the physical model. In addition to this analysis, knowledge of discharge and flow depth can be valuable for hydraulic engineering research as well as determining the GEP approach's capability of forecasting a new output variable. The CFD model was validated with the experimental and Flow-3D data. The results showed that all of the model's outputs overlap and are accurate when it comes to tackling fluid problems.
This paper consists of four sections. Section 2 contains a brief discussion of the methods and components utilized to simulate the ogee-crested spillway as well as the provided gene expression programming and a Flow-3D numerical model. Section 3 discusses the findings of the investigation regarding numerical and experimental data comparisons, and mathematical equations. Finally, Section 4 summarizes the paper's main findings.

Materials and Methods Methods
The head, the inclination of the upstream face of the overflow section, and the height of the overflow section above the entrance channel's floor determine the shape of the nappe profile over an ogee crest, which influences the velocity of approach to the crest. The general discharge equation for an ogee-crested spillway is as follows: where Q is the design discharge ( m 3 s ), C o is the variable coefficient of discharge for free-flow condition ( m 0.5 s ) , L is the effective length of the crest (m), H o is the design head on the crest(m), including the velocity of approach, h a (m). Figure 1 demonstrates the ogee profile. In this study, a physical model of a standard ogee spillway was developed at the Water Research Institute of the Ministry of Energy, Iran. On the west side of lake Urmia , on the Nazloo river, the Nazloo dam spillway is under construction with a width of 42 m, a crest height of 6 m, and a crest level of 1492 m above sea level. The hydraulic performance of the Nazloo spillway was studied using a physical model with a scale of 1:40 made of Plexiglas  Crest piers cause the flow to be constructed, reduce the effective length of the crest, and decrease the discharge compared to an uncontrolled crest. The shape and location of the pier nose are shown in Figure 3, as well as the thickness of the pier, the head relative to the design head, and the approach velocity, which all influence the pier construction coefficient, K p [42].
where L e is the effective length of the crest for calculatingthe discharge, L is the net clear length of the spillway crest, N is the number of piers, K p is the pier construction coefficient, K a is the abutment contraction coefficient and H o is the total design head on the crest including velocity head. For round-nosed piers, the K p and K a are set to 0.01 and 0.05, respectively. Dimensional analysis is a fundamental tool used in experimental research to determine dimensionless quantities. For this purpose, characteristics of the effect of the ogee spillway discharge should be identified at first. Then, Buckingham's theory should be utilized to estimate the dimensionless parameters, which can be used to analyze their impact on the spillway's discharge and identify the logical correlation between them [21]. The geometric parameters and flow characteristics might be used to calculate the discharge capacity of an ogee spillway. The quantities influencing ogee crest discharge can be summarised as follows: where f is a functional symbol; H e and H o are the total upstream head and the design head, respectively; P is the ogee crest upstream spillway height; µ and ρ are dynamic viscosity and density, respectively; and σ is the surface tension. The possible effects of surface tension on discharge were small in all experiments H > 30 mm. The Weber number was excluded from the analysis. Furthermore, since the flow was turbulent, the viscosity impact was minor compared to gravity. As a result, the Reynolds Number effect was also removed from the analysis. In this study, the ultimate relationship between the dimensionless parameters impacting the ogee spillway discharge would be stated as: The experiments were performed with 21 total heads over the spillway, ranging from

Flow-3D Numerical Modeling
FLOW-3D software is a useful tool for analyzing complex fluid issues such as freesurface transient three-dimensional flow problems with complex geometries. This software uses the finite volume method along with regular rectangular grids. In order to simulate turbulence in hydraulic problems, two-equation models are commonly utilized. To derive time-averaged Reynolds equations, the Renormalization Group RNG turbulence model was used in this study. Furthermore, the numerical problems were solved using the FLOW-3D software, and the transient governing equations were numerically solved using the finite volume model. The geometry of the finite volume approach was defined in this software utilizing the Fractional Area-Volume Obstacle Representation (FAVOR) algorithm, as shown in Figure 4. The in-field obstacles in the computational cells were treated as a fractional value between 0 and 1. Therefore, if the entire cell was filled with obstacles, the fractional value of the area-value was equal to 1. The volume-of-fluid (VOF) algorithm was used to determine the flow's free surface [43].
For the considered field, the FLOW-3D numerical model provided a three-dimensional structural grid made up of cuboid cells as shown in Figure 5. As a result, a threedimensional model was created in AutoCAD software to build up the geometry of the models with a stereolithography (STL) file, based on the laboratory models' specifications. The data was then loaded into FLOW-3D, where the gird was generated using VOF and FAVOR, and the boundaries and computational network were determined. The considered area was constructed using VOF and FAVOR methods after loading the geometric data into the software. The boundary conditions' most critical aspect is to create a flow situation similar to the physical status. Each type of boundary condition in the Flow-3D software can be used for the unique condition of the models. The X-minimum boundary condition, specified pressure, was applied in this study. An outflow was chosen for the X-maximum boundary condition since there was nothing to calculate at the flume's end. Both (Y-min) and (Y-max) were computed as symmetry, with the bottom (Z-min) as a wall boundary condition and the top (Z-max) as a symmetry boundary condition. The mesh sensitivity evaluation comes second. In order to identify and ensure the representation of all the phenomena involved, a computational mesh sensitivity analysis must be done. The mesh sizes were examined from the largest to the smallest in a trial-and-error approach. The model's precision was improved by fine-tuning the mesh size. A mesh size of 0.007 was used for these simulations. The determination of the correct mesh plays an indispensable role in any numerical model simulation. It is important to minimize the number of cells due to having sufficient resolution as well as this mesh and cell size affect simulation time. The mesh and cell size were tested in a sensitivity analysis, and the 0.007 cell size performed best in terms of results and time spent solving equations.A further decrease in this value in cell size does not affect the accuracy of results for the Q tests. Different cell sizes were used in this study (e.g., 0.06, 0.02, 0.01 and 0.006). These were then reduced to 0.007 to provide more accurate results for the flow tests, as shown in Figure 6. After simulating several different models, the numerical model was set to 40 s, which is a sufficient time to obtain a stable result. The SI system was used to determine the length unit, and degrees Celsius was chosen as the temperature unit. The fluid database was utilized to choose water with a temperature of 20 • C and a viscosity of 10 pa s in the model. A review of the previous research indicated that a Renormalized group (RNG) turbulence model was appropriate for the numerical model. The numerical model functioned as a no-slip condition over the ogee's surface, No specific material characteristics were defined for the ogee spillway, and the roughness was 0.002.

Estimation of Uncertainty in a CFD Application
Computational Fluid Dynamics (CFD) analysis provided a more detailed evaluation of flow characteristics than an experiment. The complexity of the phenomenon makes computer modeling difficult, and its reliability must be evaluated [44]. In this study, error estimates such as an approximate relative error (e 21 a ), an extrapolated relative error (e 21 ext ) and the fine-grid convergence index (GCI 21 f ine ) were used to evaluate the CFD model.
The calculation procedure for three selected grids showed in Table 1. Where Q represents the discharge, r indicates the grid refinement factor, which is greater than 1.3 [45].

Gene Expression Programming Approaches
The gene expression programming method , which is based on genetic programming and genetic algorithms, was created by Ferreira. GEP is a heuristic search and optimization technique that uses biological evolution to create computer software that can forecast a certain event [24].
One of the GEP's advantages is that it is multigenic, allowing for the creation of increasingly complex programs with many subprograms. From genetic algorithms, it inherited fixed-length linear chromosomes, as well as expressive parse trees of various sizes and shapes via genetic programming. The GA is a probabilistic search approach that is based on evolution in nature. The following provides the GA's general procedure [46]: Step 1: a population is created by randomly selecting n chromosomes (potential problem solutions). Step 4: in the new generation, the parents of the created population are replaced.
Step 5: the algorithm terminates and the existing population shows the desired response if favorable circumstances, such as the desired accuracy or the number of iterations stated in the problem,are reached. Step 6: if the algorithm does not end in Step 5, Steps 2-6 are repeated until the desired results are reached. The chromosomes and Expression Trees (ETs) are two primary components of GEP. One or more genes representing a mathematical expression may be found on the chromosomes. A gene's mathematical code can be expressed in two languages: the language of genes and the language of Expression Trees (ET). The GEP genes are divided into two sections: the head and the tail. The head contains mathematical operators, variables, and constants (+, −, * , /, ln, exp, 1, a, b, c) used to represent a mathematical expression. Variables and constants (1, a, b, c) are included as terminal symbols in the tail. Additional symbols are used if the terminal symbols in the head are insufficient to explain a mathematical equation. Starting at the top line of the tree and reading left to right, top to bottom, the translation of the expression tree is performed. This method's gene sequences are similar to biological gene sequences [31].
Three input parameters were used in the GEP model (flow head ratio, crest height, crest length). Table 2 shows the parameters of the GEP models. The fitness function used in this study was the root mean square error (RMSE)of the training set. The powerful soft computing software package GeneXproTools (Version 5.0.3926) was applied to generate GEP-based discharge prediction models. The program was run for a number of generations before being stopped since the fitness function value did not change. In the GEP model, five options were considered using the various operators listed in Table 3 , as well as the output results.

Performance Criteria
A number of evaluation indicators can be used to assess the created models' forecasting performance using statistical metrics of the goodness of fit [47]. In this study, Performance measures such as root mean square error (RMSE), determination coefficient (R 2 ) and mean absolute error (MAE) were used to evaluate the GEP model.
where N is the number of observations, Q o indicates the observed data, Q p indicates the predicted data,Q o refers to observed data, andQ p refers to the mean of the predicted data. The mean of the errors is represented by the RMSE, which varies from 0 to ∞, with lower values indicating better model performance. The simulation accuracy of the model is described by R 2 , ranging from 0 to 1. The MAE is a comparison of two continuous variables. The average vertical distance between each point and the y = x line, commonly known as the one-to-one line, is called MAE.

Discussion
The numerical model had to be calibrated with experimental data in the first stage. The experimental investigation conducted at the Ministry of Energy's Water Research Institute, provided discharge values and flow depth that were compared with those of the numerical model evaluated by Flow-3D and GEP models.
To reach steady-state conditions, it is crucial to extract the exact values from the data of a physical or numerical model. In this research, after running numerous different models with the existing numerical model, the acceptable time to extract the results was determined to be 40 s. There is no separation between the flow and the crest bottom, demonstrating the solid boundary condition's true simulation.
Baffles are two-dimensional surfaces that come in a variety of shapes, including planes, cylinders, cones, and spheres. They come in both porous and non-porous varieties, and they may be used to quantify mass flows, heat streams, and applied forces. The number of elements in the computational mesh depends on the transit flow, which is directly acquired from the section where the baffle is positioned [48]. The flow rate was measured using baffle, and the X-minimum boundary condition was used with a specified pressure. The design head was set to 0.19 m, and the design discharge coefficient and discharge were calculated at 2.15 and 0.0175 m 3 /s, respectively, based on the scale of 1:40. For all data, the physical and numerical models are in good agreement. To make a more objective comparison between models, the results are non-dimensionalized.   In order to verify the numerical simulation results the grid size and boundary condition are critical, and it should be remembered that a wrong boundary condition results in a completely different result. Darwin's theory of evolution is the foundation for gene expression programming [49]. This technique can select the input variables that have the greatest impact on the model automatically. The GEP model formulations were used to extract the most effective indicators. Ferreira [49] provided more information on the technical formulation of the GEP approach.
The GEP is the newest evolutionary algorithm approach, and it is becoming more prominent due to its high accuracy [50]. The main advantage of gene expression programming is that it may express the link between variables explicitly. It should be noted that each run of the GEP method produces a different formula. Accordingly, each model was run multiple times with various GEP parameter settings, and the best models (with the lowest amount of error) were chosen. 70% and 30% of the data were used to train and test discharge estimating models, respectively. The GEP approach demonstratedthat the predicted discharge and experimental discharge were in good agreement. The head over the spillway crest was H o = h + V 2 a /2 g, in which V a was the approach velocity. The crest height and crest length were utilized as inputs, while discharge was used as an output.
In the GEP model, five options were explored utilizing different operators listed in Table 3, in addition to the ultimate results. The four basic arithmetic operators (+, −,*, / ), were used as well as several basic mathematical functions (x 2 , exp, ln, cube root, Atan, Tanh, min, max ). The GEP's mathematical equation for option 5 is as follows (9). The parameter d in this equation represents the input parameters, while the parameter c donates constant values determined by GEP. For R 2 = 0.972 , RMSE = 0.85, and MAE = 0.64 , option 5 was the best alternative. Before 5 was chosen as the best option, a variety of performances were carried out, some of which are listed in Table 4. As well as this, different equations of these runs are shown based on changing the test and train stages. Moreover, the scatter plot in Figure 9 indicates that the training and testing data points are close to the y = x line and, providing that, GEP can properly predict relative discharge values. The GEP model's output is typically displayed as mathematical equations, and decision trees Figure 10. The following is the optimal model obtained through GEP modeling for discharge prediction: 2 ))))) Y = Y + atan(((((1.0/d(0))) + (1.0/d(0))))/2.0) + ((1.0/(d(0))) − d(0)))) Y = Y + ((atan((atan(G3C6) − d(2))) + tanh( d(0) d (2) )))/2.0) where d 0 , d 1 and d 2 correspond to the model input variables and G1C1 and G3C6 are constant values. The numerical constants are shown in Table 5 and the equation derived from the expression trees is shown in Figure 10. The following GEP prediction approach is highlighted in the Materials and Methods section.

Conclusions
An ogee spillway is an essential hydraulic structure that can be developed on a variety of sites where concurrent flooding is a concern. Numerous studies based on numerical and analytical relationships have been undertaken to evaluate the discharge values in ogee spillways. The analysis of the behavior and hydraulic parameters of flow overa spillway dam is a difficult and time-consuming task. Flow-3D was utilized for modeling 21 various water levels. In computational fluid dynamics modeling, RNG turbulence model has the highest level of compatibility. Artificial intelligence approaches were also applied to estimate the flow over the spillway. The input (independent) variables in the proposed GEP technique included H e H o , L e H e and H e p , whereas the output (dependent) variable was the discharge of the ogee spillway. R 2 , RMSE and MAE were employed as evaluation indicators. Using these tools aids a designer in establishing the best condition for a hydraulic construction in which the flow pattern is critical. The modeling findings suggest that the CFD methods and GEP models are useful tools for simulating the flow pattern in the ogee spillways. They can be utilized to estimate Q with no need for complicated and time-consuming laboratory techniques. The proposed method is simple to use and accurate enough for practical usage.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript:

GEP
Gene expression Programming RMSE Root Mean Square Error R 2 Coefficient of Determination MAE Mean Absolute Error

Appendix A
Mathematical equations from GEP model for estimating Q showed in Table A1, performance results of GEP model for option 5 described in Table 4. where d 1 , d 2 and d 3 correspond to the model input variables and numbers related to coefficients are also included in the equations. Table A1. Each case's formulation for option 5, described in Table 4.