Determination of Semi-Empirical Models for Mean Wave Overtopping Using an Evolutionary Polynomial Paradigm

The present work employs the so-called Evolutionary Polynomial Regression (EPR) algorithm to build up a formula for the assessment of mean wave overtopping discharge for smooth sea dikes and vertical walls. EPR is a data-mining tool that combines and integrates numerical regression and genetic programming. This technique is here employed to dig into the relationship between the mean discharge and main hydraulic and structural parameters that characterize the problem under study. The parameters are chosen based on the existing and most used semi-empirical formulas for wave overtopping assessment. Besides the structural freeboard or local wave height, the unified models highlight the importance of local water depth and wave period in combination with foreshore slope and dike slope on the overtopping phenomena, which are combined in a unique parameter that is defined either as equivalent or imaginary slope. The obtained models aim to represent a trade-off between accuracy and parsimony. The final formula is simple but can be employed for a preliminary assessment of overtopping rates, covering the full range of dike slopes, from mild to vertical walls, and of water depths from the shoreline to deep water, including structures with emergent toes.


Introduction
Wave overtopping occurs when sea waves run up coastal defenses, reach their crest, and flow over it. Coastal structures are usually built high enough to prevent significant flows that might damage infrastructures, properties, and assets and, even worse, injure people or cause casualties. Overtopping can be characterized in terms of individual discharges and volumes or flow properties, such as velocities and depth, of each wave that will eventually overtop the coastal structures. Volumes associated to extreme individual overtopping events are critical both for structural stability and the people safety, when standing on top or on the rear side of any coastal defense [1,2]. Generally most overtopping waves are rather small, but a few lead to significant volumes and high flow velocities. Individual overtopping volumes are usually analyzed in terms of probability distributions, e.g., [3][4][5]. Characterization of overtopping flow velocities and depths have been carried out in several works, such as [2,[6][7][8]. Wave overtopping is a complex phenomenon, varying both in space and time, and This paper is organized as follows: the state-of-art in mean wave overtopping assessment is summarized in Section 2; the scaling laws in wave overtopping are described in Section 3 together with a dimensional analysis to identify the main dimensionless groups or explanatory variable variables to be used in the analysis; Section 4 describes the experimental dataset that has been used for the analysis; in Section 5, a set of unified formulas for wave overtopping is presented; in Section 6, the new model expressions for mean overtopping assessment are discussed and a final formula is proposed; finally, the main conclusions are reported in Section 7.

Average Wave Overtopping Assessment in Existing Literature
Semi-empirical models are often used to assess average overtopping discharges. These models or formulas are fit on experimental databases. Hence, their range of application is restricted to the range of tested conditions and geometries of the database from which each formula is derived. This section offers a brief review of those semi-empirical formulas that might be applicable to the case of wave overtopping on smooth sea dikes and vertical walls.

Goda (2009)
Goda [23] proposed a set of unified formulas for the prediction of the mean rate of wave overtopping of smooth impermeable sea dikes. Both sloping and vertical structures were analyzed. The author employed 1254 data extracted from the database of the EU-funded CLASH project (Crest Level Assessment of coastal Structures by full scale monitoring, neural network prediction and Hazard analysis on permissible wave overtopping, contract EVK3-2001-00058).. Vertical walls with re-curved wave walls or broad crests, and the data with oblique wave incidence (angle of wave Mean wave overtopping is usually assessed by means of semi-empirical models (i.e., formulas) for a preliminary design of any coastal defence. There are a large variety of formulas depending on the kind of structure and hydraulic boundary conditions. Following EurOtop [19], coastal structures can be categorized as dikes and embankment seawalls, armored rubble slopes and mounds, or vertical and steep walls. The present work will focus on smooth and impermeable sea dikes, as well as vertical and steep walls. Only simple geometries are studied aiming at characterizing the influence on the main structural characteristics, such as slope, freeboard, and toe depth on wave overtopping discharge. Therefore, complex layouts including storm walls, berms, blocks, and revetments are excluded.
For smooth sloping dikes and vertical walls, there are several formulas that differ for the structural layout (whether sea dike or vertical wall) and for different hydraulic boundary conditions (whether deep or intermediate waters, shallow waters, or impulsive conditions, see [19]). For sloping dikes, a set of formulas can be described for dikes with a slope between 1:7 and 1:2. For steeper slopes, up to vertical walls, the influence of the slope angle becomes more sensitive and is considered to fit new set of formulas [12,20,21]. In case of very or extremely shallow water conditions with mild foreshores before a sloping dike, [13,18,22] propose different models, in which overtopping discharge is proved to be function of the dimensionless freeboard, R c /H m0,t , and the surf similarity parameter, which is defined as the ratio between the structural slope and the square root of the wave steepness, H m0,t /L m−1,0,t , in which L m−1,0,t is the wavelength at the toe of the structure calculated using the deep-water formula gT m−1,0,t 2 /2π.
There is actually a gap in overtopping formulas between breaking/non-breaking waves and very shallow water conditions; see [19]. Besides, the distinction between breaking/non-breaking or impulsive/pulsating raises some concerns if we look just at an average value of the overtopping discharge, as already observed by [23], who derived a set of unified formulas for prediction of the mean rate of wave overtopping at coastal structures with smooth, impermeable surfaces. Nevertheless, the work of [23] lacked enough data in very and extremely shallow waters.
The present work aims at identifying possible models for the mean wave overtopping discharge of smooth impermeable coastal structures. The models will be derived by gathering several databases together and fitting the test results by means of a hybrid data-driven technique. The main goals are to (1) identify the main variables that determine the discharge and their correlations, going beyond the most used dimensionless freeboard and surf-similarity parameter, and (2) derive a final and unified model considered as a compromise between model parsimony and accuracy that could be employed for a preliminary assessment of mean overtopping discharge for any kind of smooth coastal defenses.
This paper is organized as follows: the state-of-art in mean wave overtopping assessment is summarized in Section 2; the scaling laws in wave overtopping are described in Section 3 together with a dimensional analysis to identify the main dimensionless groups or explanatory variable variables to be used in the analysis; Section 4 describes the experimental dataset that has been used for the analysis; in Section 5, a set of unified formulas for wave overtopping is presented; in Section 6, the new model expressions for mean overtopping assessment are discussed and a final formula is proposed; finally, the main conclusions are reported in Section 7.

Average Wave Overtopping Assessment in Existing Literature
Semi-empirical models are often used to assess average overtopping discharges. These models or formulas are fit on experimental databases. Hence, their range of application is restricted to the range of tested conditions and geometries of the database from which each formula is derived. This section offers a brief review of those semi-empirical formulas that might be applicable to the case of wave overtopping on smooth sea dikes and vertical walls.

Goda (2009)
Goda [23] proposed a set of unified formulas for the prediction of the mean rate of wave overtopping of smooth impermeable sea dikes. Both sloping and vertical structures were analyzed. The author employed 1254 data extracted from the database of the EU-funded CLASH project (Crest Level Assessment of coastal Structures by full scale monitoring, neural network prediction and Hazard analysis on permissible wave overtopping, contract EVK3-2001-00058).. Vertical walls with re-curved wave walls or broad crests, and the data with oblique wave incidence (angle of wave attack not being 0 • ), were excluded. Goda used also 198 tests carried out at Kansai University, Japan, by [24] that presented dike slopes of 1:3, 1:5, and 1:7 and foreshore slopes equal to 1:10 and 1:30. Since [24] did not provide wave conditions at the toe of the structure, the method proposed in [25] was used to calculate the wave height at the toe. In [23], the author showed that existing exponential formulas have a tendency to overestimate large overtopping rates and underestimate low overtopping rates when calibrated with the extracted dataset. He used the simple well-known exponential formula for wave overtopping: q gH 3 where q is the mean overtopping discharge, R c is the freeboard, H m0,t is the incident spectral wave height at the dike toe, and g is the gravity acceleration. The coefficients A and B are a function of the dike slope, the foreshore slope, and the dimensionless toe depth: with: where θ is the foreshore slope angle with the horizontal, h t is the still water depth at the dike toe, and α is the dike slope angle. The Equations (2)-(5) are valid in the range 0 ≤ cot α ≤ 7. Both A and B coefficients increase up to a constant value if the relative toe depth increase. The datasets employed to fit Equations (2)-(5) are characterized by a dimensionless toe depth h t /H m0,t bigger than 1.0, lacking data for very and extremely shallow water conditions.

Mase et al. (2013)
Mase et al. [22] proposed a set of formulae for wave run-up and overtopping discharge of sea walls in the presence of very shallow foreshores. The work includes cases of structures with an emergent toe (h t < 0). An imaginary slope is defined between the sea bed at the breaking location and the point reached by the maximum run-up height; see Figure 2. The imaginary slope is employed to overcome the difficulties in schematizing complex dike and foreshore layouts. The formulae proposed by [22] are based on deep-water wave characteristics. Overtopping discharge is a function of the freeboard, the run-up, and the deep-water wave height. with where H m0,o and L m−1,0,o are the deep-water wave height and wavelength, respectively, and θ im refers to the imaginary slope as schematized in [22]. The use of deep-water conditions and calculation of wave run-up to assess the average discharge are key aspects of [22] that will be considered later on in this work.  (Mase et al., 2013). For the latter, the area A shadowed in red is employed.

EurOtop (2018)
EurOtop [19] proposes a set of formulas for the average overtopping discharge on sea dikes and embankment walls. In the manual, a distinction is made between gentle and steep dikes (cotα < 2), shallow foreshores, and breaking and non-breaking waves.

Breaking and Non-breaking Wave Conditions
The EurOtop formula for breaking and non-breaking waves and sea dikes with slope cotα≥2  (Mase et al., 2013). For the latter, the area A shadowed in red is employed.

EurOtop (2018)
EurOtop [19] proposes a set of formulas for the average overtopping discharge on sea dikes and embankment walls. In the manual, a distinction is made between gentle and steep dikes (cot α < 2), shallow foreshores, and breaking and non-breaking waves.

Breaking and Non-breaking Wave Conditions
The EurOtop formula for breaking and non-breaking waves and sea dikes with slope cot α≥2 can be expressed as follows: with a maximum of: q The breaker parameter can be expressed as: being s m−1,0 = H m0,t /L m−1,0,t , where L m−1,0,t is the deep-water wavelength calculated based on the spectral period T m−1,0,t . The influence factors presented in [19] are here omitted to take into account the effects of wave obliqueness, berm presence, and roughness, among others.

Very (shallow) Foreshores
The presence of very or extremely shallow foreshores induces heavy wave breaking far from the toe of the dike or seawall. The influence of the foreshore and local water depth at the dike toe are taken into account in [13], who introduced the concept of an equivalent slope in shallow water conditions. A similar formula as in [18] is used, but the surf-similarity parameter is calculated replacing the dike slope with the equivalent slope. The authors suggest using the equivalent slope when the ratio h t /H m0,t is smaller than 1.5; otherwise, the dike slope only should be considered for further calculations. The new equivalent slope concept can be applied also to cases of dry toe (= negative water depth at the toe). The relative toe depth ranges between −0.25 m and 3.65 m. Mean wave overtopping can be assessed using the following equation: The c exponent in Equation (11) is considered normally distributed: the mean value of c is equal to −0.791, and the standard deviation σ is 0.294. The equivalent slope between the point on the foreshore with a depth of 1.5 H m0,t and the run-up level R u2% can be expressed as follows: For the calculation, both structure and foreshore are assumed to have a straight slope without a berm, which are defined by cot α and cot θ, respectively. Therefore, ξ m−1,0 is assessed replacing the dike slope with the value of cot θ eq .

Gallach (2018)
The transition from steep slopes and non-breaking waves to vertical walls was initially studied by [21] for relatively low freeboards. Gallach [12] extended the work of [21] and proposed the following formula based on three coefficients: The range of application of the formula is for slope angles 0 < cot α < 4, the relative crest 0 < R c /H m0 < 2.5, and 0 < H m0,t /h t < 0.55. Influence factors are omitted here.

Volume Flux and Deficit in Freeboard
Ibrahim and Baldock [26] have revisited overtopping semi-empirical formulas by scaling them on the volume flux, V 0 = H m0,o L p /2π, of the incident waves. H m0,o is the offshore wave height and L p is the deep-water wavelength calculated starting from the peak period T p . The authors start from the assumption that scaling parameters for overtopping typically include the local wave height and period (or wavelength), the beach slope, and the crest freeboard. Usually, overtopping discharge is scaled on , which was employed based on the weir discharge equation and assuming critical flow conditions at the crest. However, the authors observed that except for cases of submerged structures, overtopping is usually characterized by subcritical flows for non-breaking wave conditions and supercritical flows in case of broken waves. Besides, the authors demonstrate that to use the deficit in freeboard, 1-R c /Ru 2% , instead of the R c /H m0 reduces substantially the scatter of data around the best fit. The deficit in freeboard has been already used in [22]; see Equation (6). Ibrahim and Baldock [26] conclude that breaker type is not a major factor in the scaling of the overtopping for non-breaking waves; however, this parameter is employed to assess wave run-up for breaking waves or shallow foreshores, having a secondary effect on the overtopping scaling.
Therefore, to apply the method proposed by [26] to scale average discharges is required to know the deep-water wave conditions to obtain the volume flux and to choose an appropriate run-up model to assess R u2% .
Starting from the scaling of the overtopping volume, we can derive the scaling of the mean discharge. The overtopping volume per wave can be defined as qT m−1,0 , in which T m−1,0 is the mean spectral wave period. By replacing L p with L m−1,0 (=gT m−1,0 2 /2π), the scaling for the mean discharge can be finally expressed as follows: In the following analysis, we will employ both deep-water and local wave characteristics at the dike toe, the latter ones being normally employed for overtopping characterization, except in [22].

Dimensional Analysis
Correlation between mean overtopping discharge and hydraulic and structural parameters is investigated in terms of dimensionless variables. Basing on the Buckingham's π theorem, if there are n variables in a problem containing k primary dimensions, the equation relating all the variables will have n-m dimensionless groups. In mathematical terms, it is possible to write: Hence, n = 10 and k = 2, leading to 8 dimensionless parameters. The choice of dimensionless groups is based on recent literature on overtopping: (1) Q* = 4π 2 q/gH m0,o T m−1,0,o , based on [26], see Equation (14): deep-water wave characteristics are employed, after having verified that there is almost no difference (R 2 = 99%) between the volume flux calculated employing the deep-water wave height and period with the one calculated based on local wave conditions at the toe.
(3) In [22] and [26], the overtopping is proved to be a function of the deficit of the freeboard, namely 1-R c /Ru max , rather than the dimensionless freeboard. Notice that we use here the maximum run-up according to [22]. Considering the maximum run-up, the effects of wave periods, foreshore, and dike slope on overtopping discharge are included, as also in [13]. (4) The slope parameter proposed in [27] , is employed to consider the influence of long waves generated by the presence of mild slopes on wave overtopping. (5) Relative toe depth employed as shallowness criteria, h t /H m0,o , is employed as proposed by [27]. (6) The ratio between the local water depth, the dike toe, and the local wave height, h t /H m0,t , is considered as it was proven to have an influence on overtopping discharge [1], see Equations (2)-(3). (7) The combined effect of the foreshore slope and dike slope is taken into account by considering the equivalent slope defined in [13] and expressed as cotθ eq . As an alternative, the imaginary slope described in [22] and expressed as cotθ im will be considered. By means of Evolutionary Polynomial Regression (EPR) application, the influence of the two conceptual slopes on the assessment of mean overtopping discharge will be evaluated and discussed. (8) The ratio h t 2 /H m0,t L m−1,0,t , which is defined as an impulsiveness ratio in [1].
It must be noticed that to define the wave run-up, different formulations can be employed. From one side, Equation (7) based on deep-water wave characteristics can be employed. On the other side, if local wave characteristics are used, we will refer to wave run-up formulas as described in [1], which is defined as follows: with a maximum of For steep and vertical walls (cot α < 2), the run-up will be expressed as: with a maximum of 3.0 and minimum of 1.8. Equations (16)- (18) are also employed to calculate the equivalent slope cotθ eq ; meanwhile, Equation (7) is employed for the imaginary slope calculation, cotθ im . The differences between the two methods are presented in Section 6. The most probable maximum run-up Ru max = (Ru max ) 99% , assuming a Rayleigh distribution, is finally equal to 1.54Ru 2% .

Methodology
The search for a unified formula for wave overtopping assessment for smooth sea dikes and vertical walls, including cases with zero freeboard and emergent toe, is facilitated by employing a data-driven approach, namely Evolutionary Polynomial Regression (EPR), which can be freely downloaded on http://www.idea-rt.com/products-our-job/epr-moga-xl.
EPR belongs to the big family of general-purpose data-driven techniques, whereas artificial neural networks (ANN) and genetic programming (GP) are probably the most well known. Based on our present understanding of the brain and nervous systems, ANN use highly simplified models composed of many processing elements ('neurons') connected by links of variable weights (parameters) to form a black-box representation of the system [28]. These models can deal with a great amount of information, while learning complex model functions from examples, i.e., by 'training' using sets of input and output data.
Genetic programming (GP) is another modeling approach that has recently increased in popularity. It is an evolutionary computing method that generates a 'transparent' and structured representation of the system being studied. The most frequently used GP method is so-called symbolic regression, which was proposed by [29]. This technique builds mathematical expressions to fit a set of data points using the evolutionary process of genetic programming. Davidson et al. ([30,31]) introduced a new regression method for creating polynomial models based on both numerical and symbolic regression. They used GP to find the form of polynomial expressions and least squares optimization to find the values for the constants in the expressions. The incorporation of least squares optimization within symbolic regression was made possible by a rule-based component that algebraically transforms expressions into equivalent forms that are suitable for least squares optimization.
The EPR is based on the combination of numerical and symbolic regression, while eliminating the bulky, and often slow, rule-based component. The method also borrows from the idea of stepwise regression [32], using an evolutionary process based on genetic algorithms [33] rather than following a hill-climbing method of stepwise regression. More details are given in the following section.

The Evolutionary Polynomial Regression Paradigm
Evolutionary Polynomial Regression is a hybrid data-driven technique [34]. EPR is an optimization and modeling tool that combines and integrates numerical regression and genetic programming. The search for mathematical structure of models is performed through a population-based strategy that mimics the evolution of the individuals in nature.
Although the details about the main EPR paradigm are explained in the reference works ( [34,35]), a concise description is reported here. EPR is built following the principle of Occam's razor, searching for the best models as a trade-off of model accuracy and parsimony. One of the general model structures in EPR can be expressed as follows: where m is the number of additive terms, a j are the numerical coefficients to be estimated, X i are the candidate explanatory variables, ES(j,z) is the exponent of the z th input within the j th term in Equation (19), and f is a function selected by the user among a set of possible alternatives (including no function selection). The exponents ES(j,z) are selected from a user-defined set of candidate values (which should include 0).
It is worth noting that when an exponent is equal to zero, the relevant input X i is basically deselected from the resulting equation (i.e., the exponent ES(j,k) = 0 and the relevant input X kj = 1). This is crucial when investigating a physical phenomenon and the most important inputs to characterize it. In fact, during its search in the space of solutions, EPR can deselect a less important input by assigning a null exponent.
The EPR technique exploits a multi-objective search paradigm (EPR-MOGA) [35] in order to develop multiple Pareto optimal models. The multi-objective search in EPR-MOGA is achieved by means of a multi-objective genetic algorithm based on the Pareto dominance criterion. First, the EPR-MOGA search options are defined by the user, such as, for example, candidate model attributes, candidate exponents for attributes, and the maximum number of parameters; based on these choices, EPR returns a set of Pareto-optimal models considering three conflicting objective functions: (1) maximization of model fitness to training data (i.e., model accuracy); (2) minimization of the number of model coefficients; and (3) minimization of the number of model inputs (i.e., both representative of model parsimony).
When the search options are defined by the user, at each generation, EPR-MOGA creates a population of models by simultaneously looking at the definition of the model structure, defining the exponents on the basis, for example, of Equation (19), and successively estimating the coefficients of polynomial terms by solving a linear inverse problem on the input data in order to guarantee a two-way relationship between each model structure and its parameters.
Then, the population of models is evaluated in the above-described Paretian dominance scheme, and only the best solutions pass to the next generation. In this way, the EPR-MOGA evolves within a few hundred generations a set of polynomial models that are optimal from the point of view of accuracy and parsimony. EPR-MOGA uses a MOGA strategy named OPTImized Multi-Objective Genetic Algorithm (OPTIMOGA), whose details are provided by [36]. The returned EPR mathematical structures are pseudo-polynomials, as they are obtained by adding a number of monomial terms, i.e., the argument of sum in Equation (19): they are linear with respect to their coefficients although not necessarily linear in their attributes, due to both exponents differing from one and the possible selection of function f in Equation (19).
Once the Pareto set of optimal models is obtained, the symbolic nature of EPR models allows a proper judgment of the optimal models looking at key aspects other than those encoded as objective functions, as for example: (1) the model structure with respect to the physical insight related to the problem; (2) similarities of mathematical structures of obtained models; (3) recurrent groups of variables in different models; (4) generalization performance of models as assessed in terms of both statistical analysis and mathematical parsimony [36][37][38][39].

Model Performance
The model expressions resulting from EPR analysis are evaluated on the basis of the coefficient of determination (CoD) and the average error (AVG) computed between the predicted and the measured data. CoD is defined as: where N is the number of data, and q predicted,i and q estimated,i are respectively the i-th predicted and measured mean overtopping discharge. The higher the value of the CoD (with a maximum of 1), the better the model performance. The average error is defined as: Besides, following [13,23], the geometric mean (GEO) is introduced and defined as: The scatter of the data is assessed through the geometric standard deviation (GSD) that is calculated as the exponential value of the standard deviation of the logarithm: Considering a quantity normally distributed, 90% of the data will be contained in the range between the mean divided by 1.64 times the GSD and the mean multiplied by 1.64 times the GSD.

Experimental Datasets for Fitting Overtopping Formula
Data from five different datasets have been employed for fitting the formula for mean wave overtopping. All data correspond to experimental results from wave flume facilities; thus, only long-crested waves are considered and the effects of wave obliqueness or directional spreading are neglected. Only simple structural layouts are considered, namely sloping dikes or vertical walls. The seaward face of the structure is smooth and impermeable. Defenses such as armor breakwaters with a permeable core are not considered. No storm walls or berms are considered. The overtopping is measured right after the seaward edge of the dike crest, so that the crest width is assumed to be equal to 0. All experimental tests with overtopping equal to 0 have been also excluded. The resulting data have been extracted from the following datasets: • EurOtop [19] database: the new extended database comprises about 13,500 tests on wave overtopping and extends the original CLASH database [40]. The tests include different dike geometries and kinds of structures, 14% of which represent smooth dikes. By excluding cases with zero overtopping, oblique waves, berms, or storm walls and considering only core data, finally 1128 data have been extracted. • Data described in [13] and corresponding to tests on sea dikes with shallow foreshores. The tests correspond to five different experimental campaigns carried out at Flanders Hydraulics Research, in Antwerp, and at the Department of Civil Engineering of Ghent University, in Ghent (both in Belgium). These experimental campaigns were conducted to study the influence of mild foreshore slopes and very or extremely shallow water conditions on wave overtopping and loading on smooth sea dikes. The geometry of the dikes resembled typical layouts from the Belgian coast. • Data from [41] corresponding to coastal defenses in Japan built in extremely shallow foreshores or on land (emergent toe), namely seawalls with slopes of 1:3, 1:5, and 1:7 with uniform foreshore slope of either 1:10 or 1:30. A Bretschneider-Mitsuyasu spectrum was used to generate random wave trains, differently from other datasets here employed, where a JONSWAP spectrum was employed. • Data from [42] corresponding to the experimental campaign carried out in the multi-directional wave basin, at Flanders Hydraulics Research. A 1:2 sloping dike after a 1:35 foreshore slope was employed. All data correspond to cases with mild foreshore slope (β < 0.62) in very and extremely shallow water conditions (h t /H m0,o < 1). The experimental campaign was conceived to characterize the overtopping discharge of long-and short-crested waves proceeding very obliquely with respect to the dike crest. For the present study, only perpendicular long-crested wave cases are considered. In total, 1679 data have been gathered. All the data are meant to cover a wide range of water depths, structural slopes, and freeboards. Datasets from [13,41] include cases with an emergent toe (h t ≤ 0), for which it is important to provide some details on the methodology followed to measure local wave characteristics. For emergent toes and more in general for extremely shallow water cases, the waves break heavily before reaching the dike. These broken waves present an unusual shape that is similar to a bore rather than to an oscillatory phenomenon around the mean water level. Their oscillatory nature can be no longer distinguished, yet the spectral momentum and, therefore, the wave height and period can be defined. The accuracy of this choice is demonstrated after showing the overtopping results in [13]. For some of the data employed for the present work, the information at the dike toe was lacking. In some cases, specifically in those from [13], the SWASH model (http://swash.sourceforge.net/) was employed to get the incident wave height and period at the toe. For other cases from [41], the method developed by [25] was applied. This method allows calculating the wave height at the toe even when the water depth is 0. Citing [25]: "A finite wave height is considered at the shoreline that originates from the presence of stationary wave set-up and time-varying surf-beat there." In the case of a dry or emergent toe, the same method from [25] has been applied, replacing the dry toe (h t < 0) with h t = 0. In other words, the solution is forced at h t = 0. A similar procedure has been found in [44].
All data are plotted in Figure 3, where a scatter plot ( [45]) among the main variables is shown. Each data has been scaled to R c = 1 m using Froude's similarity law in order to compare different datasets. The selected variables are mean overtopping discharge (expressed in logarithmic scale), deep-water wave height and period, local wave height and period, foreshore and dike slope, local water depth, and maximum wave run-up. The last one has been calculated using Equation (7) and is plotted in logarithmic scale. It can be noticed that despite the large variety of data and range of each dataset, a strong relationship can be still identified between the discharge values and the local wave height. Even better correlation is noticeable in the wave run-up, where the calculation of wave run-up depends not only on the wave height but also on the wave period, foreshore, and dike slope. These dependency is shown with more detail in terms of the dimensionless variables in Figure  4, where the left plot represents the variation of dimensionless discharge Q* = 4π 2 q/gHm0,oTm−1,0,o against the dimensionless freeboard; meanwhile, in the right plot, the dependence is shown in terms of deficit in freeboard. For the former one, all data but the ones from dataset #3 are quite clustered. Dataset #3 corresponds to the data from [22], mainly with sea dikes and walls with extremely shallow waters and even an emergent toe. In the right plot of Figure 4, the maximum run-up has been calculated with Equation (7) (Rumax,im). In this case, the data belonging to each dataset seem to follow a different trend but converge for value of the deficit in freeboard equal to 1 (i.e., freeboard equal to 0). The relationship between discharge, dimensionless freeboard, and deficit in the freeboard will be investigated further and discussed in Sections 5 and 6. It is worthwhile to notice that employing the dimensionless discharge Q* scaled by the volume flux leads to less data scatter than employing the most used Q = q/√(gHm0,t 3 ), especially for low freeboards and large overtopping rates (where measurement uncertainties are less and the test repeatability is usually high): see the comparison between Figure 4 (left plot) and Figure 5. These dependency is shown with more detail in terms of the dimensionless variables in Figure 4, where the left plot represents the variation of dimensionless discharge Q* = 4π 2 q/gH m0,o T m−1,0,o against the dimensionless freeboard; meanwhile, in the right plot, the dependence is shown in terms of deficit in freeboard. For the former one, all data but the ones from dataset #3 are quite clustered. Dataset #3 corresponds to the data from [22], mainly with sea dikes and walls with extremely shallow waters and even an emergent toe. In the right plot of Figure 4, the maximum run-up has been calculated with Equation (7) (Ru max,im ). In this case, the data belonging to each dataset seem to follow a different trend but converge for value of the deficit in freeboard equal to 1 (i.e., freeboard equal to 0). The relationship between discharge, dimensionless freeboard, and deficit in the freeboard will be investigated further and discussed in Sections 5 and 6. It is worthwhile to notice that employing the dimensionless discharge Q* scaled by the volume flux leads to less data scatter than employing the most used Q = q/   The measured discharge values are finally compared with ones calculated using Equation (1), which is proposed as a unified formula for wave overtopping by [23]. Results are depicted in Figure  6. Three lines are added to the graphs: a solid line that corresponds to a prediction equal to the measurement and two dashed lines corresponding to overtopping predictions that are 10 times and 0.1 times the measured data. Employing Equation (1), most of the predicted data are within the range 0.1 < qpredicted/qmeasured < 10, except for data from datasets #1 and #3, which correspond to the two datasets having extremely shallow water conditions with emergent toes. The inaccuracy to predict the mean overtopping discharge of these two datasets is due to the range of applicability of Equation (1) and the range of hydraulic and geometrical parameters used to derive the coefficients expressed in Equations (2)-(5), where a clear lack of data in very or extremely shallow water conditions has been noticed in the dataset employed by [23].   The measured discharge values are finally compared with ones calculated using Equation (1), which is proposed as a unified formula for wave overtopping by [23]. Results are depicted in Figure  6. Three lines are added to the graphs: a solid line that corresponds to a prediction equal to the measurement and two dashed lines corresponding to overtopping predictions that are 10 times and 0.1 times the measured data. Employing Equation (1), most of the predicted data are within the range 0.1 < qpredicted/qmeasured < 10, except for data from datasets #1 and #3, which correspond to the two datasets having extremely shallow water conditions with emergent toes. The inaccuracy to predict the mean overtopping discharge of these two datasets is due to the range of applicability of Equation (1) and the range of hydraulic and geometrical parameters used to derive the coefficients expressed in Equations (2)-(5), where a clear lack of data in very or extremely shallow water conditions has been noticed in the dataset employed by [23]. The measured discharge values are finally compared with ones calculated using Equation (1), which is proposed as a unified formula for wave overtopping by [23]. Results are depicted in Figure 6. Three lines are added to the graphs: a solid line that corresponds to a prediction equal to the measurement and two dashed lines corresponding to overtopping predictions that are 10 times and 0.1 times the measured data. Employing Equation (1), most of the predicted data are within the range 0.1 < q predicted /q measured < 10, except for data from datasets #1 and #3, which correspond to the two datasets having extremely shallow water conditions with emergent toes. The inaccuracy to predict the mean overtopping discharge of these two datasets is due to the range of applicability of Equation (1) and the range of hydraulic and geometrical parameters used to derive the coefficients expressed in Equations (2)-(5), where a clear lack of data in very or extremely shallow water conditions has been noticed in the dataset employed by [23].

A Set of Unified Formulas for the Preliminary Assessment of Wave Overtopping on Smooth Sea Dikes and Vertical Walls
EPR has been already successfully applied in coastal engineering and other engineering fields ( [36][37][38][39]). Here, EPR is applied to the collected data described in Section 4. The genetic structure of the formulas is assumed a priori based on the exiting literature. The set of variables employed in EPR modeling (see Section 3.2) are based on certain physical assumptions and derived by [12,13,23,26,27]. EPR modeling returns a set or family of formulas, which allows identifying recurrent and meaningful groups or explanatory variables and comparing all expressions in terms of accuracy and parsimony. Then, the so-derived formulas are validated with the experimental data.

EPR Setting and Results
The 1679 data related to hydraulic and geometrical parameters have been divided into two sets: 1343 data have been used as a training set for EPR modeling, while 336 data have been employed as a test set, used for the evaluation of the returned models and not for model construction. The test data have been randomly picked up from the whole database.
EPR has been applied to look among the explanatory variables defined in Section 3.2 and identify those influent for mean wave overtopping discharge. Since most of the studies in the literature on wave overtopping rely on the assumption of an exponential function, this hypothesis has been kept for EPR modeling. Both equivalent slope and imaginary slope have been considered as defined in [13] and [22], respectively. Accordingly, maximum run-up values have been calculated either with Equation (7), Rumax,im, or Equations (16)- (18), Rumax,eq. A set of model expressions returned by EPR is reported in Table 2. Looking for model parsimony, here, only expressions limited to a maximum of 3 terms (excluding the constant) and 4 inputs are reported. The values of CoD and AVG for both training and test data are reported, and the results overall are very similar to each other.

A Set of Unified Formulas for the Preliminary Assessment of Wave Overtopping on Smooth Sea Dikes and Vertical Walls
EPR has been already successfully applied in coastal engineering and other engineering fields ( [36][37][38][39]). Here, EPR is applied to the collected data described in Section 4. The genetic structure of the formulas is assumed a priori based on the exiting literature. The set of variables employed in EPR modeling (see Section 3.2) are based on certain physical assumptions and derived by [12,13,23,26,27]. EPR modeling returns a set or family of formulas, which allows identifying recurrent and meaningful groups or explanatory variables and comparing all expressions in terms of accuracy and parsimony. Then, the so-derived formulas are validated with the experimental data.

EPR Setting and Results
The 1679 data related to hydraulic and geometrical parameters have been divided into two sets: 1343 data have been used as a training set for EPR modeling, while 336 data have been employed as a test set, used for the evaluation of the returned models and not for model construction. The test data have been randomly picked up from the whole database.
EPR has been applied to look among the explanatory variables defined in Section 3.2 and identify those influent for mean wave overtopping discharge. Since most of the studies in the literature on wave overtopping rely on the assumption of an exponential function, this hypothesis has been kept for EPR modeling. Both equivalent slope and imaginary slope have been considered as defined in [13] and [22], respectively. Accordingly, maximum run-up values have been calculated either with Equation (7), Ru max,im , or Equations (16)- (18), Ru max,eq . A set of model expressions returned by EPR is reported in Table 2. Looking for model parsimony, here, only expressions limited to a maximum of 3 terms (excluding the constant) and 4 inputs are reported. The values of CoD and AVG for both training and test data are reported, and the results overall are very similar to each other. The dimensionless freeboard and the deficit in freeboard results are the most frequent variables in each model expression. In most of the cases, the dimensionless discharge results are functions of the square root of the dimensionless freeboard; meanwhile, the deficit in freeboard is expressed in form of power functions with exponents varying between 1 and 1.5 (the latter is the most frequent). A direct dependence of Q* on the foreshore and dike slope has been investigated, as for example expressed in Equations (2)-(4). However, the results were poor in terms of accuracy. Instead, employing the equivalent slope or the imaginary slope improved the model accuracy (see models III and VII). Besides, it must be considered that the same run-up is calculated as a function of the equivalent or imaginary slope, hence corroborating the importance of such parameters on the expression of mean discharge. The slope parameter defined by [27] has been selected in two cases (model IV and V). In the last three models, the quantity H m0,o /H m0,t has been selected, as derived from the ratio between the h t /H m0,t and h t /H m0,o . Notwithstanding, models VIII, IX, and X do not add further accuracy, and the dependence of the dimensionless discharge on H m0,o /H m0,t seems rather the result of model overfitting than something physically based.
To support the choice of a proper model expression, the model accuracy has been estimated also in terms of geometric mean (GEO) and standard deviation (GSD), which are calculated on the entire amount of data. Results are reported in Table 3. The 90% interval of the ratio q predicted /q estimated calculated based on GEO and GSD for each model expression is reported. The model expression VII results are the most accurate, having a GEO almost equal to 1 (=0.99) and the lowest standard deviation. For model VII, assuming the overtopping rate as normally distributed, 90% of the predicted overtopping rate is to be located in the range between 0.20 and 4.84 times the measured overtopping discharge. Excluding models VIII, IX, and X, the second most accurate model expression is model III, which is a function of the deficit in freeboard and the equivalent slope. The accuracy of Equation (1) and Equation (6) has been also assessed, showing a larger confidence interval for both equations, with a tendency to overestimate overtopping discharge when Equation (1) is employed. The differences between equivalent and imaginary slopes will be discussed later. Yet, it is evident that a slope-a combination of foreshore slope and dike slope, and the definition of which depends also on the local water depth-is a key variable to express the mean overtopping discharge. The dependence of discharge on θ, α, and h t was already remarked in [13,23]. The results of predicted mean discharge rates employing model III and VII are plotted against the measured ones in Figures 7 and 8, respectively. Table 3. Geometric mean and standard deviation for each EPR model expression. The 90% interval of q predicted /q estimated is reported. GEO: geometric mean, GSD: geometric standard deviation.

Discussion
The set of model expressions fitted with EPR has been presented in the previous section. The exponential structure for each model expression is confirmed to be the most suited for expressing the mean overtopping discharge as widely used in the literature. The dimensionless freeboard and

Discussion
The set of model expressions fitted with EPR has been presented in the previous section. The exponential structure for each model expression is confirmed to be the most suited for expressing the mean overtopping discharge as widely used in the literature. The dimensionless freeboard and deficit in freeboard resulted in the main explanatory variables for mean overtopping discharge assessment. The mean discharge has been expressed in dimensionless form by scaling it with the volume flux derived and expressed in Equation (14) in agreement with [26]. A preliminary analysis of the volume flux has shown that there is no difference employing either deep-water wave characteristics or local wave characteristics ( Figure 9). deficit in freeboard resulted in the main explanatory variables for mean overtopping discharge assessment. The mean discharge has been expressed in dimensionless form by scaling it with the volume flux derived and expressed in Equation (14) in agreement with [26]. A preliminary analysis of the volume flux has shown that there is no difference employing either deep-water wave characteristics or local wave characteristics ( Figure 9). All calculations presented in this work employ the deep-water wave characteristics for the definition of Q*. Differently from the most used formulas from the literature (e.g., [13,19,22,23,46,47]), where the average discharge is scaled by √(gHm0,t 3 ), here, the scaling factor All calculations presented in this work employ the deep-water wave characteristics for the definition of Q*. Differently from the most used formulas from the literature (e.g., [13,19,22,23,46,47]), where the average discharge is scaled by √ (gH m0,t 3 ), here, the scaling factor includes the wave period as a key variable to explain overtopping discharge. Assuming the extreme case of zero freeboard for different wave periods but keeping all other hydraulic and geometrical characteristics fixed, the mean discharge will be different; namely, it will be larger for longer wave periods. This result is actually corroborated by the evidence that if we assume sea storms of the same duration, which are expressed as the number of waves in one wave train, longer waves will lead to larger volumes than shorter ones. However, only breaking wave formulas, such as Equation (8), express similar relationships. For non-breaking waves, the wave period does not play a role at all. For very and shallow water conditions, the wave period is considered only within the exponent; hence, there is no influence in case of zero freeboard.
The EPR results show that the deficit in freeboard is a recurrent variable; hence, it is fundamental for mean overtopping discharge (EPR models I, III, VI, VII, VIII, IX, X), as demonstrated by [26]. The larger the deficit, the larger the overtopping. Mase et al. [22] expressed the discharge rate as a function of the deficit in freeboard. Besides, better model performance is achieved combining it with the dimensionless freeboard (models VI, VII, and X). Comparing models with only dimensionless freeboard or deficit in freeboard, better model accuracy is attained when the latter explanatory variable is employed. The main difference is that the dimensionless freeboard considers only two variables: crest freeboard R c and local wave height H m0,t . Employing instead the deficit in freeboard includes the wave run-up calculation and hence the influence of wave period, local water depth, foreshore slope, and dike slope. This means proper consideration of the wave transformation and eventual wave breaking to which the waves are subjected before reaching the dike toe and dike crest ( [13]).
At this point, a discussion on the equivalent slope and imaginary slope concepts is mandatory. Both take into account local processes of wave transformation and breaking on the foreshore first and then on the dike slope. Both concepts are applicable to cases of structures with an emergent toe. However, for the imaginary slope, the calculation of the breaking depth is required. In this work, the formula proposed by [48] is employed to calculate the breaking depth, h b . For the equivalent slope, if the local water depth is bigger than 1.5H m0,t , only the dike slope is assumed. In case of vertical walls, this will correspond therefore to a value equal to 0. Besides, the calculation of the equivalent slope and the imaginary slope requires an iterative process, since the wave run-up, which is necessary for the slope definition, is not known a priori and depends on the same slope. Uncertainties related to applicability range of the same run-up formulas raise questions about the accuracy of one method over another. A comparison between the calculated imaginary slope and equivalent slope is reported in Figure 10. It can be noticed that for most of the data, the two calculations lead to similar results; however, for some data, the imaginary slope is far larger than the equivalent one. This is due to the calculation of the breaking depth and its distance from the dike toe, especially for very mild foreshores: the area underneath the foreshore will actually increase significantly, giving more weight to the foreshore slope than the dike slope (see Figure 2). In general, however, EPR models show similar dependence of the overtopping rates on the imaginary and equivalent slopes: namely, the larger the slope is, the smaller the overtopping. Larger equivalent or imaginary slopes correspond to cases where the foreshore slope is rather influential; hence, heavier wave breaking and bigger wave energy dissipation are expected. This leads to reduced overtopping discharge. As an example, in countries such as Belgium or The Netherlands, beach nourishment is employed in such cases as a soft countermeasure to reduce overtopping and flooding. In fact, the resulting very long and mild beaches before sea dikes induce heavy wave breaking that decrease overtopping rates. and bigger wave energy dissipation are expected. This leads to reduced overtopping discharge. As an example, in countries such as Belgium or The Netherlands, beach nourishment is employed in such cases as a soft countermeasure to reduce overtopping and flooding. In fact, the resulting very long and mild beaches before sea dikes induce heavy wave breaking that decrease overtopping rates. Differences can be noticed also in the calculated run-up values, see Figure 11, where the maximum run-up is scaled by the crest freeboard. Looking at the EPR model expressions in Table 2, it can be observed that when the deficit in freeboard is calculated based on Equation (7), it appears in the exponential function with an exponent equal to 1. Instead, when Equations (16)- (18) and the equivalent slope are employed, the deficit in freeboard presents an exponent equal to 1.5 in all models but one.
The uncertainties in the accuracy of the wave run-up formula and the different slope concept employed in each case are responsible for such discrepancy. The variation of Q* on the deficit in freeboard is depicted in Figure 4 (right plot) and Figure 12, for Rumax,im and Rumax,eq, respectively. Differences can be noticed also in the calculated run-up values, see Figure 11, where the maximum run-up is scaled by the crest freeboard. Looking at the EPR model expressions in Table 2, it can be observed that when the deficit in freeboard is calculated based on Equation (7), it appears in the exponential function with an exponent equal to 1. Instead, when Equations (16)- (18) and the equivalent slope are employed, the deficit in freeboard presents an exponent equal to 1.5 in all models but one.  (7), and therefore, the deficit in freeboard; the latter ones define the dimensionless freeboard. Instead, the terms of the right-hand side in model expression III are expressed as a function of local wave conditions only, and due to the almost identity of volume flex expressed either with deep-water or local wave conditions (Figure 9), the latter ones can be used to scale the mean discharge (left-hand side in model III).  The uncertainties in the accuracy of the wave run-up formula and the different slope concept employed in each case are responsible for such discrepancy. The variation of Q* on the deficit in freeboard is depicted in Figure 4 (right plot) and Figure 12, for Ru max,im and Ru max,eq , respectively. Further research aiming to derive a unified formula for wave run-up is advised, covering the whole range of sloping dikes and vertical walls, for all water conditions and even an emergent toe. For it, ad hoc experimental tests and new measurements will be required. Despite the clear difference and independently of the way it is calculated, the deficit in freeboard remains the most important explanatory variable for the process at stake. From a practical point of view, model expression III might be preferred to model expression VII for two main reasons: (1) uncertainty in the calculation of the breaking depth required to define the imaginary slope in model expression VII; and (2) model expression VII requires using both deep-water and local wave conditions. The former ones are needed to calculate the run-up based on Equation (7), and therefore, the deficit in freeboard; the latter ones define the dimensionless freeboard. Instead, the terms of the right-hand side in model expression III are expressed as a function of local wave conditions only, and due to the almost identity of volume flex expressed either with deep-water or local wave conditions (Figure 9), the latter ones can be used to scale the mean discharge (left-hand side in model III).

Conclusions
The Evolutionary Polynomial Regression technique is employed in the present work to mine information from five different datasets gathered together to identify the most important explanatory variables that describe the mean overtopping discharge of smooth sea dikes and vertical walls. First, scaling laws for wave overtopping are discussed. A set of unified formulas is derived to cover a broad range of hydrodynamic conditions and geometrical characteristics and to fill the existing gaps between the most used semi-empirical formulas from the literature. Each EPR model expression is analyzed in terms of the physics of the phenomenon at stake, and expressions with eventual model overfitting are excluded. The main finding can been summarized as follows: • Mean overtopping discharge is scaled by the volume flux, including both wave height and period ( [26]): dimensionless discharge can be defined as Q* = 4π 2 q/gH m0,o T m−1,0,o . In this way, the wave period is always considered as an explanatory variable for wave overtopping assessment also for those cases where it is usually not taken into account; see Equation (9) for non-breaking waves.

•
Employing deep-water wave characteristics rather than local wave characteristics at the dike toe to define the volume flux and, consequently, the scaling law for overtopping, leads to differences that are negligible. The reader can employ either the former ones or the latter ones, depending on the availability and accuracy of the employed information. • EPR analysis confirms that an exponential structure, the most employed in literature (see [19]), leads to the best model representation.
• Both the dimensionless freeboard, R c /H m0,t , and the deficit in freeboard, 1-R c /Ru max , are key explanatory variables for wave overtopping assessment. The main difference is that the definition of the deficit in freeboard requires the assessment of wave run-up, which is not only depending on the wave height but also on the wave period, local water depth, foreshore slope, and dike slope. • Therefore, the study confirms previous findings from [13,22,23], and partly from [12], where other variables such as dike and foreshore slope and local water depth play an important role for the calculation of mean discharge rates. By combination these variables in one equivalent or imaginary slope concept, the model accuracy increases. • EPR results confirm that the imaginary or equivalent slopes provided improve the estimate of mean discharge values, confirming somehow the physical meaning of such concepts. Overtopping reduces when the slope (defined as a cotangent of the angle with the horizontal) increases (heavier wave braking and dissipation over the foreshore slope).
Although the set of formulas are derived only for smooth structures, their application can be extended to rough and permeable structure (e.g., breakwaters) for a preliminary and conservative assessment of the mean discharge value.
Further research is required to define a unified formula for wave run-up and a unique concept for equivalent or imaginary slope. Nevertheless, the results of the present research provide already a clear indication of the determining variables and groups of variables for overtopping assessment.
Finally, the proposed set of formulas does not replace the most and widely used semi-empirical models for mean wave overtopping ( [19]), but it aims to offer a simple method for a preliminary assessment of wave overtopping for any kind of structural layout and hydrodynamic conditions of smooth sea dikes and vertical walls, overcoming the existing model gaps.