Failure of the Downstream Shoulder of Rockﬁll Dams Due to Overtopping or Throughﬂow

: This paper presents the results of an extensive laboratory set of tests aimed to study the failure of the downstream shoulder of highly permeable rockﬁll subjected to overﬂow. The experimental research comprised testing 114 physical models by varying the following elements: (i) the median size of the uniform gravels (7 to 45 mm); (ii) the conﬁguration of the dam, i.e., upstream and downstream shoulders and crest or just the downstream shoulder; (iii) the dam height (from 0.2 to 1 m), (iv) the crest length (from 0.4 to 2.5 m), (v) the downstream slope (from 1 to 3.5 H:V), (vi) the type of impervious element (i.e., central core, upstream face, and no impervious element). The tests allowed us to identify two failure mechanisms, slumping and particle dragging. In addition, the downstream slope was observed to be one of the most important variables in this parametric study, as it inﬂuenced the pore water pressures inside the dam, the failure discharge, and the occurrence of one or the other mechanism of failure.


Introduction
Rockfills may be formed by natural processes or as a direct result of human action in civil engineering structures. Examples of natural rockfills are moraine dams [1] and landslide or avalanche dams [2][3][4]. On the other hand, constructed rockfill structures include levees, dikes, and dams built to fulfill different human needs, and embankmentlike deposits of homogeneous coarse rockfill, usually produced by mining activities, also referred to as rock drains [5].
In natural rockfill dams, formed without any impervious element, the prediction of the final breach geometry and dimensions is crucial for the estimation of the peak outflow. Diverse studies have been performed in this area for both cohesive and non-cohesive materials [31][32][33][34][35][36][37][38], as well as literature reviews [39][40][41]. In dams constructed with an Table 1. Summary of the 114 physical models tested to study the failure of the rockfill downstream shoulder (values in parentheses represent the number of models tested for that particular configuration). NA means 'not available'.  All of these variables combined can be grouped into four types of physical models as shown in Figure 1. These groups are (i) complete configuration with upstream face (CPM/UF), (ii) complete configuration without impervious element (CPM/NIE), (iii) partial or complete configuration with a central core (PPM/CC), and (iv) partial configuration without impervious element (PPM/NIE). characteristics and material gradings [3]. Based on an extensive laboratory test campaign, this paper provides a parametric analysis to understand how the failure discharge is affected by some characteristics of the dam body. Based on the results of the parametric analysis, an empirical formulation is calibrated to estimate the 'failure discharge'.

Test Overview
A total of 114 physical models (PM), all tested in horizontal flumes with rectangular sections, are summarized in Table 1. They were tested by varying the following elements: (i) the size of the uniform gravel, characterized by its D50; (ii) the configuration of the cross-section, using partial (PPM) or complete physical models (CPM) (complete configurations are trapezoidal and include both upstream (USS) and downstream shoulders (DSS) and crest, while partial configurations are triangular and include only the downstream shoulder); (iii) geometrical parameters as the height ( ) of the physical models, the width of the flume ( ), the width of the crest ( c ), the downstream and upstream slopes ( dss and uss , respectively); (iv) the type of impervious element (IE) (central core (CC), upstream face (UF), and no impervious element (NIE)).
All of these variables combined can be grouped into four types of physical models as shown in Figure 1. These groups are (i) complete configuration with upstream face (CPM/UF), (ii) complete configuration without impervious element (CPM/NIE), (iii) partial or complete configuration with a central core (PPM/CC), and (iv) partial configuration without impervious element (PPM/NIE).

Specific Tests
Within the main campaign, we performed specific tests to evaluate the variability of the results and the scale effect. Regarding variability, five groups of tests were performed (from A to E), all CPM/NIE, consisting of repeating the same physical model a given number of times to assess if the test procedure could substantially affect the 'unit failure discharge' (q f ) or any other factors, such as the hydraulic pressures inside the downstream shoulder or the 'failure path', i.e., the evolution of the failure progress with the throughflow discharge [25,48]. It must be noted, though, that the discharge steps were not (in general) the same throughout the tests within the same group. Tests in the same group all had the same geometry and dimensions as well as the same granular material: Regarding the scale effect, the aim was to analyze if the Froude similitude could be applied to scale q f . So, for a scale factor s L = 1 : 3.5, we tested 0.23 m high 'small-scale' physical models (tests n o 109, 111, and 130) and 0.8 m high 'prototypes' (tests n o 108, 110, and 112). This scale factor was applied to all lengths (except for the flume width) including the gravels' D50. So, gravels M3 and M7 were respectively used in the 'small-scale' model and 'prototype'. In total, we tested three Z dss = 1.5, 2.5, and 3.5, and for each of these slopes, we tested one 'small-scale' physical model and a 'prototype'.

Test Setup and Procedure
The physical models were constructed by pouring and extending the granular material without compaction. Nevertheless, some unintentional compaction resulted from walking over the models during construction, mainly those constructed in the larger flumes involving the placement of tons of material. To obtain the final geometry, the physical model surfaces were evened with an aluminum straight guide.
Tests were based on a stepwise flow increment methodology until total failure of the downstream shoulder occurred. By total failure, we mean that the damages inflicted to the downstream shoulder reached the crest of the dam. Although the location of the upstream impervious face or the internal clay core was slightly different, this criterion permitted a homogeneous analysis of the results arising from tests with different types of impervious elements. The failure forefront ( Figure 2) is the border that separates intact areas of the slope from those damaged by failure. The maximum advance of failure (B f ), i.e., the most upward point of the failure forefront, was used to define the complete failure of the shoulder. So, the physical models were defined as completely failed when B f reached the downstream edge of the crest. For practical purposes, the discharge that produced complete failure (Q f ) was defined as being the average value between the highest discharge in which failure did not reach the crest, and the lowest in which failure surpassed it.

Test Setup and Procedure
The physical models were constructed by pouring and extending the granular material without compaction. Nevertheless, some unintentional compaction resulted from walking over the models during construction, mainly those constructed in the larger flumes involving the placement of tons of material. To obtain the final geometry, the physical model surfaces were evened with an aluminum straight guide.
Tests were based on a stepwise flow increment methodology until total failure of the downstream shoulder occurred. By total failure, we mean that the damages inflicted to the downstream shoulder reached the crest of the dam. Although the location of the upstream impervious face or the internal clay core was slightly different, this criterion permitted a homogeneous analysis of the results arising from tests with different types of impervious elements. The failure forefront (Figure 2) is the border that separates intact areas of the slope from those damaged by failure. The maximum advance of failure ( f ), i.e., the most upward point of the failure forefront, was used to define the complete failure of the shoulder. So, the physical models were defined as completely failed when f reached the downstream edge of the crest. For practical purposes, the discharge that produced complete failure ( f ) was defined as being the average value between the highest discharge in which failure did not reach the crest, and the lowest in which failure surpassed it.
Each discharge was kept constant until steady-state conditions were reached, i.e., until no additional damage was observed to the shoulder or any change in the water elevation and pressures. Several long preliminary tests (more than 1 h per step) showed that a step duration of 30 min was long enough for reaching the stationary state. The number of steps varied from test to test and, following the initiation of failure (first damage observed on the shoulder), five steps were performed, on average. The minimum and the maximum number of flow steps were three and ten, respectively. Once the stationary state condition of every discharge step was reached, all measurements were performed.

Materials
Tests were performed with eight uniform limestone gravels (M1 to M8) of different sizes, ranging D50 from 0.00736 to 0.04509 m, and with a coefficient of uniformity ( u = 60/ 10) ranging from 1.46 to 2.28. Their main characteristics are summarized in Each discharge was kept constant until steady-state conditions were reached, i.e., until no additional damage was observed to the shoulder or any change in the water elevation and pressures. Several long preliminary tests (more than 1 h per step) showed that a step duration of 30 min was long enough for reaching the stationary state. The number of steps varied from test to test and, following the initiation of failure (first damage observed on the shoulder), five steps were performed, on average. The minimum and the maximum number of flow steps were three and ten, respectively. Once the stationary state condition of every discharge step was reached, all measurements were performed.

Materials
Tests were performed with eight uniform limestone gravels (M1 to M8) of different sizes, ranging D50 from 0.00736 to 0.04509 m, and with a coefficient of uniformity (C u = D60/D10) ranging from 1.46 to 2.28. Their main characteristics are summarized in Table 2 and Figure 3. These eight gravels were obtained by sieving four raw gravels with size ranges 4-12, 12-20, 20-40, and 40-80 mm. Materials M1 to M3 resulted from gravel 4-12 mm, M4 to M6 from gravel 12-20 mm, M7 from gravel 20-40 mm, and M8 from gravel 40-80 mm.    Some characterization tests were consigned to an external laboratory, the Geotechnics Laboratory of CEDEX, to obtain the particle size distribution (UNE EN 933-1), the specific gravity ( ) of the soil solids (ASTM D5550-06), and the soil density (UNE Some characterization tests were consigned to an external laboratory, the Geotechnics Laboratory of CEDEX, to obtain the particle size distribution (UNE EN 933-1), the specific gravity (G) of the soil solids (ASTM D5550-06), and the soil density (UNE 103301:1994) of gravels M3, M7, and M8. With these two last standards, we obtained the void ratio (e) with Equation (1), porosity (n) with Equation (2), and the saturated specific weight (γ sat ) with Equation (3). The particle size distribution was obtained for the rest of the materials following the same standards. The porosity of gravels M4 and M6 was obtained by filling a bucket full of gravel with water. The quadratic resistance law of flow through coarse granular materials expressed by Equation (4) relating the hydraulic gradient (i) to the flow velocity (v) was obtained for gravels M4, M7, and M8 following a methodology described in the state of the art [10,49], and for gravels M3 and M6 with a horizontal permeameter of large dimensions [50].
The angle of repose was obtained for gravels M4, M7, and M8 by scanning the surface of the mounds of these gravels using a 2D laser (LMS200-30106 by SICK TM , scanning range for objects up to 10 m; angular range up to 180 • with a maximum angular resolution of 0.25 • ; a systematic error of ±0.015 m; a statistical error of ±0.005 m). The repose angles were obtained by fitting a linear regression to the external surface of the mounds [45]. The Geotechnics Laboratory of CEDEX was also consigned to perform the characterization of the first three raw gravels to obtain the particle distribution (UNE 103101:1995), soil density (UNE 103301:1994), permeability (UNE 103403:1999), friction angles (UNE 103401:1998), and specific gravity of soil solids using a gas pycnometer (ASTM D5550-06).

Facilities and Instrumentation
Tests were conducted in four U-shaped flumes (rectangular section) located in two different laboratories: one flume at the Hydraulics Laboratory of the E.T.S.I. de Caminos, Canales y Puertos of the Universidad Politécnica de Madrid (UPM), and three at the Hydraulics Laboratory of the Centro de Estudios Hidrográficos of the Centro de Estudios y Experimentación de Obras Públicas (CEDEX), both laboratories located in Madrid (Spain).
The UPM flume, straight with horizontal bottom, is 13.7 m long, 2.5 m wide, and 1.3 m high (inner dimensions) with an inspection window 4.6 m long and 1.1 m high placed in the left-side wall (Figure 4a). In this flume, we tested physical models with different widths ranging from 0.6 to 2.5 m; hence, when a smaller width had to be tested, it was necessary to build a longitudinal central wall. placed in the left-side wall (Figure 4a). In this flume, we tested physical models with different widths ranging from 0.6 to 2.5 m; hence, when a smaller width had to be tested, it was necessary to build a longitudinal central wall. Figure 4b,c show images of a test performed at UPM. At the CEDEX laboratory, three flumes were used ( Figure 5). The smaller one was a tilting metallic flume 12 m long, 0.4 m wide, and 0.6 m high. Although the slope of the flume could be controlled, it was kept horizontal throughout the tests. This flume was supplied by a constant water level tank where discharges were measured upstream of the physical model using a thin-plate rectangular weir 0.487 m long with no lateral contraction. This constant level tank was supplied with water by pumping from the main tank located below the laboratory floor (3000 m 3 ) with a Worthington hydraulic pump (threephase motor GEAL 220/380 V, 4.4 kW, 6 hp) capable of pumping up to 0.06 m 3 s −1 with 5 mwc. The medium-size concrete flume was 12 m long, 1.0 m wide, and 1.1 m high. Supplied by a constant water level tank, discharges were also measured upstream of the physical model with a 90 thin-plate triangular weir. The water level upstream of the weirs was measured with P8000 ultrasonic level sensors with digital displays (Dr. D. Wehrhahn, Hannover, Germany). This constant water level tank was also supplied with water from the main tank with a Worthington hydraulic pump (three-phase motor At the CEDEX laboratory, three flumes were used ( Figure 5). The smaller one was a tilting metallic flume 12 m long, 0.4 m wide, and 0.6 m high. Although the slope of the flume could be controlled, it was kept horizontal throughout the tests. This flume was supplied by a constant water level tank where discharges were measured upstream of the physical model using a thin-plate rectangular weir 0.487 m long with no lateral contraction. This constant level tank was supplied with water by pumping from the main tank located below the laboratory floor (3000 m 3 ) with a Worthington hydraulic pump (three-phase motor GEAL 220/380 V, 4.4 kW, 6 hp) capable of pumping up to 0.06 m 3 s −1 with 5 mwc. The medium-size concrete flume was 12 m long, 1.0 m wide, and 1.1 m high. Supplied by a constant water level tank, discharges were also measured upstream of the physical model with a 90 • thin-plate triangular weir. The water level upstream of the weirs was measured with P8000 ultrasonic level sensors with digital displays (Dr. D. Wehrhahn, Hannover, Germany). This constant water level tank was also supplied with water from the main tank with a Worthington hydraulic pump (three-phase motor Alcanza 220/380 V, 45 kW, 6 hp) capable of pumping up to 0.2 m 3 s −1 with 14 mwc. In this flume, the hydraulic pressures were measured using the intelligent pressure instrumental system Scanivalve, placed at the base of the flume. This system was composed of 44 measuring points distributed in ten transversal rows along a distance of 2.9 m. Finally, the bigger metal-glazed flume, 100 m long, 1.5 m wide, and 1.5 m high, was supplied directly from the main tank using two Jeumont-Schneider hydraulic pumps, one capable of pumping up to 1. mental system Scanivalve, placed at the base of the flume. This system was composed of 44 measuring points distributed in ten transversal rows along a distance of 2.9 m. Finally, the bigger metal-glazed flume, 100 m long, 1.5 m wide, and 1.5 m high, was supplied directly from the main tank using two Jeumont-Schneider hydraulic pumps, one capable of pumping up to 1.7 m 3 s −1 with 4.4 mwc (DC motor 107 kW and 440 V), and the other capable of pumping up to 0.8 m 3 s −1 with 4.25 mwc (DC motor 55 kW and 440 V).

Dimensionless Variables
Using the height of the physical models ( ) and the acceleration of gravity ( ) as the basic variables, and by applying the Buckingham  theorem we can obtain the dimensionless unit discharge ( * ) expressed by Equation (5) and the dimensionless equivalent Darcy's coefficient of permeability ( eq * ) expressed by Equation (6).
Even though Darcy's law is not applicable in coarse materials such as those used in this study, the equipotential lines at the toe of a rockfill shoulder with linear and nonlinear models are nearly vertical [51]. Assuming the maximum hydraulic gradient at the toe of the rockfill dam as being max = 1/ dss , then parameters and of the nonlinear resistance law (Equation (4)) can be converted into a single equivalent Darcy's coefficient of permeability ( eq ) using Equation (7) [10]. The velocity max is that occurring for the maximum gradient max at the toe of the dam.
To compare physical models with different geometries and dimensions, the horizontal lengths were also converted to non-dimensional ( * ) using Equation (8). This dimensionless variable ranges from zero to one, from the downstream edge of the crest to the toe of the dam. * = dss  (8)

Dimensionless Variables
Using the height of the physical models (H) and the acceleration of gravity (g) as the basic variables, and by applying the Buckingham Π theorem we can obtain the dimensionless unit discharge (q * ) expressed by Equation (5) and the dimensionless equivalent Darcy's coefficient of permeability (k * eq ) expressed by Equation (6).
Even though Darcy's law is not applicable in coarse materials such as those used in this study, the equipotential lines at the toe of a rockfill shoulder with linear and nonlinear models are nearly vertical [51]. Assuming the maximum hydraulic gradient at the toe of the rockfill dam as being i max = 1/Z dss , then parameters a and b of the nonlinear resistance law (Equation (4)) can be converted into a single equivalent Darcy's coefficient of permeability (k eq ) using Equation (7) [10]. The velocity v max is that occurring for the maximum gradient i max at the toe of the dam.
To compare physical models with different geometries and dimensions, the horizontal lengths were also converted to non-dimensional (x * ) using Equation (8). This dimensionless variable ranges from zero to one, from the downstream edge of the crest to the toe of the dam. x

Statistical Analyses
The statistical analyses presented in this paper were all performed with the statistical software R (version 4.0.3). The regression models were obtained with the 'lm' function (R 'stats' Package), the hypotheses contrasts were performed with the 't.test' function (R 'stats' Package), and the power analyses with the 'cohen.d' function (Package 'effsize') to calculate the effect size statistics and the 'pwr.2p2n.test' function (Package 'pwr) to compute the power of the test. The contrast hypotheses between the two groups of samples were performed assuming the two variances as equal.

Failure Initiation and Progress
The failure progress was observed to be the same for all tests. These observations were in line with those obtained by other authors [3,19,23,28] for highly permeable materials.
Given the test procedure and regardless of the type of flow (overtopping or throughflow), material gradings, and dam geometry, failure always initiated at the toe of the dam for a unit discharge that must overcome a given threshold (q fi ). Once this threshold was overcome, failure progressed upwards until a new equilibrium state was achieved for a given constant inflow discharge. The failure progress or 'failure paths' obtained during the specific campaign for analyzing the variability of the results are shown in Figure 6.

Failure Initiation and Progress
The failure progress was observed to be the same for all tests. These observations were in line with those obtained by other authors [3,19,23,28] for highly permeable materials. Given the test procedure and regardless of the type of flow (overtopping or throughflow), material gradings, and dam geometry, failure always initiated at the toe of the dam for a unit discharge that must overcome a given threshold ( fi ). Once this threshold was overcome, failure progressed upwards until a new equilibrium state was achieved for a given constant inflow discharge. The failure progress or 'failure paths' obtained during the specific campaign for analyzing the variability of the results are shown in Figure 6.

Failure Discharge
The failure discharge was assumed to be that by which damages inflicted to the downstream shoulder reached the crests of the physical models. For practical reasons, in general, it was defined as the average value between the last discharge step in which damages did not reach the crest ( f,pre ) and the first in which damages surpassed it ( f,pos ). Table A1

Failure Discharge
The failure discharge was assumed to be that by which damages inflicted to the downstream shoulder reached the crests of the physical models. For practical reasons, in general, it was defined as the average value between the last discharge step in which damages did not reach the crest (Q f,pre ) and the first in which damages surpassed it (Q f,pos ). Table A1 in Appendix A summarizes the average failure discharges (Q f = Q f,ave ) obtained for every physical model. Although difficult to compare with other studies from the state of the art (different materials, criteria, etc.), these results are roughly in line with the results of other authors. Test n o 84 (H = 0.5 m, W = 0.4 m, Z dss = 1.5, D50 = 17.3 mm) resulted in a q f = 0.0168 m 2 s −1 , in the same order of magnitude as similar tests performed by Franca and Almeida [28] (H = 0.5 m, W = 2 m, Z dss = 1.5, D50 = 18.9 mm), which obtained a value of q f = 0.0138 ± 0.009 m 2 s −1 .

Hydraulic Pressures
The hydraulic pressures at the base of the dam were measured for the majority of the tests and all discharge steps. Nonetheless, for simplicity, here, only those pressures relevant for the analysis of the results are presented. Table A2 in Appendix A summarizes the average hydraulic pressures at the bases of the physical models n o 85, 87, 90, 94, and 95 for a given discharge step in the early stages of each test, and Table A3 the average hydraulic pressures from tests n o 87, 94, and 95 for a transversal section roughly located at x * = 0.3 (i.e., 30% of the base length from the crest). The average values were obtained using the records of each piezometer in the same transversal row.

Failure Mechanisms
The laboratory tests allowed the identification of two dominant failure mechanisms: particle dragging (PD) and mass sliding or slumping (MS). Slumping occurs predominantly in embankments with steep slopes. In these cases, failure of the downstream slope affected the entire width of the physical model (Figure 7a). Particle dragging is the predominant failure mechanism in embankments with gentle slopes. In these cases, we observed the formation of one or more erosion channels whose final width was smaller than the total width of the physical model (Figure 7b). Figure 7 shows digital elevation models (DEM) resulting from the difference between the original undeformed and the failed embankment for different discharge steps (from top to bottom, the discharge is increasing). Light colors represent eroded areas, while dark colors represent areas of deposition.

The Mechanics of Failure
Particle dragging is all about the individual stability of each particle when subjected to throughflow forces and gradients as well as skimming flow over the shoulder surface. Once a given discharge threshold is overcome, the motion of a single particle (not a group of particles) is observed, changing the stability conditions of the adjacent particles. If this is a 'key' particle, a type of chain reaction will be triggered, leading the adjacent particles to also move downstream, forming an erosion channel. If the particle is not categorized as key, the adjacent particles will remain stable and in place. In the early stages of failure, this mechanism could lead to the formation of several incipient erosion channels along the entire toe of the dam, but in a given moment, only a few will prevail and grow upstream. Here, the seepage conditions could change significantly, leading to the concentration of flow in the prevailing channels and forcing failure to progress through them. Eventually, only one will prevail, completing the failure process of the downstream shoulder. These erosion channels are hourglass-shaped (Figure 7b) with steeper walls than the original slope and widths smaller than the total flume width. If key particles are displaced, these and the corresponding adjacent particles will fall radially into the erosion channel, making it progress upwards as well as laterally. Nevertheless, if the displaced particles

The Mechanics of Failure
Particle dragging is all about the individual stability of each particle when subjected to throughflow forces and gradients as well as skimming flow over the shoulder surface. Once a given discharge threshold is overcome, the motion of a single particle (not a group of particles) is observed, changing the stability conditions of the adjacent particles. If this is a 'key' particle, a type of chain reaction will be triggered, leading the adjacent particles to also move downstream, forming an erosion channel. If the particle is not categorized as key, the adjacent particles will remain stable and in place. In the early stages of failure, this mechanism could lead to the formation of several incipient erosion channels along the entire toe of the dam, but in a given moment, only a few will prevail and grow upstream. Here, the seepage conditions could change significantly, leading to the concentration of flow in the prevailing channels and forcing failure to progress through them. Eventually, only one will prevail, completing the failure process of the downstream shoulder. These erosion channels are hourglass-shaped (Figure 7b) with steeper walls than the original slope and widths smaller than the total flume width. If key particles are displaced, these and the corresponding adjacent particles will fall radially into the erosion channel, making it progress upwards as well as laterally. Nevertheless, if the displaced particles are not key, this will deepen the erosion channel. The deepening of the channel also occurs naturally as the erosion channel progresses upstream. This phenomenon could eventually lead to slumping. In this case, slumping is not the main failure mechanism but a consequence of particle dragging.
Mass sliding or slumping is related to a problem of global instability affecting a certain mass of material and associated with pore water pressures inside the dam. The sliding mechanism is difficult to detect because the sliding surfaces are usually shallow and quasi-parallel to the slope. A way of detecting sliding is by observing the simultaneous movement (not consecutive) of a group of particles. The physical models where this failure mechanism is dominant are not immune to particle dragging. In these cases, if key particles are displaced as a consequence of sliding, this would also trigger a chain reaction like that described previously, leading the failure forefront, which in these cases usually covers the entire width of the model (Figure 7a), to progress upwards.

The Scale Effect
To compare both 'small-scale' physical models and 'prototypes', the Froude similitude was applied to q f . From this similitude theory, it follows that the unit flow discharge scale factor is s q = s 3/2 L . So, comparing both scaled and prototype results of q f (Table 3), we obtain errors of 2.47%, 2.43%, and 2.75% for models with Z dss = 1.5, 2.5, and 3.5, respectively. These errors represent the difference between both results relative to the 'prototype' value. The mean value for the errors is then 2.55% (mean) ±0.14% (standard deviation). Even though it is a simplistic approach, the Froude similitude to scale between 'physical model' and 'prototype' by scaling D50 seems to give good results for this scale factor. Given that it is the phreatic surface elevation flowing through the downstream shoulder and the first emergence point (intersection between the phreatic surface and the downstream slope) that govern failure and define how far it will progress, then, an alternative approach could be scaling the 'unit discharge-permeability' ratio, as this would more accurately scale the water table elevation. By scaling D50, we are indirectly scaling the permeability, albeit by a factor that we do not know and that should be, in theory, s 1/2 L , the scale factor for a velocity. Given the small errors between the prototype and scaled q f , we can conclude that for these uniform gravels, the 'unit discharge-permeability' ratio is being somehow properly scaled by scaling D50 with the limits of size existing in this study.
Another problem associated with scaling D50 is that we are managing gravels with different repose angles in the 'prototypes' and 'small-scale' physical models. A failed shoulder profile presents three different slopes, (i) the original slope not yet affected by the failure, (ii) a zone over the phreatic surface with the dry repose angle, and finally (iii) a zone below the phreatic surface and flow with a submerged repose angle. Because it is the slope over the phreatic surface with the dry repose angle that is defines whether failure reaches the crest of the dam or not, then complete failure of the downstream shoulder depends greatly on the gravel that is being tested. Because the repose angle flattens by reducing the size of the granular materials, that could imply that physical models tested with smaller materials could reach complete failure earlier than if they were tested with coarser materials.

Repeatability
The set of tests dedicated to analyzing repeatability was used to quantify the variability of the results. First, the hydraulic pressures were compared between tests of the same group. Here, we preferred to compare pressures for an early stage of failure, preferably before any major damage was observed to the downstream shoulder because, as failure progresses, the flow net also changes, especially for those cases where particle dragging is the dominant failure mechanism. In these cases, characterized by the formation of one or more erosion channels acting as boreholes or wells, the flow net could change significantly along the width of the physical models suffering a dropdown where the erosion channels are located. As can be observed in Figure 8

Repeatability
The set of tests dedicated to analyzing repeatability was used to quantify the variability of the results. First, the hydraulic pressures were compared between tests of the same group. Here, we preferred to compare pressures for an early stage of failure, preferably before any major damage was observed to the downstream shoulder because, as failure progresses, the flow net also changes, especially for those cases where particle dragging is the dominant failure mechanism. In these cases, characterized by the formation of one or more erosion channels acting as boreholes or wells, the flow net could change significantly along the width of the physical models suffering a dropdown where the erosion channels are located. As can be observed in Figure 8  By observing the failure paths plotted in Figure 6, it can also be concluded that, in general, tests performed under the same group of tests presented the same trajectories. When failure was complete, groups A, B, C, D, and E resulted, respectively, in unit failure discharges around 0.0187 m 2 s −1 (mean) ± 0.0020 m 2 s −1 (one standard deviation), 0.0357 ± 0.0005, 0.0234 ± 0.0016, 0.0216 ± 0.0005, and 0.02 ± 0.0007. The ratios of the standard deviation to the mean, i.e., the coefficients of variation (CV), were, also respectively, 10.9%, 1.4%, 6.9%, 2.1%, and 3.7%, being the mean value of CV = 5.0% ± 3.9%. It can be stated that the results varied within reasonable ranges given the great amount of uncertainty associated with this kind of test.

The Effect of the Downstream Slope
The downstream slope was observed to greatly affect the failure mechanism. The chart presented in Figure 9 was plotted using only the results for the physical models wider than 1 m to avoid having the flume walls affecting the correct development of the erosion channels if that was the case. Also, we only used those models where one of the two mechanisms was dominant at the final stages of failure. In summary, a total of forty- By observing the failure paths plotted in Figure 6, it can also be concluded that, in general, tests performed under the same group of tests presented the same trajectories. When failure was complete, groups A, B, C, D, and E resulted, respectively, in unit failure discharges around 0.0187 m 2 s −1 (mean) ± 0.0020 m 2 s −1 (one standard deviation), 0.0357 ± 0.0005, 0.0234 ± 0.0016, 0.0216 ± 0.0005, and 0.02 ± 0.0007. The ratios of the standard deviation to the mean, i.e., the coefficients of variation (CV), were, also respectively, 10.9%, 1.4%, 6.9%, 2.1%, and 3.7%, being the mean value of CV = 5.0% ± 3.9%. It can be stated that the results varied within reasonable ranges given the great amount of uncertainty associated with this kind of test.

The Effect of the Downstream Slope
The downstream slope was observed to greatly affect the failure mechanism. The chart presented in Figure 9 was plotted using only the results for the physical models wider than 1 m to avoid having the flume walls affecting the correct development of the erosion channels if that was the case. Also, we only used those models where one of the two mechanisms was dominant at the final stages of failure. In summary, a total of forty-one physical models were used. The transition between mechanisms occurs for a range of Z dss varying roughly between 2.0 and 2.5. If we focus on the tests with Z dss = 2.2, it can be observed that eight out of the nine models failed by particle dragging and resulted in the formation of erosion channels. So, for practical purposes, steep slopes were defined to be those steeper than 2.2 and gentle slopes as those smoother than this value. This could be a simplistic way of categorizing between slopes given that the repose angle may most certainly affect it. Nonetheless, the small variation in these angles within the rockfill materials must be taken into account. be those steeper than 2.2 and gentle slopes as those smoother than this value. This could be a simplistic way of categorizing between slopes given that the repose angle may most certainly affect it. Nonetheless, the small variation in these angles within the rockfill materials must be taken into account.
The effect of the slope was also noticed in the hydraulic pressures, which increased as the slope was gentler for the same unit flow. This fact is shown in Figure 10, which presents the hydraulic pressures measured for the first discharge step in tests nº 87, 90, 94, and 95. All of these tests were of the type CPM/NIE, i.e., complete configurations without impervious element and, thus, throughflow. For relative distances of * = 0.2, 0.5, and 0.7, an embankment with a slope dss = 3.0 resulted, respectively, in hydraulic pressures 47.3%, 59.6%, and 78.9% higher than those resulting from an embankment with a slope dss = 1.5. Figure 9. Relationship between the failure mechanisms and the downstream slope ( dss ). The acronyms PD and MS refer to 'particle dragging' and 'mass sliding', respectively. Figure 9. Relationship between the failure mechanisms and the downstream slope (Z dss ). The acronyms PD and MS refer to 'particle dragging' and 'mass sliding', respectively.
The effect of the slope was also noticed in the hydraulic pressures, which increased as the slope was gentler for the same unit flow. This fact is shown in Figure 10, which presents the hydraulic pressures measured for the first discharge step in tests n o 87, 90, 94, and 95. All of these tests were of the type CPM/NIE, i.e., complete configurations without impervious element and, thus, throughflow. For relative distances of x * = 0.2, 0.5, and 0.7, an embankment with a slope Z dss = 3.0 resulted, respectively, in hydraulic pressures 47.3%, 59.6%, and 78.9% higher than those resulting from an embankment with a slope Z dss = 1.5.
Water 2022, 14, x FOR PEER REVIEW 14 of 29 one physical models were used. The transition between mechanisms occurs for a range of dss varying roughly between 2.0 and 2.5. If we focus on the tests with dss = 2.2, it can be observed that eight out of the nine models failed by particle dragging and resulted in the formation of erosion channels. So, for practical purposes, steep slopes were defined to be those steeper than 2.2 and gentle slopes as those smoother than this value. This could be a simplistic way of categorizing between slopes given that the repose angle may most certainly affect it. Nonetheless, the small variation in these angles within the rockfill materials must be taken into account.
The effect of the slope was also noticed in the hydraulic pressures, which increased as the slope was gentler for the same unit flow. This fact is shown in Figure 10, which presents the hydraulic pressures measured for the first discharge step in tests nº 87, 90, 94, and 95. All of these tests were of the type CPM/NIE, i.e., complete configurations without impervious element and, thus, throughflow. For relative distances of * = 0.2, 0.5, and 0.7, an embankment with a slope dss = 3.0 resulted, respectively, in hydraulic pressures 47.3%, 59.6%, and 78.9% higher than those resulting from an embankment with a slope dss = 1.5. Figure 9. Relationship between the failure mechanisms and the downstream slope ( dss ). The acronyms PD and MS refer to 'particle dragging' and 'mass sliding', respectively.    (Figure 8). Although pressures measured in test nº 85 [ dss = 2.2] were similar to that measured in test nº 87 (Figure 8), in this figure it was decided not to average them because in test nº 85 they correspond to a discharge of 0.005 m 3 s −1 . Figure 11 presents the evolution of the hydraulic pressure measured during tests nº 87, 94, and 95 in a single section of the physical models, roughly located at * = 0.3 (30% of the base length from the crest). Here, it can also be seen that, for a given unit flow, the gentler the slope, the higher the hydraulic pressures, at least in the early stages of the tests. In a given moment, i.e., for a given discharge step, pressures measured in the physical models with dss = 3.0 suffered a sudden decrease that matched the formation of an erosion channel, as can be observed in Figure 12. It must be noted that this pressure dropdown was not observed in test nº 87 ( dss = 2.2) even though the failure mechanism of this test was the same as that of tests nº 94 and 95 ( dss = 3.0), as can be observed in Figure 13. This observation denotes that the failure mechanism is not enough to explain the pressure dropdown and that other variable/s should also be considered-for example, the geometry and dimensions of the erosion channel.
The repeatability campaign groups C ( dss = 2.2), D ( dss = 1.5), and E ( dss = 3.0) were compared with each other to assess the possible effect of this variable on f . From a physical point of view, and considering only the range of slopes for which the dominant failure mechanism is mass sliding or slumping, it could be expected that the steeper slopes Here, it can also be seen that, for a given unit flow, the gentler the slope, the higher the hydraulic pressures, at least in the early stages of the tests. In a given moment, i.e., for a given discharge step, pressures measured in the physical models with Z dss = 3.0 suffered a sudden decrease that matched the formation of an erosion channel, as can be observed in Figure 12. It must be noted that this pressure dropdown was not observed in test n o 87 (Z dss = 2.2) even though the failure mechanism of this test was the same as that of tests n o 94 and 95 (Z dss = 3.0), as can be observed in Figure 13. This observation denotes that the failure mechanism is not enough to explain the pressure dropdown and that other variable/s should also be considered-for example, the geometry and dimensions of the erosion channel.
The repeatability campaign groups C (Z dss = 2.2), D (Z dss = 1.5), and E (Z dss = 3.0) were compared with each other to assess the possible effect of this variable on q f . From a physical point of view, and considering only the range of slopes for which the dominant failure mechanism is mass sliding or slumping, it could be expected that the steeper slopes within this range resist higher flow discharges than gentler slopes as a result of the higher hydraulic gradients that lead to lower phreatic surfaces and lower pressures. This hypothesis was confirmed in Figure 10. If we now expand the range of slopes and compare those equal to 1.5 and 3.0 through a one-sided test, we obtain that steep slopes completely fail for higher flow discharges than gentle slopes for a p-value = 0.086 and a power of 64.9%. We could accept the alternative hypothesis (steep slopes resist higher flow discharges) for a 0.1 significance level (α), thus having a 10% and 35.1% chance of committing Type I and II errors, respectively. When contrasting the slopes 2.2 and 3.0 also through a one-sided test, we obtain a p-value = 0.035 and a power of 74.0%. We could accept the alternative hypothesis even for a lower significance level, α = 0.05, thus having a 5% and 26.0% chance of committing Type I and II errors, respectively. Therefore, steeper slopes seem to resist higher unit discharges than gentler slopes, a trend that is also observed in Figure 14. Nonetheless, it must be noted that this tendency was not observed, for example, during the scale effect campaign, summarized in Table 3, nor when comparing slopes 1.5 and 2.2 also through a one-sided test, where we obtained a p-value = 0.955 and a power of 0.06% (for α = 0.1). These are contradictory results that should be examined more deeply in future investigations. They could be related more to testing variability (tests performed throughout different R&D projects, different laboratories, measurement techniques, discharge steps, etc.) rather than the variability of the results that we already saw to be small when repeating the same test in the same conditions. within this range resist higher flow discharges than gentler slopes as a result of the higher hydraulic gradients that lead to lower phreatic surfaces and lower pressures. This hypothesis was confirmed in Figure 10. If we now expand the range of slopes and compare those equal to 1.5 and 3.0 through a one-sided test, we obtain that steep slopes completely fail for higher flow discharges than gentle slopes for a − value = 0.086 and a power of 64.9%. We could accept the alternative hypothesis (steep slopes resist higher flow discharges) for a 0.1 significance level ( ), thus having a 10% and 35.1% chance of committing Type I and II errors, respectively. When contrasting the slopes 2.2 and 3.0 also through a one-sided test, we obtain a − value = 0.035 and a power of 74.0%. We could accept the alternative hypothesis even for a lower significance level, = 0.05, thus having a 5% and 26.0% chance of committing Type I and II errors, respectively. Therefore, steeper slopes seem to resist higher unit discharges than gentler slopes, a trend that is also observed in Figure 14. Nonetheless, it must be noted that this tendency was not observed, for example, during the scale effect campaign, summarized in Table 3, nor when comparing slopes 1.5 and 2.2 also through a one-sided test, where we obtained a − value = 0.955 and a power of 0.06% (for = 0.1). These are contradictory results that should be examined more deeply in future investigations. They could be related more to testing variability (tests performed throughout different R&D projects, different laboratories, measurement techniques, discharge steps, etc.) rather than the variability of the results that we already saw to be small when repeating the same test in the same conditions.     One last observation must be taken into consideration. Here phasize that we were dealing with slopes that were unstable in th i.e., we were dealing with physical models that failed. So, if a gentle ing, is not subjected to flow discharges capable of dragging its part ment will remain stable. On the other hand, the same dam construc unstable to slumping would fail, so in this case, gentler slopes wo

Other Effects
There are clear differences between the physical processes rela throughflow. Nonetheless, characteristics of throughflow such as atic surface elevation (including the first emergence point) are no between both types of flow [10]. So, because it is the water level One last observation must be taken into consideration. Here, we would like to emphasize that we were dealing with slopes that were unstable in throughflow conditions, i.e., we were dealing with physical models that failed. So, if a gentle slope, stable to slumping, is not subjected to flow discharges capable of dragging its particles, then this embankment will remain stable. On the other hand, the same dam constructed with a steeper slope unstable to slumping would fail, so in this case, gentler slopes would be more resistant.

Other Effects
There are clear differences between the physical processes related to overtopping and throughflow. Nonetheless, characteristics of throughflow such as pressures or the phreatic surface elevation (including the first emergence point) are not significantly different between both types of flow [10]. So, because it is the water level inside the rockfill dam and the position of the first emergence point that governs the failure of the downstream shoulder by determining how far it will progress, and since these do not change significantly when changing the type of flow, we would expect to obtain a similar failure discharge q f in each case, for example, CPM/UF vs. CPM/NIE or PPM/CC vs. PPM/NIE (Figure 1). Results of the statistical analysis performed for tests n o 140, 141, 142 (CPM/NIE) and n o 143 and 146 (CPM/UF) support this idea. The p-value = 0.027, resulting from a two-tailed test, allows us to reject the alternative hypothesis for α = 0.01 and claim that the type of flow does not affect q f . The statistical power of this analysis is about 93%, so we have a small probability of also committing the Type II statistical error. In other words, we have a 7% chance of rejecting the alternative hypothesis when it should be accepted. Besides this comparison, Figure 15 shows the relation between k * eq and q * f differentiating between tests performed with overtopping and throughflow. Differences between both scatter plots are negligible, especially in the lower part of the chart. other words, we have a 7% chance of rejecting the alternative hypothesis when it s be accepted. Besides this comparison, Figure 15 shows the relation between eq * a differentiating between tests performed with overtopping and throughflow. Diffe between both scatter plots are negligible, especially in the lower part of the chart. Those physical models where the differentiating variable was the type of impe element (central core or upstream face) were also compared. In this case, we could how expect to observe some differences in the value of f . The process through water infiltrates was the same, overtopping, but the flow paths inside the body of th were longer for dams with an upstream face, which did not necessarily imply a diff in the position of the first emergence point. Results from the statistical analysis perf on tests nº 22 and 123 (PPM/CC) and nº 124 and 128 (CPM/UF) indicate that th of impervious element does not have a significant impact on the value of f . The p-v 0.124 allows us to reject the alternative hypothesis for an = 0.05. The statistical is about 73%, so here we have a 27% chance of rejecting the alternative hypothesis it should be accepted. Crossing these results with Figure 16 makes us confident to for now, that the type of impervious element does not affect f . Figure 17 presen images of both tests and discharge steps used in the construction of the Figure 1 Besides this comparison, Figure 18 shows the relation between eq * an Figure 15. Variation of q * f with k * eq differentiating by the type of flow.
Those physical models where the differentiating variable was the type of impervious element (central core or upstream face) were also compared. In this case, we could somehow expect to observe some differences in the value of q f . The process through which water infiltrates was the same, overtopping, but the flow paths inside the body of the dam were longer for dams with an upstream face, which did not necessarily imply a difference in the position of the first emergence point. Results from the statistical analysis performed on tests n o 22 and 123 (PPM/CC) and n o 124 and 128 (CPM/UF) indicate that the type of impervious element does not have a significant impact on the value of q f . The p-value = 0.124 allows us to reject the alternative hypothesis for an α = 0.05. The statistical power is about 73%, so here we have a 27% chance of rejecting the alternative hypothesis when it should be accepted. Crossing these results with Figure 16 makes us confident to state, for now, that the type of impervious element does not affect q f . Figure 17 presents the images of both tests and discharge steps used in the construction of the Figure 16 plot. Besides this comparison, Figure 18 shows the relation between k * eq and q * f differentiating between tests performed with upstream face (UF) and central core (CC). Here, we also cannot see any clear separation between both scatter plots. differentiating between tests performed with upstream face (UF) and central core (CC).
Here, we also cannot see any clear separation between both scatter plots.   differentiating between tests performed with upstream face (UF) and central core (CC).
Here, we also cannot see any clear separation between both scatter plots.

Research Scope and Limitations
The inferential analysis seeks patterns from a population through samples of it. In this kind of analysis, the aphorism 'more is better' referring to the sample size is true because samples with a higher number of observations imply smaller confidence intervals and more reliable conclusions. But the truth is that researchers are most of the time far from this ideal goal working even with extremely small samples. In these cases, statistical power analysis assumes an important role by including the probability of committing Type II errors besides the traditional assessment of the Type I error [52]. Alongside size sampling, this study contains a series of other limitations, including, for example, the size of the physical models that could lead to scale effects, the use of uniform limestone materials, preventing a thorough analysis of the effect of other material parameters, and the small number of tests repeated in the same conditions.

Research Scope and Limitations
The inferential analysis seeks patterns from a population through samples of it. In this kind of analysis, the aphorism 'more is better' referring to the sample size is true because samples with a higher number of observations imply smaller confidence intervals and more reliable conclusions. But the truth is that researchers are most of the time far from this ideal goal working even with extremely small samples. In these cases, statistical power analysis assumes an important role by including the probability of committing Type II errors besides the traditional assessment of the Type I error [52]. Alongside size sampling, this study contains a series of other limitations, including, for example, the size of the physical models that could lead to scale effects, the use of uniform limestone materials, preventing a thorough analysis of the effect of other material parameters, and the small number of tests repeated in the same conditions.

A Regression Model for the Failure Discharge
Taking into account the discussion of the results presented in the previous sections and the scope and limitations of the research, we propose a regression model that can be used to estimate the failure discharge of a rockfill downstream shoulder (q f ), i.e., that overtopping or throughflow discharge with a failure degree that reaches the crest of the dam. This model depends on the equivalent Darcy's coefficient of permeability (k eq ), expressed by Equation (7), on the downstream slope (Z dss ) and height (H) of the dam, and on the acceleration of gravity (g). To be precise, two regression models were calibrated, one for 'steep' and one for 'gentle' slopes, with the critical slope Z dss = 2.2 for the granular materials used in this research and an angle of repose around 41 • . Both models pass through the origin since this is a non-tested data point, i.e., in the limit, a dam with k * eq = 0 should need no overtopping to fail. Equations (9) and (10) should be used for gentle and steep slopes, respectively. If no distinction is desired between these two categories, then Equation (11) should be used. The first two equations both have a coefficient of determination R 2 = 0.97, while the third uses 0.95. It should be noted that tests 114 and 129 were excluded from Equation (9) because these were outliers with Cook's distances of roughly 1 and 0.5, respectively, and that these calibrations included only those tests where the permeability of the materials was obtained. Equations (9) and (10) can be observed fitted to data in Figure 19.  Figure 19. Regression models proposed to estimate the failure discharge slopes ( dss < 2.2) and gentle slopes ( dss > 2.2). Figure 19. Regression models proposed to estimate the failure discharge (q f ) for dams with steep slopes (Z dss < 2.2) and gentle slopes (Z dss > 2.2).

Conclusions
This study allowed the identification of two different failure mechanisms for highly permeable rockfills subjected to overflowing: slumping and particle dragging. The occurrence of one or the other is heavily dependent on the slope of the embankment and the size of the particles. Slumping is related to a problem of global instability of a certain mass of material and is predominant in embankments with steep slopes (Z dss < 2.2). In these cases, failure of the downstream slope affected the entire width of the physical models. On the other hand, when particle dragging is the predominant failure mechanism, it means that we are dealing with slopes that are stable to slumping. In these cases, which are all about the individual stability of each particle when subject to seepage forces and hydraulic gradients, we observed the formation of erosion channels whose width was smaller than the total width of the physical models. The critical slope, i.e., the slope that defines the limit for the occurrence of one mechanism or the other, was identified to be Z dss = 2.2 for the characteristics of the materials used in this study.
The downstream slope was also observed to affect both the hydraulic pressures at the base of the physical models and the value of the failure discharge, i.e., the inflow that forces failure to reach the crest of the dam. For a discharge of 0.007 m 3 s −1 , the hydraulic pressures measured in a dam with Z dss = 3.0 were, on average, 61.9% higher than in a dam with Z dss = 1.5. Regarding failure, steep slopes (Z dss < 2.2) were prone to fail for higher discharges than gentle slopes (Z dss > 2.2) or, in other words, tended to be, on average, more resistant. Given this observation, two regression models were proposed to estimate the unit failure discharge (q f ) of dams with gentle and steep slopes, Equations (9) and (10), respectively, both functions of the equivalent Darcy's coefficient of permeability (k eq ), expressed by Equation (7), the downstream slope (Z dss ), the height of the embankment (H), and the acceleration of gravity (g). If no distinction is desired to be made between slopes, then q f should be estimated with Equation (11).
Other factors such as the type of flow (overtopping or throughflow) or the type of impervious element (central core or upstream face) were not observed to affect the hydraulic pressures or the failure discharge. Funding: This research was funded by the Spanish Ministry of Science and Innovation, grant number BIA2010-21350-C03-03 (Project EDAMS-Rotura del elemento impermeable de presas de materiales sueltos en situación de sobrevertido y análisis de protecciones combinando modelación física e inteligencia artificial), and by the Spanish Ministry of Economy and Competitiveness, grant number BIA2007-68120-C03-02 (Project XPRES-Caracterización de la rotura de las presas de escollera por sobrevertido y desarrollo de criterios para evaluar la seguridad del conjunto presa-área afectada durante una avenida).
Acknowledgments: Within the scope of the funding research projects, we would like to thank Ángel Lara, Rafael Cobo, Cristina Lechuga, Isabel Berga, and Hibber Campos for their participation in these projects.

Conflicts of Interest:
The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

Nomenclature
The following symbols and acronyms are used in this paper: