1. Introduction
Flood frequency analysis (FFA) is commonly used in hydrology to relate extreme river flow to its frequency of occurrence, i.e., to estimate flood quantile [
1]. The application of statistical methods in this analysis implies long-term, reliable gauge flow data [
2]. For instance, a known issue with gauged high flows is the uncertainty of the rating curve that provides the relationship between the water flow and the corresponding water stage in that particular domain of interest [
3,
4]. The application of statistical methods implies homogeneous and stationary datasets, which contradicts the increasing evidence for climate change [
5,
6], although some flood-related studies suggest the opposite [
7,
8,
9,
10]. Assuming that the datasets are homogeneous and reliable, the next question is whether the population distribution function and its parameters can be reliably estimated from the available data record length.
Some countries have suggested procedures, distribution functions, and parameter estimation methods for FFA. In the USA, the Log Pearson 3 (LP3) function is used, with the parameters estimated by the expected moments algorithm (EMA) [
11]; in the Russian Federation, either Kritsky–Menkel, Pearson 3 or Log Normal distribution is used, depending on the ratio of the skew (
Cs) and coefficient of variation (
Cv), with the use of the approximate method of maximum likelihood (MLE) [
12], while in Germany, the choice of distribution function depends on the data record length [
13]. The use of three-parameter distributions in Germany is allowed for data records that are at least 50 years long. In addition, MLE, the method of ordinary moments and L-moments are used to estimate the distribution parameters, without preference. It is assumed that all methods lead to approximately equal results, and if this is not the case, it is necessary to include other information. In general, such inclusion is conducted with the temporal, spatial, and causal expansion of information used for the improvement of flood quantile estimation [
14,
15].
Spatial data transfer, in the focus of this research, includes methods of regionalization for estimating parameters or flood quantiles. It is most often used when the data record is short or completely absent (ungauged catchment/site). The spatial information expansion includes spatial regionalization and statistical regionalization. Spatial regionalizations imply the creation of envelope curves, specific runoff diagrams [
13], maps of quantiles or maps of a certain statistical parameter. Envelope curves and specific runoff diagrams show high flows in relation to the catchment area or a section of the river network and serve to verify the estimated peak flows. The extrapolation of envelope curves for areas outside the observation range is problematic [
13].
Statistical regionalization procedures imply the determination of the relationship between the parameters/quantiles of high-flows and the morphological and/or meteorological parameters of the catchment. When the flow data record is shorter than half of the required return period, regional statistics can help to improve the dataset statistics [
13]. In statistical regionalization procedures, two steps are crucial: the formation of homogeneous regions and the transfer of information, that is, the flood flow estimation. Many regionalization methods have been recommended in the literature, and the difference between them is in these two steps.
Assuming that similar processes take place in areas that are physically close to each other, geographically continuous regions are often considered homogeneous regions for flood flow estimation. This approach is subjective because neighboring areas do not always behave in a hydrologically similar way [
16]; hence, cluster methods [
17,
18,
19] or the Region of Influence (ROI) method [
20,
21] are used.
In the ROI and cluster analyses, it is necessary to select similarity attributes for grouping (regionalizing) catchments. Given that they are easily available, geomorphological properties are most often used [
22,
23], then hydro-meteorological properties [
24,
25] or seasonality of peak flows [
26,
27], which has the advantage of high accuracy and robustness of the data, dates of occurrence of peak flows, form of distribution [
27,
28], etc.
Due to its practicality, the use of cluster analysis in flood regionalization is frequent [
17,
18]. Most often, clustering is performed using Ward’s algorithm (hierarchical approach) [
26,
29] or the k-means algorithm (partition approach) [
30,
31]. The advantage of the hierarchical approach is in the way the results are presented (dendrogram), from which regions can be determined for an arbitrary distance or number of cluster centers (CCs). In the partition approach, this is not possible, but the number of CCs has to be specified in advance. As the result may depend on the initially assumed CCs, the algorithm needs to be run several times. Due to the hierarchic approach, it happens that the object (catchment) assigned to the CC at an early stage is found to be inadequate after the formation of the CC/region [
17]. The fact that objects cannot migrate from one CC to another is the shortcoming of a hierarchical approach. To take advantage of both clustering methods, a hybrid cluster analysis [
18] was proposed, which implies a hierarchical approach in the first step, and then its result represents the assumed starting CCs for the partition approach in the second step.
In the study area for this research, i.e., Bosnia and Herzegovina, specific runoff diagrams or regression that relates flood quantiles and catchment area are most often used for flood quantile estimation in ungauged catchments [
32]. It was shown that this approach can give significant under-/overestimation of flood quantiles and that geographically continuous regions of water basins are considered homogeneous regions without testing [
33,
34]. Other methods for flood data transfer used to improve flood quantile estimation included an extension of the gauged peak flow data based on gauged water stage, assumed rating curves [
35], and historic floods [
36]. A comparative presentation of statistical results, empirical expressions, and a geomorphological unit hydrograph using the EBA4SUB model is also considered [
37]. Regional analysis using the cluster method was performed for stations from Bosnia and Herzegovina with the addition of stations from Serbia, where the advantage of hierarchical (Ward’s algorithm) over partition (k-means) clustering was demonstrated in flood quantile estimation [
33].
In this study, another approach to forming regions for flood quantile estimation is investigated, which is fundamentally hierarchical. The adequacy of assigning objects to CCs is examined through the (mean) silhouette width value [
38], which is intended for determining the optimal number of CCs [
39] and as a measure of the region compactness [
40]. The advantage of silhouette width is that it does not require a training set to evaluate clustering results [
41]. Similarly to the k-means algorithm, the proposed procedure also requires an iterative method. The important difference is that the k-means algorithm depends on the assumed initial CCs, while in the proposed method, this is not the case. Based on quantitative measures such as silhouette width, the same input data give a unique result.
The research question is as follows: is silhouette-width-induced clustering of catchments appropriate for the formation of flood estimation regions in a data-scarce environment? This research was conducted for the study region of Bosnia and Herzegovina and Serbia and is based on the findings from the initial study [
34]. The appropriateness of the silhouette-width-induced clustering was rated using the performance-based assessment that also included region homogeneity examination and flood information transfer by the index-flood method. The flood quantile estimation procedure for pseudo-ungauged catchments was conducted in the common way used in Bosnia and Herzegovina.
4. Discussion
There are two approaches in the comparison of regionalization methods: the first approach ends with the definition of homogeneous regions, and the second approach continues through information transfer down to the results of regionally assessed values [
55]. The second approach is a performance-based assessment, conducted here with a focus on the silhouette-width-induced clustering outcome in flood quantile estimates. This approach has its known drawbacks due to uncertainty accumulation as the assessment advances (e.g., [
17,
55]).
Clustering based on silhouette width has so far been performed in other disciplines on large data samples, resulting in fewer CCs/regions compared to the number used here for flood quantile estimation [
40].
Marková et al. [
56] found 3 regions for flood seasonality in 556 Slovak and Austrian catchments where the optimal number of clusters was found according to the mean silhouette width (MSW). It was not shown whether these regions are homogenous.
The choice between an optimal number of regions here is also made according to maximized MSW, between possibly 3 and 7 (
Figure 5), while trying to achieve the recommended number of elements, i.e., HSs in a region for hydrological applications [
17,
23].
Table 6 shows realized MSWs in six regions and their size.
According to the Rousseeuw’s MSW range [
38], region/CC2_adj has a strong clustering structure (0.51 < MSW < 0.70), while the rest of the regions have weak clustering structures (0.26 < MSW < 0.50). The sizes of the individual regions are between the recommended 5 and 20.
The region size plays an important role in homogeneity determination [
17]. Small regions may lead to false homogeneity conclusions, large regions are rarely homogenous, and comparison among regions with different sizes is complicated due to the ‘regional homogeneity’ concept, which is different from the heterogeneity concept considered in other fields [
55]. Therefore, Requena et al. [
55] proposed using the Gini index (GI) as a regional homogeneity (heterogeneity) measure that overpowers HW and AD tests in many aspects. In further research, GI will be used because the results obtained for pseudo-ungauged HSs here have shown that good
Q100 estimation results originate from both homogeneous and heterogeneous regions. In the _adj CCs, the conclusions of the HW and AD bootstrap tests are more consistent compared to the _org CCs (
Table 3), and the efficiency measures of MAF (
Table 5),
q100, and
Q100 (
Table 6) are overall better, although some values are significantly underestimated or overestimated.
When strictly observing results obtained from the homogenous region CC2_adj, which also has the best MSW among all CCs and a balanced size, the range of percent bias in Q100 is −50% to 40%, with one high outlier of 100%. The resulting bias range in q100 is from—24% to 30%. Such results point to the poor performance of the catchment area as an attribute used in the flood transfer model.
A single regression was used as a transfer model, with the catchment area as an independent variable. Despite significant values of
R2, large percent biases in MAF were observed in the subsequent jackknife procedure. It is assumed that this simple regression model is not adequate and that new variables should be introduced. The selection of new variables is not straightforward [
57,
58]. In a recent performance-based regionalization study for Ethiopia [
59], 27 catchment attributes were considered: 9 morphologic and location, 5 soil, 12 land use, and mean annual rainfall. In the three homogeneous regions out of the four defined, MAF regression equations were developed using a total of six attributes: three morphologic, two land use, and one soil. Per each region, the MAF relative errors were 51%, 4%, and 17%. The flood quantile comparison was not shown. De Michele and Rosso [
60] have shown a multi-level approach to flood frequency regionalization based on physical and statistical criteria/attributes. Flood, streamflow, and rainfall seasonality were studied and two indices were used to cluster catchments with the same flood production mechanism. In the area of Northwestern Italy, similar to the one studied here, 80 HSs were used, and 4 regions were defined according to seasonality analysis. It appears that precipitation and flood characteristics are unavoidable candidates for further attribute inclusion in regionalization. Additionally, in FFA, it is assumed that the upper limit of the
Cs of flood flow is limited by the
Cs of precipitation [
61]. This means that not only can precipitation data be used as an attribute, but they could explain suspicious values of regionally assessed flood quantiles as well. Sometimes, in such cases, the value of the distribution parameters is limited to acceptable values [
62].
The geological structure of the 53 catchments considered here is characterized by a wide range of karst content [
34]. It is known that karst potentially reduces or amplifies flood peaks and influences the catchment boundary due to the complex conditions for surface and groundwater flow [
63,
64]. A karst share influence on MAF prediction is implicitly shown in
Figure 9, where it may be observed that MAF in catchments with high karst share (>50%) is generally underestimated.
Apart from catchment attribute selection in MAF regression, the region size influences the reliability of regression coefficient estimation. It is shown here that small regions with five HSs may lead to ill-posed regression lines, such as in CC5 (
Figure 8). Mc Cuen [
65] suggested four times more data than the number of parameters for reliable correlation estimation. In this regard, a simple regression model with two parameters requires eight data points (HSs), plus one for the (pseudo-) ungauged HS, meaning that nine HSs is the minimum size of a region. From this perspective, CC_adj 1, 2, and 6 and CC_org 2 have reliable MAF regression equations.
In the regional flood growth curve estimations, the GEV distribution function is used for all regions. According to the L-moment ratio plots (
Figure 11), for three CCs (_org 5, _adj 5, and _adj 3) other functions fit better but the relative biases in
q100 estimates are similar to other CCs (
Figure 13). The adoption of different regional distribution functions in regions has been performed by other researchers (e.g., [
59]), and this may contribute to result improvement in future research. Due to input data issues (e.g., data gaps), the use of LP3 with the expected moments algorithm (EMA) should be reconsidered [
11]. Overall better
Q100 estimates were achieved in previous research [
33] conducted using the Hydrologic Engineering Center’s Statistical Software Package (HEC-SSP) where EMA is installed. In addition to the 53 HSs here, more HSs were included, and a study area comprising 74 HSs was delineated in the three regions. The study period was 1961–1990.
The silhouette width of an individual HS has been shown as a good indicator of belonging to the wrong region but not as an indication of
q100 or
Q100 estimate quality in the circumstances presented here. The silhouette widths in HS 16 and HS 52 are among those showing a high clustering structure (>0.50). When these HSs are considered pseudo-ungauged in the jackknife procedure, the region is homogeneous (
Figure 7), but their
q100 estimates are among the most overestimated (~50%) (
Figure 13). The percent biases in
q100 or
Q100 estimates in the region of CC_adj 2, holding most of the highest achieved silhouette widths among HSs, are slightly less spread compared to other CCs (
Figure 13 and
Figure 14). Still, the HSs with the highest silhouette widths are not those with the best
q100 or
Q100 estimates.