Next Article in Journal
Application of LSTM and Climate Index for Prediction of Meteorological Drought in South Korea
Previous Article in Journal
AI-Driven Time Series Forecasting of Coastal Water Quality Using Sentinel-2 Imagery: A Case Study in the Gulf of Thailand
Previous Article in Special Issue
Assessing the Effect of Bias Correction Methods on the Development of Intensity–Duration–Frequency Curves Based on Projections from the CORDEX Central America GCM-RCM Multimodel-Ensemble
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Statistical and Physical Significance of Homogeneous Regions in Regional Flood Frequency Analysis

1
School of Engineering, Design and Built Environment, Building Penrith Campus, Western Sydney University, Penrith, NSW 2747, Australia
2
Department of Electrical and Computer Engineering, North South University, Dhaka 1229, Bangladesh
3
CSIRO Environment, Black Mountain, Canberra, ACT 2601, Australia
4
Translational Health Research Institute, School of Medicine, Western Sydney University, Campbelltown, NSW 2560, Australia
*
Author to whom correspondence should be addressed.
Water 2025, 17(12), 1799; https://doi.org/10.3390/w17121799
Submission received: 19 April 2025 / Revised: 3 June 2025 / Accepted: 9 June 2025 / Published: 16 June 2025

Abstract

This study investigates formation homogeneous regions in regional flood frequency analysis (RFFA) and compares two RFFA methods, the quantile regression technique (QRT) and the index flood method (IFM). A total of 201 gauged stations from southeast Australia were adopted in this study. Multivariate statistical techniques were applied to form candidate regions. Also, regions are formed in the L-moments space (such as the L coefficient of variation (LCV) and L coefficient of skewness (LCS) of annual maximum flood data). Hosking and Wallis test statistics were used to find discordant sites and for testing the homogeneity of the assumed regions. No homogeneous regions were found in southeast Australia based on catchment characteristics data; however, homogeneous regions can be formed in the space of L-moments. It was found that regions formed in the L-moments space have little link with the catchment characteristics data space. The QRT provides more accurate flood quantile estimates than the IFM.

1. Introduction

Floods are a natural disaster that cause significant loss to the economy and livelihoods [1,2,3]. In the USA, floods account for more than 75 percent of US federal disaster declarations [4]. Flood risk assessment is essential [5] to minimize flood damage, and in this regard, design flood estimation is widely used in risk-based planning and design [6,7,8,9]. Design flood estimation is widely used in the design of hydraulic structures [10], floodplain management, and water resource planning [11] studies.
In design flood estimation at a given location, recorded flood data of sufficient length and good quality are needed. However, at many locations of interest, recorded flood data are either non-existent or of limited length or of poor quality. In these cases, regional flood frequency analysis (RFFA) is adopted, which aims to transfer flood characteristics from a homogeneous region to an ungauged location [12,13,14].
The index-flood method (IFM) is widely used in RFFA [15]. The IFM assumes that flood distributions within a homogeneous region differ only by a scaling factor. The L-moments-based IFM has become quite popular in RFFA as L-moments provide more accurate estimation of distributional parameters than the ordinary product moments [16]. The parameter regression technique (PRT) and quantile regression technique (QRT) are also widely used in RFFA, which are not strictly dependent on the homogeneity assumption [17,18].
In RFFA, one of the most challenging steps is to identify homogeneous regions [19]. Regions in RFFA are generally formed in two ways: (a) fixed regions—where regions are formed based on geographic boundary; and (b) the region of influence (ROI) approach—where a local region is formed either in the catchment data space or in a geographical space [20].
There are statistical methods to test the degree of homogeneity of a hypothesized region [16,21]. These tests have been widely adopted in RFFA. However, there are cases where perfect homogeneous regions cannot be established, e.g., in Australia, the attempts to form homogeneous regions were not successful [22,23,24,25]. There have been limited studies on the physical significance (e.g., geographical contiguity) of homogeneous regions. To fill this knowledge gap, this study aims to answer the following questions: (i) Whether regions formed based on catchment area and drainage division deliver homogeneous regions in southeast Australia? (ii) Whether regions formed in the space of flood characteristics have meaningful linkage with catchment characteristics? (iii) In a heterogeneous region like southeast Australia, whether regression-based RFFA techniques (such as QRT) are preferable to the IFM?
This study forms regions in several ways, i.e., (i) regions based on catchment size; (ii) regions based on drainage divisions; (iii) regions based on basin proximity; and (iv) regions based on the L-moments space. The regions formed in the L-moments space are linked with the catchment characteristics data space and geographical space. Furthermore, this study compares two RFFA methods, the QRT (where the homogeneity assumption is relaxed) and the IFM (where the homogeneity of a region is a prerequisite).

2. Data and Methodology

The adopted methodology is illustrated in Figure 1.

2.1. Study Area

The States of New South Wales (NSW) and Victoria (VIC) in the southeast part of Australia were selected as the study area. A total of 201 gauged stations were selected in this study, as shown in Figure 2. The reason for selecting this part of Australia is that this has better quality streamflow data as compared to other parts of Australia [23,24]. The selected catchments are predominantly natural and are not subject to any major urbanization and artificial storage [22,25]. Geographically, these catchments lie within the range from latitudes −38.76° to −28.36° south and longitudes from 141.47° to 153.50° east. This area is dominated by highly variable hydrology and various climates such as cool, hot, dry, humid, arid, and semi-arid. Bates et al. [22] suggested that the weather of the study region is controlled by an extratropical high-pressure system from west-to-east, which generates heavy rainfalls and thunderstorms. The Murray–Darling and South-East Coast divides the study area into two main drainage divisions by vast ridges of mountain relief. It is known as the Great Dividing Range (GDR). Coastal areas are separated from the plain inland areas by this GDR. The Snowy Mountain regions and the Victorian Alps, the highest mountains in the Australian mainland, are the part of the GDR and fall in our study area.

2.2. Exploratory Data Analysis

Based on the recommendations for catchment selection criteria [26,27], this study used eight catchment and climatic characteristics as predictor variables. These are: (i) catchment area (AREA, km2), (ii) design rainfall intensity with duration of 6 h and 1 in 2 AEP (I62, mm/h), (iii) mean annual rainfall (MAR, mm), (iv) shape factor (SF), (v) mean annual evapotranspiration (MAE, mm), (vi) stream density (SDEN, km−1), (vii) slope of central 75% of mainstream (S1085, m/km), and (viii) fraction of area forested (FOREST, fraction). The predictor variables data were obtained from the Australian Bureau of Meteorology (BoM) website and Australian Rainfall and Runoff Revision Project 5 Regional flood methods [28]. Many Australian RFFA studies also adopted most of these predictors [22,25,27,29,30,31].

Catchment and Climate Characteristics

The value of the catchment area (AREA) ranges from 3.00 to 1010.00 km2 with a mean of 333.99 km2 and standard deviation (SD) of 262.40 km2. Only one station has an AREA of 1010 km2, but the majority of them are below 400 km2 (n = 138, ~69%). The shape factor (SF) values lie between 0.26 and 1.63 (mean ± SD; 0.78 ± 0.21) with a median of 0.78. The mean and median values of stream density (SDEN) of the selected catchments are 2.10 km−1 and 1.69 km−1, respectively (SD = 1.06 km−1). Similarly, these figures for mainstream slope (S1085) are 13.19 m/km, 9.50 m/km, and SD = 11.67 m/km, respectively. The values for the forest fraction (FOREST) vary from 0 to 1 with a mean of 0.55 and the median is 0.59 (SD = 0.34).
With a range of 25 to 89 years (mean ± SD; 45 ± 9.63), the record length of streamflow data for majority of stations (n = 137, ~68%) lies between 40 to 50 years. Additional summary statistics for assumed regions can be found in Figure 3 (including the L-moments of annual maximum (AM) flood data; L-coefficient of variation (LCV); and L-coefficient of skewness (LSK)) and in the Supplementary Section (see Table S1).
Data for the rainfall intensity of 6 h duration in 1 in 2 AEP (I62) and the mean annual evapotranspiration (MAE) were extracted from the Australian Bureau of Meteorology (BoM) website. The value of I62 ranges from 24.60 to 87.30 mm/h with a median of 37.30 mm/h (mean ± SD; 39.16 ± 10.07 mm/h). The maximum and the minimum values for the predictor MAR are 484.39 mm and 1953.23 mm (mean ± SD; 962.26 ± 314.47 mm), respectively. For MAE, these figures are 1543.30, 925.90, and 1117.96 ± 129.31 mm, respectively. Region-specific summary statistics of predictor variables are explained in Section 3.5.

2.3. Formation of Regions and Testing for Homogeneity

2.3.1. Homogeneous Region Identification

In the identification of homogeneous regions, a fixed region approach was applied in different ways. Firstly, based on the categorization of AREA the selected stations were divided into five groups. These are G-A1 (0 to 50 km2), G-A2 (51 to 108 km2), G-A3 (112 to 200 km2), G-A4 (>200 to 500 km2), and G-A5 (>500 km2). Figure 3 shows the spatial distribution of 201 stations by their AREA category. Secondly, considering the drainage division (DD), two groups were formed. Group 1 (G-D1) consists of the stations, starting the DD number with “2”, and group 2 (G-D2) starts with “4”. Again, based on these two groups, “States” were considered and 2 groups were formed for each. Under the DD “2”, the two groups are G-D3 (within NSW) and G-D4 (within VIC) and for DD “4”, these two groups are G-D5 (within NSW) and G-D6 (within VIC). In addition, based the north and south part of NSW within DD “2” another two regions were formed (G-D7 and G-D8). Similarly, based on the eastern and western part of VIC and DD “2”, the formed regions are G-D9 and G-D10. Adopting the same approach for DD “4”, four regions were formed, which are G-D11, G-D12, G-D13, and G-D14, respectively. Thirdly, considering the neighborhood basins, eight groups were formed: G-B1, G-B2, G-B3, G-B4, G-B5, G-B6, G-B7, and G-B8. Finally, the LCV and LSK space was split to form the following regions: the (a) single largest homogeneous region (G-LMA), (b) two largest homogeneous regions (G-LMB1 and G-LMB2), (c) three largest homogeneous regions (G-LMC1, G-LMC2, and G-LMC1), and (d) four largest homogeneous regions (G-LMD1, G-LMD2, G-LMD3, and G-LMD4).
Multivariate statistical techniques like principal component analysis (PCA) and cluster analysis (Ward’s method and K-means clustering) were applied to investigate the geographical coherence and agreement of the four largest L-moments-based groups (n = 174) with catchment characteristics data. The group names on the plane of principal components 1 and 2 are QR1, QR1, QR1, and QR1. As per cluster analysis by Ward’s method, four groups are formed (WMR1, WMR2, WMR3, and WMR4). As per K-means cluster analysis, four groups are formed (KMR1, KMR2, KMR3, KMR4). The spatial distribution of regions formed based on L-moments, DD, and basin neighborhood can be seen in and in the Supplementary Section (see Figures S1–S4), respectively. In addition, a detailed description of group formation including some basic statistics can be found in Table 1.

2.3.2. Testing Homogeneity

In RFFA, among the various heterogeneity tests, the most extensively used one is the Hosking and Wallis test [21,32], which was adopted in this study. The proposed test statistics by Hosking and Wallis [16] include (a) discordancy measure (Di) to detect unusual sites; (b) heterogeneity measure (Hi) to test the homogeneity; and (c) goodness-of-fit measure |Z|Dist to identify the best-fit distribution(s) for a proposed region. Based on dimensionless L-moment coefficients, the Hi -statistics (H1, H2, and H3) are estimated [16].
To screen out a site to be discordant, Hosking and Wallis [16] set a threshold value for discordancy measure (Di) of ≥3.00. Di detects the discordant sites through the sample L-moments, which differ significantly from the bulk of the sites. According to Hosking and Wallis [16], a region is said to be acceptably homogeneous if H < 1, either homogeneous or heterogeneous if 1 ≤ H < 2, and certainly heterogeneous if H ≥ 2. Based on the goodness-of-fit measure (Z), a distribution is said to be the best fit distribution if the |Z|Dist value is ≤1.64 for an assumed region.

2.3.3. Multivariate Statistical Analysis

To compare groups formed in the LCV-LSK space (four largest homogeneous regions, n = 174, see Section 2.3.1), we applied principal component analysis (PCA) and cluster analysis. Through this analysis, it was examined whether there is any association between the LCV-LSK space and the catchment and climate characteristics data space.
Principal Component Analysis (PCA)
A data reduction technique, PCA transforms an original set of variables to a new set of mutually uncorrelated variables arranged in decreasing order of importance. This new set of variables are called principal components (PCs). The first PC (PC1) is the linear combination of the original variables, which captures as much of the variation in the original data set as possible. The second PC (PC2) captures the maximum variability that is uncorrelated with PC1, and so on. The PC’s number is equal to the number of original input variables in the data set.
Cluster Analysis
The main goal of cluster analysis is to form mutually exclusive and exhaustive groups of observations, which ensure similarity of observations within a group (homogeneous) and dissimilarity between groups (heterogeneous). Cluster analysis refers to a wide range of techniques where almost all methods require a measure of similarity or dissimilarity between observations. In this study we adopted (i) Ward’s method [33], a hierarchical agglomeration method, and (ii) K-means clustering, a centroid-based method. Ward’s method is the minimum-variance method and provides reasonable groupings in most cases [34,35]. In merging two clusters from the previous generation, the sum of squares is minimized over all the partitions at every stage. The most used and the simplest centroid-based clustering method is K-means clustering, which is an unsupervised machine learning algorithm [36,37]. It splits a dataset comprising of n observations into a set of k groups (k-clusters). It minimizes squared Euclidean distance (within-cluster variances) but not regular Euclidean distance. It also classifies the objects into multiple clusters in such a way that within clusters the similarity among objects is as high as possible (i.e., high intra-class similarity) and between clusters it is as dissimilar as possible (i.e., low inter-class similarity). In K-means clustering, each cluster has a mean (centroid) for its points, and it represents the cluster. In hydrology, Ward’s method and K-means clustering are commonly used because of their consistent performances [38,39,40,41].

2.3.4. Prediction Model Development

In RFFA, the at-site flood frequency analysis (FFA) is the first step where an appropriate probability distribution is fitted to the AM flood data. In this study, log-Pearson Type 3 (LP3) distribution (Bayesian fitting) was used using FLIKE software version 1.0 [42], which has been incorporated in Australian Rainfall and Runoff (ARR) [42,43,44]. Previous Australian studies found that the LP3 distribution is the most appropriate distribution for at-site FFA for the majority of Australian catchments and hence it was adopted in this study [28,45]. Flood quantiles were estimated for six return periods (T)/annual exceedance probabilities (AEPs), which were used as dependent variables in prediction model development.
Among various regional flood estimation models, the QRT is widely used for its simplicity and better performances [46,47,48]. This QRT model was proposed by the United States Geological Survey (USGS) to develop prediction equations [49]. In this study, the ordinary least square (OLS) method was used to estimate the regression coefficients. It should be noted that the PRT is the alternative to the QRT. In the PRT, parameters of a selected probability distribution (such as LP3) are regionalized [28]. In this study, we adopted the QRT as previous studies showed that both the QRT and PRT performed similarly for southeast Australia [28].
Mathematically, the QRT can be expressed as:
QT = aBbCcDd
where QT is the flood quantile for T years return period; B, C, D, … refer to the catchment characteristics (predictors), and a, b, c, … are regression coefficients estimated by the OLS method.
Applying selected predictors and taking the log (base 10) transformation, Equation (1) can be simplified as:
l o g 10 ( Q T ) = b 0 + b 1 *   l o g 10 ( AREA ) + b 2 *   l o g 10 ( I 62 ) + b 3 *   l o g 10 ( MAR ) + b 4 *   l o g 10 ( SF ) + b 5 *   l o g 10 ( MAE ) + b 6 *   l o g 10 ( SDEN ) + b 7 *   l o g 10 ( S 1085 ) + b 8 *   l o g 10 ( FOREST )
where QT, is the flood quantile for AEP of 1 in T.
The IFM was first proposed by Dalrymple [15]. The IFM is well explained by Hosking and Wallis (1997) [50]. It is considered one of the most efficient [51] RFFA methods [46,52,53] based on a hypothesis that floods at the various sites of a region are identically distributed except for a site-specific scaling factor, or the index flood, which is generally a function of catchment physiographic characteristics. Generally, the mean or median of AMF data is used as a scaling factor [54]. The IFM has been applied by many researchers [34,47,55,56,57,58,59,60]. Bobee et al. [46] suggested that, despite there being some critiques of the IFM, a general procedure proposed by Dalrymple [15] for the delineation of homogeneous regions remains one of the most popular approaches to RFFA.
Mosaffaie [61], described the IFM as follows: Say data are available at N sites with site i having sample size ni and observed AMF data Qij (j = 1, …, ni). Qi (F), 0 < F < 1, is the quantile function of frequency distribution at site i. The key assumption of the IFM is that the sites form a homogeneous region, that is, the frequency distributions of the N sites are identical apart from a site-specific scaling factor, the index flood.
Q i ( F ) = µ i q ( F ) ,   i = 1 ,   ,   N or   Q i T = µ i q T
where Q T i is the flood quantile corresponding to a T-year return period at a given site i. The index flood is naturally estimated by μi = Q ¯ i, the sample mean of the AMF data at site i. Other location estimators such as the median or a trimmed mean could be used instead. μi is supposed to be the mean of the at-site frequency distribution, q(F) is the regional quantile of non-exceedance probability F, and q T is the regional quantile of the return period T [61].

2.3.5. Validation Approach and Evaluation Criteria

In RFFA, the leave-one-out (LOO) validation approach is commonly used to validate the developed prediction equations [62]. In this process, one catchment is left out to develop a prediction equation, and then this equation is tested on the left-out catchment, and the procedure is repeated with all the catchments of a given region. Adopting the LOO technique, the below seven evaluation criteria were used to evaluate the developed predictions for the assumed regions:
R e l a t i v e   e r r o r   ( R E ) = Q p r e d Q o b s Q o b s × 100
A b s o l u t e   m e d i a n   r e l a t i v e   e r r o r   ( A M R E ) = m e d i a n [ a b s Q p r e d Q o b s Q o b s ] × 100
M e a n   s q u a r e   e r r o r   ( M S E ) = m e a n Q p r e d Q o b s 2
B i a s   ( B I A S ) = m e a n Q p r e d Q o b s
R e l a t i v e   B i a s   ( R B I A S ) = m e a n Q p r e d Q o b s Q o b s × 100
R e l a t i v e   r o o t   m e a n   s q u a r e   e r r o r   ( R R M S E ) = m e a n Q p r e d Q o b s 2 m e a n [ Q o b s ]
R o o t   m e a n   s q u a r e   n o r m a l i z e d   e r r o r   ( R M S N E ) = [ m e a n Q p r e d Q o b s Q o b s 2 ]
The at-site flood quantiles (Qobs) were obtained by fitting the LP3 distribution to the AMF data (see Section 2.3.4) and the predicted flood quantiles (Qpred) were estimated applying the developed prediction equations (Equation (2)). R was used to perform regression analysis and Python 3.11.4 was used to prepare spatial maps.

3. Results

3.1. Discordancy and Homogeneity Assessment of the Formed Regions

Table 1 presents summary results of all the assumed regions based on different approaches (see Section 2.3.1). It refers to the notation of regions, number of sites in each region, the lowest and the highest Di values (only for the discordant sites), and H- and Z-statistics. Hosking and Wallis [16] assessment criteria for testing discordancy and homogeneity were used here. Placing 201 sites in a single region (G-ALL), 9 sites were detected as discordant for the region. In this case, the discordant values range from 3.26 to 6.43. In regions G-D2, G-D6, and G-D14, the Di statistic ranges from 3.05 to 7.18, 3.16 to 5.80, and 3.11 to 6.45, respectively. The removal of discordant sites from these regions does not reduce Di values significantly; hence, this study considered all the selected sites for further analysis. No discordant site was found in regions G-D7, G-D12, G-D13, G-B1, G-B4, G-B5, G-B7, and G-LMD2.
Except the regions formed based on the LCV-LSK space, all other regions are highly heterogenous (H1-statistics >> 1). The highest H1-value (30.13) was found in the region G-ALL followed by G-D1 (H1 = 22.41), G-D2 (H1 = 20.78), and G-D6 (H1 = 20.34). Among the heterogeneous regions, the lowest H1-value was found in the region G-D12 (H1 = 2.86) followed by G-D11 (H1 = 4.57). Among the heterogeneous regions, the region G-D12 selected all the distributions as acceptable with Z-values < 1.64; in contrast, there was no acceptable distribution for regions G-LMB2, G-LMC3, and G-LMD2 as Z-values > 1.64. With some exceptions, Pearson Type 3 (PE3) and Generalized Pareto (GPA) were found to be the best-fit distributions for the majority of the proposed regions.
Figure 4 compares the Hi-statistics among all the assumed regions. It clearly shows that all the regions are highly/definitely heterogeneous (far away from the threshold value of 1.00, indicated by the red line) except the regions formed based on the LCV-LSK space. The region G-A4 (n = 74) has the highest H1 value (19.62) compared to other regions. Among contiguous basin-based regions, the highest H1 statistic is 15.34. Figure 3 also shows that the Hi -statistics are higher in the regions based on drainage division, followed by AREA and basin. A general statement can be made from this figure: “the more the number of sites in a region the higher the Hi values are, and H1 > H2 > H3”.

3.2. Prediction Model Evaluation

3.2.1. Degree of Heterogeneity vs. Absolute Median Relative Error

Figure 5 compares the Hi values and absolute median relative error (AMRE) estimated from the developed QRT for each of the assumed regions. It also examines the association between the degree of heterogeneity and AMRE (%).
With the highest H1-value (30.13), the single region (G-ALL, n = 201) shows relatively lower AMRE values (below 46%). These values range from 36.98 (Q10) to 45.34 (Q100). In contrast, the region G-A1 (n = 24) exhibits the highest AMRE values (77.27% for Q100) with a moderate H1-value (11.17). Similarly, the regions G-D2 (n = 95), G-D4 (n = 56), G-B4 (n = 22), and G-B6 (n = 20) also show higher AMRE values. For Q100, these values are 74.70%, 73.99%, 65.66%, and 70.62% having the H1-statistics 20.78, 15.96, 9.32, and 13.29, respectively. The lowest 20.01% (Q5) and the second lowest 25.48% (Q10) AMRE values are found in the region G-B7 (n = 21, H1-value 5.53) and region G-B5 (n = 21, H1-value 4.98), respectively. Moreover, a significant number of AMRE values of quantile estimates are found below 30.00% in some regions with H1-value ranging from 0.10 to 15.34. These are: 26.98% (Q5) and 27.29% (Q10) in region G-D7, 29.50% (Q2) in region G-D10, 28.93% (Q10) in region G-D14, 27.35% (Q2) and 27.75% (Q5) in region G-B3, 26.76% (Q5), 26.51% (Q20), and 27.26% (Q100) in region G-B5, 27.49% (Q2) and 26.93% (Q10) in region G-B7, 27.98% (Q2) in region G-LMB2, 28.21% (Q2) in region G-LMD2, and 28.55% (Q10) in region G-LMD3.
The AMRE value of flood quantile estimates is roughly below 40.00% for the regions G-D6 (except Q2), G-D7, G-D10 (except Q100), G-D13 (except Q2), G-B7, G-B8, G-LMB2, and G-LMD4 (except Q100). Noteworthy, it is below 35.00% for the regions G-D14, G-B5 (except Q2), and G-LMD3 (except Q2). Over the quantiles, the median AMRE value for the fixed single region is 40.37% with a standard deviation (SD) of 3.19%. Among the regions under a categorical approach, say AREA, drainage division, basin, and L-moments space, the median values are 52.17% (SD = 10.68%), 39.61% (SD = 11.61%), 41.27% (SD = 12.77%), and 37.26% (SD = 6.12%), respectively. The overall median AMRE value of the quantile estimates among all the assumed regions is 40.09% (SD = 11.35%).
In RFFA, generally it is believed that the more the regional homogeneity is, the more accurate the flood quantiles estimate is. The assumed regions based on a single fixed region, AREA, drainage division, and basin are highly heterogeneous (H1-values range from 2.86 to 30.13), and regions based on the LCV-LSK space are acceptably homogeneous (H1-values range from −0.48 to 0.96). In these homogeneous regions, the AMRE values are lower compared to other regions, which range from roughly 28% to 55%. Irrespective of this homogeneity (heterogeneity) condition, the AMRE values do not decline (increase) over the quantiles, which does not support the usual notion of RFFA. Hence, little association was found between the degree of heterogeneity and the accuracy of the model estimates (AMRE) in southeast Australia.

3.2.2. Model Evaluation Adopting Evaluation Statistics

Table 2 shows the value of evaluation criteria estimated from the developed prediction equations using the QRT. The highest and the lowest AMRE values among all the assumed regions are 20.01% (Q5) in G-A1 and 77.27% (Q100) in G-D13, respectively. The estimated MSE values lie in a range of 206.09 (Q5) in G-B7 to 50031260.38 (Q100) in G-B2. Interestingly, these figures for RMSE values range from 14.36 (Q5 in G-B7) to 7073.28 (Q100 in G-B2). The maximum BIAS and RBIAS values for the developed models are 1375.82 (Q100 in G-B2) and 396.41 (Q2 in G-A1), respectively. The minimum values for these are −248.34 (Q100 in G-A5) and 8.72 (Q2 in G-B7), respectively. The lowest value of RRMSE (0.00) is found in region G-LMD4 (Q10) and G-D11 (both Q5 and Q50,) and the highest (2.52) one is in G-A1 for Q100. The RMSNE value varies for the developed models from 0.48 (Q5) in region G-LMD2 to 17.97 (Q2) in region G-A1. The value of regression coefficient of determination (R2), which quantifies the strength of the model, fluctuates from 0.45 (Q100 in G-B2) to 0.94 (Q10 in G-D11).
Figure 6 compares the RE (%) value estimated for flood quantiles Q5, Q20, and Q100 from the developed prediction equations adopting QRT (Figure S5 illustrates RE values for all the six quantiles). The first row in Figure 5 represents ten homogeneous regions based on LCV-LSK space and the second row refers to ten heterogeneous regions. The parallel sequential region consists of a nearly similar number of sites. Here the “0” line refers to an unbiased estimate (the best-fit) between the observed and predicted values. In the region G-LMD4 vs. G-B4, the RE (%) values for Q5 deviates from the best-fit line remarkably with a higher deviation in the heterogeneous region. Though there are some fluctuations in RE values of Q20 both in homogeneous and heterogeneous regions (G-LMB2, G-LMD4, G-D2, G-A5, G-B1, and G-B4), the maximum REs of homogeneous regions touch the best-fit line. Similarly, in the case of Q100 the RE values are skewedly distributed from the median value, say in regions G-A4, G-D3, and G-B4.

3.2.3. Comparison of Standardized Flood Frequency Curves Between the Homogeneous and Heterogeneous Regions

Figure 7 illustrates the comparison of standardized flood frequency curves (SFFCs) between homogeneous and heterogeneous regions. The four regions in the first column are the homogeneous regions formed based on the LCV-LSK space and parallelly shown (consists of nearly similar number of sites) in the second column are the regions which are heterogenous (formed based on drainage division and basin contiguity). The wider red color line shows the mean standardized value of flood quantile in each graph. The SFFCs between homogeneous and heterogeneous region (G-LMD1 vs. G-D6) are more likely to be similar for lower flood quantiles (Q2 to Q20). In the upper quantiles (Q50 to Q100), the variation is more prominent. The parallel regions G-LMD2 vs. G-D3, G-LMD3 vs. G-D5, and G-LMD4 vs. G-B4 show a similar pattern, but in the regions G-LMD3 and G-LMD4, the flood quantile estimates are more consistent. It can be said that SFFCs do not vary significantly in terms of homogeneous (or heterogeneous) conditions. In Figure 6, SFFCs for higher return periods (e.g., 100 years) show wider variation among the stations within a region for both the homogeneous and heterogeneous regions. This indicates that a higher degree of uncertainty is associated with the flood quantile estimates for the higher return periods. This is also evident in Figure 5, where the box width of the REs is larger for a 100-year return period compared to smaller return periods.

3.3. Comparison of Quantile Regression Technique (QRT) and Index Flood Method (IFM)

In this study, the IFM is used for 10 homogeneous regions (based on the LCV-LSK space) to compare the quantile estimates in terms of AMRE and the selected evaluation statistics.
Table 3 presents the evaluation statistics estimated from the developed models using the IFM for these 10 homogeneous regions. The AMRE (%) values vary from 32.45 for Q50 to 62.98 for Q20, which range from 27.98 for (Q2) to 54.34 for (Q100) adopting the QRT (see Table 2). In the IFM, the MSE and RMSE values fall within a range from 1326.22 (Q2) to 1,811,741.70 (Q100) and 36.42 (Q2) to 1346.01 (Q100), respectively. The maximum and minimum BIAS values are −767.95 (Q100) and −7.95 (Q2), respectively. These figures for RBIAS are −47.28 (Q5) and 12.20 (Q100). The RRMSE value ranges from 0.14 (Q2) to 0.61 (Q2, Q20, Q50, Q100) and RMSNE from 0.43 (Q5, Q10, Q20) to 1.33 (Q100). The R2-value does not change between the QRT and IFM because of the same predictors. Table 2 and Table 3 clearly show that model accuracy does not vary remarkably in terms of evaluation statistics by adopting either the QRT or IFM.
Figure 8 exhibits a clear comparison of the QRT and IFM to estimate AMRE (%) from the developed models for the same homogeneous regions. The dotted/broken lines refer to the IFM and unbroken lines refer to the QRT estimation of AMRE. Specifically, in G-LMA the AMRE values for all the quantiles from the IFM (except for Q50 and Q100) are higher than for the QRT. Similarly, this trend is much higher in both the regions G-LMC1 and G-LMD1. The AMRE value for Q20 is 62.98 (IFM) vs. 42.94 (QRT) in the region G-LMC1. In the region G-LMD1, this figure is IFM: QRT = 59.35:38.36. The AMRE values of almost all the quantiles for both the IFM and QRT are roughly below 40% in the regions G-LMB2, G-LMD2, G-LMD3, and G-LMD4. Apparently, Figure 7 indicates that the IFM provides more consistent estimates compared to the QRT, but the QRT ensures lower AMRE values of flood quantile estimates to some extent.
Figure 9 shows the boxplots of RE (%) values of quantile estimates adopting both the QRT and IFM for the four largest homogeneous regions (formed in the LCV-LSK space). The left column refers to the value obtained from the QRT and right column from the IFM for their respective regions. The median value of almost all the quantile estimates is close to the unbiased line (“0” line) obtained from the QRT. In contrast, it is far away from the “0” line for IFM estimates. For instance, in region G-LMD1 the “0” line is above the third quartile (Q3) values for all the quantiles (IFM). These results also support the findings as explained in relation to Figure 8.
Based on a paired Wilcoxon test comparing AMRE for homogeneous versus heterogeneous groupings, it was found that the AMRE values of these two groupings are not statistically significant (at 5% level).

3.4. Coherence of Group Formation Between Flood Data and Catchment Data Space

Principal component analysis (PCA) and cluster analysis were performed for the sites (comprising the four largest homogeneous regions based on L-moments (of AMF data) space, n = 174). Physical and geographical coherence of groupings between the L-moments space and catchment data space is examined here. Figure 10 shows the group formation adopting PCA, Ward’s method, and K-means clustering. The scatter plot between the PC1 and PC2 space, Ward’s cluster dendrogram, and K-means clustering are shown in Figure 10 under a, b, and c, respectively. In Figure 10a, the numerals (1, 2, 3, and 4) refer to L-moments space grouping number, like G-LMD1 = 1, G-LMD2 = 2, G-LMD3, = 3, and G-LMD4 = 4. This figure evidently shows that the member sites in group G-LMD1 are distributed throughout the PC1 and PC2 space. Similarly, the member sites in groups G-LMD2, G-LMD3, and G-LMD4 are also distributed all over the PC1-PC2 space. Likewise, in the cluster dendrogram (Figure 10b,c) member sites from the L-moments grouping are allocated over the group formed based on cluster analysis (both Ward’s method and K-means clustering). The agreement of group formation was further examined by preparing scatter plots for L-moments of member sites for respective groups obtained from PCA and cluster analysis. Figure 11 shows the allocation of the sites based on L-moments coefficient. Member sites of G-LMD1 (green color) based on the L-moments space were allocated in a scattered way among the group based on PCA. However, few of these members formed a pocket in cluster analysis. Similarly, member sites of other groups (G-LMD2, G-LMD3, and G-LMD4) show the agreement of group formation to some extent in cluster analysis, but PCA does not. Hence, no significant agreement of group formation was found among the flood data/L-moments space, PCA, and cluster analysis (Ward’s method and K-means clustering).
Table 4 summarizes the degree of agreement of group formation using the L-moments space and multiple regression as explained above. It clearly describes that only 16 (25%) out of 64 member sites are common in group 1 both in the L-moments space (G-LMD1) and PCA (QR1). In group G-LMD2 vs. QR2, it is 11 (21.57%). Unfortunately, no member sites are not common in G-LMD4 vs. QR4. Considering the same group number between the L-moments space and PCA and cluster analysis, the percentage of highest (47.22%, n = 17 out of 23) agreement is found in G-LMD3 vs. WMR3, followed by G-LMD3 vs. QR3 (38.89%, n = 14 out of 23). In other words, the agreement of group formation based on flood data/the L-moments space and either of the adopted cluster analysis is highest in G-LMD4 vs. KMR3 (82.61%, n = 19 out of 23) followed by G-LMD4 vs. WMR2 (78.26%, n = 18 out of 23). These findings obviously support the graphical presentation in Figure 10 and Figure 11.
Figure 12 illustrates the comparison of principal component scores (both PC1 and PC2) obtained from each group using the selected predictors (standardized). The upper row refers to the homogeneous regions and the lower one refers to heterogeneous regions. The boxplots of PC1 for the majority of the homogeneous regions (except G-LMC1, GLMC2, and G-LMD4) show that the median (second quartile) values deviate far away from the unbiased line (“0” line). For PC2 (except G-LMA, G-LMB1, G-LMC1, and GLMC2), the deviation is far away from the ”0” line. Moreover, in the heterogeneous regions this deviation is nearly the same for PC2 but a bit better for PC1. Thus, it can be said that the degree of homogeneity does not have any impact on principal component scores.

3.5. Physical and Geographical Interpretation in Terms of Degree of Homogeneity and Heterogeneity

Boxplots were created for the catchment characteristics (including LCV and LSK) to examine the physical and geographical coherence of the assumed regions (homogeneous vs. heterogeneous). In addition, spatial distributions are also presented, creating maps only for the sites comprising of the four largest homogeneous regions (L-moments space).

3.5.1. Physical Interpretation of Catchment Characteristics

Figure 13 shows the distribution of catchment characteristics and L-moments for the assumed regions (both homogeneous and heterogeneous). The first and third (left-right) columns refer to the boxplots of homogeneous regions, and the second and fourth columns refer to the boxplots for heterogenous regions with specific predictors. Among all the homogeneous regions, AREA value fluctuated roughly at a moderate variation, whereas in the heterogeneous regions this variation is high. For instance, in G-A5 the median value of AREA is away from the median values in homogeneous regions. Similar variation pattern exists between homogeneous and heterogeneous regions for I62, MAR, MAE, and SDEN. Surprisingly, roughly closer variation shows for SF and S1085 between the homogeneous and heterogeneous regions. No uniform pattern is observed for FOREST value within the two types of regions. Interestingly, the values of both the LCV and LSK are nearly close to its median value within the group, but the median values are different between groups. This happens as the regions are formed based on the LCV and LSK space. On the other hand, the median values of LCV and LSK are nearly closer among the heterogeneous regions, but variation of values exists within the regions.

3.5.2. Geographical Coherence of Assumed Homogeneous Regions

Figure 14 shows the spatial distribution (left side) and LCV-LSK space scatter plots (right-side) for the homogeneous regions formed based on the L-moments space. The panels (top to bottom) 1st, 2nd, 3rd, and 4th refer to the distribution for the single (Figure 14a,b), two (Figure 14c,d), three (Figure 14e,f), and four (Figure 14g,h) largest homogeneous regions based on the LCV-LSK space. The 1st panel clearly indicates that the sites in a single homogeneous region (green color) are distributed over the study area with some minor pockets with fewer number of sites. Similarly, the 2nd, 3rd, and 4th panes also reveal the scatteredness of sites throughout the study area having few pockets with a smaller number of sites. These spatial distribution of sites for each region obviously indicate that homogeneous regions (formed in the L-moments space) are not geographically contiguous, which is similar to the findings of the study by Bates et al. [22].

4. Discussion

In searching for homogeneous regions in southeast Australia, regions were formed based on AREA, drainage division, basin neighborhood, and the L-moments (LCV-LSK) space. Only in the L-moments space, homogeneous regions were found and the spatial distribution (see Figure 14) for the member sites of the four largest homogeneous regions (n = 174) obviously indicates that the sites of each region are not geographically contiguous. Bates et al. [22] suggested that grouping based on the L-moments space would not show any notable pattern or trend, just reflect the noise in data, but no physical significance [16]. Among these 174 sites, multivariate statistical techniques (PCA, Ward’s method, and K-means clustering) were applied in the delineation of homogeneous regions. Though the first five PCs explain almost 89% of data variation ([31] showed 85%), no significant agreement of group formation was found (see Figure 11 and Table 4).
Taylor et al. [29] suggested, that along with other RFFA techniques, the QRT and IFM are commonly used for regional flood estimation. Though the QRT relaxes the homogeneity assumptions, the IFM requires ‘acceptably regional homogeneity’ [16], which is hard to find in Australia [22]. Adopting the QRT, the estimated AMRE values range from 28% to 79% [29]. In Western Australia (Pilbara), a study adopted the IFM using ordinary least square and found that AMRE values ranged from 23% to 46% (Q2 to Q100) for streamflow data [63]. The present study found reasonably comparable result adopting the IFM and QRT using the member sites of four largest L-moments based regions. Over the quantiles (Q2 to Q100), the AMRE values range from 32.45% to 62.98% for the IFM (see Table 3) and 27.98% to 54.34% for the QRT (see Table 2). It indicates that overall, the QRT outperforms the IFM in southeast Australia. Using 202 sites from Australia, adopting network theory and commonly used models to estimate flood quantiles, the estimated RMSE values range from 23% (Q2) to 29% (Q100) [19]. In a study in Australia, Rahman et al. [27] adopted independent component analysis integrating with the QRT to estimate flood quantiles. The evaluation criteria AMRE and RMSNE were estimated, which ranged from 33.28% to 43.92% and 0.95 to 3.60 over the different return periods [27].
In a study in India, classical RFFA method and deep-learning (DL) approach (Random Forest—RF and eXtreme Gradient Boosting—XGBoost) were applied for annual maximum streamflow data in a data sparse region. The study found that the AMRE and RMSE values could be reduced by roughly 50% by using a DL approach rather than the classical RFFA method with a higher R2 (0.85 to 0.96) [6]. In a study in Iran, using 32 gauged sites to delineate homogeneous regions, Ward’s method was applied, which resulted in two regions. The Z-statistic concluded that GLO and GPA were the best fit distributions for the first and second region, respectively. Adopting the IFM and multiple regression among these two regions, the estimated RRMSE values fluctuated from 0.995 to 3.674 and 1.002 to 5.692, respectively [61]. A similar study in Iran estimated RRMSE for the IFM and multiple linear regression, which were found to vary from 0.23 to 0.36, and 0.29 to 0.53, respectively [34].
In Turkey (West Mediterranean River basin), a study conducted for model evaluation used 47 gauged sites integrating with IFM and L-moments parameters. The estimated values for BIAS ranged nearly from −0.2 to 0.6 and for RRMSE it ranged roughly from 0.15 to 0.60 [64]. Similarly, in another study in Turkey to evaluate the quantile estimates the RMSE, BIAS, and RBIAS were reported; over the quantiles, these reported values ranged from 0.037 to 0.247, 0.002 to 0.022, and 0.124 to 0.205, respectively. Based on multiple linear regression analysis using data from all over the world (excluding Australia), the reported RMSNE values ranged from 0.50 to 0.65 [65]. Tramblay et al. [66] developed a RFFA for Northern Africa and noted that Lasso regression provided mean absolute relative errors close to 50% [66]. Singh and Chavan (2025) applied region-of-influence approach to form regions in the USA and India and found that nearly 20% of the formed regions had an H1 measure less than 2 [67]. The RRMSE values of the proposed RFFA were 33.33 and 54.56 for the USA and India, respectively. Desai and Ouarda (2021) conducted RFFA in southern Quebec, Canada using a random forest regression and noted a BIAS value of −0.019 for a 100-year return period [68].
Australia is a country of highly variable hydrology, especially southeast Australia. With highly heterogeneous regions, the findings of this study are comparable with various domestic and international RFFA studies. For instance, in terms of AMRE values the findings from Taylor et al. [29], Haque et al. [63], and Rahman et al. [27] are comparable to those from this study. The estimated AMRE values are comparable with the findings of Australian Rainfall and Runoff (ARR), which ranged from 57.25 to 64.06% [28,69,70]. Though the RRMSE (ranges from 0.14 to 0.61 for the IFM and 0.00 to 0.35 for the QRT) and RMSNE (0.43 to 1.33 for the IFM and 0.48 to 2.37 for the QRT) values are almost similar ti other study findings, the MSE and RMSE values are higher in this study. This might be due to high variation in hydro-climatic parameters across different catchments. The heterogeneity in a region has little influence on the model accuracy in regional flood quantile estimation. Alexandre et al. [71] reported that application of a Bayesian hierarchical model based on Generalized Extreme Value (GEV) distribution could provide accurate flood quantile estimates (median relative error of 19%) despite a marked regional heterogeneity across North America [71]. This shows that there are RFFA methods that are not dependent on homogeneous regions.
This study used streamflow data from 201 stations, which have different record lengths (25 to 89 years). It should be noted that shorter streamflow record length introduces sampling variability affecting L-moment estimates, which has introduced uncertainty in the results. Also, AMF data of nearby stations are generally correlated, which has affected accuracy of the LOO validation technique.

5. Conclusions

This study investigates homogeneous region identification in the context of RFFA in southeast Australia. Regions were formed based on catchment size, drainage division, basin contiguity, and the L-moments (of AMF data) space. In addition, a comparison between the QRT and IFM was conducted to estimate flood quantiles for the four homogeneous regions formed in the L-moments space. Hosking and Wallis [16] test criteria were used to test the discordancy of sites and heterogeneity of the assumed regions. Considering all the assumed regions and a LOO validation technique, the estimated AMRE values range was 20.01–77.27%. However, for the four L-moments-based homogeneous regions, these values range from 27.98–54.34% for the QRT, and 32.45–62.98% for the IFM. The major findings of this study are noted below:
  • The Pearson Type III (PE3) and Generalized Pareto (GPA) distributions are the best-fit regional distributions in southeast Australia.
  • For the homogeneous regions (formed in the L-moments space), the variation in estimated model accuracy is smaller for the IFM than the QRT, but the QRT generally outperforms the IFM with lower AMRE values.
  • There is a weak association between the flood characteristics data space (L-moments of AMF data) and catchment characteristics data space in southeast Australia.
The limitations of this study include a smaller data set is used consisting of only 201 stations from a large area of southeast Australia. Also, AMF data length of some of the selected stations are small (25 years), which introduced a greater sampling variability in the estimates of L-moments of the AMF data and other flood statistics. Future studies should focus on the application of a parameter regression technique instead of the QRT and impacts of climate change on regional homogeneity.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/w17121799/s1, Figure S1. Spatial distribution showing stations by assumed regions based on Drainage Division (“2” and “4”), Figure S2. Spatial distribution showing stations by assumed regions based on drainage division, Figure S3. Spatial distribution showing stations by assumed regions based on drainage division, Figure S4. Spatial distribution showing stations by assumed regions based on basins, Figure S5. Comparison boxplots of RE (%) values adopting QRT for the assumed homogeneous (in LCV-LSK space) and heterogenous regions Table S1. Region specific summary of descriptive statistics of selected catchment characteristics.

Author Contributions

A.A.: Literature review, conceptualization, data analysis, writing the original draft; A.R.: conceptualization, review and editing, and supervision; R.S.M.H.R.: data compilation, review and editing of article; Z.K.: data analysis and editing; and H.M.: review and editing. All authors have read and agreed to the published version of the manuscript.

Funding

The authors declare that no funds, grants, or other financial support were received during the preparation of this manuscript.

Data Availability Statement

Data used in this study can be obtained from Australian government authorities.

Acknowledgments

The authors acknowledge the Australian Rainfall and Runoff Revision Project 5 Team, the Australian Bureau of Meteorology, WaterNSW, and the Government of Victoria for providing data used in this study.

Conflicts of Interest

The authors declare no conflicts of interest. The authors have no relevant financial or non-financial interests to disclose.

References

  1. Endendijk, T.; Botzen, W.; de Moel, H.; Aerts, J.; Slager, K.; Kok, M. Flood Vulnerability curves and household flood damage mitigation measures: An econometric analysis of survey data. In Proceedings of the EGU23, the 25th EGU General Assembly, Vienna, Austria, 23–28 April 2023. Copernicus Meetings. [Google Scholar]
  2. FitzGerald, G.; Du, W.; Jamal, A.; Clark, M.; Hou, X.Y. Flood fatalities in contemporary Australia (1997–2008). Emerg. Med. Australas. 2010, 22, 180–186. [Google Scholar] [CrossRef]
  3. Li, Z.; Gao, S.; Chen, M.; Gourley, J.J.; Hong, Y. Spatiotemporal characteristics of US floods: Current status and forecast under a future warmer climate. Earth’s Future 2022, 10, e2022EF002700. [Google Scholar] [CrossRef]
  4. Rhodes, C. Flood Damage Costs Beyond Buildings—A Lake Champlain Case Study; US Geological Survey: Reston, VA, USA, 2023. [Google Scholar]
  5. Quesada-Román, A.; Ballesteros-Cánovas, J.A.; Granados-Bolaños, S.; Birkel, C.; Stoffel, M. Improving regional flood risk assessment using flood frequency and dendrogeomorphic analyses in mountain catchments impacted by tropical cyclones. Geomorphology 2022, 396, 108000. [Google Scholar] [CrossRef]
  6. Mangukiya, N.K.; Sharma, A. Alternate pathway for regional flood frequency analysis in data-sparse region. J. Hydrol. 2024, 629, 130635. [Google Scholar] [CrossRef]
  7. Sharifi Garmdareh, E.; Vafakhah, M.; Eslamian, S.S. Regional flood frequency analysis using support vector regression in arid and semi-arid regions of Iran. Hydrol. Sci. J. 2018, 63, 426–440. [Google Scholar] [CrossRef]
  8. Srinivas, V.V.; Tripathi, S.; Rao, A.R.; Govindaraju, R.S. Regional flood frequency analysis by combining self-organizing feature map and fuzzy clustering. J. Hydrol. 2008, 348, 148–166. [Google Scholar] [CrossRef]
  9. Stedinger, J.R.; Griffis, V.W. Flood frequency analysis in the United States: Time to update. J. Hydrol. Eng. 2008, 13, 199–204. [Google Scholar] [CrossRef]
  10. Kumar, R.; Goel, N.K.; Chatterjee, C.; Nayak, P.C. Regional flood frequency analysis using soft computing techniques. Water Resour. Manag. 2015, 29, 1965–1978. [Google Scholar] [CrossRef]
  11. Mengistu, T.D.; Feyissa, T.A.; Chung, I.M.; Chang, S.W.; Yesuf, M.B.; Alemayehu, E. Regional Flood Frequency Analysis for Sustainable Water Resources Management of Genale–Dawa River Basin, Ethiopia. Water 2022, 14, 637. [Google Scholar] [CrossRef]
  12. Ouarda, T.B.; Girard, C.; Cavadias, G.S.; Bobée, B. Regional flood frequency estimation with canonical correlation analysis. J. Hydrol. 2001, 254, 157–173. [Google Scholar] [CrossRef]
  13. Ouarda, T.B.; Bâ, K.M.; Diaz-Delgado, C.; Cârsteanu, A.; Chokmani, K.; Gingras, H.; Quentin, E.; Trujillo, E.; Bobée, B. Intercomparison of regional flood frequency estimation methods at ungauged sites for a Mexican case study. J. Hydrol. 2008, 348, 40–58. [Google Scholar] [CrossRef]
  14. Guru, N. Implication of partial duration series on regional flood frequency analysis. Int. J. River Basin Manag. 2024, 22, 167–186. [Google Scholar] [CrossRef]
  15. Dalrymple, T. Flood-Frequency Analyses, Manual of Hydrology: Part 3; USGPO: Washington, DC, USA, 1960. [Google Scholar]
  16. Hosking, J.R.M.; Wallis, J.R. Some statistics useful in regional frequency analysis. Water Resour. Res. 1993, 29, 271–281. [Google Scholar] [CrossRef]
  17. Griffis, V.; Stedinger, J. The use of GLS regression in regional hydrologic analyses. J. Hydrol. 2007, 344, 82–95. [Google Scholar] [CrossRef]
  18. Mostofi Zadeh, S.; Burn, D.H. A Super Region Approach to Improve Pooled Flood Frequency Analysis. Can. Water Resour. J./Rev. Can. Des Ressour. Hydr. 2019, 44, 146–159. [Google Scholar] [CrossRef]
  19. Han, X.; Ouarda, T.B.M.J.; Rahman, A.; Haddad, K.; Mehrotra, R.; Sharma, A. A Network Approach for Delineating Homogeneous Regions in Regional Flood Frequency Analysis. Water Resour. Res. 2020, 56, e2019WR025910. [Google Scholar] [CrossRef]
  20. Burn, D.H. Delineation of groups for regional flood frequency analysis. J. Hydrol. 1988, 104, 345–361. [Google Scholar] [CrossRef]
  21. Viglione, A.; Laio, F.; Claps, P. A comparison of homogeneity tests for regional frequency analysis. Water Resour. Res. 2007, 43. [Google Scholar] [CrossRef]
  22. Bates, B.C.; Rahman, A.; Mein, R.G.; Weinmann, P.E. Climatic and physical factors that influence the homogeneity of regional floods in southeastern Australia. Water Resour. Res. 1998, 34, 3369–3381. [Google Scholar] [CrossRef]
  23. Sabrina, A.; Rahman, A. Development of a kriging-based regional flood frequency analysis technique for South-East Australia. Nat. Hazards 2022, 114, 2739–2765. [Google Scholar]
  24. Zalnezhad, A.; Rahman, A.; Vafakhah, M.; Samali, B.; Ahamed, F. Regional Flood Frequency Analysis Using the FCM-ANFIS Algorithm: A Case Study in South-Eastern Australia. Water 2022, 14, 1608. [Google Scholar] [CrossRef]
  25. Ahmed, A.; Khan, Z.; Rahman, A. Searching for homogeneous regions in regional flood frequency analysis for Southeast Australia. J. Hydrol. Reg. Stud. 2024, 53, 101782. [Google Scholar] [CrossRef]
  26. Rahman, A. Flood Estimation for Ungauged Catchments: A Regional Approach using Flood and Catchment Characteristics. Ph.D. Thesis, Department of Civil Engineering, Monash University, Victoria, Australia, 1997. [Google Scholar]
  27. Rahman, A.S.; Khan, Z.; Rahman, A. Application of independent component analysis in regional flood frequency analysis: Comparison between quantile regression and parameter regression techniques. J. Hydrol. 2020, 581, 124372. [Google Scholar] [CrossRef]
  28. Rahman, A.; Haddad, K.; Kuczera, G.; Weinmann, E. Regional Flood Methods. In Australian Rainfall and Runoff: A Guide to Flood Estimation; Chapter 3, Book 3; Ball J, B.M., Nathan, R., Weeks, W., Weinmann, E., Retallick, M., Testoni, I., Eds.; Geoscience Australia, Commonwealth of Australia, Engineers Australia: Symonston, Australia, 2019. [Google Scholar]
  29. Taylor, M.; Haddad, K.; Zaman, M.; Rahman, A. Regional flood modelling in Western Australia: Application of regression based methods using ordinary least squares. In Proceedings of the 19th International Congress on Modelling and Simulation—Sustaining Our Future: Understanding and Living with Uncertainty, MODSIM2011, Perth, WA, USA, 12–16 December 2011; pp. 3803–3810. [Google Scholar]
  30. Zalnezhad, A.; Rahman, A.; Nasiri, N.; Haddad, K.; Rahman, M.M.; Vafakhah, M.; Samali, B.; Ahamed, F. Artificial Intelligence-Based Regional Flood Frequency Analysis Methods: A Scoping Review. Water 2022, 14, 2677. [Google Scholar] [CrossRef]
  31. Rahman, A.S.; Rahman, A. Application of Principal Component Analysis and Cluster Analysis in Regional Flood Frequency Analysis: A Case Study in New South Wales, Australia. Water 2020, 12, 781. [Google Scholar] [CrossRef]
  32. Gado, T.A.; Nguyen, V.T.V. Comparison of homogenous region delineation approaches for regional flood frequency analysis at ungauged sites. J. Hydrol. Eng. 2016, 21. [Google Scholar] [CrossRef]
  33. Ward Jr, J.H. Hierarchical grouping to optimize an objective function. J. Am. Stat. Assoc. 1963, 58, 236–244. [Google Scholar] [CrossRef]
  34. Malekinezhad, H.; Nachtnebel, H.P.; Klik, A. Comparing the index-flood and multiple-regression methods using L-moments. Phys. Chem. Earth Parts A/B/C 2011, 36, 54–60. [Google Scholar] [CrossRef]
  35. Murtagh, F.; Legendre, P. Ward’s hierarchical agglomerative clustering method: Which algorithms implement Ward’s criterion? J. Classif. 2014, 31, 274–295. [Google Scholar] [CrossRef]
  36. Farsadnia, F.; Rostami Kamrood, M.; Moghaddam Nia, A.; Modarres, R.; Bray, M.T.; Han, D.; Sadatinejad, J. Identification of homogeneous regions for regionalization of watersheds by two-level self-organizing feature maps. J. Hydrol. 2014, 509, 387–397. [Google Scholar] [CrossRef]
  37. Sharghi, E.; Nourani, V.; Soleimani, S.; Sadikoglu, F. Application of different clustering approaches to hydroclimatological catchment regionalization in mountainous regions, a case study in Utah State. J. Mt. Sci. 2018, 15, 461–484. [Google Scholar] [CrossRef]
  38. Acreman, M.C.; Sinclair, C.D. Classification of drainage basins according to their physical characteristics; an application for flood frequency analysis in Scotland. J. Hydrol. 1986, 84, 365–380. [Google Scholar] [CrossRef]
  39. Ahani, A.; Mousavi Nadoushani, S.S.; Moridi, A. A ranking method for regionalization of watersheds. J. Hydrol. 2022, 609, 127740. [Google Scholar] [CrossRef]
  40. Zhang, R.; Chen, Y.; Zhang, X.; Ma, Q.; Ren, L. Mapping homogeneous regions for flash floods using machine learning: A case study in Jiangxi province, China. Int. J. Appl. Earth Obs. Geoinf. 2022, 108, 102717. [Google Scholar] [CrossRef]
  41. Baidya, S.; Singh, A.; Panda, S.N. Flood frequency analysis. Nat. Hazards 2020, 100, 1137–1158. [Google Scholar] [CrossRef]
  42. Kuczera, G. Comprehensive at-site flood frequency analysis using Monte Carlo Bayesian inference. Water Resour. Res. 1999, 35, 1551–1557. [Google Scholar] [CrossRef]
  43. Kuczera, G.; Franks, S.W. At-Site Flood Frequency Analysis. In Australian Rainfall and Runoff: A Guide to Flood Estimation; Chapter 3, Book 3; Ball J, B.M., Nathan, R., Weeks, W., Weinmann, E., Retallick, M., Testoni, I., Eds.; Geoscience Australia: Symonston, Australia, 2019. [Google Scholar]
  44. Franks, S.W.; Kuczera, G. Flood frequency analysis: Evidence and implications of secular climate variability, New South Wales. Water Resour. Res. 2002, 38, 20-21–20-27. [Google Scholar] [CrossRef]
  45. Rahman, A.S.; Rahman, A.; Zaman, M.A.; Haddad, K.; Ahsan, A.; Imteaz, M. A study on selection of probability distributions for at-site flood frequency analysis in Australia. Nat. Hazards 2013, 69, 1803–1813. [Google Scholar] [CrossRef]
  46. Bobée, B.; Mathier, L.; Perron, H.; Trudel, P.; Rasmussen, P.F.; Cavadias, G.; Bernier, J.; Nguyen, V.T.V.; Pandey, G.; Ashkar, F.; et al. Presentation and review of some methods for regional flood frequency analysis. J. Hydrol. 1996, 186, 63–84. [Google Scholar]
  47. Rahman, A. A quantile regression technique to estimate design floods for ungauged catchments in south-east Australia. Australas. J. Water Resour. 2005, 9, 81–89. [Google Scholar] [CrossRef]
  48. Rahman, A.; Haddad, K.; Zaman, M.; Kuczera, G.; Weinmann, P.E. Design Flood Estimation in Ungauged Catchments: A Comparison Between the Probabilistic Rational Method and Quantile Regression Technique for NSW. Australas. J. Water Resour. 2011, 14, 127–139. [Google Scholar] [CrossRef]
  49. Thomas, D.; Benson, M.A. Generalization of Streamflow Characteristics from Drainage-Basin Characteristics; US Government Printing Office: Washington, DC, USA, 1970. [Google Scholar]
  50. Hosking, J.R.M.; Wallis, J.R. Regional Frequency Analysis: An Approach Based on L-Moments; Cambridge University Press: New York, NY, USA, 1997. [Google Scholar]
  51. Ouarda, T.B.M.J. Handbook of Applied Hydrology, Second Edition, Chapter 77, Regional Flood Frequency Modeling; Regional Flood Frequency Modeling, ed. V.P.S. (Editor-in-Chief); McGraw-Hill Education: New York, NY, USA, 2017. [Google Scholar]
  52. Cunnane, C. Methods and merits of regional flood frequency analysis. J. Hydrol. 1988, 100, 269–290. [Google Scholar] [CrossRef]
  53. Potter, K.W.; Lettenmaier, D.P. A comparison of regional flood frequency estimation methods using a resampling method. Water Resour. Res. 1990, 26, 415–424. [Google Scholar] [CrossRef]
  54. GREH, G.D.R.E.H.S. Inter-comparison of regional flood frequency procedures for Canadian rivers. J. Hydrol. 1996, 186, 85–103. [Google Scholar]
  55. Aissia, B.; Chebana, F.; Ouarda, T.; Bruneau, P.; Barbet, M.; Équipement, H.-Q. Bivariate index flood model for a northern case study. Hydrol. Sci. J. 2015, 60, 247–268. [Google Scholar] [CrossRef]
  56. Chebana, F.; Ouarda, T.B. Index flood-based multivariate regional frequency analysis. Water Resour. Res. 2009, 45. [Google Scholar] [CrossRef]
  57. Formetta, G.; Over, T.; Stewart, E. Assessment of Peak Flow Scaling and Its Effect on Flood Quantile Estimation in the United Kingdom. Water Resour. Res. 2021, 57, e2020WR028076. [Google Scholar] [CrossRef]
  58. Kader, F.; Derbas, A.; Haddad, K.; Rahman, A. Regional flood estimation for NSW: Comparison of quantile regression and parameter regression techniques. In Proceedings of the 21st International Congress on Modelling and Simulation: Partnering with Industry and the Community for Innovation and Impact through Modelling, MODSIM 2015—Held jointly with the 23rd National Conference of the Australian Society for Operations Research and the DSTO Led Defence Operations Research Symposium, DORS 2015, Queensland, Australia, 29 November–4 December 2015; Modelling and Simulation Society of Australia and New Zealand Inc. (MSSANZ): Perth, WA, Australia, 2015; pp. 2179–2185. [Google Scholar]
  59. Requena, A.I.; Chebana, F.; Mediero, L. A complete procedure for multivariate index-flood model application. J. Hydrol. 2016, 535, 559–580. [Google Scholar] [CrossRef]
  60. Šimková, T. L-moment homogeneity test in trivariate regional frequency analysis of extreme precipitation events. Meteorol. Appl. 2018, 25, 11–22. [Google Scholar] [CrossRef]
  61. Mosaffaie, J. Comparison of two methods of regional flood frequency analysis by using L-moments. Water Resour. 2015, 42, 313–321. [Google Scholar] [CrossRef]
  62. Haddad, K.; Rahman, A.; A Zaman, M.; Shrestha, S. Applicability of Monte Carlo cross validation technique for model development and validation using generalised least squares regression. J. Hydrol. 2013, 482, 119–128. [Google Scholar] [CrossRef]
  63. Haque, M.M.; Rahman, A.; Haddad, K.; Kuczera, G.; Weeks, W. Development of a regional flood frequency estimation model for Pilbara, Australia. In Proceedings of the 21st International Congress on Modelling and Simulation: Partnering with Industry and the Community for Innovation and Impact through Modelling, MODSIM 2015—Held jointly with the 23rd National Conference of the Australian Society for Operations Research and the DSTO led Defence Operations Research Symposium, DORS 2015, Queensland, Australia, 29 November–4 December 2015; Modelling and Simulation Society of Australia and New Zealand Inc. (MSSANZ): Perth, WA, Australia, 2015; pp. 2172–2178. [Google Scholar]
  64. Saf, B. Regional Flood Frequency Analysis Using L-Moments for the West Mediterranean Region of Turkey. Water Resour. Manag. 2008, 23, 531–551. [Google Scholar] [CrossRef]
  65. Rosbjerg, D.; Bloschl, G.; Burn, D.; Castellarin, A.; Croke, B.; Di Baldassarre, G.; Iacobellis, V.; Kjeldsen, T.R.; Kuczera, G.; Merz, R. Prediction of floods in ungauged basins. In Runoff Prediction in Ungauged Basins: Synthesis Across Processes, Places and Scales; Cambridge University Press: Cambridge, UK, 2013; pp. 189–225. [Google Scholar]
  66. Tramblay, Y.; Khedimallah, A.; Sadaoui, M.; Benaabidate, L.; Boulmaiz, T.; Boutaghane, H.; Dakhlaoui, H.; Hanich, L.; Ludwig, W.; Meddi, M. Regional flood frequency analysis in North Africa. J. Hydrol. 2024, 630, 130678. [Google Scholar] [CrossRef]
  67. Singh, A.K.; Chavan, S.R. An approach to regional flood frequency analysis for general peak discharge distribution datasets. J. Hydrol. 2025, 650, 132493. [Google Scholar] [CrossRef]
  68. Desai, S.; Ouarda, T.B. Regional hydrological frequency analysis at ungauged sites with random forest regression. J. Hydrol. 2021, 594, 125861. [Google Scholar] [CrossRef]
  69. Rahman, A.; Haddad, K.; Rahman, A.S.; Haque, M. Australian Rainfall and Runoff Revision Project 5: Regional Flood Methods: Database Used to Develop ARR RFFE Technique: Stage 3 Report; Engineers Australia: Barton, Australia, 2015. [Google Scholar]
  70. Haddad, K.; Rahman, A. Regional flood frequency analysis in eastern Australia: Bayesian GLS regression-based methods within fixed region and ROI framework—Quantile Regression vs. Parameter Regression Technique. J. Hydrol. 2012, 430–431, 142–161. [Google Scholar] [CrossRef]
  71. Alexandre, D.A.; Chaudhuri, C.; Gill-Fortin, J. Continental Scale Regional Flood Frequency Analysis: Combining Enhanced Datasets and a Bayesian Framework. Hydrology 2024, 11, 119. [Google Scholar] [CrossRef]
Figure 1. Flow chart illustrating the adopted methodology.
Figure 1. Flow chart illustrating the adopted methodology.
Water 17 01799 g001
Figure 2. Location of selected stream gauging stations in southeast Australia (n = 201). Each red dot refers to a stream gauging station.
Figure 2. Location of selected stream gauging stations in southeast Australia (n = 201). Each red dot refers to a stream gauging station.
Water 17 01799 g002
Figure 3. Spatial distribution of selected stations by category based on AREA.
Figure 3. Spatial distribution of selected stations by category based on AREA.
Water 17 01799 g003
Figure 4. Comparison of Hi-statistics (median) for the assumed regions in southeast Australia.
Figure 4. Comparison of Hi-statistics (median) for the assumed regions in southeast Australia.
Water 17 01799 g004
Figure 5. Comparison of absolute median relative error with H1-statistics among the assumed regions in southeast Australia.
Figure 5. Comparison of absolute median relative error with H1-statistics among the assumed regions in southeast Australia.
Water 17 01799 g005
Figure 6. Comparison of boxplots for RE (%) values between the assumed homogeneous and heterogeneous regions (QRT) by quantiles.
Figure 6. Comparison of boxplots for RE (%) values between the assumed homogeneous and heterogeneous regions (QRT) by quantiles.
Water 17 01799 g006
Figure 7. Comparison of standardized flood frequency curves for six quantiles between the four largest homogeneous regions (vertical left column) based on the LCV-LSK space and heterogeneous regions (vertical right column). Each line refers to a standardized flood frequency curve for a station.
Figure 7. Comparison of standardized flood frequency curves for six quantiles between the four largest homogeneous regions (vertical left column) based on the LCV-LSK space and heterogeneous regions (vertical right column). Each line refers to a standardized flood frequency curve for a station.
Water 17 01799 g007
Figure 8. Comparison of estimated absolute median RE (%) values from the developed model adopted by the QRT and IFM.
Figure 8. Comparison of estimated absolute median RE (%) values from the developed model adopted by the QRT and IFM.
Water 17 01799 g008
Figure 9. Comparison of RE (%) values estimated from the developed models adopting the QRT and IFM (homogeneous regions).
Figure 9. Comparison of RE (%) values estimated from the developed models adopting the QRT and IFM (homogeneous regions).
Water 17 01799 g009
Figure 10. Comparison of group formation adopting (a) PCA, a scatter plot of the PC1 and PC2 space, (b) Ward’s method, and (c) K-means clustering.
Figure 10. Comparison of group formation adopting (a) PCA, a scatter plot of the PC1 and PC2 space, (b) Ward’s method, and (c) K-means clustering.
Water 17 01799 g010
Figure 11. Comparison of region formation (scatter plots of L-moments coefficient for respective group member sites) adopting (a) LCV-LSK space, (b) PCA, (c) Ward’s method, and (d) K-means clustering. Each colour represents a group of catchments.
Figure 11. Comparison of region formation (scatter plots of L-moments coefficient for respective group member sites) adopting (a) LCV-LSK space, (b) PCA, (c) Ward’s method, and (d) K-means clustering. Each colour represents a group of catchments.
Water 17 01799 g011
Figure 12. Comparison of PC1 and PC2 between (a) L-moments-based homogeneous regions and (b) Heterogeneous regions with nearly same number of sites in respective groups.
Figure 12. Comparison of PC1 and PC2 between (a) L-moments-based homogeneous regions and (b) Heterogeneous regions with nearly same number of sites in respective groups.
Water 17 01799 g012
Figure 13. Comparison boxplots of catchment characteristics values between the assumed homogeneous and heterogeneous regions.
Figure 13. Comparison boxplots of catchment characteristics values between the assumed homogeneous and heterogeneous regions.
Water 17 01799 g013
Figure 14. Comparison of region formation between spatial (left-side) and LCV-LSK space (right-side); (a,b) single largest homogeneous region, (c,d) two largest homogeneous regions, (e,f) three largest homogeneous regions, (g,h) four largest homogeneous regions. Larger yellow color circle indicates the coordinates with respect to the mean values of LCV and LSK for each region.
Figure 14. Comparison of region formation between spatial (left-side) and LCV-LSK space (right-side); (a,b) single largest homogeneous region, (c,d) two largest homogeneous regions, (e,f) three largest homogeneous regions, (g,h) four largest homogeneous regions. Larger yellow color circle indicates the coordinates with respect to the mean values of LCV and LSK for each region.
Water 17 01799 g014
Table 1. Summary of region formation adopting fixed region approach with regions notation, * Di-values and Hi-statistics in southeast Australia.
Table 1. Summary of region formation adopting fixed region approach with regions notation, * Di-values and Hi-statistics in southeast Australia.
Regionalization ApproachesDescriptionRegion NotationSite No. (n)Site Covered (%)Di-Values (≥3.00)Hi-StatisticsZ-Statistics
LowestHighestH1H2H3GLOGEVGNOPE3GPA
All stationsPlacing 201 stations in a regionG-ALL2011003.266.4330.1320.5811.5117.1812.987.59−1.770.02
Based on AREA≤50 km2G-A12412 3.7211.176.512.536.725.433.510.191.30
51–108 km2G-A220103.043.4911.488.124.135.103.882.22−0.660.06
112–200 km2G-A33417 3.3411.566.473.727.885.953.800.070.23
201–500 km2G-A474373.304.6319.6214.368.429.156.603.44−2.05−1.19
>500 km2G-A54924 4.9713.008.765.788.896.844.07−0.740.44
Based on drainage divisionDrainage division “2”G-D1106533.345.3622.4116.039.2514.5011.197.160.171.15
Drainage division “4”G-D295473.057.1820.7813.237.139.366.823.37−2.61−1.13
Drainage division “2” within NSWG-D35025 4.6413.5510.566.7610.038.055.140.121.69
Drainage division “2” within VICG-D456283.414.1315.9611.746.7110.037.434.770.16−0.15
Drainage division “4” within NSWG-D538193.064.105.924.142.073.832.850.57−3.36−0.83
Drainage division “4” within VICG-D657283.165.8020.3411.415.229.837.144.540.01−0.58
Drainage division “2” within northern NSWG-D72613--8.857.465.116.915.233.21−0.300.13
Drainage division “2” within southern NSWG-D82412 4.478.685.773.818.226.944.620.622.56
Drainage division “2” within eastern VICG-D93216 3.6013.259.575.107.435.573.49−0.140.05
Drainage division “2” within western VICG-D102412 3.0011.677.494.117.005.113.470.61−0.16
Drainage division “4” within northern NSWG-D11189 3.334.573.382.363.983.331.75−0.970.86
Drainage division “4” within southern NSWG-D122010--2.861.880.501.470.80−0.69−3.25−1.66
Drainage division “4” within eastern VICG-D132512--7.093.661.594.442.350.95−1.51−3.21
Drainage division “4” within western VICG-D1432163.116.4512.997.764.989.637.845.561.602.34
Based on basinBasin (‘201’, ‘203’, ‘204’, ’206’, ‘207’, ’208’, ’209’, ‘210’)G-B12914--10.528.795.797.135.393.32−0.270.15
Basin (‘211’, ‘212’, ‘215’, ‘218’, ‘219’, ‘220’, ‘221’)G-B22110 3.204.724.273.409.307.905.822.233.40
Basin (‘222’, ‘223’, ‘224’, ‘225’, ‘226’, ‘227’)G-B33417 3.1415.3410.945.355.734.042.01−1.53−1.07
Basin (‘229’, ‘230’, ‘231’, ‘232’, ‘233’, ‘234’, ‘235’, ‘236’, ‘237’, ‘238’)G-B42211--9.325.513.056.975.343.660.750.59
Basin (‘401’, ‘402’, ‘403’, ‘404’)G-B52110--4.983.542.724.863.041.62−0.85−1.95
Basin (‘405’)G-B62010 3.7913.296.581.844.782.871.52−0.84−2.26
Basin (‘406’, ‘407’, ‘410’)G-B72110--5.533.642.746.074.913.01−0.281.05
Basin (‘411’, ‘412’, ‘415’, ‘416’, ‘418’, ‘419’, ‘421’)G-B83316 3.395.484.392.704.623.841.71−1.950.74
Based on LCV and LSK spaceSingle largest homogeneous regionG-LMA8844 5.350.960.500.8613.9511.907.921.064.72
Two largest homogeneous regionsG-LMB171653.354.790.83−2.02−3.418.087.023.64−2.142.52
G-LMB260 3.020.78−1.56−1.3316.8113.3010.355.203.55
Three largest homogeneous regionsG-LMC15073 4.910.36−2.41−3.765.164.591.70−3.161.59
G-LMC2673.253.360.91−1.20−1.813.661.990.13−3.09−2.98
G-LMC330 3.23−0.48−0.84−0.1116.7613.9910.424.235.45
Four largest homogeneous regionsG-LMD164873.074.800.96−1.52−2.866.986.192.83−2.852.36
G-LMD251--0.10−1.37−2.0219.6216.3013.588.827.11
G-LMD336 3.880.79−2.16−3.014.412.830.64−3.15−2.16
G-LMD423 3.080.27−1.66−2.438.845.284.502.85−2.93
Notes: * Di-values refer the values only for the discordant stations, GLO = Generalized Logistic, GEV = Generalized Extreme Value, GNO = Generalized Normal, PE3 = Pearson Type 3, GPA = Generalized Pareto.
Table 2. Summary of evaluation statistics for the assumed regions (Q2 to Q100, QRT).
Table 2. Summary of evaluation statistics for the assumed regions (Q2 to Q100, QRT).
Evaluation CriteriaQTAllBased on AREABased on BASINBased on L-Moments (LCV vs. LSK)
G-ALLG-A1G-A2G-A3G-A4G-A5G-B1G-B2G-B3G-B4G-B5G-B6G-B7G-B8G-LMAG-LMB1G-LMB2G-LMC1G-LMC2G-LMC3G-LMD1G-LMD2G-LMD3G-LMD4
AMREQ239.4963.5751.2946.0036.2739.6130.4658.8431.9655.5548.0243.0930.2751.8439.4938.5527.9833.9234.2330.4135.3128.2138.8734.32
Q537.8052.9651.2941.2536.9349.7032.8452.5931.5256.2854.7237.1126.9835.8235.2232.9032.9936.7641.2234.2334.3031.6832.9430.09
Q1036.9857.0949.5144.2939.2252.2448.6957.6936.7366.6157.3035.2627.2931.3434.2533.5035.9538.1042.4037.7836.5934.4128.5532.65
Q2041.2466.4550.5054.1643.0854.7151.1961.9446.1166.3851.0638.5235.2937.6339.8940.8137.2342.9444.8237.2838.3638.6233.9136.18
Q5043.0475.2564.0659.8949.9254.1753.9766.6050.8171.3143.6936.5438.3839.2946.7747.8536.1146.8643.5341.1144.4939.9934.3436.78
Q10045.3477.2767.2166.1052.0960.8759.8474.7054.1273.9937.3734.1937.0241.8151.9354.3438.8151.7742.0146.7351.5343.6030.9942.59
MSEQ22514.73401.481465.381823.4728776770538011,83914933147621164820617171712820232673629021736696236530953592
Q513,166.854380.665373.6019,734.92141,9236,26325,387121,274720231255685212,7971223814115,615878013,864778118,16511,708684013,76017,21310,520
Q1036,474.5639,183.6514,464.8687,771.3835,444103,51073,838485,87518,77110,41778,07431,879283627,30251,64733,84338,81329,58947,92833,54225,24446,21040,80818,951
Q2094,459.31222,126.9538,891.46306,740.34801,88278,2312043131,856,37044,67626,35580,39458,223608089,060147,390108,65098,99496,021108,08388,89480,540143,45781,72931,368
Q50309,516.461,429,884.04132,519.971,247,388.99216,832952,076726,15211,728,100128,85469,60586,69499,65616,226377,765511,950426,355308,162392,278271,005297,021327,252546,552175,79956,678
Q100722,328.064,760,233.15309,416.083,155,874.16440,8542,273,7871,787,50450,031,260274,128128,601106,835134,02133,2521,026,1711,210,5741,091,092676,9441,044,854501,192691,407875,3361,340,578291,26685,604
RMSEQ250.1520.0438.2842.7053.6482.2873.35108.8138.6417.7387.3040.6014.3641.4441.3828.6348.2327.1253.8741.6726.3848.6355.6459.94
Q5114.7566.1973.30140.48119.13190.43159.33348.2484.8655.90238.44113.1234.9890.23124.9693.70117.7488.21134.78108.2082.71117.30131.20102.57
Q10190.98197.95120.27296.26188.27321.73271.73697.05137.01102.06279.42178.5553.26165.23227.26183.97197.01172.01218.93183.14158.88214.97202.01137.66
Q20307.34471.30197.21553.84283.17527.48452.011362.49211.37162.34283.54241.2977.98298.43383.91329.62314.63309.87328.76298.15283.80378.76285.88177.11
Q50556.341195.78364.031116.87465.65975.74852.153424.63358.96263.83294.44315.68127.38614.63715.51652.96555.12626.32520.58545.00572.06739.29419.28238.07
Q100849.902181.80556.251776.48663.971507.911336.987073.28523.57358.61326.86366.09182.351013.001100.261044.55822.771022.18707.95831.51935.591157.83539.69292.58
BIASQ2−9.531.95−4.49−3.11−10.69−14.87−1.19−4.77−4.000.6913.606.91−0.51−3.90−6.60−3.33−9.12−3.374.67−5.71−3.17−7.32−3.602.58
Q5−20.0511.93−6.621.92−22.53−33.48−3.936.31−9.055.1438.4717.62−1.60−4.06−20.98−10.89−20.82−9.7215.63−16.75−10.10−17.69−6.261.24
Q10−33.2535.29−9.0712.21−35.30−54.46−8.9946.25−16.0610.9837.5427.87−2.72−5.98−39.96−22.48−32.62−18.6527.79−31.10−20.13−30.58−8.42−0.71
Q20−54.9584.52−14.3129.21−53.97−86.77−18.68157.56−27.5618.2723.6438.88−4.17−12.96−70.42−42.92−48.03−34.6844.17−53.78−37.49−50.94−10.85−3.56
Q50−105.44217.69−29.5363.48−93.06−158.96−43.39579.62−53.7628.89−3.7653.31−6.65−37.93−138.35−93.15−76.03−75.9973.34−103.19−79.87−95.21−14.91−8.77
Q100−169.16401.23−51.4598.56−138.97−248.34−76.411375.82−86.3236.63−26.6863.62−8.85−77.51−220.83−159.53−104.73−133.30102.11−161.54−135.94−147.16−19.10−13.95
RBIASQ222.24396.4140.6026.6021.7520.1115.8161.2811.8336.0683.0448.348.7236.7818.8319.1813.1328.1541.8916.3225.3210.0919.1226.48
Q521.41143.5225.5922.0721.5721.4914.0288.5116.8836.50126.9958.1814.3421.8820.3420.1912.3829.6547.2114.2528.678.8918.8623.20
Q1023.55103.0326.3226.3824.6024.5118.68114.9921.7443.45110.3164.7416.9020.9421.8122.7812.6730.2650.6613.8530.349.7218.0522.70
Q2026.9196.2730.9033.2628.9228.4525.54150.4127.0551.6582.7269.9219.3123.1523.8026.0713.3131.4154.2714.1031.9811.2217.0023.03
Q5032.76110.8841.8145.2636.1134.9437.05215.2734.7463.4852.1075.6722.7628.8727.3131.4514.7734.2759.1415.4434.6914.1515.6224.44
Q10038.13133.4853.7556.2942.5140.8047.40282.4641.1173.1537.1879.7725.7334.6630.6636.3416.3537.6063.0417.2237.4417.0314.7326.11
RRMSEQ20.150.150.100.070.160.140.010.050.080.030.280.160.020.060.110.070.110.080.100.080.080.080.070.02
Q50.130.390.060.020.140.130.010.020.080.100.370.200.020.020.120.080.110.070.170.090.070.080.060.01
Q100.130.710.050.070.140.130.020.090.090.140.250.230.020.020.130.090.110.070.220.100.080.090.050.00
Q200.151.150.060.110.150.130.030.210.100.160.120.250.020.030.150.100.120.080.260.120.090.100.050.01
Q500.181.870.070.160.170.150.040.450.130.180.010.270.030.040.180.130.130.090.310.150.100.120.050.02
Q1000.212.520.090.180.190.170.050.760.160.180.080.270.030.060.200.150.140.110.350.170.120.140.050.02
RMSNEQ20.9317.971.521.000.860.800.751.990.570.973.111.530.521.080.830.760.591.041.550.750.950.550.841.05
Q50.955.470.990.810.880.930.702.740.721.095.151.930.760.820.860.810.541.151.730.651.140.480.820.94
Q101.033.191.000.931.011.030.803.500.831.274.442.100.880.840.900.900.551.111.860.611.170.500.800.91
Q201.132.601.131.131.181.140.954.490.931.463.232.220.970.950.951.000.561.082.010.601.180.530.770.91
Q501.293.051.441.481.441.301.226.351.071.721.892.351.071.151.051.180.591.102.210.611.180.600.740.94
Q1001.423.901.781.801.661.431.498.381.171.931.292.441.141.341.141.330.631.162.370.641.200.670.720.98
R2Q20.720.730.810.710.510.590.930.550.860.790.820.750.930.730.700.700.820.750.800.810.740.890.850.92
Q50.730.720.860.690.530.570.930.520.830.800.850.670.880.840.690.690.830.720.780.820.710.880.850.93
Q100.720.690.870.670.540.570.900.500.790.780.870.630.850.870.690.670.830.710.780.820.690.880.860.93
Q200.700.660.870.660.540.580.870.480.750.750.880.600.810.870.690.660.830.690.770.820.680.870.860.93
Q500.680.620.860.640.540.570.810.460.700.720.890.570.780.850.690.640.820.670.760.810.660.860.870.93
Q00.660.590.840.630.530.570.760.450.670.700.890.540.760.840.680.630.810.650.760.800.650.840.880.93
Table 3. Summary of evaluation statistics for the assumed regions (Q2 to Q100, IFM).
Table 3. Summary of evaluation statistics for the assumed regions (Q2 to Q100, IFM).
Evaluation CriteriaQTBased on L-Moments (LCV vs. LSK)
G-LMAG-LMB1G-LMB2G-LMC1G-LMC2G-LMC3G-LMD1G-LMD2G-LMD3G-LMD4
AMREQ244.3451.5540.0955.1139.7043.0453.8739.1238.0737.81
Q545.1652.1638.0858.0247.3643.9857.9136.4235.8834.14
Q1047.6154.0734.0858.5747.0142.6257.5038.0736.7434.21
Q2046.9454.5035.3062.9845.0538.5359.3538.2738.3835.07
Q5046.1549.3432.4559.7845.1440.7657.5241.0038.4438.42
Q10049.0852.6836.2258.5345.6046.4056.2642.9036.1741.25
MSEQ22856135240171457192330551326440827663866
Q52553514,54822,13415,881693722,21214,18725,24013,69114,161
Q1084,98753,58658,87158,80513,52368,96752,29078,82931,15929,867
Q20238,889162,588138,274180,42623,432186,568159,991219,19661,01157,210
Q50796,207589,607378,756670,02943,527599,810591,596725,082129,700122,138
Q1001,811,7421,424,691757,1861,656,37965,9251,327,1691,455,8261,623,676215,182204,411
RMSEQ253.4436.7763.3838.1743.8555.2736.4266.3952.6062.18
Q5159.80120.61148.77126.0283.29149.04119.11158.87117.01119.00
Q10291.52231.49242.63242.50116.29262.62228.67280.77176.52172.82
Q20488.76403.22371.85424.77153.07431.93399.99468.18247.00239.19
Q50892.30767.86615.43818.55208.63774.47769.15851.52360.14349.48
Q1001346.011193.60870.161287.00256.761152.031206.581274.24463.88452.12
BIASQ2−29.58−24.15−31.14−25.54−7.95−29.66−24.32−34.24−17.32−16.46
Q5−89.83−77.62−74.30−86.00−14.76−80.45−80.18−85.58−39.54−34.90
Q10−160.31−143.76−118.40−163.06−20.41−136.24−150.26−144.48−60.40−54.61
Q20−259.15−240.23−174.50−277.86−26.71−211.12−253.56−225.31−85.37−79.81
Q50−447.24−430.99−270.92−509.77−36.15−347.52−460.10−374.67−125.64−122.18
Q100−646.25−639.47−364.04−767.95−44.17−486.44−688.11−527.84−162.38−161.59
RBIASQ2−33.05−40.22−21.91−45.236.97−28.03−43.58−26.47−16.556.58
Q5−33.50−40.80−22.60−47.287.09−29.74−45.07−27.91−17.384.35
Q10−32.83−39.31−22.39−46.497.80−29.90−44.11−27.44−17.673.98
Q20−30.99−37.05−21.59−44.778.87−28.90−42.21−25.81−17.894.37
Q50−26.55−32.78−19.32−41.1610.60−25.59−38.22−21.65−17.975.93
Q100−21.53−28.41−16.58−37.3312.20−21.46−33.91−16.85−17.807.87
RRMSEQ20.500.540.390.610.170.430.580.390.350.14
Q50.520.550.390.600.160.440.580.380.360.16
Q100.540.560.400.600.160.460.590.410.370.18
Q200.550.570.420.610.160.480.590.440.380.20
Q500.580.590.450.610.150.500.600.480.380.23
Q1000.590.600.470.610.150.520.600.510.390.25
RMSNEQ20.570.550.450.621.130.530.600.450.600.86
Q50.550.560.430.601.130.490.590.430.550.77
Q100.550.570.430.601.150.480.590.430.530.73
Q200.550.590.430.601.200.470.600.440.520.72
Q500.560.610.440.611.270.480.610.460.500.73
Q1000.590.630.470.621.330.500.640.490.500.76
R2Q20.700.700.820.750.800.810.740.890.850.92
Q50.690.690.830.720.780.820.710.880.850.93
Q100.690.670.830.710.780.820.690.880.860.93
Q200.690.660.830.690.770.820.680.870.860.93
Q500.690.640.820.670.760.810.660.860.870.93
Q1000.680.630.810.650.760.800.650.840.880.93
Table 4. Agreement of group formation between L-moments space and multivariate statistical techniques (PCA, Ward’s method, and K-means clustering). The bold faced refers the highest agreement among the assumed regions.
Table 4. Agreement of group formation between L-moments space and multivariate statistical techniques (PCA, Ward’s method, and K-means clustering). The bold faced refers the highest agreement among the assumed regions.
Comparison in % (n)LCV-LSK Space [% (n)]Total (n)
G-LMD1G-LMD2G-LMD3G-LMD4
PC1 vs. PC2 spaceQR125 (16)19.61 (10)19.44 (7)0 (0)33
QR226.56 (17)21.57 (11)8.33 (3)65.22 (15)46
QR325 (16)29.41 (15)38.89 (14)34.78 (8)53
QR423.44 (15)29.41 (15)33.33 (12)0 (0)42
Ward’s MethodWMR19.38 (6)17.65 (9)0 (0)0 (0)15
WMR218.75 (12)9.8 (5)11.11 (4)78.26 (18)39
WMR321.88 (14)29.41 (15)47.22 (17)17.39 (4)50
WMR450 (32)43.14 (22)41.67 (15)4.35 (1)70
K-means clusteringKMR19.38 (6)17.65 (9)0 (0)0 (0)15
KMR240.63 (26)37.25 (19)38.89 (14)0 (0)59
KMR318.75 (12)11.76 (6)11.11 (4)82.61 (19)41
KMR431.25 (20)33.33 (17)50 (18)17.39 (4)59
Total100 (64)100 (51)100 (36)100 (23)174
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Ahmed, A.; Rahman, A.; Rafi, R.S.M.H.; Khan, Z.; Mannan, H. Statistical and Physical Significance of Homogeneous Regions in Regional Flood Frequency Analysis. Water 2025, 17, 1799. https://doi.org/10.3390/w17121799

AMA Style

Ahmed A, Rahman A, Rafi RSMH, Khan Z, Mannan H. Statistical and Physical Significance of Homogeneous Regions in Regional Flood Frequency Analysis. Water. 2025; 17(12):1799. https://doi.org/10.3390/w17121799

Chicago/Turabian Style

Ahmed, Ali, Ataur Rahman, Ridwan S. M. H. Rafi, Zaved Khan, and Haider Mannan. 2025. "Statistical and Physical Significance of Homogeneous Regions in Regional Flood Frequency Analysis" Water 17, no. 12: 1799. https://doi.org/10.3390/w17121799

APA Style

Ahmed, A., Rahman, A., Rafi, R. S. M. H., Khan, Z., & Mannan, H. (2025). Statistical and Physical Significance of Homogeneous Regions in Regional Flood Frequency Analysis. Water, 17(12), 1799. https://doi.org/10.3390/w17121799

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop