Impacts of the Degree of Heterogeneity on Design Flood Estimates: Region of Influence vs. Fixed Region Approaches

Ahmed, Ali; Morshed, Mohammad A.; Mim, Sadia T.; Rafi, Ridwan S. M. H.; Khan, Zaved; Maity, Rajib; Rahman, Ataur

doi:10.3390/w17182765

Open AccessArticle

Impacts of the Degree of Heterogeneity on Design Flood Estimates: Region of Influence vs. Fixed Region Approaches

by

Ali Ahmed

¹

,

Mohammad A. Morshed

¹,

Sadia T. Mim

²,

Ridwan S. M. H. Rafi

³,

Zaved Khan

⁴,

Rajib Maity

⁵

and

Ataur Rahman

^1,*

¹

School of Engineering, Design and Built Environment, Building XB, Western Sydney University, Kingswood, Penrith 2747, Australia

²

Department of Civil Engineering, Ahsanullah University of Science and Technology, Dhaka 1208, Bangladesh

³

Department of Electrical and Computer Engineering, North South University, Dhaka 1229, Bangladesh

⁴

CSIRO Environment, Canberra 2601, Australia

⁵

Department of Civil Engineering, Indian Institute of Technology, Kharagpur 721302, West Bengal, India

^*

Author to whom correspondence should be addressed.

Water 2025, 17(18), 2765; https://doi.org/10.3390/w17182765

Submission received: 5 July 2025 / Revised: 1 September 2025 / Accepted: 11 September 2025 / Published: 18 September 2025

(This article belongs to the Special Issue Urban Flood Mitigation and Sustainable Stormwater Management—2nd Edition)

Download

Browse Figures

Versions Notes

Abstract

In regional flood frequency analysis (RFFA), the formation of homogeneous regions is commonly regarded as a necessary condition for reliable regional flood estimation. However, achieving true homogeneity is often challenging in practice. This study investigates the formation of homogeneous regions by applying two region delineation approaches—fixed regions and the region-of-influence (ROI) method—accompanied by the widely used heterogeneity measure (H₁) proposed by Hosking and Wallis. The analysis utilizes data from 201 stream gauging stations across southeast Australia, evaluating a total of 1211 candidate regions. The computed H₁-statistics range from 13 to 30 for fixed regions and from 6 to 30 for ROI-based regions, indicating a consistently high level of heterogeneity across the study area. This suggests that the assumption of homogeneity may not be realistic for many parts of southeast Australia. Moreover, regression equations developed for regional flood estimation yield absolute median relative errors between 29% and 56%, with a median of 39% across return periods from 2 to 100 years. These findings underscore the limitations of relying solely on homogeneity in regional flood modelling and highlight the need for more flexible and robust approaches in RFFA. The outcomes of this research have significant implications for improving flood estimation practices and are expected to contribute to future enhancements of the Australian Rainfall and Runoff (ARR) national guidelines.

Keywords:

floods; regional flood frequency; homogeneity; regression; Australian Rainfall and Runoff

1. Introduction

Floods are one of the worst natural disasters globally. A probabilistic approach is used when designing flood-safe hydraulic structures, where a streamflow is linked with an annual exceedance probability (AEP) or return period (T) to find a design flow, such as 100-year flood. At a location of interest where good quality streamflow data are available, at-site flood frequency analysis (FFA) is widely adopted; however, for sites with no/little streamflow data, regional flood frequency analysis (RFFA) is adopted to estimate design floods [1,2]. In RFFA, data from similar sites are pooled to develop prediction models [3].

A group of catchments exhibiting similar hydrological responses may form a homogeneous region [4]. Among all the RFFA techniques, the index flood method is widely used, which heavily relies on the concept of homogeneous regions [5,6]. As per the index flood method, a group of stations forms a homogeneous region if their standardized flood frequency curves are like each other within a certain margin of sampling variability [5]. In contrast, the quantile regression technique (QRT), another type of RFFA method, relaxes the homogeneity assumption, i.e., it does not need a perfect homogeneous region [7]. However, there is a notion in RFFA that even in QRT, if the regions are homogeneous, the prediction error will be smaller [4]; however, this has never been tested thoroughly, e.g., in Australia.

Traditionally, a fixed-region approach is applied to identify regions in RFFA for simplicity [2]. A statistical criterion, such as the regional homogeneity test by Hosking and Wallis [5], is generally applied to evaluate the level of homogeneity of an assumed region. Numerous studies have explored the formation of homogeneous regions in RFFA [8,9,10].

In contrast to regions formed on geographical space, regions can also be formed in the hydrological and basin attributes space. Cluster analysis and principal component analysis (PCA) are generally adopted to form these regions in a catchment attributes space [11,12].

The fixed-region approach often lacks in homogeneity. To overcome the limitations of the fixed-region approach, the region-of-influence (ROI) approach is proposed, where a local region is formed around each of the selected stations in the geographic or catchment attributes space [3,13]. The ROI approach has successfully been applied to many RFFA studies [14]. The ROI approach generally outperforms the fixed-regions approach in RFFA [15,16,17,18].

Spatial proximity-based ROI generally outperforms the purely fixed-region approach where all the stations in a country/state are placed in a fixed region [19,20]. In the Gulf-Atlantic Rolling Plains of the United States, a 50-year flood quantile was regionalised based on the principle of geographical proximity. Here, the ROI outperformed the fixed-region approach for both geographic and catchment attributes space [19]. Similarly, using 204 stations in Arkansas, the ROI approach was found to outperform five other RFFA techniques [21]. In Québec and Ontario, the Canadian Statistical Hydrology Research Group compared four different RFFA frameworks and five distinct RFFA methods. Using the regional hydrological attributes as inputs to identify homogeneous regions and to enhance the efficiency of analysis, a novel regionalisation process called the Automatic Region Revision Algorithm (ARRA) was formulated, which was complementary to both the L-moments-based and the ROI approaches [5].

A few studies have noted that successful identification of homogeneous regions is dependent on the hydrological features of the sites [22].

Although there have been many studies in RFFA, no single method is available that can be confidently applied across all countries to identify homogeneous regions. There has been no success on the identification of acceptable homogeneous regions in Australia using traditional homogeneity tests. Moreover, there has been a limited number of studies on how the degree of heterogeneity impacts the model accuracy in RFFA in a QRT framework. Our research intends to fill this knowledge gap by assessing the regional heterogeneity of catchments in southeast Australia using both the fixed-region and ROI approaches and quantifying the impacts of level of regional homogeneity on the design flood estimates in a QRT framework.

2. Data and Methodology

2.1. Selection of Study Area

Southeast Australia (Victoria (VIC) and New South Wales (NSW)) is selected as the study area, as the streamflow gauging network is dense in this part of Australia. A total of 201 stations is selected from these two Australian states (Figure 1). The selected catchments are predominantly natural and have no significant storage and land use change, and these are small to medium in size. This part of Australia is significant, as most of the people in Australia live here; hence, significant capital investment is made. For most of these investments, flood risk assessment is needed, as the investments are situated in flood plains. The mean annual rainfall in the study area ranges from 470 mm to 1950 mm, while the mean annual potential evapotranspiration varies from 900 mm to 1550 mm. The Great Dividing Range (GDR) separates coastal areas from the inland plain areas. The rivers located to the east or southeast of the GDR fall to the adjacent sea, while the rivers on the inland side flow to the Murray Darling basins.

Table S1 (in Supplementary Section) provides important physical features and statistical indices of the selected 201 catchments, such as the discordancy measure (D_i) (any D_i > 3 indicates a discordant site as per Hosking and Wallis [5] criteria). Figure S1 presents cluster analysis (by Ward’s method) results where three catchment groups (Group 1, Group 2 and Group 3) are formed (sites are labelled by site index, column 1 of Table S1). Figure S2 exhibits the hydrological perspectives of these three clustering groups where it is found that Group 1 is dominated by smaller LCV (of AMF data) values, Group 3 is dominated by higher LCV values, and Group 2 is characterized by moderate LCV values.

The selection of stations and catchment characteristics in RFFA generally relies on some basic recommendations. The selected catchment characteristics should have (i) a direct impact on flood response (in flood generation mechanism), (ii) easy accessibility, (iii) proper definition, (iv) simple physical explanation, (v) no major correlation among them (i.e., avoiding highly correlated characteristics), and (vi) better prediction performance.

Based on the relevance to previous studies in Australia, eight climatic and catchment characteristics are selected for this study (shown in Table 1). The data for these characteristics were obtained from the Australian Bureau of Meteorology (BoM) website and Australian Rainfall and Runoff Revision Project 5 Regional flood methods. Table 1 summarises the basic statistics of the catchment characteristics data of the selected 201 stations. With a mean of 333.99 km², the AREA values lie in a range from 3.00 to 1010.00 km² (standard deviation: 262.40 km²). The AREA of the maximum stations (n = 138, ~69%) is within 3 to 400 km², and only one station has an AREA of 1010 km². The MAR values range from 484.39 to 1953.23 mm, with a mean of 962.26 mm and the median of 891.64 mm (standard deviation (SD) = 314.47 mm). I₆₂ and MAE are extracted from the BoM website. The highest and the lowest values for the predictor I₆₂ are 87.30 and 24.60 mm/h (SD = 10.07 mm/hr), respectively. In case of MAE, these figures are 1543.30 and 925.90 mm (SD = 129.31 mm), respectively. The length of time AMF data have been recorded (record length) is in the range 25.00 to 89.00 years (mean = 45 years and SD = 9.63 years). Out of 201 stations, roughly 68% (n = 137) of stations have a record length of 40 to 50 years for AMF data.

2.2. Formation of Regions

By placing all the 201 selected stations, a single fixed region (A) is formed. Thereafter, considering the administrative boundaries, i.e., based on the states (NSW and VIC), the selected stations are divided into two fixed regions (B1 and B2). In a different way, based on the drainage division (DD), the stations are grouped into two regions. Stations with station identification numbers (SINs) starting with ‘2’ (stations falling in DD 2) are placed in one region (C1), and stations with SINs starting with ‘4’ are placed in another region (C2). Lastly, the ROI approach is used to form regions where the KNN (K-nearest neighbour) principle is adopted, with 10, 20, 50, 100, 150, and 200 nearby stations being selected around each of the 201 stations. A summary of the assumed regions with their notation and number of stations in each region can be found in Table 2.

2.3. Evaluation of Homogeneity of the Selected Regions

In RFFA, various types of heterogeneity tests have been introduced; however, the most widely used one is the L-moments-based test by Hosking and Wallis [5], which is adopted in this study. For a homogeneous region, population L-moments will be the same for all the sites in a proposed region; however, their sample L-moments will be different [5]. To assess the degree of sampling variability, Hosking and Wallis [5] used simulation based on a four-parameter Kappa distribution. Based on the results of this simulation, three statistics were proposed to evaluate regional homogeneity. These are (i) the discordancy measure (D_i) for detecting discordant stations, (ii) the heterogeneity measure (H_i) for finding the degree of homogeneity in a region, and (iii) the goodness-of-fit measure (|Z|-statistics) for selecting the best fit distribution(s) for the assumed region. Based on the dimensionless L-moment coefficients (LCV, LSK, and LKUR), the H_i -statistics (i = 1 to 3) are generated [5].

The discordancy measure (Di) can be expressed as

D_{i} = \frac{1}{3} {(u_{i} - \bar{u})}^{T} S^{- 1} (u_{i} - \bar{u})

(1)

where u_i represents the vector of LCV, LCS, and LCK for a station i

S = covariance matrix of u_i, ū = mean of vector u_i.

If D_i ≥ 3.00, a station is said to be discordant.

Assuming the station i has the record length n_i and sample L-moment ratios are t(i), t3(i), and t4(i), the regional average LCV, LSK, and LKUR are weighted by the stations’ record length and are presented as t^R, t3R and t4R, respectively. Mathematically,

t^{R} = \frac{Σ n_{i} t^{(i)}}{\sum_{i = 1} n_{i}}

(2)

and the weighted standard deviation can be estimated by

V = \sqrt [\frac{Σ n_{i} {(t}^{(i)} - t^{R})^{2}}{\sum_{i = 1} n_{i}}]

(3)

Finally, a heterogeneity measure, the H-statistic, can be estimated as

H = \frac{(V - µ_{v})}{σ_{v}}

(4)

Here,

µ_{v}

and

σ_{v}

represent the mean and standard deviation of the simulated values, respectively, and V represents the weighted standard deviation of simulated values. The following rules are used to evaluate the homogeneity:

H < 1: a proposed region is regarded as acceptably homogeneous;
1 ≤ H < 2: a proposed region is possibly heterogeneous;
H ≥ 2: a proposed region is regarded as definitely heterogeneous.

Z-distribution is defined as

Z^{Dist} = \frac{1}{σ_{4}} [{τ_{4}}^{Dist} - {(t bar)}_{4} + β_{4}]

(5)

where

σ_{4}

= standard deviation of L-kurtosis values obtained from simulation,

τ4^Dist = average L-kurtosis value computed from simulation for the fitted Kappa distribution,

(t bar)₄ = average L-kurtosis value computed from the data of a given region, and β4 = bias of τ4.

A distribution is regarded the best fit for the assumed region if the |Z|-statistic value is ≤1.64 [5].

2.4. Regional Estimation Model Development

The quantile regression technique (QRT) is applied for developing a prediction equation for a given region and for a given return period [23]. The QRT is defined by

Q_T = aB^bC^cD^d …

(6)

where Q_T is a flood discharge with 1 in T AEP, B, C, D, … are predictors, and a, b, c, … are the regression coefficients, which are estimated by the method of ordinary least squares (OLS).

Using the logarithmic (base 10) transformation and applying the selected predictors (see Table 1), the QRT can be expressed as

{l o g}_{10} (_{Q_{T}}) = b_{0} + b_{1} \times {l o g}_{10} (A R E A) + b_{2} \times {l o g}_{10} (I_{62}) + b_{3} \times {l o g}_{10} (M A R) + b_{4} \times {l o g}_{10} (S F) + b_{5} \times {l o g}_{10} (M A E) + b_{6} \times {l o g}_{10} (S D E N) + b_{7} \times {l o g}_{10} (S 1085) + b_{8} \times {l o g}_{10} (F O R E S T)

(7)

where Q_T is the flood quantile for the T-year return level. These Q_T values were calculated from fitting a log-Pearson Type 3 (LP3) distribution to the AMF data at each of the 201 selected stations. FLIKE software (TUFLOW Flike 5.0.220.0) was used to fit the LP3 distribution to the AMF data [24]. The reason for using LP3 distribution is that previous Australian studies found LP3 to be the suitable distribution to describe AMF data for most Australian streamflow gauging stations.

2.5. Evaluation Criteria

To validate the developed prediction equation, six statistical indices were adopted: (i) relative error (RE), (ii) mean square error (MSE), (iii) bias (BIAS), (iv) relative bias (RBIAS), (v) relative root mean square error (RRMSE), and (vi) root mean square normalized error (RMSNE).

R E = \frac{Q_{p r e d} - Q_{o b s}}{Q_{o b s}} \times 100

(8)

M S E = m e a n [{(Q_{p r e d} - Q_{o b s})}^{2}]

(9)

B I A S = m e a n (Q_{p r e d} - Q_{o b s})

(10)

R B I A S = [m e a n (\frac{Q_{p r e d} - Q_{o b s}}{Q_{o b s}})] \times 100

(11)

R R M S E = \frac{\sqrt{[{m e a n (Q_{p r e d} - Q_{o b s})}^{2}]}}{{m e a n [Q}_{o b s}]}

(12)

R M S N E = \sqrt {[m e a n (\frac{Q_{p r e d} - Q_{o b s}}{Q_{o b s}})}^{2}]

(13)

Here, Q_pred is obtained by the RFFA model derived in this study and Q_obs is estimated by FFA, as noted before. A leave-one-out (LOO) validation method is considered here, where each of the selected stations is tested by applying the developed prediction, derived without the test station. Absolute median relative error (AMRE) is calculated as the median of the absolute RE values of all the selected stations to provide an overall model error estimate. R was used for carrying out the regression analysis.

3. Results

3.1. Exploratory Analysis

To understand hydrological perspectives of the catchment groups, three clustering groups (Figure S1) are formed based on catchment characteristics data; these groups are then linked to the LCV values of the AMF data of the selected stations in Figure S2. This figure clearly exhibits the differences in the LCV values of the three clustering groups. For example, Group 1 has much smaller LCV values than Group 3.

The predictor variables and L-moments for the fixed regions are represented by boxplots in Figure 2. It can be seen that fixed-region A is characterized by the second-highest median AREA, I₆₂, MAR, MAE, and FOREST, and smaller SDEN values, which implies a region with comparatively larger, wetter, more forested, and stepper stations. In contrast, fixed-region B1 has the highest median I₆₂, MAR, MAE, and SDEN and almost similar S1085. It also has the and second-lowest FOREST values. These characteristic suggest a group of wet, steeper, and relatively less-forested stations. The median L-moment (LCV, LSK, and LKUR) values become higher in this region. Region B2 has the smallest median I₆₂, MAE, and SDEN and the second-lowest FOREST values, which points to a group of drier and stepper stations with larger forest cover, thus they have the lowest median L-moment values. Region C1 has the smallest median AREA and the highest median MAR and FOREST values, which implies a group of smaller, wetter, and more forested stations. In contrast, region C2 is dominated by the highest median AREA and the lowest median MAR and FOREST values (i.e., low forestation, dry and bigger catchments). The correlations among the eight predictor variables used in developing regression equations need to be examined. The Variance Inflation Factor (VIF) is used to measure the impact of the degree of multicollinearity on the regression results. Typically, VIF values above five indicate the presence of a significant multicollinearity. In the present study, all the VIF values are below four, suggesting a low level of multicollinearity, which is unlikely to affect the stability of our regression coefficients.

3.2. Discordancy, Heterogeneity, and Z-Statistics of the Assumed Regions

Table 2 shows the summary statistics of the assumed regions with their D_i and Z-values. Here, the reported D_i values are 3.00 or more. The lowest D_i values fall in the range 3.05 (B1 and C2)–3.34 (C1) for the assumed fixed regions. Adopting the ROI approach, these values range from 3.07 (KNN50) to 3.62 (KNN150). In contrast, for the assumed fixed regions, the highest Di values range from 5.04 (B2) to 7.18 (C2), and for the ROI approach, this range is from 5.40 (KNN100) to 7.70 (KNN150). Interestingly, there are no discordant stations for both the KNN 10 and 20. As per Hosking and Wallis [5] criteria, the Z-values clearly indicate that Pearson Type three (PE3) and Generalized Pareto (GPA) are the most suitable regional distributions in the study area.

3.3. Discordancy and Comparison of Fixed Region and ROI Approaches in Identification of Homogeneous Regions

Figure 3 exhibits the distribution of calculated H_i (median values) for the considered fixed regions and regions based on the ROI approach. The H₁-statistics range from 13.8 to 30.7 and 3.8 to 30.6 in the approaches, respectively. This clearly shows that the assumed groups are highly heterogeneous.

Adopting the fixed-region approach, in region A (n = 201), the calculated H_i-values (i = 1, 2, and 3) are the highest and vary from 11.5 to 30.7. However, the lowest H_i-values are observed in region B1 (n = 88), ranging from 6.6 to 13.8. With an exception for H₃-values (in region C1); the second-, third-, and fourth-highest H_i-values are found in the regions B2 (n = 113), C1 (n = 106), and C2 (n = 95), respectively. These figures fluctuate from 8.4 to 26.2, 9.3 to 22.9, and 7.1 to 20.2. This indicates that the H_i-statistics gradually decrease with the decreasing number of stations. In addition, when applying the ROI approach, the H_i -statistics reveal the same notion. The lowest H₁-value (3.8) is noted for KNN10, and it steeply increases for the higher-order KNNs. Eventually, the highest H₁-statistic (30.6) is observed for KNN200. In this approach, the median H₂-value ranges from 2.8 (for KNN10) to 20.7 (for KNN200), and the H₃-value varies from 2.1 (for KNN10) to 11.7 (for KNN200). Having higher H_i-statistics, the relationship H₁ > H₂ > H₃ is observed among the assumed regions. It is obvious from the graph that H_i-statistics sharply increase with the increased number of stations in a region formed using both the fixed-region and ROI approaches.

Despite these high H_i-statistics, 29 stations are identified as homogeneous regions (H₁ < 1.00) only in the KNN10 approach (Figure 4, red colour). No other approaches identify any homogeneous region. Table S2 (Supplementary Section) shows a detailed summary of these homogeneous regions, including the target and member stations’ ID numbers and respective H₁-values. Figure 5 shows an in-depth summary of the H₁-values obtained using the ROI approach. More specifically, the spatial distribution in Figure 6 shows the magnitude of the H₁-statistic for KNN20 for Q₂₀. The spatial distribution of H₁-statistics to estimate Q₂₀, adopting ROI approach, can be found in the Supplementary Section (Figure S3 for KNN50 and Figure S4 for KNN100).

A comparison between the H₁-statistics and absolute RE values for the KNN50, KNN100, and KNN150 approaches to estimate Q₂, Q₅, Q₁₀, Q₅₀, and Q₁₀₀ can be found in Figures S5–S9 (see Supplementary Section), respectively.

To understand the temporal changes in homogeneity for different parts of the study area, each stations’ AMF data are divided into three parts: (i) full length, (ii) earlier part (till 1990), and (iii) most recent part (1991–2018). Figure 6, Figures S10 and S11 present the spatial distributions of H₁-statistics for KNN20 for these three subsets of AMF data. For earlier AMF data (till 1990), southwest Victoria is dominated by a higher spatial homogeneity (marked by light blue colour) (Figure S10). In contrast, for the most recent AMF data (1991–2018), southwest Victoria is dominated by a higher level of heterogeneity (indicated by light-red to red colour) (Figure S11). However, for the full AMF data length, this part of Victoria is dominated by a moderate level of heterogeneity (indicated by the light-yellow colour) (Figure 6). This clearly demonstrates that more recent AMF flood data have higher variability (i.e., higher LCV), resulting in a higher degree of heterogeneity. Another distinct pattern is visible due to temporal variation in AMF data in the northeastern NSW, where earlier data (Figure S10) indicates a moderate spatial heterogeneity (light-yellow colour), and the most recent data (Figure S11) and the full AMF data (Figure 6) show a moderate level of spatial homogeneity. This indicates a shift towards a greater level of spatial homogeneity in northeast NSW over time. It is interesting to note that the mid-south Victoria (adjacent to Melbourne) exhibits the highest level of spatial heterogeneity (represented by deep red colour) for all the three AMF data subsets.

3.4. Comparison of Fixed-Region and ROI Approaches

3.4.1. Prediction Equation Development and Performance

Table 3 summarises the regression coefficients (based on equation 7) for Q₅₀ for the assumed regions. This table also shows the adjusted coefficient of determinant (Adj-R²) values. Detailed descriptive statistics of the regression coefficients for all the quantiles are presented in the Supplementary Section (Table S3). Region KNN10 is not considered for further analysis as the number of stations in this region is < 20 (n = 15).

3.4.2. Assessment of the Impact of Homogeneity on Prediction Accuracy

Figure 7 assesses the effects of the heterogeneity level on the prediction accuracy of the developed regional prediction equations (H₁- vs. AMRE values). For the assumed fixed region A (n = 201), which has the highest H₁-value (30.66), the AMRE values range from 36.98% (Q₁₀) to 45.34% (Q₁₀₀). Interestingly, using the ROI approach, the region KNN200 (n = 200) has the same AMRE values, with an H₁-statistic of 30.57. With the lowest H₁-value (6.71) the region KNN20 shows moderately high AMRE values. In this region, the minimum and the maximum AMRE values are 39.79% (Q₁₀) and 52.73% (Q₁₀₀), respectively. In contrast, having a higher H₁-statistic (22.94), the region C1 (n = 106) also exhibits higher AMRE values, which range from 38.58% (Q₂) to 56.66% (Q₁₀₀). It is noted here that exhibiting the lowest (KNN20) and 5th highest (C1) H₁-values, the regions have more than 50.00% AMRE values for Q₅₀ and Q₁₀₀. Similar to KNN200, the regions B1 (n = 88), B2 (n = 113), C2 (n = 95), KNN50, KNN100, and KNN150 have less than 49.00% AMRE values, with H₁-statistics of 13.83, 26.16, 20.15, 13.21, 22.01, and 27.11, respectively. The lowest AMRE values are observed in the region B2 (VIC state); these values lie in a range from 32.74% (Q₂) to 40.51% (Q₁₀₀), with a relatively higher H₁-value (26.16). In addition, compared to this figure, regions C2 (H₁-value 20.15), KNN50 (H₁-value 13.21), and KNN100 (H₁-value 22.01) show lower AMRE values up to Q₂₀. Figure 7 clearly indicates a poor relationship between the heterogeneity level and the prediction accuracy of the quantile estimates adopted from either the fixed-region or ROI approaches.

Table 4 shows the summary statistics of the evaluation criteria of the developed prediction equations for the assumed fixed regions and the regions based on the ROI approach. The R² values of the formed fixed regions fluctuate from 0.53 (region B2 for Q₁₀₀) to 0.80 (region C1 for Q₂). Applying the ROI approach, these R² values range from 0.58 (region KNN150 for Q₁₀₀) to 0.84 (region KNN20 for Q₂ and Q₅). It is obvious from this table that with the increase in the return period the R² values decreased (with the exception of the C2 region). Thus, there is an inverse relationship between the R² values and the return period (T), as expected.

In the fixed regions, the lowest AMRE value is 29.50% (Q₂ of region C2) and the highest is 56.66% (Q₁₀₀ of region C1), with the overall median 39.72% (SD 6.06%). Likewise, adopting the ROI approach, the AMRE values are 32.53% (Q₂ of region KNN50) and 52.73% (Q₁₀₀ of region KNN20), respectively, with an overall median of 40.00% (SD 4.63%). Here, the SD values indicate that the ROI approach provides relatively more consistent and precise flood quantile estimates than the fixed-region approach.

In the assumed fixed regions, the minimum MSE value (613) is observed in region B2 for Q₂ and the maximum (1,576,140) is in region B1 for Q₁₀₀. The overall median and SD values of the MSE for these regions are 40,902 and 367,830, respectively. On the other hand, these MSE values are higher for the assumed regions formed by the ROI method. In this case, the highest and the lowest MSE values are 3,201,044 (Q₁₀₀ of region KNN20) and 2476 (Q₂ of region KNN150), respectively. The overall median and SD values of the MSE are 95,007 and 641,000, respectively. Among the assumed fixed regions, the BIAS ranges from −233.05 (Q₁₀₀ of region B1) to −3.79 (Q₂ of region B2), and in the ROI approach, it ranges from −169.16 (Q₁₀₀ of region KNN200) to −6.33 (Q₂ of region KNN50). Interestingly, only for the region KNN20 is the BIAS is positive, and it ranges from 1.68 (Q₂) to 93.24 (Q₁₀₀). Within the assumed fixed regions, the overall median and SD values of BIAS are −34.28 and 58.94, respectively. In case of the ROI approach, these figures are −20.25 and 50.09, respectively. This shows that the ROI outperforms the fixed-region approach to estimate the flood quantile in southeastern Australia. The estimated highest RBIAS values for the fixed-region and ROI approaches are 56.20 (Q₁₀₀) and 68.05 (Q₁₀₀) in the assumed regions B1 and KNN20, respectively. The lowest RBIAS values are 14.63 (Q₅) and 21.41 (Q₅) in the regions C2 and KNN200, respectively. In the assumed fixed regions, the overall median RBIAS is 27.33 (SD = 10.24), and for ROIs it is 31.22 (SD = 10.18). Adopting the fixed-region approach, the RRMSE values range from 0.10 to 0.21 (Q₁₀₀) in the region A, for the ROI approach it ranges from 0.03 (Q₂ and Q₅) in the region KNN20 to 0.21 (Q₁₀₀) in the region KNN200.

In the fixed-region approach, the overall median RRMSE value is 0.13 (SD = 0.03), and for the ROI approach the value is 0.09 (SD = 0.04). In the assumed fixed regions, the RMSNE values of the quantile estimates fall between 0.70 (region C2 for Q₅) and 2.25 (region B1 for Q₁₀₀). For the ROI approach, the RMSNE values vary from 0.93 (Q₂) in KNN200 to 2.85 (Q₁₀₀) in KNN20. With relatively higher H₁-statistics (26.16), region B2 (which has the highest variation in Adj-R², SD = 0.07) exhibits the least variation for the evaluation criteria AMRE, MSE, and BIAS. Similarly, KNN150, which has the second-highest H₁-statistic (27.11) and the highest variation in Adj-R² (SD = 0.05), exhibits the lowest variation for the model evaluation criteria AMRE, MSE, and RBIAS. This clearly indicates that the regions B2 and KNN150 provide more accurate quantile estimates compared to the other assumed regions.

The scatter plot in Figure 8 illustrates the relationship between the heterogeneity level (measured by the H₁-statistic) and AMRE values for the quantile estimates, with each dot representing an assumed region. The R² values of the developed regression equations quantifies the degree of association between the H₁-statistic and AMRE values of quantile estimation. The R² values clearly indicate the weak relationship between the H₁- and AMRE values associated with the regression equations. The maximum and minimum R² values are 0.16 (p-value = 0.25) for Q₁₀₀ and 0.005 (p-value = 0.85) for Q₂₀, respectively. The trend/regression lines in Figure 8 (for all the six return periods) are not statistically significant (p-values range from 0.25 to 0.84), which indicates that the impact of the degree of heterogeneity on absolute median relative error is statistically insignificant. The minor inverse relationship or likely plateau conditions indicate no relationship between the degree of heterogeneity and the prediction accuracy of the developed regression equations. Figures S12–S15 (see Supplementary Section) also show the categorical spatial distribution of absolute RE values from Q₂₀ quantile estimates for KNN50, KNN100, KNN150, and KNN200.

Figure 9 presents distribution of RE values by quantile estimates for all the assumed fixed regions. If the RE value of the prediction model is close to zero, then the model is called an unbiased model. In addition, for clearer visualization the boundary limit is set between −300 and 300, Figure 9 apparently indicates that the median RE value for most of the quantiles is generally close to the zero line for region B2. To some extent, the RE values for the regions A and C1 also touch the zero line. Having some under- and over-estimation, the regions A, B2, and C1 exhibit relatively precise RE estimates. Similar findings are also noticed in Figure 7 for the regions B2 (AMRE value < 41% for Q₁₀₀) and A (AMRE value < 46% for Q₁₀₀); however, surprisingly, in region C1 the AMRE value is high (ranges from 38.58% for Q₂ to 56.66% for Q₁₀₀). The fixed-region approach gives better quantile estimates for Q₁₀, followed by Q₅. In Figure 10, the ROI (KNN 20, 50, 100, 150, and 200) approach results in some under- and over-estimation of design flood estimates for the quantiles Q₂, Q₅, Q₁₀, Q₂₀, Q₅₀, and Q₁₀₀. However, the unbiased line (red line) roughly passes through the median values of RE of the Q₂₀ quantile for all the KNNs compared to other quantiles. It also exhibits less variation in terms of RE values, which indicates that the ROI approach generates the best quantile estimates for the Q₂₀ quantile. In the case of Q₁₀₀, higher variation is observed (RE values deviate far away from the median RE values) compared to other quantiles (Figure 10).

Based on Figure 9 and Figure 10, it can be said that the ROI approach outperforms the fixed-region approach by generating more precise flood quantile estimates for quantiles Q₂₀ and Q₅. Figure 11 shows a 3D visualization of the absolute RE and H₁-values for the KNN regions. It explores the associations between the heterogeneity level and accuracy of quantile prediction. Figure 12 shows the categorical spatial distribution of AMRE values for Q₂₀ for KNN20. Figure 13 (histogram) shows the comparison between H₁-values and absolute RE values associated with Q₂₀ for the KNN50, KNN100, and KNN150 regions. Similar plots can be found in Figures S5–S9 (Supplementary Section).

Figure 14 and Figure 15 explain the accuracy of quantile estimates through the scatter plots for the assumed regions (both the fixed-region and ROI approaches). The diagonal red line indicates the unbiased estimates. Log transformation (

{l o g}_{10}

) is used to normalize the quantile values for better visualization. In both the figures, the 1st, 2nd, and 3rd row represent the scatter plots of quantile estimates Q₂₀, Q₅₀, and Q₁₀₀, respectively, for different assumed regions. Figure 14 clearly indicates that the estimated quantile values are closer to the observed values in the C2 region. On the other hand, the observed and predicted values for the quantile are relatively closer to the unbiased line in regions KNN150 and KNN200 (Figure 15).

4. Discussion

The results of this study are compared with similar RFFA studies. Based on 305 stations in the USA, Basu and Srinivas [14] adopted a leave-one-out (LOO) validation technique to evaluate regional regression models under an ROI framework. The estimated absolute RBIAS values in this study ranged from 7.8 to 23.0 for Q₅₀ and 12.8 to 33.4 for Q₁₀₀. They also found that the RRMSE values in the range of 10.0% to 30.8% for Q₅₀ and 16.2% to 44.3% for Q₁₀₀ [14].

Using an ROI framework, based on 204 stream gauging stations in Arkansas, the defined unique subregion for each of selected stations showed an RMSE value of 38% [21]. Eng et al. [20] applied a hybrid ROI approach based on 1091 gauged stations in the southeastern USA for Q₅₀. This hybrid ROI approach outperformed both the predictor variable and geographic-space-based ROI approach. The RMSE values from split-sample validation were in the range of 47.7–55.6%, 58.4–60.9%, and 50.3–56.2% for hybrid, predictor–variable and geographic-space-based ROI approaches, respectively [20]. Likewise, a multiple regression leverage-guided ROI approach was applied to 996 stations in the southeastern USA to estimate Q₅₀. In case of geographic-space-based ROI, the estimated RMSE values reduced from 226% to 59%. On the other hand, the estimated RMSE values reduced to 22% from 48% and the variance of the leverage values decreased to 0.08 from 0.12 for predictor-based ROI [20].

Zrinji and Burn [13] conducted an RFFA study in mid-west Canada applying a hierarchical ROI approach. They found that the MSE values were in the range of 0.0654 to 0.0696, and the BIAS was in the range of −0.0109 to 0.0006 for different modelling options. Burn [17] conducted a study using 45 gauging stations in southern Manitoba, Canada to evaluate the ROI approach in terms of network average RMSE. The RMSE values ranged from 0.142 (Q₂₅) to 0.298 (Q₂₀₀). The estimated BIAS values ranged from 0.000 (Q₂₅) to 0.023 (Q₂₀₀). Durocher et al. [25] applied spatial copula on 151 gauging stations in southern Quebec, Canada. The estimated mean and median values of RMSE were 38 and 35, respectively, for Q₁₀. These values were 45 and 41 for Q₁₀₀, respectively [25]. Using the same dataset, several RFFA studies were conducted [26,27]. For these studies, the estimated RMSE values for Q₁₀ ranged from 37 to 66 and from 45 to 86 for Q₁₀₀. The estimated BIAS values ranged from −3% to −20% for Q₁₀ and −2% to −27% for Q₁₀₀.

In Australia, using a multiple liner regression-based RFFA approach with data from 88 stations in NSW, Rahman and Rahman [28] reported overall AMRE values ranging from 26.16% to 387.51%. Their H₁-statistics ranged from 1.93 to 7.59, indicating the regions were heterogeneous. They also noticed a weak association between the degree of regional heterogeneity and the prediction accuracy of the developed regression equations [28]. In contrast, adopting independent component analysis and using the same dataset, Rahman et al. [29] found few discordant stations (D_i values ranged from 3.78 to 6.05) and higher H_i-statistics (the H₁, H₂, and H₃ statistics being 13.44, 10.06, and 5.96, respectively). Interestingly, they found less variation in AMRE values, which ranged from 33.28% (for Q₂₀) to 43.92% (for Q₂) for QRT. Using the parameter regression technique (PRT), AMRE values were found to be in the range of 47.84% (for Q₅)–58.76% (for Q₂). The reported MSE values ranged from 110,166 to 1,800,000. They found that the RBIAS and RRMSE values ranged from 22.04% to 68.93% and 0.00 to 0.27, respectively. The RMSNE values ranged from 0.95 to 3.60 [29]. However, using a global database (excluding Australia), Rosbjerg et al. [30] showed that the RMSNE values in a regression analysis are in the range of 0.50–0.65 [30].

In our study, the lowest and the highest Di values ranged from 3.05 to 7.18 (for the fixed regions) and 3.07 to 7.70 (for the ROI-based regions). The H₁-statistics range from 13.83 to 30.66 for the fixed regions and from 6.71 to 30.57 for the ROIs. The overall median AMRE for quantile estimates is 40.00%. Though the H₁-statistics are higher (potentially due to the highly variable Australian hydrology), in terms of D_i and AMRE values, our study findings are similar to those of other Australian studies. The estimated overall median RBIAS among the assumed fixed regions in this study is 27.33; for ROIs it is 31.22. These figures for the RRMSE are 13% and 9%, respectively. These findings are comparable with the studies conducted in the USA, Canada, and Australia, as noted above.

To some extent, our AMRE, RBIAS, RMSE, and RRMSE values are more accurate than those of the few previous Australian studies, as noted above. Considering the estimated RMSNE values, the results of this study (0.70 to 2.25 for the fixed regions and 0.93 to 2.85 for the ROIs) appear to be more accurate compared to those from Rahman et al. [29] and less accurate than those of Rosbjerg et al. [30].

Our results show that southern and western Victoria (Figure 6) exhibit high levels of spatial heterogeneity, represented by shades ranging from light to deep red. In contrast, mid-northern Victoria demonstrates the highest degree of homogeneity, indicated by deep blue, with a similar though slightly reduced pattern observed in the southeastern corner of New South Wales. The underlying causes of these spatial variations remain poorly understood, and further investigation is required to clarify the hydrological processes driving this pattern.

It should be noted that the LCV and LCS of the AMF data are affected by data length; the length of data recording ranged from 25 to 89 years (a mean of 45 years). A shorter AMF data recording length, like 25 to 40 years (only around 14% of our stations have smaller recording lengths than 40 years), would have affected our results, e.g., H₁-values. However, the impacts of a short recording length on our results are not explicitly addressed, except in the sub-division of the AMF data into three subsets: all data, AMF till 1990, and AMF from 1991 to 2018. The impacts of this data splitting are discussed in Section 3.3.

5. Conclusions

This study investigated the delineation of homogeneous regions for regional flood frequency analysis (RFFA) in southeast Australia using both fixed-region and region-of-influence (ROI) approaches. The relationship between regional heterogeneity and the predictive performance of regression-based RFFA models was also evaluated. Regions were delineated based on three fixed-region configurations: a single region encompassing all 201 stations and divisions based on either administrative state boundaries or drainage divisions. Additionally, a K-nearest neighbour (KNN) ROI method was employed to form flexible, site-specific regions. The degree of regional homogeneity was assessed using the statistical measures proposed by Hosking and Wallis [5].

Key findings from the study are as follows:

(i): The Pearson Type III (PE3) and Generalized Pareto (GPA) distributions are identified as the most suitable probability distributions for RFFA in southeast Australia.
(ii): The majority of the proposed regions exhibit high levels of heterogeneity. Although the KNN10 approach identifies 29 regions with acceptable homogeneity (H₁ < 1.00), these regions include fewer than 20 sites each and are therefore considered too small to be statistically robust.
(iii): The absolute median relative error (AMRE) values associated with the developed regression equations range from 29.50% for the 2-year return period (Q₂) to 56.66% for the 100-year return period (Q₁₀₀), with an overall median AMRE of 39.79%.
(iv): The level of regional heterogeneity appears to have minimal influence on the accuracy of flood quantile predictions using regression-based methods in this region.
(v): The ROI approach consistently outperforms the fixed-region approach in terms of predictive accuracy, particularly for the 20-year return period (Q₂₀).

These findings highlight the limitations of conventional homogeneity-based regionalisation in highly diverse catchments and underscore the potential of ROI methods to provide more reliable flood estimates in such settings. Future research should explore the influence of climate change on both regional heterogeneity and the performance of regression-based RFFA. Additionally, there is a need to develop alternative homogeneity testing frameworks tailored for heterogeneous regions, particularly those applicable to regression-based methodologies.

Based on the findings of this study, it can be concluded with confidence that perfectly homogeneous regions for RFFA cannot be defined in southeast Australia. Therefore, engineers should avoid applying RFFA techniques that require strict regional homogeneity, such as the index-flood method. Instead, regression-based approaches are recommended for RFFA in southeast Australia, as they do not heavily rely on the homogeneity assumptions.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/w17182765/s1, Figure S1: Groups formed by cluster analysis (Ward’s method) (Stations are labelled by site index as per Table S1); Figure S2: Boxplots of LCV of AMF data for three assumed regions by cluster analysis (Ward’s Method, refer Figure S1); Figure S3: Spatial distribution of H₁-statistics for KNN50 (ROI); Figure S4: Spatial distribution of H₁-statistics for KNN100 (ROI); Figure S5: Comparison of H₁-statistics and ARE values for Q₂ for KNN50, KNN100 and KNN150; Figure S6: Comparison of H₁-statistics and ARE values for Q₅ for KNN50, KNN100 and KNN150; Figure S7: Comparison of H₁-statistics and ARE values for Q₁₀ for KNN50, KNN100 and KNN150; Figure S8: Comparison of H₁-statistics and ARE values for Q₅₀ for KNN50, KNN100, and KNN150; Figure S9: Comparison of H₁-statistics and ARE values for Q₁₀₀ for KNN50, KNN100 and KNN150; Figure S10: Spatial distribution of H₁-statistics for KNN20 (ROI) for the sites having AMF data up to 1990; Figure S11: Spatial distribution of H₁-statistics for KNN20 (ROI) for the sites having AMF data covering 1991 to 2018; Figure S12: Spatial distribution of ARE for Q₂₀ of KNN50; Figure S13: Spatial distribution of ARE for Q₂₀ of KNN100 Figure S14: Spatial distribution of ARE for Q₂₀ of KNN150 Figure S15: Spatial distribution of ARE for Q₂₀ of KNN200; Table S1: Selected catchments and their important characteristics; Table S2: Summary of identified homogeneous regions by ROI method (H₁ < 1.00) with H₁ values and their station IDs; Table S3: Regression coefficients for the developed models to estimate design floods for all assumed regions adopting fixed region and region-of-influence (ROI) approaches.

Author Contributions

Data analysis, investigation, and manuscript drafting: A.A.; investigation and editing: M.A.M.; methodology, investigation, and editing: S.T.M.; conceptualisation, investigation, and editing: R.S.M.H.R.; data analysis and editing: Z.K.; investigation and editing: R.M.; conceptualisation, editing, and supervision: A.R. All authors have read and agreed to the published version of the manuscript.

Funding

No funding was received by this research.

Data Availability Statement

The data used in this study can be obtained from Australian Government authorities by paying a prescribed fee.

Acknowledgments

The authors would like to acknowledge the Australian Rainfall and Runoff Revision Project 5 team for providing some of the data used in this study. TUFLOW FLIKE was provided freely by the FLIKE sales team. Streamflow data were obtained from WaterNSW, the entity of NSW state Government, Australia.

Conflicts of Interest

The authors state no conflicts of interest.

References

Dalrymple, T. Flood-Frequency Analyses (No. 1543); Manual of Hydrology: Part 3; United States Government Printing Office: Washington, DC, USA, 1960.
Shu, C.; Ouarda, T.B. Flood frequency analysis at ungauged sites using artificial neural networks in canonical correlation analysis physiographic space. Water Resour. Res. 2007, 43, 7. [Google Scholar] [CrossRef]
Burn, D.H. An appraisal of the “region of influence” approach to flood frequency analysis. Hydrol. Sci. J. 1990, 35, 149–165. [Google Scholar] [CrossRef]
Cunnane, C. Review of statistical models for flood frequency estimation. In Hydrologic Frequency Modelling; Springer: Dordrecht, The Netherlands, 1987; pp. 49–95. [Google Scholar]
Hosking, J.R.M.; Wallis, J.R. Some statistics useful in regional frequency analysis. Water Resour. Res. 1993, 29, 271–281. [Google Scholar] [CrossRef]
Stedinger, J.R.; Tasker, G.D. Regional hydrologic analysis: 1. Ordinary, weighted, and generalized least squares compared. Water Resour. Res. 1985, 21, 1421–1432. [Google Scholar] [CrossRef]
Haddad, K.; Rahman, A. Regional flood frequency analysis in eastern Australia: Bayesian GLS regression-based methods within fixed region and ROI framework—Quantile Regression vs. Parameter Regression Technique. J. Hydrol. 2012, 430–431, 142–161. [Google Scholar] [CrossRef]
Mediero, L.; Kjeldsen, T.; Macdonald, N.; Kohnova, S.; Merz, B.; Vorogushyn, S.; Wilson, D.; Alburquerque, T.; Blöschl, G.; Bogdanowicz, E. Identification of coherent flood regions across Europe by using the longest streamflow records. J. Hydrol. 2015, 528, 341–360. [Google Scholar] [CrossRef]
Pinos, J.; Quesada-Román, A. Flood Risk-Related Research Trends in Latin America and the Caribbean. Water 2021, 14, 10. [Google Scholar] [CrossRef]
Singh, A.K.; Chavan, S.R. An approach to regional flood frequency analysis for general peak discharge distribution datasets. J. Hydrol. 2025, 650, 132493. [Google Scholar] [CrossRef]
Wiltshire, S.E. Grouping basins for regional flood frequency analysis. Hydrol. Sci. J. 1985, 30, 151–159. [Google Scholar] [CrossRef][Green Version]
Li, Z.; Gao, S.; Chen, M.; Gourley, J.J.; Hong, Y. Spatiotemporal characteristics of US floods: Current status and forecast under a future warmer climate. Earth’s Future 2022, 10, e2022EF002700. [Google Scholar] [CrossRef]
Zrinji, Z.; Burn, D.H. Regional flood frequency with hierarchical region of influence. J. Water Resour. Plan. Manag. 1996, 122, 245–252. [Google Scholar] [CrossRef]
Basu, B.; Srinivas, V.V. Regional flood frequency analysis using kernel-based fuzzy clustering approach. Water Resour. Res. 2014, 50, 3295–3316. [Google Scholar] [CrossRef]
Ouarda, T.B.; Bâ, K.M.; Diaz-Delgado, C.; Cârsteanu, A.; Chokmani, K.; Gingras, H.; Quentin, E.; Trujillo, E.; Bobée, B. Intercomparison of regional flood frequency estimation methods at ungauged sites for a Mexican case study. J. Hydrol. 2008, 348, 40–58. [Google Scholar] [CrossRef]
Durocher, M.; Burn, D.H.; Mostofi Zadeh, S. A nationwide regional flood frequency analysis at ungauged sites using ROI/GLS with copulas and super regions. J. Hydrol. 2018, 567, 191–202. [Google Scholar] [CrossRef]
Burn, D.H. Evaluation of regional flood frequency analysis with a region of influence approach. Water Resour. Res. 1990, 26, 2257–2265. [Google Scholar] [CrossRef]
Oudin, L.; Andréassian, V.; Perrin, C.; Michel, C.; Le Moine, N. Spatial proximity, physical similarity, regression and ungaged catchments: A comparison of regionalization approaches based on 913 French catchments. Water Resour. Res. 2008, 44, 3. [Google Scholar] [CrossRef]
Merz, R.; Blöschl, G. Flood frequency regionalisation—Spatial proximity vs. catchment attributes. J. Hydrol. 2005, 302, 283–306. [Google Scholar] [CrossRef]
Eng, K.; Milly, P.; Tasker, G.D. Flood regionalization: A hybrid geographic and predictor-variable region-of-influence regression method. J. Hydrol. Eng. 2007, 12, 585–591. [Google Scholar] [CrossRef]
Tasker, G.D.; Hodge, S.A.; Barks, C.S. Region of Influence Regression for Estimating the 50-Year Flood At Ungaged Sites 1. JAWRA J. Am. Water Resour. Assoc. 1996, 32, 163–170. [Google Scholar] [CrossRef]
Zhang, Z.; Stadnyk, T.A. Investigation of Attributes for Identifying Homogeneous Flood Regions for Regional Flood Frequency Analysis in Canada. Water 2020, 12, 2503. [Google Scholar] [CrossRef]
Thomas, D.; Benson, M.A. Generalization of Streamflow Characteristics from Drainage-Basin Characteristics; US Government Printing Office: Washington, DC, USA, 1970.
Kuczera, G.; Franks, S. At-site flood frequency analysis. In Australian rainfall & runoff; Ball, J., Babister, M., Nathan, R., Weeks, B., Weinmann, E., Retallick, M., Testoni, I., Coombers, P., Roso, S., Ward, M., et al., Eds.; Commonwealth of Australia: Symonston, Australia, 2019; Chapter 2, Book 3. [Google Scholar]
Durocher, M.; Chebana, F.; Ouarda, T.B. On the prediction of extreme flood quantiles at ungauged locations with spatial copula. J. Hydrol. 2016, 533, 523–532. [Google Scholar] [CrossRef]
Chebana, F.; Charron, C.; Ouarda, T.B.; Martel, B. Regional frequency analysis at ungauged sites with the generalized additive model. J. Hydrometeorol. 2014, 15, 2418–2428. [Google Scholar] [CrossRef]
Wazneh, H.; Chebana, F.; Ouarda, T.B. Optimal depth-based regional frequency analysis. Hydrol. Earth Syst. Sci. 2013, 17, 2281–2296. [Google Scholar] [CrossRef]
Rahman, A.S.; Rahman, A. Application of Principal Component Analysis and Cluster Analysis in Regional Flood Frequency Analysis: A Case Study in New South Wales, Australia. Water 2020, 12, 781. [Google Scholar] [CrossRef]
Rahman, A.S.; Khan, Z.; Rahman, A. Application of independent component analysis in regional flood frequency analysis: Comparison between quantile regression and parameter regression techniques. J. Hydrol. 2020, 581, 124372. [Google Scholar] [CrossRef]
Rosbjerg, D.; Bloschl, G.; Burn, D.; Castellarin, A.; Croke, B.; Di Baldassarre, G.; Iacobellis, V.; Kjeldsen, T.R.; Kuczera, G.; Merz, R. Prediction of floods in ungauged basins. In Runoff Prediction in Ungauged Basins: Synthesis Across Processes, Places and Scales; Cambridge University Press: Cambridge, UK, 2013; pp. 189–225. [Google Scholar]

Figure 1. Spatial distribution of 201selected stations in southeast Australia.

Figure 2. Boxplots of predictors and L-moments for the assumed regions.

Figure 3. Comparison of H_i-statistics (median) among different candidate groups adopting fixed-region and ROI approaches.

Figure 4. Pockets of identified homogeneous regions (red colour) for KNN10.

Figure 5. Boxplots of H₁-statistics for different regions (ROI approach). Red line refers threshold value for H₁ = 1.00.

Figure 6. Spatial distribution of H₁-statistics for KNN20 (ROI).

Figure 7. Comparison of absolute median relative error (AMRE) among the assumed regions, with H₁-statistics.

Figure 8. Scatter plots showing the association between H₁- and AMRE values for the assumed fixed regions. Each dot refers the coordinates of H₁- and AMRE-values. Red colors indicate the p-values and blue colors stand for fitted regression line with R²-values.

Figure 9. Boxplots of relative error of the quantile estimates for the assumed fixed regions. The red lines indicate the estimated RE-values are set at zero (0), which means the predicted and observed quantile estimates are equal. Each dot refers to the RE-values for respective regions.

Figure 10. Boxplots of relative error of the quantile estimates for the assumed regions adopting the ROI approach via KNNs. The red lines indicate the estimated RE-values are set at zero (0), which means the predicted and observed quantile estimates are equal. Each dot refers to the RE-values for respective regions (KNN).

Figure 11. Three-dimensional plots of absolute relative error and H₁-statistics among the assumed regions for randomly selected stations.

Figure 12. Spatial distribution of AMRE for Q₂₀ (KNN20).

Figure 13. Comparison of H₁-statistics and ARE values for Q₂₀ (KNN50, KNN100, and KNN150).

Figure 14. Scatter plots of observed and predicted quantiles in the assumed regions via the fixed-region approach. Each blue dot refers the coordinates of observed and predicted values of quantile estimates.

Figure 15. Scatter plots of observed and predicted quantiles in the assumed regions via the ROI approach. Each blue dot refers the coordinates of observed and predicted values of quantile estimates.

Table 1. Summary of descriptive statistics of the catchment characteristics (n = 201).

Station Characteristics	Minimum	Maximum	Median	Mean	Standard Deviation
Catchment area (AREA, km²)	3.00	1010.00	261.00	333.99	262.40
Rainfall intensity (I₆₂) (mm/h)	24.60	87.30	37.30	39.16	10.07
Mean annual rainfall (MAR) (mm)	484.39	1953.23	891.64	962.26	314.47
Shape factor (SF)	0.26	1.63	0.78	0.78	0.21
Mean annual evapotranspiration (MAE) (mm)	925.90	1543.30	1076.70	1117.96	129.31
Stream density (SDEN) (/km)	0.52	5.47	1.69	2.10	1.06
Mainstream slope (S1085) (m/km)	0.80	69.90	9.50	13.19	11.67
Forest (FOREST) (fraction)	0.00	1.00	0.59	0.55	0.34

Table 2. Summary of region formation (with notation), D_i, H_i-, and Z-statistics for the assumed regions adopting fixed-region and ROI approaches.

Regionalization Approach	Region Notation	Number of Stations (n)	D_i-Values (≥3.00)		Z-Statistics
Regionalization Approach	Region Notation	Number of Stations (n)	Lowest	Highest	GLO	GEV	GNO	PE3	GPA
Fixed-region approach	A	201	3.26	6.43	16.55	12.49	7.27	−1.77	−0.04
	B1	88	3.05	5.43	9.82	7.73	4.06	−2.28	0.65
	B2	113	3.09	5.04	14.15	10.38	6.65	0.16	−0.49
	C1	106	3.34	5.36	14.41	11.12	7.11	0.15	1.12
	C2	95	3.05	7.18	9.69	7.08	3.53	−2.61	−1.09
Region-of-influence (ROI) approach	KNN10	10	-	-	3.93	3.00	1.79	−0.34	0.07
	KNN20	20	-	-	5.27	4.19	2.33	−0.75	0.18
	KNN50	50	3.07	5.53	8.29	6.38	3.64	−1.03	0.35
	KNN100	100	3.14	5.40	11.55	8.73	4.90	−1.52	−0.76
	KNN150	150	3.62	7.70	14.37	10.72	6.27	−1.46	−0.47
	KNN200	200	3.26	6.43	16.82	12.70	7.42	−1.74	0.01

Table 3. Regression coefficients associate with the predictor variables for Q₅₀.

Region (No. of Sites)	Regression Coefficients									Adj-R²
	β₀	β₁	β₂	β₃	β₄	β₅	β₆	β₇	β₈
	Cons	AREA (km²)	I₆₂ (mm/h)	MAR (mm)	SF	MAE (mm)	SDEN (/km)	S1085 (m/km)	FOREST (Fraction)
A (201)	−4.05	0.70	2.90	−0.91	−0.28	0.94	0.36	−0.01	−0.01	0.68
B1 (88)	−1.24	0.73	3.10	−0.91	−0.55	−0.13	0.40	0.07	−0.03	0.62
B2 (113)	−1.67	0.63	2.32	−0.69	0.08	0.35	0.08	−0.09	0.06	0.57
C1 (106)	−5.00	0.73	3.30	−0.87	−0.40	1.00	0.08	0.07	0.00	0.70
C2 (95)	−6.39	0.78	1.07	−0.51	−0.16	2.15	0.57	0.02	−0.07	0.71
KNN20 (20)	−4.19	0.81	2.07	−0.88	−0.43	1.65	0.19	0.10	−0.12	0.81
KNN50 (50)	−5.99	0.77	1.89	−0.83	−0.10	1.76	0.25	0.01	−0.05	0.66
KNN100 (100)	−2.36	0.62	3.33	−1.18	0.16	0.39	0.50	−0.10	0.01	0.60
KNN150 (150)	−4.94	0.66	3.33	−0.94	−0.18	1.12	0.40	−0.10	0.01	0.60
KNN200 (200)	−4.05	0.70	2.90	−0.91	−0.28	0.94	0.36	−0.01	−0.01	0.68

Table 4. Summary of evaluation criteria for the assumed regions adopting the fixed-region ROI method.

Evaluation Criteria	Quantiles	Fixed Region					Region-of-Influence (ROI) Approach					Overall Statistics
Evaluation Criteria	Quantiles	A	B1	B2	C1	C2	KNN20	KNN50	KNN100	KNN150	KNN200	Statistics	FR	ROI
R²	Q₂	0.72	0.73	0.71	0.80	0.69	0.84	0.74	0.69	0.70	0.72	MIN	0.53	0.58
	Q₅	0.73	0.73	0.68	0.79	0.73	0.84	0.71	0.66	0.68	0.73	MAX	0.80	0.84
	Q₁₀	0.72	0.70	0.64	0.77	0.73	0.83	0.69	0.63	0.66	0.72	Mean	0.69	0.70
	Q₂₀	0.70	0.67	0.61	0.74	0.72	0.82	0.68	0.62	0.63	0.70	Median	0.70	0.69
	Q₅₀	0.68	0.62	0.57	0.70	0.71	0.81	0.66	0.60	0.60	0.68	SD	0.06	0.08
	Q₁₀₀	0.66	0.59	0.53	0.67	0.70	0.79	0.64	0.59	0.58	0.66	Range	0.27	0.26
AMRE	Q₂	39.49	47.62	32.74	38.58	29.50	43.17	32.53	36.18	37.47	39.49	MIN	29.50	32.53
	Q₅	37.80	39.90	37.38	38.84	31.93	40.21	34.77	34.58	38.50	37.80	MAX	56.66	52.73
	Q₁₀	36.98	41.39	37.94	39.78	33.34	39.79	37.69	36.07	38.23	36.98	Mean	40.72	40.85
	Q₂₀	41.25	40.93	38.80	47.00	33.81	42.30	38.72	38.79	42.12	41.25	Median	39.72	40.00
	Q₅₀	43.04	46.93	39.66	53.13	38.48	50.60	43.38	42.10	45.41	43.04	SD	6.06	4.63
	Q₁₀₀	45.34	48.12	40.51	56.66	44.69	52.73	45.54	44.51	46.08	45.34	Range	27.16	20.20
MSE	Q₂	2515	4833	613	3189	1012	4926	2633	2679	2476	2515	MIN	613	2476
	Q₅	13,167	27,029	3836	18,573	4695	34,690	17,671	16,733	13,240	13,167	MAX	1,576,140	3,201,044
	Q₁₀	36,475	78,942	9546	55,357	12,713	113,209	55,254	49,666	37,089	36,475	Mean	208,866	353,981
	Q₂₀	94,459	209,806	19,797	147,944	35,210	333,229	150,509	130,180	95,555	94,459	Median	40,902	95,007
	Q₅₀	309,517	686,275	45,329	480,016	136,189	1,247,541	490,891	410,848	306,193	309,517	SD	367,830	641,000
	Q₁₀₀	722,328	1,576,140	80,856	1,087,259	362,360	3,201,044	1,109,517	915,078	700,116	722,328	Range	1,575,528	3,198,568
BIAS	Q₂	−9.53	−12.34	−3.79	−9.45	−5.21	1.68	−6.33	−7.99	−8.34	−9.53	MIN	−233.05	−169.16
	Q₅	−20.05	−24.10	−10.20	−18.50	−11.36	4.82	−10.69	−13.63	−14.22	−20.05	MAX	−3.79	93.24
	Q₁₀	−33.25	−40.63	−17.61	−31.20	−20.14	9.93	−15.48	−20.44	−21.66	−33.25	Mean	−58.19	−29.25
	Q₂₀	−54.95	−69.81	−27.61	−53.63	−35.31	20.36	−23.68	−31.83	−34.90	−54.95	Median	−34.28	−20.25
	Q₅₀	−105.43	−140.90	−46.05	−108.35	−71.63	49.83	−44.67	−59.46	−68.51	−105.43	SD	58.94	50.09
	Q₁₀₀	−169.16	−233.05	−65.26	−179.05	−118.10	93.24	−73.51	−95.85	−113.74	−169.16	Range	229.26	262.40
RBIAS	Q₂	22.24	29.41	16.83	21.77	16.49	31.10	24.71	24.54	27.98	22.24	MIN	14.63	21.41
	Q₅	21.41	27.60	19.67	23.44	14.63	30.88	26.42	25.33	26.55	21.41	MAX	56.20	68.05
	Q₁₀	23.55	31.18	23.21	27.71	16.78	35.11	30.00	28.19	28.37	23.55	Mean	29.08	33.75
	Q₂₀	26.91	36.90	27.06	33.28	20.07	41.96	34.41	31.87	31.34	26.91	Median	27.33	31.22
	Q₅₀	32.76	46.93	32.59	42.18	25.65	54.96	41.15	37.58	36.48	32.76	SD	10.24	10.18
	Q₁₀₀	38.13	56.20	37.15	50.01	30.66	68.05	46.87	42.45	41.16	38.13	Range	41.57	46.64
RRMSE	Q₂	0.15	0.13	0.10	0.12	0.11	0.03	0.10	0.13	0.13	0.15	MIN	0.10	0.03
	Q₅	0.13	0.10	0.11	0.10	0.10	0.03	0.07	0.09	0.09	0.13	MAX	0.21	0.21
	Q₁₀	0.13	0.10	0.13	0.10	0.10	0.04	0.06	0.08	0.09	0.13	Mean	0.14	0.10
	Q₂₀	0.15	0.12	0.15	0.12	0.12	0.05	0.06	0.09	0.09	0.15	Median	0.13	0.09
	Q₅₀	0.18	0.14	0.17	0.15	0.15	0.08	0.08	0.10	0.12	0.18	SD	0.03	0.04
	Q₁₀₀	0.21	0.16	0.20	0.19	0.18	0.12	0.09	0.12	0.14	0.21	Range	0.11	0.18
RMSNE	Q₂	0.93	1.10	0.84	0.89	0.72	1.34	1.17	1.05	1.11	0.93	MIN	0.70	0.93
	Q₅	0.95	1.15	0.98	1.00	0.70	1.40	1.31	1.16	1.07	0.95	MAX	2.25	2.85
	Q₁₀	1.03	1.30	1.08	1.13	0.81	1.52	1.46	1.29	1.13	1.03	Mean	1.20	1.43
	Q₂₀	1.13	1.51	1.18	1.29	0.94	1.75	1.63	1.42	1.22	1.13	Median	1.14	1.35
	Q₅₀	1.29	1.89	1.32	1.52	1.14	2.26	1.87	1.61	1.36	1.29	SD	0.34	0.42
	Q₁₀₀	1.42	2.25	1.43	1.72	1.31	2.85	2.07	1.76	1.47	1.42	Range	1.55	1.92

Standard deviation is used in the table to understand the variation of estimated evaluation criteria.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Ahmed, A.; Morshed, M.A.; Mim, S.T.; Rafi, R.S.M.H.; Khan, Z.; Maity, R.; Rahman, A. Impacts of the Degree of Heterogeneity on Design Flood Estimates: Region of Influence vs. Fixed Region Approaches. Water 2025, 17, 2765. https://doi.org/10.3390/w17182765

AMA Style

Ahmed A, Morshed MA, Mim ST, Rafi RSMH, Khan Z, Maity R, Rahman A. Impacts of the Degree of Heterogeneity on Design Flood Estimates: Region of Influence vs. Fixed Region Approaches. Water. 2025; 17(18):2765. https://doi.org/10.3390/w17182765

Chicago/Turabian Style

Ahmed, Ali, Mohammad A. Morshed, Sadia T. Mim, Ridwan S. M. H. Rafi, Zaved Khan, Rajib Maity, and Ataur Rahman. 2025. "Impacts of the Degree of Heterogeneity on Design Flood Estimates: Region of Influence vs. Fixed Region Approaches" Water 17, no. 18: 2765. https://doi.org/10.3390/w17182765

APA Style

Ahmed, A., Morshed, M. A., Mim, S. T., Rafi, R. S. M. H., Khan, Z., Maity, R., & Rahman, A. (2025). Impacts of the Degree of Heterogeneity on Design Flood Estimates: Region of Influence vs. Fixed Region Approaches. Water, 17(18), 2765. https://doi.org/10.3390/w17182765

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Impacts of the Degree of Heterogeneity on Design Flood Estimates: Region of Influence vs. Fixed Region Approaches

Abstract

1. Introduction

2. Data and Methodology

2.1. Selection of Study Area

2.2. Formation of Regions

2.3. Evaluation of Homogeneity of the Selected Regions

2.4. Regional Estimation Model Development

2.5. Evaluation Criteria

3. Results

3.1. Exploratory Analysis

3.2. Discordancy, Heterogeneity, and Z-Statistics of the Assumed Regions

3.3. Discordancy and Comparison of Fixed Region and ROI Approaches in Identification of Homogeneous Regions

3.4. Comparison of Fixed-Region and ROI Approaches

3.4.1. Prediction Equation Development and Performance

3.4.2. Assessment of the Impact of Homogeneity on Prediction Accuracy

4. Discussion

5. Conclusions

Supplementary Materials

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI