Next Article in Journal
Characterisation of First Flush for Rainwater Harvesting Purposes in Buildings
Previous Article in Journal
Mangrove Habitat Health Assessment in the Sanya River: Multidimensional Analysis of Diatom Communities and Physicochemical Water Properties
Previous Article in Special Issue
Proposed Framework for Sustainable Flood Risk-Based Design, Construction and Rehabilitation of Culverts and Bridges Under Climate Change
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Regional Flood Frequency Analysis in Northeastern Bangladesh Using L-Moments for Peak Discharge Estimation at Various Return Periods in Ungauged Catchments

1
Department of Water Resources Engineering, Bangladesh University of Engineering and Technology, Dhaka 1000, Bangladesh
2
Department of Industrial and Production Engineering, Bangladesh University of Engineering and Technology, Dhaka 1000, Bangladesh
3
Institute of Water and Flood Management (IWFM), Bangladesh University of Engineering and Technology, Dhaka 1000, Bangladesh
*
Author to whom correspondence should be addressed.
Water 2025, 17(12), 1771; https://doi.org/10.3390/w17121771
Submission received: 30 April 2025 / Revised: 6 June 2025 / Accepted: 10 June 2025 / Published: 12 June 2025

Abstract

The Sylhet Division of Bangladesh, highly susceptible to monsoon flooding, requires effective flood risk management to reduce socio-economic losses. Flood frequency analysis is an essential aspect of flood risk management and plays a crucial role in designing hydraulic structures. This study applies regional flood frequency analysis (RFFA) using L-moments to identify homogeneous hydrological regions and estimate extreme flood quantiles. Records from 26 streamflow gauging stations were used, including streamflow data along with corresponding physiographic and climatic characteristic data, obtained from GIS analysis and ERA5 respectively. Most stations showed no significant monotonic trends, temporal correlations, or spatial dependence, supporting the assumptions of stationarity and independence necessary for reliable frequency analysis, which allowed the use of cluster analysis, discordancy measures, heterogeneity tests for regionalization, and goodness-of-fit tests to evaluate candidate distributions. The Generalized Logistic (GLO) distribution performed best, offering robust quantile estimates with narrow confidence intervals. Multiple Non-Linear Regression models, based on catchment area, elevation, and other parameters, reasonably predicted ungauged basin peak discharges (R2 = 0.61–0.87; RMSE = 438–2726 m3/s; MAPE = 41–74%) at different return periods, although uncertainty was higher for extreme events. Four homogeneous regions were identified, showing significant differences in hydrological behavior, with two regions yielding stable estimates and two exhibiting greater extreme variability.

1. Introduction

Floods are natural disasters characterized by their type, volume, and duration, with their frequency rising in recent decades, sparking growing concern among scientists and decision-makers [1]. These catastrophic events cause widespread environmental, agricultural, transportation, and infrastructure damage, resulting in fatalities and substantial economic losses [2,3,4,5]. Human activities, such as land use and climate change, have exacerbated flood risks by altering runoff responses in river catchments, intensifying flood vulnerabilities [6]. Factors such as basin size, slope, soil structure, and urbanization further contribute to the severity of floods, as impermeable surfaces and infrastructure disrupt natural flood mitigation processes [7,8]. With the increasing frequency of floods, especially under the impacts of climate change, effective flood frequency analysis has become crucial for managing these disasters [4].
The estimated flood quantiles (e.g., 50-year, 100-year discharge values) derived from frequency analysis are critical for practical engineering and flood risk management applications. These quantiles play a key role in the design of hydraulic structures like dams (typically using a 100-year return period), culverts (often designed for a 50-year flood), and bridges (commonly based on a 50- to 100-year flood) [9]. Specifically, they determine spillway capacity and dam height, ensuring that the infrastructure can safely convey extreme flows without overtopping or structural failure. In floodplain management, flood quantiles inform zoning regulations and the delineation of high-risk areas, thereby restricting development in flood-prone zones and guiding insurance and land-use planning. Furthermore, urban drainage systems—including stormwater networks, culverts, and detention basins—rely on estimates of more frequent events (e.g., 10-year or 25-year floods) to reduce the risk of urban flooding and maintain public safety during intense rainfall events. Accurate flood frequency estimation ensures a reliable understanding of flood behavior, helping to reduce risk and protect communities. However, accurately estimating return periods for rare geophysical events such as extreme floods remains a challenge [10].
To address this, regional flood frequency analysis (RFFA) has emerged as a widely used method, especially when data periods are unequal or records are limited or incomplete [9]. RFFA involves pooling data from multiple sites with similar characteristics, which improves flood rate estimates and compensates for sparse data at specific sites. The RFFA process includes identifying homogeneous regions, selecting appropriate regional frequency distributions, and transferring flood characteristics from gauged to ungauged catchments. In particular, the Probability-Weighted Moment (PWM)/Generalized Extreme Value (GEV) scheme provides more reliable estimates in homogeneous regions, enhancing flood predictions and enabling better risk management strategies [10]. Despite its advantages, RFFA must be applied cautiously, considering data uncertainties such as stationarity, correlation, and sampling variability [11].
The L-moments method is a prominent technique for regional flood frequency analysis (RFFA) that has been successfully applied globally in regions such as Canada [12], Norway [13], Iran [14], India [15], Korea [16], China [17], and Turkey [18]. Building on the Probability-Weighted Moment (PWM) method [19], L-moments allow a more direct interpretation of flood distributions in scale and shape. This method is particularly effective for estimating distributions with more than three parameters, offering more accurate results than single-location data. The two main techniques in RFA, Annual Maximum Series (AMS) and Peak-Over-Threshold (POT) are utilized to estimate flood magnitudes based on different flood event frequencies, with AMS being beneficial for more extended return periods [6]. L-moments significantly improve the reliability of flood discharge predictions, particularly in regions with limited data, making them essential for robust flood frequency analysis and risk management [5].
Although the L-moments method has become popular worldwide, its application in Bangladesh is limited in various studies. Various flood frequency distributions were compared using the L-moments method without applying discordancy and heterogeneity tests, cluster analysis, or Monte Carlo simulation to estimate quantiles [20]. The Log Pearson Type-3 (LP3) distribution was formulated as the best suited to flood frequency analysis of rivers in Bangladesh [21], outperforming other distributions such as Log-Normal and Extreme Value Type-1 (EV1). Based on the characteristics of hydrographs, a flood index was established for the Haor area [22]. Both Powell and Gumbel distributions were used for the Dudhkumar River, and it was observed that both these models provided a good fit, with predicted magnitudes of floods matching well with the observed magnitudes [23]. Several approaches for the Padma River were compared, and it was concluded that Gumbel and Stochastic methods yielded the best flooding estimates, especially for extended return periods [24]. The Hamiltonian Monte Carlo (HMC) method was used on the Generalized Extreme Value (GEV) distribution, illustrating the greater precision of HMC for extreme return periods compared to the Metropolis–Hastings algorithm [25]. Flash floods in the northeast Haor region were analyzed, and the GEV distribution was found to be best for predicting flash flood levels and establishing new flood risk thresholds to protect Boro crops [26]. While different approaches were adopted in these investigations, the L-moments approach was not frequently used, despite its reliability for flood frequency analysis. Our study focuses on the northeastern region of Bangladesh, a flood-prone area due to its hilly terrain, and Haors, which has large tectonic depressions that remain flooded during the monsoon but dry up post-monsoon. Flash floods occur when heavy rainfall in India’s Assam and Meghalaya regions causes floodwaters to quickly move into Bangladesh, affecting the Haor areas within hours [27]. This rapid inundation leaves farmers with little time to harvest their Boro crops, which cover nearly 80% of the Haor areas [28]. Flash floods have caused significant damage to crops, destroying up to 70% of them and leading to substantial economic losses [29]. The devastating floods of 2010 and 2016 affected large areas, with the 2010 flood alone damaging 152,000 ha and causing losses worth BDT 13.18 billion [26]. Despite existing flood control structures, such as embankments and rubber dams, the region faces serious flood risks [30]. The 2022 flash floods, which affected over 7.2 million people and caused extensive damage, further highlight the need for better regional flood prediction and management [31].
This underscores the importance of our study. This research fills a vital gap by creating and applying a physiographic–statistical framework that effectively advances classical flood frequency analysis (FFA) by combining L-moments-based regional frequency analysis and hierarchical clustering. This method is very effective for our study area since there are only 30 river discharge monitoring stations in the Sylhet Division, and data could be collected from 26 of them. Moreover, many of the discharge observation stations have short and unevenly spaced discharge records, making traditional moment-based methods less reliable. In contrast, L-moments are well-suited for such conditions, as they provide more robust and unbiased estimates even with limited or irregularly spaced data. Behind this methodological enhancement is a multi-step process starting with comprehensive data screening, including the Mann–Kendall test, Standard Normal Homogeneity Test (SNHT), autocorrelation, partial autocorrelation, and Moran’s I test, for ensuring stationarity, independence, and spatial autocorrelation of the annual peak discharge time series, followed by Ward’s hierarchical cluster algorithm based on both hydrological and physiographic watershed properties (drainage area, elevation, mean precipitation, and geographic coordinates) for delineating homogeneous regions, with cluster validity assessed using silhouette width analysis, gap statistics, and the elbow method. The proposed methodology enhances regional homogeneity testing via discordancy measures (Di statistic) for detecting anomalous stations and heterogeneity measures (H-statistics) derived from Monte Carlo simulations of synthetic regions from Kappa distributions. The selection of probability distribution is optimized by exhaustive goodness-of-fit testing by Z DIST statistics between observed and simulated L-kurtosis values of six competing distributions (Generalized Logistic (GLO), Generalized Extreme Value (GEV), Generalized Pareto (GPA), Generalized Normal (GNO), Pearson Type III (PE3), Wakeby (WAK)), and the chosen distribution is also substantiated by Root Mean Square Error (RMSE) analysis and 95% confidence limits based on Monte Carlo simulations. Besides the traditional index flood methods, the study presents several predictive Multiple Non-Linear Regression (MNLR) models for different return periods that incorporate watershed-scale geomorphological predictors (stream order, basin length, elevation, etc.) and climatic predictors in an attempt to enable accurate flood quantile estimation in ungauged basins, where the performance of the model is compared using R2 (Coefficient of Determination), RMSE (Root Mean Square Error), and MAPE (Mean Absolute Percentage Error) metrics. The framework’s demonstration application in the Sylhet Division has yielded precise design flood estimates with quantified uncertainty ranges, producing vital input to climate-resilient infrastructure planning. At the same time, its techniques, particularly in cluster validation, hybrid homogeneity testing, and non-linear predictor inclusion, provide a replicable template for flood hazard estimation in similar monsoonal contexts worldwide.

2. Materials and Methods

2.1. Study Area and Data Availability

The present study focuses on the Sylhet Division, located in the northeastern region of Bangladesh, comprising the four administrative districts of Sylhet, Sunamganj, Habiganj, and Moulvibazar. Covering a total area of 12,635.22 km2, Sylhet Division accounts for approximately 8.5% of the total area of Bangladesh, and having a population of roughly 11,034,863 people, it has demographic significance. Sylhet Division lies between 23°58′ W to 25°12′ W latitude and 90°56′ E to 92°30′ E longitude. It is bounded by Meghalaya to the north, Tripura and Brahmanbaria district to the south, Assam and Tripura to the east, and Netrakona and Kishoreganj districts to the west. The elevation in the Sylhet Division ranges from 0 m to a maximum of 1620 m, with an average elevation of 88 m [32]. Sylhet Division experiences a tropical monsoon climate, bordering on a humid subtropical climate in higher elevations. The pre-monsoon (March to May), monsoon (June to September), post-monsoon (October to November), and dry winter (December to February) are the primary seasons in this region [33,34]. The region is known for its intense rainfall; the recorded maximum annual rainfall is 4996.03 mm, and the minimum stands at 2041.92 mm, while the average annual temperature is 21.56 °C. From a climatic perspective, the area is classified as having subtropical humid conditions [35]. Sylhet Division is traversed by several major rivers, including the Surma, Meghna, Kushiyara, Jadukata, and Manu. Humid southwesterly flows from the Bay of Bengal are repelled by the mountainous areas of Assam, Meghalaya, and Tripura. Heavy rainfall over the Meghna basin is caused by orographic lifting and other processes [36,37,38,39]. In the past, the Sylhet region has been highly flood-prone, with significant events occurring in 1988, 1998, 2004, 2007, 2017, 2020, and 2022. The floods have varied in intensity, persistence, and socio-economic effects. For instance, the 2004 and 2007 floods caused agricultural losses of BDT 8285.80 million and BDT 3768.64 million, respectively [40]. The floods have also caused huge income and job loss, primarily impacting the rural agrarian community. For the current study, data on the daily discharge of 26 gauging stations and their geographic coordinates (latitude and longitude) were gathered from the Bangladesh Water Development Board (BWDB). Some key physiographic and climatic characteristics of the gauging stations, including Station ID, Station Name, geographic coordinates (latitude and longitude), data availability period, catchment area (km2), Mean Annual Precipitation (MAP in mm), and Mean Annual Temperature (MAT in °C) are summarized in Table 1. The physiographic characteristics of the gauging stations were extracted using Geographic Information System (GIS) software (ArcGIS Desktop 10.8). Mean Annual Temperature and Mean Annual Precipitation were obtained from monthly ERA5 precipitation and temperature data (ERA5 stands for “ECMWF Reanalysis 5th Generation,” produced by the European Centre for Medium-Range Weather Forecasts), which were extracted using Google Earth Engine (GEE). The dataset covers the period from 1940 to 2024. For each year, the monthly temperature and precipitation values were first averaged to obtain annual means. Then, the Mean Annual Temperature and Mean Annual Precipitation were calculated by averaging these yearly means over the entire 85-year period (1940–2024). The geographical positions of gauging stations utilized for the current study are shown in Figure 1. Figure 2 illustrates the data availability and missing years for each station, highlighting temporal gaps and periods of continuous record.

2.2. Statistical Methods

L-moments are a type of statistical moments based on linear combinations of ordered statistics [41]. Traditional moment-based methods tend to be unreliable, especially with extreme data [11], including difficulty in interpreting the shape information conveyed by higher-order moments (third and beyond), which has prompted the development of better alternatives like L-moments that are more precise and robust [41]. The first four L-moments are described as mean, scale, skewness, and kurtosis, which are linear combinations of the PWMs:
λ 1 = β 0
λ 2 = 2 β 1 β 0
λ 3 = 6 β 2 6 β 1 + β 0
λ 4 = 20 β 3 30 β 2 + 12 β 1 β 0
and L-moment ratios are defined by Hosking [42] expressed as L-coefficient of variation (L-CV): τ 2 = λ 2 λ 1 , L-skewness (L-CS): τ 3 =   λ 3 λ 2 and L-kurtosis (L-CK): τ 4 = λ 4 λ 2 . Probability-Weighted Moments (PWMs) are defined by the expression, β r = E [ x F x r ] , where F( x ) is the cumulative distribution function (CDF) of the random variable X , and β r is the r t h -order PWM, where X represents data in ascending order [19]. For a sample of size n with ordered data X 1 X 2 X n , the sample estimators of the first few PWMs are given by [43]
β 0 = 1 n j = 1 n X j
β 1 = 1 n j = 2 n j 1 n 1 X j
β 2 = 1 n j = 3 n j 1 j 2 n 1 n 2 X j
The statistic b r = 1 n i = 1 n i 1 i 2 i r n 1 n 2 n r x i is an unbiased estimator of β r [44]. The first four sample L-moments l 1 , l 2 , l 3 , and l 4 can be derived from the unbiased estimators of the sample Probability-Weighted Moments β r . Using these expressions, l 1 = b 0 , l 2 = 2 b 1 b 0 ,   l 3 = 6 b 2 6 b 1 + b 0 , and l 4 = 20 b 3 30 b 2 + 12 b 1 b 0 , and similarly, t = l 2 l 1 ,   t 3 = l 3 l 2 , and t 4 = l 4 l 2 are the sample L-moments ratios [45]. Regional frequency analysis (RFA) by L-moments consists of four steps: (1) initial data screening, which is the examination of data quality and stationarity; (2) identification of homogeneous regions by classifying sites having similar hydrological properties; (3) selection of a suitable frequency distribution on the basis of goodness-of-fit tests and reliability measures; and (4) estimation of regional flood quantiles using the L-moments approach [42]. Figure 3 depicts the flowchart representing the proposed work.

2.3. Initial Data Screening

An initial screening was performed to ensure the suitability of mean annual peak discharge data from northeast Bangladesh (Sylhet Division) for regional frequency analysis. Stationarity was checked using the nonparametric Mann–Kendall trend test [46,47], while homogeneity was assessed using the Standard Normal Homogeneity Test (SNHT). Serial independence was tested using lag-1 to lag-10 autocorrelation and partial autocorrelation tests, and spatial independence between stations was tested using Moran’s I [48]. For a region to be considered homogeneous, the Annual Maximum Series (AMS) from different stations should be spatially independent; high spatial cross-correlation between stations provides less additional regional information to the site under study than uncorrelated stations [49].

2.3.1. Mann–Kendall Test

The Mann–Kendall (MK) test is a popular nonparametric statistical method for detecting monotonic trends in time series data [50,51,52]. It assesses whether there is a statistically significant trend (either upward or downward) over time without assuming any specific distribution of the data; it also tests the null hypothesis of no trend against the alternative hypothesis of a monotonic trend [53]. For a time series x 1 , x 2 , , x n , the test statistic S is calculated by comparing each pair of values:
S = i = 1 n 1 j = i + 1 n s g n X j X i
where X i and X j are data values at times i and j (with j > i ) and n is the total number of observations. The sign function, s g n ( X j X i ) , is defined as
s g n X j X i = + 1 , i f   X j X i > 0 0 , i f   X j X i = 0 1 , i f   X j X i < 0
To determine the statistical significance of the trend, the variance of S , denoted V a r ( S ) , is computed as
V a r S = n n 1 2 n + 5 j = 1 m t j t j 1 2 t j + 5 18
where m is the number of tied groups and t j is the tied rank, each with t j tied points. Using the computed S and its variance, the standardized test statistic Z is determined as follows:
Z = S 1 V a r S , i f   S > 0 0 , i f   S = 0 S + 1 V a r S , i f   S < 0
A positive Z-value indicates an upward trend, and a negative Z-value indicates a downward trend. The statistical significance of the trend typically would be tested in terms of the p-value associated with it. A p-value of less than or equal to 0.05 is typically considered statistically significant, leading to the null hypothesis of no trend being rejected in favor of the alternative hypothesis of the presence of a trend. Also, Kendall’s Tau ( τ ) is simply described as a measure of the strength of the correlation of ranks in the dataset [54]. It has values ranging between −1 (perfect negative correlation) and +1 (perfect positive correlation), and its near-zero values reflect no significant association. To test the homogeneity of the time series prior to the detection of the trend, the Standard Normal Homogeneity Test (SNHT) was also employed. The SNHT is a statistical test for detecting changes in the mean value of a time series and is traditionally utilized in climatological and hydrological data quality assessments [55,56,57,58].

2.3.2. Autocorrelation and Partial Autocorrelation Tests

Autocorrelation is a measure of how much a given time series is similar to a lagged copy of itself from one time period to the next. Autocorrelation measures the correlation between observations with a specific time lag and is a fundamental method for identifying patterns, trends, or cycles in time series data [59,60]. Mathematically, the autocorrelation function (ACF) at lag k can be stated as
ρ k = Cov X t , X t k Var X t Var X t k
where Cov X t , X t k is the covariance between observations of the time series values at time t and t k , and Var X t and Var X t k are the variances at time t and t k , respectively (where k is the lag). The value of ρ k is normalized to lie in the range [−1, 1]. A positive autocorrelation indicates that large (or small) values are followed by values of the same kind, indicating trend or persistence. A negative autocorrelation indicates a reversal, wherein high values will be succeeded by low values and vice versa. A near-zero autocorrelation indicates weak or no serial dependence of observations at the specified lag.
For a stationary time series { X t } , the Partial Autocorrelation Function (PACF) [61,62] at lag k , denoted ϕ k k , is the last coefficient in the Yule–Walker equations for an autoregressive process of order k (AR( k )):
X t = ϕ k 1 X t 1 + ϕ k 2 X t 2 + + ϕ k k X t k + ϵ t  
where ϵ t is white noise.
The coefficients ϕ k j (for j = 1 , 2 , , k ) are obtained by solving the Yule–Walker system:
ρ 0 ρ 1 ρ k 1 ρ 1 ρ 0 ρ k 2 ρ k 1 ρ k 2 ρ 0 ϕ k 1 ϕ k 2 ϕ k k = ρ 1 ρ 2 ρ k
where ρ j is the autocorrelation at lag j and the solution yields ϕ k k , which is the partial autocorrelation at lag k .

2.4. Identification of Homogeneous Regions by Clustering

In order to define homogeneous regions for regional frequency analysis, the following site factors concerning climatic variables, basin characteristics, and location were utilized. They are longitude, latitude, drainage area, elevation, and Mean Annual Precipitation. These variables were selected as they are relevant to hydrological processes and are commonly used in comparable studies [42]. To obtain comparability and prevent analysis from being dominated by larger-scale variables, all the variables were standardized to the range 0–1. The technique employed to cluster the sites by the similarity of these features was Ward’s hierarchical clustering method [63]. The most appropriate number of clusters was found through silhouette widths [64], gap statistics [65], and the elbow method [66]. The utility of these clustering techniques and validation procedures has been shown in several hydrological applications; e.g., silhouette widths have been successfully applied in hierarchical clustering for the delineation of flood estimation zones [67], and cluster analysis in conjunction with distance-based validation procedures has helped evaluate the hydrological similarity of small watersheds [68].

2.5. Discordancy Measure

Discordancy is a statistical measure used under the regional frequency analysis (RFA) method to discern whether any individual site in a collective shows any meaningful divergence from the collective properties. It is used to detect abnormalities or potential inaccuracies within the dataset. A site would be considered discordant if its L-moment ratios, specifically L-CV, L-skewness, and L-kurtosis, show wide divergence from the mean values of the respective ratios for the rest of the sites within the defined region. After homogeneous areas have been determined, a discordancy measure is computed for every location in the specified region. Where a site is discordant with the region as a whole, the reassignment of such a site to a different region needs to be considered. The measure of discordancy for site i is
D i = N 3 u i u ¯ T A 1 u i u ¯
where D i is the discordancy value for site i, N is the number of sites in the region, and u i is the vector of L-moment ratios. For site i, u ¯ is the average vector of L-moment ratios for the group, and A is the sample covariance matrix (matrix of sums of squares and cross-products). A site will considered as discordant if its D i value exceeds a critical threshold D i > 3 [42].

2.6. Heterogeneity Measure

The H-statistic is the second measure for assessing regional homogeneity in the context of regional frequency analysis (RFA), providing the quantitative measure of site heterogeneity through a comparison of the variability of the observed sample L-moment ratios (e.g., L-CV, L-skewness, and L-kurtosis) with the variability in a theoretically homogeneous region. This comparison, made through Monte Carlo simulations, provides a statistically sound basis for determining if the observed differences among sites are within acceptable limits for regional analysis. There are three versions of the heterogeneity measure: H 1 : based on L-CV ( τ 2 ), H 2 : based on L-skewness ( τ 3 ), H 3 : based on L-kurtosis ( τ 4 ). Different H-statistic values are interpreted based on threshold criteria summarized in Table 2. Each uses a corresponding observed dispersion statistic, denoted as V 1 , V 2 , V 3 computed as weighted standard deviations of the respective L-moment ratios across sites, with weights proportional to the site record lengths.
V 1 = i = 1 N n i t i t R 2 i = 1 N n i 1 2
V 2 = i = 1 N n i t i t R 2 + t 3 i t 3 R 2 i = 1 N n i 1 2
V 3 = i = 1 N n i t 3 i t 3 R 2 + t 4 i t 4 R 2 i = 1 N n i 1 2
where N = number of sites, n i = record length at site i , and the sample L-moment ratios are t 2 i (L-CV), t 3 i (L-skewness), and t 4 i (L-kurtosis), The regional average L-moment ratios are t 2 R , t 3 R , and t 4 R . For each j { 1 , 2 , 3 } , the corresponding heterogeneity statistic H j is defined as
H j = V j μ V j σ V j
where μ V j = the mean of V j values obtained from simulated homogeneous regions, σ V j = the standard deviation of those values. The Kappa distribution generates simulations matching the average L-moment ratios of the region, ensuring a flexible representation of hydrological behavior without committing to a specific parametric form [42].

2.7. Selection of Suitable Frequency Distribution Based on Goodness-of-Fit Tests and Reliability Measures

After the homogeneity analysis of the study area, a suitable probability distribution is required for regional frequency analysis (RFA) to identify the best fit for regional frequency analysis and also ensure robust quantile estimates for each location and the regional growth curve; the list of candidate probability distributions for RFA includes Generalized Logistic (GLO), Generalized Extreme Value (GEV), Generalized Pareto (GPA), Generalized Normal (GNO), Pearson Type III (PE3), and Wakeby (WAK) [45]. The goodness-of-fit test is employed to identify the parent probability distribution that closely aligns with the weighted average regional values of L-skewness and L-kurtosis for a proposed group of sites [68]. This test helps validate the appropriateness of a fitted distribution by comparing it with observed regional L-moment statistics. Visualization is often performed by plotting the observed regional values of L-skewness and L-kurtosis and assessing how closely the L-kurtosis of a candidate distribution matches the regional L-kurtosis. The test is based on the Z DIST statistic, which follows an approximately standard normal distribution [43]. The goodness-of-fit statistic is calculated using the following equations:
Z D I S T = τ 4 D I S T τ ¯ 4 + β 4 σ 4
where “DIST” indicates the candidate distribution, τ ¯ 4 indicates the average L-kurtosis value from observed data, β 4 indicates bias (for regional average sample L-kurtosis), and τ 4 D I S T indicates the average L-kurtosis value for a fitted distribution computed from a simulation [43].
A candidate distribution is considered a good fit if Z DIST   1.64 corresponds to a confidence level of 90%. If multiple distributions satisfied the goodness-of-fit criteria, the one with the lowest RMSE (Root Mean Square Error) and the narrowest 95% error bound was selected for further analysis [69].

2.8. Quantile Estimation

Quantile estimation is accomplished using the index flood approach, in which it is supposed that all sites in a homogeneous region have the same-shaped frequency distributions but differ by a site-specific scale factor—typically the mean annual flood (index flood) [70]. The at-site flood quantiles are estimated using the regional quantiles derived from the quantile function of the best-fitted regional distribution, expressed by the equation
Q i F = l 1 i q F
where Q i F is the flood quantile at site i for a given return period (non-exceedance probability F ), l 1 i is the mean of the annual peak flow (APF) at site i , and q F is the regional dimensionless quantile obtained from the fitted regional growth curve [9]. The accuracy of estimated flood quantiles is assessed using Monte Carlo simulations for each homogeneous region [42].
For each homogeneous region, a region having the same number of stations, record length at each station, heterogeneity, and regional average L-moment ratios as the observed data were simulated. This procedure was repeated 1000 times to obtain 1000 simulated regions. For each simulation, errors in the simulated growth curve and quantiles were calculated, and then the bias, RMSE (Root Mean Square Error), and 95% error bounds were estimated [71].

2.9. Development of Multiple Non-Linear Regression Models for Predicting Peak Discharge for Different Return Periods Based on Climatic and Watershed Characteristics

To establish a predictive relationship between watershed characteristics and hydrological response parameters (e.g., peak discharge), Multiple Non-Linear Regression (MNLR) analysis was used in this research. Unlike ordinary linear regression, which is based on the assumption of linear relationships among variables, MNLR has the advantage of more flexible functional forms, e.g., power-law or exponential ones that are more consistent with the complexity of hydrologic systems [72].
The general form of the MNLR model used is
Y = a X 1 b 1 X 2 b 2 X n b n
where Y is the dependent variable (e.g., 100-year flood discharge, Q 100 ), X 1 , X 2 , , X n are the independent variables (e.g., catchment area, Mean Annual Precipitation, stream order, etc.), a is the regression constant, and b 1 , b 2 , , b n are the regression exponents for each predictor. To estimate the coefficients, the model was linearized using a natural logarithmic transformation:
l n Y = l n a + b 1 l n X 1 + b 2 l n X 2 + + b n l n X n
Standard Ordinary Least Squares (OLS) regression was then applied to the transformed equation [73]. The model’s predictive performance and goodness-of-fit were evaluated using statistical indicators, including the Coefficient of Determination ( R 2 ), Root Mean Square Error (RMSE), and Mean Absolute Percentage Error (MAPE) [74]. This approach follows established regional flood frequency analysis and hydrologic modeling methods, where physiographic, climatic, and morphometric watershed variables are commonly used to estimate peak discharge [75,76].

3. Results

3.1. Statistical Tests for Assessing Stationarity and Independence

The Mann–Kendall test and Standard Normal Homogeneity Test (SNHT) were performed on annual peak discharge values of 26 streamflow gauging stations to detect monotonic trends, assess homogeneity, and ascertain data stationarity. If there is a trend, the data are non-stationary and unsuitable for frequency analysis without correction. From the test, it was found that only 5 out of 26 gauge stations exhibited a statistically significant trend at the 95% confidence level (for the Mann–Kendall test, a trend is considered significant if the test statistic Z lies outside the range [−1.96, 1.96] and p-value < 0.05), suggesting non-stationarity for annual peak discharge over time. The remaining 21 stations showed no significant trend (p > 0.05), confirming the stationarity of the dataset and suggesting that frequency analysis can be applied directly. Six stations showed non-homogeneous behavior according to the SNHT (for the SNHT, inhomogeneity is detected if the test statistic exceeds 6.95 and p < 0.05). As most stations exhibit no significant trend, it can be assumed that there is no consistent monotonic trend in the streamflow data at the regional level and that the data can be treated as a stationary series. The lack of substantial trends suggests that annual peak discharge variations at most stations are likely due to short-term variability rather than long-term climatic or anthropogenic changes. The test results are presented in Table 3; marked values indicate statistically significant trends.
Both autocorrelation and Partial Autocorrelation Function tests were performed to determine how this year’s maximum annual discharge values related to the previous year’s. The test utilized lags of 1 to 10, and the y-axis represented the autocorrelation and partial autocorrelation coefficient while the x-axis represented lag. The 95% confidence level dashed horizontal lines indicate the limits beyond which autocorrelation and partial autocorrelation are significant at the statistical level. For the autocorrelation analysis (ACF) from lag 1 to lag 10, as illustrated in Figure 4, it was observed that most station data do not cross the critical bounds (represented by dashed horizontal lines), indicating no significant autocorrelation in the discharge data at those lags. However, stations such as SW158.1, SW192, and SW135 showed prominent vertical spikes from lag 1 to approximately lag 5, crossing the critical limits. This suggests significant short-term autocorrelation in those stations, implying that past discharge values have a measurable influence on immediate future values. Nonetheless, the insignificance of later lags indicates that this autocorrelation diminishes over time. Most of the autocorrelation values were within boundaries since the data had no significant autocorrelation. The partial autocorrelation analysis (PACF) from lag 1 to lag 10, as illustrated in Figure 5, revealed that the majority of stations remained within the critical bounds across most lags. Certain stations, such as SW135, SW157, SW192, and SW158.1, exhibited notable spikes from lag 1 to lag 2, with values crossing the critical limits, suggesting a significant direct correlation between consecutive years’ discharge values. Importantly, most stations showed minimal partial autocorrelation beyond lag 2, with coefficients generally remaining well within the confidence bounds for higher-order lags. This indicates that while some stations may exhibit direct year-to-year dependence, there is no significant direct relationship between discharge values separated by more than one year when intermediate effects are controlled for. The combined ACF and PACF results demonstrate that the insignificance of correlations at later lags indicates that any existing autocorrelation diminishes over time. Most autocorrelation and partial autocorrelation values were within the critical boundaries, suggesting that yearly flood values are highly independent from year to year across the majority of stations. This pattern reinforces the data independence assumptions underlying flood frequency analysis, justifying the application of standard statistical methods for extreme value analysis of annual maximum discharge data. Also, Moran’s I analysis indicated that cross-correlation between stations was not statistically significant at the 5% level, implying the data series are spatially independent.

3.2. Initial Grouping by Cluster Analysis

Ward’s hierarchical clustering method was applied to identify homogeneous zones for regional flood frequency analysis (RFFA) using physiographic and climatic parameters, such as latitude, longitude, elevation, catchment area, and Mean Annual Precipitation. A significant drawback of using flood-based statistics to define regions is that such regions may appear statistically homogeneous but may not be hydrologically meaningful or practical for RFFA [77]. Also, regions should not be determined solely based on physiographic characteristics, as these may overlook critical variations in hydrologic response [78]. It is reasonable to include attributes estimated from site measurements, and they should not be strongly correlated with flood values to avoid circular reasoning [42]. Instead, regionalization should use a combination of physical and climatic attributes that are measurable, stable, and not strongly correlated with flood magnitudes—such as catchment area, elevation, Mean Annual Precipitation, and flood seasonality indicators. This approach ensures that the regions are hydrologically meaningful and statistically sound, providing a robust foundation for applying the index flood method and estimating flood quantiles, particularly in data-scarce or ungauged basins.
Several validation techniques were used to identify the optimal number of clusters. The gap statistic and silhouette width suggested five clusters, while the elbow method suggested four as the most suitable number. Overall, the analysis supports the delineation of four to five homogeneous regions. While initial groupings based on these features are informative, the final classification of any region should be confirmed through statistical tests of discordancy and heterogeneity. The results of Ward’s clustering method, showing the optimal number of clusters, are illustrated in the dendrogram shown in Figure 6.
For the first group, with a critical value of 1.92 (Dcritical = 1.92), all the included station’s discordancy values are less than this. Notably, SW337 has the least discordancy at 0.17, whereas SW269 is closest to the critical value, with a discordancy of 1.86, indicating no discordant stations in this group. Likewise, the second group, with a Dcritical value of 2.14, shows that all stations in this group have discordancy values less than this threshold, with SW251 having the lowest discordancy at 0.34 and SW233A approaching the critical value at 1.81. For group 5, with a Dcritical value of 1.65, the discordancy values range from a minimum of 0.67 (SW280) to a maximum of 1.36 (SW264A). For smaller subgroups like group 3 and group 4, the test measures of discordancy are always 1.00, much less than the critical values of 3.00, indicating high agreement within each group. Discordancy and heterogeneity measures are illustrated in Table 4.
In the first group, heterogeneity values vary from 8.79 (H1) to 4.48 (H3), indicating significant differences in station streamflow characteristics. This result suggests that although the stations depict overall homogeneity with respect to discordancy, a moderate to high level of variation is observed in their streamflow characteristics, resulting in extreme differences among the stations. These sites span latitudes from 24.0834° to 25.1593°, longitudes from 91.2508° to 91.7538°, and elevations from 8 m (SW135A) to 31 m (SW131.5). Catchment areas range between 350.72 km2 (SW332) and 530 km2 (SW333A), annual precipitation ranges from 2322.55 mm (SW264A) to 4996.03 mm (SW341), and temperatures range between 18.06 °C (SW333) and 24.50 °C (SW135A).
Contrarily, the second group exhibits comparatively lower heterogeneity values, the highest being 4.53 (H1) to 2.58 (H3), pointing towards moderate variation. The stations within this group are more similar to each other than those in the first group. These sites lie between 24.0834° and 25.1624° latitude and between 91.3506° and 92.1729° longitude, with elevations from 21 m to 54 m, basin areas ranging from 726.41 km2 (SW326) to 2206.57 km2 (SW265), and annual rainfall ranging between 2521.37 mm (SW233A) and 4388.25 mm (SW333). Temperatures vary from 18.06 °C (SW333) to 24.50 °C (SW135A). The third group, composed of SW173 (24.8911° N, 92.1915° E) and SW175.5 (24.6303° N, 91.6813° E), with elevations of 59 m and 28 m, respectively, shows moderate heterogeneity with H2 = 4.30 and H1 = 2.06. This group’s catchments vary widely: 25,337.01 km2 for SW173 versus 33,855.2 km2 for SW175.5, with precipitation around 2253.36 mm (SW173) and 2360.13 mm (SW175.5) and temperatures of 20.89 °C and 21.71 °C, respectively. The fourth group exhibits excessive heterogeneity, with H-values as high as H3 = 8.78 and H1 = 6.63. These sites are spread across latitudes of 24.2930° to 24.5902° N and longitudes of 91.5465° to 92.1177° E, with basin areas of 166.96 km2 (SW138), 789.53 km2 (SW135), and 2292.71 km2 (SW201). Precipitation ranges from 2041.92 mm (SW201) to 2261.90 mm (SW138), while temperatures span 23.84 °C to 24.50 °C, indicating notable hydrological and climatic divergence. Lastly, the fifth group is distinguished by its extremely low heterogeneity, with values ranging from H1 = 1.10 to H2 = 0.08, signaling a highly homogeneous hydrological response among the stations. These sites are closely clustered between 24.0838° and 25.1301° N and between 91.3506° and 91.8489° E, with elevations from 21 m (SW264A) to 25 m (SW157). Catchment areas are generally modest, between 61.29 km2 (SW192) and 1142.76 km2 (SW157). Rainfall is also fairly uniform, ranging from 2199.38 mm (SW157) to 4474.48 mm (SW341), and temperature values are consistently between 24.12 °C and 24.55 °C.
In general, although most groups are homogeneous in terms of discordancy (Di < Dcritical), the measures of heterogeneity ( H 2 ) indicate a range of variability within the groups. There are groups with high heterogeneity (groups 1 and 4) and groups with minimal variation (group 5).

3.3. Formation of Homogeneous Regions by Discordancy and Heterogeneity Measures

As shown in Table 4, although the discordancy levels were mostly acceptable, the initially high heterogeneity among most groups required an intensive refinement process to achieve more homogeneous regional groupings of the initially clustered homogeneous groups; the final grouping resulted in four well-defined regions exhibiting acceptable homogeneity. This process involved a series of iterations, where the stations were regrouped based on heterogeneity checks and discordancy assessments. In each iteration, the groups were rearranged carefully, and the discordancy and heterogeneity values were recalculated to ensure all regions met the specified standards (i.e., Di < Dcritical and H 2 ), as shown in Table 5.
As shown in Table 5, region 1 consists of four stations: SW175.5, SW266, SW267, and SW269. These stations are geographically confined to a small region and exhibit similar hydrological characteristics. Region 2 comprises seven stations: SW131.5, SW138, SW158.1, SW192, SW332, SW233A, and SW333A. Although these stations are reasonably dispersed across the study area, they share similar runoff characteristics. Region 3 consists of eight stations: SW135, SW173, SW233, SW251, SW265, SW326, SW337, and SW333. This region encompasses a hydrologically heterogeneous region, including upstream and downstream basins, resulting in substantial elevation and basin area variation. Region 4 comprises six stations: SW67, SW157, SW280, SW341, SW135A, and SW264A. Lastly, SW201 could not be assigned to any group and was therefore omitted, as including SW201 in any region caused the discordancy (Di) and heterogeneity (H) values to exceed their acceptable thresholds, violating the grouping criteria (i.e., Di < Dcritical and H 2 ). The final arrangement, as outlined in the table, is the best categorization, in which there were no discordant stations, except for station SW333 of region 3, which slightly surpassed the critical discordancy (Di (2.26) > Dcritical (2.14)) value but passed the heterogeneity test. The location of all four homogeneous regions is shown in Figure 7.

3.4. Goodness-of-Fit Test and Selection of the Best Parent Distribution, Along with the Derivation of the Growth Curve for Each Homogeneous Region

We created L-moment diagrams to assess the probability distributions in the study area. Figure 8 presents a comparison between the observed data and theoretical distribution patterns. However, finding a suitable probability distribution to fit most of the observed regional data is challenging.
The variation in hydrological conditions among sites results in wide-ranging variation in L-moment characteristic values. The single distribution, thus, does not always yield the optimum possible fit for every site. Whereas some datasets perform reasonably well when fitted against the Generalized Extreme Value (GEV) or Pearson Type III distributions, others exhibit extreme disagreement. This suggests that a flexible or site-specific approach to distribution fitting may be necessary to represent the underlying hydrological processes in the region accurately.
To identify suitable parent distributions for each homogeneous region, the good-ness-of-fit statistic Z DIST was determined for several candidate distributions: Generalized Logistic (GLO), Generalized Extreme Value (GEV), Generalized Normal (GNO), Generalized Pareto (GPA), and Pearson Type III (PE3). In cases where multiple Z DIST values fell within the acceptable range ( Z DIST   1.64 ) for a region, a simulation-based approach was adopted to select the most robust distribution [42]. Any distribution with a calculated value not exceeding this threshold was considered a potential candidate, as denoted in Table 6. Table 7 shows the estimates of the regional parameters for L-moments for the suitable probability distribution.
For regions 1, 2, and 4, more than one distribution passed the critical value in one region. To further narrow down the selection and determine the best distribution, we compared the Root Mean Square Error (RMSE) values and 95% confidence intervals of the regional growth curves for each qualifying distribution. For example, in region 1, the GLO, GEV, GNO, and PE3 distributions all passed the criteria set by the critical value test. Among these distributions, GLO showed the narrowest 95% confidence intervals and lowest RMSE value, particularly for the 50- and 100-year return periods, indicating higher accuracy and less uncertainty in extreme value estimation. Therefore, GLO was selected as the most appropriate distribution for region 1. The same procedure was followed for regions 2, 3, and 4.
Although more than one distribution passed the initial critical value threshold, GLO always had the lowest RMSE or the smallest and most consistent 95% error bounds for all significant return periods. For this reason, GLO was selected as the best fit for all four regions. Moreover, regional growth curves with 95% error bounds for GLO distribution are given in Figure 9. As presented in Table 8, quantile estimates across 50-, 100-, 200-, and 1000-year return periods (f = 0.98, 0.99, 0.995, 0.999) reveal significant regional variations in precision and uncertainty. Region 1 demonstrates exceptional stability, with consistently low RMSEs (0.0243 for 50-year, 0.0323 for 100-year, 0.0424 for 200-year, and 0.0740 for 1000-year) and narrow confidence bounds (1.1337–1.2077 for 50-year, 1.1507–1.2426 for 100-year, 1.1653–1.2774 for 200-year, and 1.1884–1.3575 for 1000-year), indicating highly reliable predictions across all return periods. Region 2 shows dramatically increasing uncertainty, with RMSEs surging from 0.4621 (50-year) to 0.7097 (100-year), 1.0853 (200-year), and 2.7769 (1000-year), accompanied by rapidly expanding bounds (2.6369–3.9559 to 3.0998–4.8887 to 3.6054–5.9676 to 4.9954–9.1964), highlighting difficulties in extreme value estimation.
Region 3 maintains moderate precision throughout, with RMSEs ranging from 0.1204 to 0.2068 and bounds (1.2560–1.5997 to 1.2679–1.6481 to 1.2759–1.6885 to 1.2813–1.7572) that suggest reasonable reliability even for rare events. Region 4 exhibits concerning variability, particularly for more extended return periods, with RMSEs climbing from 0.3499 (50-year) to 0.5461 (100-year), 0.8468 (200-year), and 2.1913 (1000-year) and bounds widening significantly (2.5513–3.5651 to 2.9598–4.3449 to 3.4013–5.2387 to 4.5394–7.8847). These patterns underscore critical regional differences in extreme value behavior, with regions 1 and 3 offering relatively stable estimates suitable for precise risk assessment. In contrast, region 2 and region 4 require cautious interpretation and potentially more conservative approaches in engineering and planning applications due to their pronounced uncertainty, especially for 200-year and 1000-year events. Regional flood quantile estimates with associated uncertainty bands across different return periods are plotted in Figure 10.
The log-log return period plot in Figure 10 shows that the regional q(F) curves both intersect and diverge at different return periods, highlighting important hydrological differences among the regions. At lower return periods (around 2 to 5 years), region 3 initially exhibits higher quantile values compared to region 2 and region 4, indicating a greater frequency of moderate events; however, as the return period increases, the curves for regions 2 and 4 rise more steeply and overtake region 3, suggesting a higher magnitude of extreme events in those regions. This crossing pattern implies that the severity of events relative to other regions shifts depending on the frequency of occurrence. Beyond approximately 10 to 20 years, the curves begin to diverge significantly—regions 2 and 4 continue to rise rapidly, while regions 1 and 3 remain relatively flat, indicating lower susceptibility to rare, extreme events. This divergence at higher return periods underscores the need for region-specific planning, as some areas face far greater risks of severe hydrological extremes than others.

3.5. Regional Flood Frequency Relationship for Ungauged Catchments

To estimate the T-year return period flood at a site, the mean annual peak flow must first be determined. However, due to the lack of observed flow data, this site-specific mean cannot be calculated for ungauged catchments. In such cases, developing a relationship between the mean annual peak flows of gauged catchments in the region and their corresponding physiographic and climatic characteristics becomes essential. This regional relationship can then estimate ungauged sites’ mean annual peak flow at different return periods.
At-site annual maximum discharge values for various return periods were derived by multiplying the regional growth curve values by the mean annual peak discharge specific to each site. Table 9 provides the estimated maximum flood discharges corresponding to each gauging station, reflecting the expected discharge levels for different return intervals. Key geomorphological parameters were extracted using Geographic Information System (GIS) software (ArcGIS Desktop 10.8) and presented in Table 10. These include Total Stream Length and Number of Streams, representing the cumulative length and count of stream segments within the watershed. Additional attributes such as Perimeter, Main Channel Length, Maximum Basin Length, and Maximum Stream Order describe the watershed boundary, principal flow path, longest dimension, and drainage network complexity—factors that significantly influence hydrological response and flood behavior.
Equations for the prediction of peak flood discharge for various return periods were developed using the Multiple Non-Linear Regression (MNLR) model, based on watershed and climatic variables, as shown in Table 11. These equations incorporate key parameters such as area (km2) (A), elevation (m) (E), Total Stream Length (km) (TSL), Number of Streams (NS), Perimeter (km) (P), Main Channel Length (km) (MCL), Maximum Basin Length (km) (MBL), Maximum Stream Order (MSO), Mean Annual Precipitation (mm) (MAP), and Mean Annual Temperature (°C) (MAT) for estimating peak discharge.
The performance of the models, as indicated by the Coefficient of Determination (R2), Root Mean Square Error (RMSE), and Mean Absolute Percentage Error (MAPE), demonstrates strong predictive accuracy, as illustrated in Figure 11, particularly for lower return periods. Predictive reliability decreases with increasing return periods, reflecting growing uncertainty in extreme flood estimation.
It should be noted that the equations are only applicable to the four homogeneous regions created earlier and are intended for use in both gauged and ungauged catchments within those regions. As the return period increases, predictive performance weakens, evidenced by the decline in R2 values and the increase in both RMSE and MAPE (Table 11). The pattern shows the growing uncertainty and complexity in quantifying long-return-period extreme flood events. Lower values of R2 and larger errors at longer return periods highlight model limitations in reliability for extreme floods and imply large variance. Hence, these uncertainties and associated errors must be cautiously quantified while utilizing the models, especially for long-return-period events.
Figure 12 illustrates the average impact of various physiographic and climatic variables on peak discharge, based on the absolute values of exponents in Multiple Non-Linear Regression equations. Among the variables considered, Mean Annual Precipitation and Mean Annual Temperature exhibit the largest effect values, 2.4 and 2.0, respectively, indicating their dominant control on peak discharge. Elevation (1.6), Number of Streams (1.1), and Perimeter (0.72) also reflect comparatively high control. Geometric parameters such as area (0.1), Maximum Stream Order (0.33), and Main Channel Length (0.36) exhibit lower values of influence, reflecting comparatively less impact on peak flow generation. The trend reveals the significant contribution of climatic and topographic factors over the simple morphometric parameters in the formation of flood peaks.

4. Discussion

This study presents a comprehensive framework for regional flood frequency analysis (RFFA) using L-moments in northeastern Bangladesh, addressing critical gaps in existing methodologies by integrating rigorous data screening via the Mann–Kendall test, the Standard Normal Homogeneity Test, autocorrelation, partial autocorrelation, and Moran’s I test to ensure stationarity and spatial independence—an essential step often overlooked [20,21,23,24,25]. This contrasts with earlier assumptions that flood flows strictly follow specific probability distributions, and the current study highlights the importance of flexible, region-specific approaches in flood frequency analysis. In many cases, the scale and shape parameters of the distributions were estimated using the method of moments. A key finding from several studies conducted in Bangladesh is the superiority of the Generalized Logistic (GLO) distribution over traditionally used distributions such as Log-Normal (both two-parameter (LN2) and three-parameter (LN3)), Extreme Value Type-I (EV1), and Log Pearson Type III (LP3) distributions [21]; the Gumbel and Powell method [23]; and Gumbel, Powell, and Ven Te Chow and Stochastic methods [24]. The GLO distribution demonstrated narrower confidence intervals and lower RMSE, providing more reliable quantile estimates across all return periods. The study enhances the initial regional grouping by applying Ward’s hierarchical clustering algorithm, with cluster validity evaluated using silhouette width analysis, gap statistics, and the elbow method. The study strengthens the analysis by incorporating both discordancy measures (Di statistic) and heterogeneity measures (H-statistics), which were not employed in the previous approach in Bangladesh [20]. For ungauged basins, Multiple Non-Linear Regression (MNLR) models incorporating geomorphologic and climatic predictors—previously unexplored in the context of Bangladesh—achieved strong performance with R2 = 0.61–0.87 for 2- to 100-year floods. Although uncertainty increased for extreme events (MAPE = 41–74%, RMSE up to 2726 m3/s for 1000-year floods), the results are consistent with findings from similar studies conducted globally [12]. In contrast, some previous studies outside Bangladesh [16] used only a single predictive variable (area), whereas this study employs 10 geomorphologic and climatic variables to improve prediction accuracy.

5. Conclusions

The present study successfully identified four hydrologically homogeneous regions within the Sylhet Division of Bangladesh using a robust framework integrating L-moments, hierarchical clustering, and physiographic–statistical analysis. This approach addresses critical gaps in regional flood frequency analysis (RFFA) for data-scarce, monsoon-dominated regions, offering scalable solutions for flood risk management. The GLO distribution emerged as the most suitable model across all regions, demonstrating narrow confidence intervals and low Root Mean Square Error (RMSE) values. This contrasts with earlier studies in Bangladesh that favored Log Pearson Type III or Gumbel distributions, highlighting the importance of region-specific methodologies. The superiority of GLO may stem from its flexibility in capturing the tail behavior of flood data in monsoonal climates, where extreme rainfall and rapid runoff responses dominate. The L-moments approach, being less sensitive to outliers than conventional moments, likely enhanced the robustness of distribution selection, particularly in regions with limited data. Multiple Non-Linear Regression models could predict peak discharges in ungauged catchments from geomorphological and climatic predictors. Still, predictive uncertainty increased for extreme events. Future work should incorporate climate change projections to assess non-stationarity, as rising temperatures and intensifying monsoons could alter flood regimes. Expanding the framework to include socio-economic vulnerability metrics could enhance its utility for holistic flood risk management. Overall, this study establishes a replicable L-moments and clustering framework for flood frequency analysis tailored to the complex hydro-climatic setting of northeastern Bangladesh. By identifying region-specific flood behaviors and providing tools for ungauged basins, the findings directly support climate-resilient infrastructure planning and agricultural protection in one of Bangladesh’s most flood-prone regions, and the integration of advanced statistical validation and geospatial techniques sets a precedent for similar monsoonal basins globally, where balancing data limitations with hydrological complexity remains a critical challenge.

Author Contributions

Conceptualization, S.D. (Sujoy Dey), S.M.T.Z., and S.D. (Saptaporna Dey); methodology, S.D. (Saptaporna Dey), S.M.T.Z., and S.D. (Sujoy Dey); software, S.D. (Sujoy Dey), S.M.T.Z., and S.D. (Saptaporna Dey); data curation, S.M.T.Z.; formal analysis, S.D. (Saptaporna Dey); validation, S.M.T.Z. and S.D. (Saptaporna Dey); resources, A.K.M.S.I.; investigation, S.D. (Saptaporna Dey); writing—original draft preparation, S.D. (Saptaporna Dey) and S.M.T.Z.; writing—review and editing, S.D. (Sujoy Dey), K.M.A.R., and A.K.M.S.I.; supervision, A.K.M.S.I. and K.M.A.R.; data collection and feedback, K.M.A.R. and A.K.M.S.I.; visualization, S.M.T.Z. and S.D. (Saptaporna Dey). All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The data presented in this study are available on request from the corresponding author. The data have been collected from the governmental organization Bangladesh Water Development Board, which does not allow data redistribution but is available on purchase.

Acknowledgments

We express our sincere gratitude to Sarder Udoy Raihan, Executive Engineer (Civil) at the Flood Forecasting and Warning Centre (FFWC), Bangladesh Water Development Board (BWDB), Dhaka, for his valuable support in assisting in the data collection.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Meddi, M.; Toumi, S.; Assani, A.A. Application of the L-Moments Approach to the Analysis of Regional Flood Frequency in Northern Algeria. Int. J. Hydrol. Sci. Technol. 2017, 7, 77. [Google Scholar] [CrossRef]
  2. Tehrany, M.S.; Lee, M.-J.; Pradhan, B.; Jebur, M.N.; Lee, S. Flood Susceptibility Mapping Using Integrated Bivariate and Multivariate Statistical Models. Environ. Earth Sci. 2014, 72, 4001–4015. [Google Scholar] [CrossRef]
  3. Fatahi Nafchi, R.; Yaghoobi, P.; Reaisi Vanani, H.; Ostad-Ali-Askari, K.; Nouri, J.; Maghsoudlou, B. Eco-Hydrologic Stability Zonation of Dams and Power Plants Using the Combined Models of SMCE and CEQUALW2. Appl. Water Sci. 2021, 11, 109. [Google Scholar] [CrossRef]
  4. Pathan, A.I.; Agnihotri, P.G. Application of New HEC-RAS Version 5 for 1D Hydrodynamic Flood Modeling with Special Reference through Geospatial Techniques: A Case of River Purna at Navsari, Gujarat, India. Model. Earth Syst. Environ. 2021, 7, 1133–1144. [Google Scholar] [CrossRef]
  5. Anılan, T.; Marangoz, H.O.; Wara, M.G. L-Moments Based Regional Frequency Analysis on 1D Flood Analysis by Solving Regular Energy Equations in the Urban Areas. Arab. J. Geosci. 2025, 18, 83. [Google Scholar] [CrossRef]
  6. Leščešen, I.; Urošev, M.; Dolinaj, D.; Pantelić, M.; Telbisz, T.; Varga, G.; Savić, S.; Milošević, D. Regional Flood Frequency Analysis Based on L-Moment Approach (Case Study Tisza River Basin). Water Resour. 2019, 46, 853–860. [Google Scholar] [CrossRef]
  7. Eslamian, S.; Parvizi, S.; Ostad-Ali-Askari, K.; Talebmorad, H. Water. In Encyclopedia of Engineering Geology; Bobrowsky, P., Marker, B., Eds.; Encyclopedia of Earth Sciences Series; Springer International Publishing: Cham, Switzerland, 2018; pp. 1–5. ISBN 978-3-319-12127-7. [Google Scholar]
  8. Peker, İ.B.; Gülbaz, S.; Demir, V.; Orhan, O.; Beden, N. Integration of HEC-RAS and HEC-HMS with GIS in Flood Modeling and Flood Hazard Mapping. Sustainability 2024, 16, 1226. [Google Scholar] [CrossRef]
  9. Khan, M.S.R.; Hussain, Z.; Ahmad, I. Regional Flood Frequency Analysis, Using Lmoments, Artificial Neural Networks and OLS Regression, of Various Sites of Khyberpakhtunkhwa, Pakistan. Appl. Ecol. Environ. Res. 2021, 19, 471–489. [Google Scholar] [CrossRef]
  10. Hussain, Z.; Pasha, G.R. Regional Flood Frequency Analysis of the Seven Sites of Punjab, Pakistan, Using L-Moments. Water Resour. Manag. 2009, 23, 1917–1933. [Google Scholar] [CrossRef]
  11. Jaiswal, R.; Nayak, T.R.; Lohani, A.K.; Galkate, R.V. Regional Flood Frequency Modeling for a Large Basin in India. Nat. Hazards 2021, 111, 1845–1861. [Google Scholar] [CrossRef]
  12. Requena, A.I.; Ouarda, T.B.M.J.; Chebana, F. Flood Frequency Analysis at Ungauged Sites Based on Regionally Estimated Streamflows. J. Hydrometeorol. 2017, 18, 2521–2539. [Google Scholar] [CrossRef]
  13. Hailegeorgis, T.T.; Alfredsen, K. Regional Flood Frequency Analysis and Prediction in Ungauged Basins Including Estimation of Major Uncertainties for Mid-Norway. J. Hydrol. Reg. Stud. 2017, 9, 104–126. [Google Scholar] [CrossRef]
  14. Mesbahzadeh, T.; Soleimani Sardoo, F.; Kouhestani, S. Flood Frequency Analysis for the Iranian Interior Deserts Using the Method of L-moments: A Case Study in the Loot River Basin. Nat. Resour. Model. 2019, 32, e12208. [Google Scholar] [CrossRef]
  15. Alam, J.; Muzzammil, M.; Khan, M.K. Regional Flood Frequency Analysis: Comparison of L-Moment and Conventional Approaches for an Indian Catchment. ISH J. Hydraul. Eng. 2016, 22, 247–253. [Google Scholar] [CrossRef]
  16. Lee, D.-H.; Kim, N.W. Regional Flood Frequency Analysis for a Poorly Gauged Basin Using the Simulated Flood Data and L-Moment Method. Water 2019, 11, 1717. [Google Scholar] [CrossRef]
  17. Yang, T.; Xu, C.-Y.; Shao, Q.-X.; Chen, X. Regional Flood Frequency and Spatial Patterns Analysis in the Pearl River Delta Region Using L-Moments Approach. Stoch. Environ. Res. Risk Assess. 2010, 24, 165–182. [Google Scholar] [CrossRef]
  18. Aydoğan, D.; Kankal, M.; Önsoy, H. Regional Flood Frequency Analysis for Çoruh Basin of Turkey with L-moments Approach. J. Flood Risk Manag. 2016, 9, 69–86. [Google Scholar] [CrossRef]
  19. Greenwood, J.A.; Landwehr, J.M.; Matalas, N.C.; Wallis, J.R. Probability Weighted Moments: Definition and Relation to Parameters of Several Distributions Expressable in Inverse Form. Water Resour. Res. 1979, 15, 1049–1054. [Google Scholar] [CrossRef]
  20. Karim, M.A.; Chowdhury, J.U. A Comparison of Four Distributions Used in Flood Frequency Analysis in Bangladesh. Hydrol. Sci. J. 1995, 40, 55–66. [Google Scholar] [CrossRef]
  21. Ferdows, M.; Hossain, M. Flood Frequency Analysisat Different Riversin Bangladesh: A Comparison Study on Probabilitv Distribution Functions. Sci. Technol. 2005, 10, 53–62. [Google Scholar]
  22. Bhattacharya, B.; Islam, T.; Masud, S.; Suman, A.; Solomatine, D.P. The Use of a Flood Index to Characterise Flooding in the North-Eastern Region of Bangladesh. E3S Web Conf. 2016, 7, 10003. [Google Scholar] [CrossRef]
  23. Asad, M.A.; Ahmeduzzaman, M.; Kar, S.; Khan, M.A.; Rahman, M.N.; Islam, S. Flood Frequency Modeling Using Gumbel’s and Powell’s Method for Dudhkumar River. J. Water Resour. Ocean Sci. 2013, 2, 25–28. [Google Scholar] [CrossRef]
  24. Opu, R.K.; Masum, A.A.; Biswas, R.; Islam, S. Flood Frequency Analysis by Probability and Stochastic Method for Padma River, Bangladesh. Am. J. Civ. Eng. 2014, 2, 8–11. [Google Scholar] [CrossRef]
  25. Alam, M.A.; Farnham, C.; Emura, K. Bayesian Modeling of Flood Frequency Analysis in Bangladesh Using Hamiltonian Monte Carlo Techniques. Water 2018, 10, 900. [Google Scholar] [CrossRef]
  26. Roy, B.; Islam, A.K.M.S.; Islam, G.M.T.; Khan, M.J.U.; Bhattacharya, B.; Ali, M.H.; Khan, A.S.; Hossain, M.S.; Sarker, G.C.; Pieu, N.M. Frequency Analysis of Flash Floods for Establishing New Danger Levels for the Rivers in the Northeast Haor Region of Bangladesh. J. Hydrol. Eng. 2019, 24, 05019004. [Google Scholar] [CrossRef]
  27. Nowreen, S.; Murshed, S.B.; Islam, A.K.M.S.; Bhaskaran, B. Change of Future Climate Extremes for the Haor Basin Area of Bangladesh. In Proceedings of the 4th International Conference on Water and Flood Management, Dhaka, Bangladesh, 5 October 2013; Bangladesh University of Engineering and Technology: Dhaka, Bangladesh, 2013; Volume 2, pp. 545–556. [Google Scholar]
  28. Huda, M.K. Experience with Modern and Hybrid Rice Varieties in Haor Ecosystem: Emerging Technologies for Sustainable Rice Production. In Proceedings of the Twentieth National Workshop on Rice Research and Extension in Bangladesh. Bangladesh Rice Research Institute, Gazipur, Bangladesh, 21 April 2004; Bangladesh Rice Research Institute: Gazipur, Bangladesh, 2004; pp. 94–97. [Google Scholar]
  29. Khan, M.; Mia, M.; Hossain, M. Impacts of Flood on Crop Production in Haor Areas of Two Upazillas in Kishoregonj. J. Environ. Sci. Nat. Resour. 2012, 5, 193–198. [Google Scholar] [CrossRef]
  30. Hossain, G.M.; Nishat, A. Environmental Considerations for Water Resources Development in Haor Areas of Northeastern Bangladesh. In Proceedings of the North American Water and Environment Congress & Destructive Water, Anaheim, CA, USA, 28 June 1996; American Society of Civil Engineers: Anaheim, CA, USA; pp. 1063–1068. [Google Scholar]
  31. Bangladesh Red Crescent Society. Bangladesh: Sylhet Flash Flood—Situation Report #03 (June 18); Bangladesh Red Crescent Society: Dhaka, Bangladesh, 2022. [Google Scholar]
  32. Sylhet Division—Banglapedia. Available online: http://en.banglapedia.org/index.php/Sylhet_Division (accessed on 19 April 2025).
  33. Rafiuddin, M.; Uyeda, H.; Islam, M.N. Characteristics of Monsoon Precipitation Systems in and around Bangladesh. Int. J. Climatol. 2010, 30, 1042–1055. [Google Scholar] [CrossRef]
  34. Islam, M.; Uyeda, H. Use of TRMM in Determining the Climatic Characteristics of Rainfall over Bangladesh. Remote Sens. Environ. 2007, 108, 264–276. [Google Scholar] [CrossRef]
  35. Hasan, G.M.J.; Alam, R.; Islam, Q.N.; Hossain, S. Frequency Structure of Major Rainfall Events in the North-Eastern Part of Bangladesh. J. Eng. Sci. Technol. 2012, 7, 690–700. [Google Scholar]
  36. Ohsawa, T.; Ueda, H.; Hayashi, T.; Watanabe, A.; Matsumoto, J. Diurnal Variations of Convective Activity and Rainfall in Tropical Asia. J. Meteorol. Soc. Jpn. Ser II 2001, 79, 333–352. [Google Scholar] [CrossRef]
  37. Mahanta, R.; Sarma, D.; Choudhury, A. Heavy Rainfall Occurrences in Northeast India. Int. J. Climatol. 2013, 33, 1456–1469. [Google Scholar] [CrossRef]
  38. Sato, T. Mechanism of Orographic Precipitation around the Meghalaya Plateau Associated with Intraseasonal Oscillation and the Diurnal Cycle. Mon. Weather Rev. 2013, 141, 2451–2466. [Google Scholar] [CrossRef]
  39. Stiller-Reeve, M.A.; Syed, M.A.; Spengler, T.; Spinney, J.A.; Hossain, R. Complementing Scientific Monsoon Definitions with Social Perception in Bangladesh. Bull. Am. Meteorol. Soc. 2015, 96, 49–57. [Google Scholar] [CrossRef]
  40. Haque, A.; Jahan, S. Impact of Flood Disasters in Bangladesh: A Multi-Sector Regional Analysis. Int. J. Disaster Risk Reduct. 2015, 13, 266–275. [Google Scholar] [CrossRef]
  41. Hosking, J.R.M. L-Moments: Analysis and Estimation of Distributions Using Linear Combinations of Order Statistics. J. R. Stat. Soc. Ser. B Stat. Methodol. 1990, 52, 105–124. [Google Scholar] [CrossRef]
  42. Hosking, J.R.M.; Wallis, J.R. Regional Frequency Analysis: An Approach Based on L-Moments, 1st ed.; Cambridge University Press: Cambridge, UK, 1997; ISBN 978-0-521-43045-6. [Google Scholar]
  43. Sahu, R.T.; Verma, M.K.; Ahmad, I. Regional Frequency Analysis Using L-Moment Methodology—A Review. In Recent Trends in Civil Engineering; Pathak, K.K., Bandara, J.M.S.J., Agrawal, R., Eds.; Lecture Notes in Civil Engineering; Springer: Singapore, 2021; Volume 77, pp. 811–832. ISBN 978-981-15-5194-9. [Google Scholar]
  44. Wang, Q.J. Estimation of the GEV Distribution from Censored Samples by Method of Partial Probability Weighted Moments. J. Hydrol. 1990, 120, 103–114. [Google Scholar] [CrossRef]
  45. Khan, S.A.; Hussain, I.; Hussain, T.; Faisal, M.; Muhammad, Y.S.; Mohamd Shoukry, A. Regional Frequency Analysis of Extremes Precipitation Using L-Moments and Partial L-Moments. Adv. Meteorol. 2017, 2017, 6954902. [Google Scholar] [CrossRef]
  46. Mann, H.B. Nonparametric Tests Against Trend. Econometrica 1945, 13, 245. [Google Scholar] [CrossRef]
  47. Kendall, M.G. Rank Correlation Methods, 4th ed.; 2d impression; Griffin: London, UK, 1975; ISBN 978-0-85264-199-6. [Google Scholar]
  48. Moran, P.A.P. Notes on Continuous Stochastic Phenomena. Biometrika 1950, 37, 17. [Google Scholar] [CrossRef]
  49. Ngongondo, C.S.; Xu, C.-Y.; Tallaksen, L.M.; Alemaw, B.; Chirwa, T. Regional Frequency Analysis of Rainfall Extremes in Southern Malawi Using the Index Rainfall and L-Moments Approaches. Stoch. Environ. Res. Risk Assess. 2011, 25, 939–955. [Google Scholar] [CrossRef]
  50. Frimpong, B.F.; Koranteng, A.; Molkenthin, F. Analysis of Temperature Variability Utilising Mann–Kendall and Sen’s Slope Estimator Tests in the Accra and Kumasi Metropolises in Ghana. Environ. Syst. Res. 2022, 11, 24. [Google Scholar] [CrossRef]
  51. Yue, S.; Wang, C. The Mann-Kendall Test Modified by Effective Sample Size to Detect Trend in Serially Correlated Hydrological Series. Water Resour. Manag. 2004, 18, 201–218. [Google Scholar] [CrossRef]
  52. Yi, X.; Li, G.; Yin, Y. Spatio-Temporal Variation of Precipitation in the Three-River Headwater Region from 1961 to 2010. J. Geogr. Sci. 2013, 23, 447–464. [Google Scholar] [CrossRef]
  53. Akhundzadah, N.A. Analyzing Temperature, Precipitation, and River Discharge Trends in Afghanistan’s Main River Basins Using Innovative Trend Analysis, Mann–Kendall, and Sen’s Slope Methods. Climate 2024, 12, 196. [Google Scholar] [CrossRef]
  54. Kendall, M.G. A New Measure OF Rank Correlation. Biometrika 1938, 30, 81–93. [Google Scholar] [CrossRef]
  55. Alexandersson, H. A Homogeneity Test Applied to Precipitation Data. J. Climatol. 1986, 6, 661–675. [Google Scholar] [CrossRef]
  56. Pandžić, K.; Kobold, M.; Oskoruš, D.; Biondić, B.; Biondić, R.; Bonacci, O.; Likso, T.; Curić, O. Standard Normal Homogeneity Test as a Tool to Detect Change Points in Climate-Related River Discharge Variation: Case Study of the Kupa River Basin. Hydrol. Sci. J. 2020, 65, 227–241. [Google Scholar] [CrossRef]
  57. Toreti, A.; Kuglitsch, F.G.; Xoplaki, E.; Della-Marta, P.M.; Aguilar, E.; Prohom, M.; Luterbacher, J. A Note on the Use of the Standard Normal Homogeneity Test to Detect Inhomogeneities in Climatic Time Series. Int. J. Climatol. 2011, 31, 630–632. [Google Scholar] [CrossRef]
  58. Khaliq, M.N.; Ouarda, T.B.M.J. On the Critical Values of the Standard Normal Homogeneity Test (SNHT). Int. J. Climatol. 2007, 27, 681–687. [Google Scholar] [CrossRef]
  59. Hassani, H.; Royer-Carenzi, M.; Mashhad, L.M.; Yarmohammadi, M.; Yeganegi, M.R. Exploring the Depths of the Autocorrelation Function: Its Departure from Normality. Information 2024, 15, 449. [Google Scholar] [CrossRef]
  60. Nounou, M.N.; Bakshi, B.R. Multiscale Methods for Denoising and Compression. In Data Handling in Science and Technology; Elsevier: Amsterdam, The Netherlands, 2000; Volume 22, pp. 119–150. ISBN 978-0-444-50111-0. [Google Scholar]
  61. Bhattacharya, P.K.; Burman, P. Time Series. In Theory and Methods of Statistics; Elsevier: Amsterdam, The Netherlands, 2016; pp. 431–489. ISBN 978-0-12-802440-9. [Google Scholar]
  62. Rius, A.; Ruisánchez, I.; Callao, M.P.; Rius, F.X. Reliability of Analytical Systems: Use of Control Charts, Time Series Models and Recurrent Neural Networks (RNN). Chemom. Intell. Lab. Syst. 1998, 40, 1–18. [Google Scholar] [CrossRef]
  63. Ward, J.H. Hierarchical Grouping to Optimize an Objective Function. J. Am. Stat. Assoc. 1963, 58, 236–244. [Google Scholar] [CrossRef]
  64. Rousseeuw, P.J. Silhouettes: A Graphical Aid to the Interpretation and Validation of Cluster Analysis. J. Comput. Appl. Math. 1987, 20, 53–65. [Google Scholar] [CrossRef]
  65. Tibshirani, R.; Walther, G.; Hastie, T. Estimating the Number of Clusters in a Data Set Via the Gap Statistic. J. R. Stat. Soc. Ser. B Stat. Methodol. 2001, 63, 411–423. [Google Scholar] [CrossRef]
  66. Thorndike, R.L. Who Belongs in the Family? Psychometrika 1953, 18, 267–276. [Google Scholar] [CrossRef]
  67. Mulaomerović-Šeta, A.; Blagojević, B.; Mihailović, V.; Petroselli, A. A Silhouette-Width-Induced Hierarchical Clustering for Defining Flood Estimation Regions. Hydrology 2023, 10, 126. [Google Scholar] [CrossRef]
  68. Hosking, J.R.M.; Wallis, J.R. Some Statistics Useful in Regional Frequency Analysis. Water Resour. Res. 1993, 29, 271–281. [Google Scholar] [CrossRef]
  69. Khan, M.S.U.R.; Hussain, Z.; Ahmad, I.; Noor, F. Modeling of Flood Extremes Using Regional Frequency Analysis of Sites of Khyber Pakhtunkhwa, Pakistan. J. Flood Risk Manag. 2021, 14, e12751. [Google Scholar] [CrossRef]
  70. Dalrymple, T. Flood-Frequency Analyses, Manual of Hydrology: Part 3; U.S. Government Printing Office: Washington, DC, USA, 1960. [Google Scholar]
  71. Malekinezhad, H.; Zare-Garizi, A. Regional Frequency Analysis of Daily Rainfall Extremes Using L-Moments Approach. Atmósfera 2014, 27, 411–427. [Google Scholar] [CrossRef]
  72. Pandey, G.R.; Nguyen, V.-T.-V. A Comparative Study of Regression Based Methods in Regional Flood Frequency Analysis. J. Hydrol. 1999, 225, 92–101. [Google Scholar] [CrossRef]
  73. Stedinger, J.R.; Vogel, R.M.; Foufoula-Georgiou, E. Frequency Analysis of Extreme Events. Handb. Hydrol. 1993, 18.1–18.66. [Google Scholar]
  74. Ouarda, T.B.M.J.; Girard, C.; Cavadias, G.S.; Bobée, B. Regional Flood Frequency Estimation with Canonical Correlation Analysis. J. Hydrol. 2001, 254, 157–173. [Google Scholar] [CrossRef]
  75. Shu, C.; Ouarda, T.B.M.J. Regional Flood Frequency Analysis at Ungauged Sites Using the Adaptive Neuro-Fuzzy Inference System. J. Hydrol. 2008, 349, 31–43. [Google Scholar] [CrossRef]
  76. Gaume, E.; Bain, V.; Bernardara, P.; Newinger, O.; Barbuc, M.; Bateman, A.; Blaškovičová, L.; Blöschl, G.; Borga, M.; Dumitrescu, A.; et al. A Compilation of Data on European Flash Floods. J. Hydrol. 2009, 367, 70–78. [Google Scholar] [CrossRef]
  77. Burn, D.H. Catchment Similarity for Regional Flood Frequency Analysis Using Seasonality Measures. J. Hydrol. 1997, 202, 212–230. [Google Scholar] [CrossRef]
  78. Rao, A.R.; Srinivas, V.V. Regionalization of Watersheds; Water Science and Technology Library; Springer: Dordrecht, The Netherlands, 2008; Volume 58, ISBN 978-1-4020-6851-5. [Google Scholar]
Figure 1. The geographical location of the Sylhet Division highlights the positions of 26 streamflow gauging stations across the region.
Figure 1. The geographical location of the Sylhet Division highlights the positions of 26 streamflow gauging stations across the region.
Water 17 01771 g001
Figure 2. Temporal coverage of station data highlighting periods of availability and missing years.
Figure 2. Temporal coverage of station data highlighting periods of availability and missing years.
Water 17 01771 g002
Figure 3. Architecture of the proposed framework.
Figure 3. Architecture of the proposed framework.
Water 17 01771 g003
Figure 4. ACF plots were generated for the annual maximum streamflow data from 26 streamflow gauge stations, with horizontal dashed lines marking the 95% confidence bounds.
Figure 4. ACF plots were generated for the annual maximum streamflow data from 26 streamflow gauge stations, with horizontal dashed lines marking the 95% confidence bounds.
Water 17 01771 g004
Figure 5. PACF plots were generated for the annual maximum streamflow data from 26 streamflow gauge stations, with horizontal dashed lines marking the 95% confidence bounds.
Figure 5. PACF plots were generated for the annual maximum streamflow data from 26 streamflow gauge stations, with horizontal dashed lines marking the 95% confidence bounds.
Water 17 01771 g005
Figure 6. (a) Silhouette method showing average silhouette width across varying cluster counts (k), with a peak value at k = 5 (indicated by vertical dashed line). (b) The elbow method illustrates the total within-sum-of-squares against the number of clusters k, with an identified inflection point at k = 4 (indicated by a vertical dashed line). (c) The gap statistic method displays gap values with standard error bars for different cluster configurations (the ideal number of clusters is shown by a vertical dashed line). (d) Hierarchical clustering dendrogram demonstrating sample relationships and potential cluster formations based on distance metrics (different groups are highlighted by colored rectangles).
Figure 6. (a) Silhouette method showing average silhouette width across varying cluster counts (k), with a peak value at k = 5 (indicated by vertical dashed line). (b) The elbow method illustrates the total within-sum-of-squares against the number of clusters k, with an identified inflection point at k = 4 (indicated by a vertical dashed line). (c) The gap statistic method displays gap values with standard error bars for different cluster configurations (the ideal number of clusters is shown by a vertical dashed line). (d) Hierarchical clustering dendrogram demonstrating sample relationships and potential cluster formations based on distance metrics (different groups are highlighted by colored rectangles).
Water 17 01771 g006
Figure 7. Location of homogeneous regions in Sylhet Division.
Figure 7. Location of homogeneous regions in Sylhet Division.
Water 17 01771 g007
Figure 8. L-moment ratio diagrams for annual maximum streamflow series for four homogeneous regions.
Figure 8. L-moment ratio diagrams for annual maximum streamflow series for four homogeneous regions.
Water 17 01771 g008
Figure 9. Regional growth curves with 95% confidence intervals for all four homogeneous regions.
Figure 9. Regional growth curves with 95% confidence intervals for all four homogeneous regions.
Water 17 01771 g009
Figure 10. Regional flood quantile estimates with associated uncertainty bands across different return periods.
Figure 10. Regional flood quantile estimates with associated uncertainty bands across different return periods.
Water 17 01771 g010
Figure 11. Heatmap of model evaluation metrics (R2, RMSE, and MAPE) for different return periods.
Figure 11. Heatmap of model evaluation metrics (R2, RMSE, and MAPE) for different return periods.
Water 17 01771 g011
Figure 12. Average influence of variables on peak discharge (based on absolute exponent values).
Figure 12. Average influence of variables on peak discharge (based on absolute exponent values).
Water 17 01771 g012
Table 1. Data summary of 26 streamflow gauging stations.
Table 1. Data summary of 26 streamflow gauging stations.
Station IDStation NameLatitude (°)Longitude (°)Data PeriodArea (km2)*MAP (mm)*MAT (°C)
SW264AMontala24.083491.35061998–2024912322.5524.5
SW135AJuriCont_Silghat24.603292.121990–20128002521.3724.29
SW341Chella Sonapur25.130191.66851990–2016511.424474.4817.23
SW280Sutang Rly. Bridge24.282891.4031964–2024246.422300.2624.55
SW157Ballah24.083891.59611999–20231142.762199.3824.15
SW67Kamalganj24.355891.84891990–2024731.082241.9224.12
SW333Muslimpur25.110991.39141996–2023543.844388.2518.06
SW337Urargaon25.120991.59361990–201679.34996.0322.13
SW326Lubachara25.051592.30851985–2022726.413402.1720.82
SW265Jaldhup24.773892.17291964–20162206.572630.4723.98
SW251Sarighat25.093992.11732000–2023821.73578.9319.34
SW233Ratnar bhanga (Piyan gang)25.162491.9671985–20228443981.2818.15
SW173Sheola24.891192.19151996–202325,337.012253.3620.89
SW135Juri_Silghat24.590292.11771964–2022789.532246.7524.3
SW333ADulura25.159391.38782003–20245304388.2518.06
SW233AJaflong_Spill25.156892.01671994–20248503981.2818.15
SW332Islampur25.130291.75381988–2016350.724733.8418.73
SW192Motiganj24.306591.68221964–202361.292246.7524.3
SW158.1Shaistaganj24.273191.47361964–202414002225.6424.21
SW138Sofiabad24.29391.54651964–2024166.962261.924.5
SW131.5Laurergarh Saktiarkhola25.189191.25081993–20242483.83302.2519.25
SW175.5Sherpur24.630391.68131996–202333,855.22360.1321.71
SW266Kanairghat24.99892.25942000–20231501.73377.622.01
SW267Sylhet24.887991.8682000–20231593.123377.622.01
SW269Sunamganj25.079291.41212000–20237176.024236.9321.1
SW201Monu Rly. Bridge24.425691.94041996–20232292.712041.9223.83
Note: *MAP (mm) and *MAT (°C) represent the Mean Annual Precipitation (mm) and Mean Annual Temperature (°C), respectively.
Table 2. Interpretation of H-statistic values.
Table 2. Interpretation of H-statistic values.
H ValueInterpretation
H < 1 The region is “acceptably homogeneous.”
1 H < 2 The region is “possibly heterogeneous.”
H 2 The region is “definitely heterogeneous.”
Table 3. Results of the Mann–Kendall trend test (critical Z = ±1.96, critical p-value (MK) = 0.05) and SNHT (critical value = 6.95, critical p-value (SNHT) = 0.05) at 26 gauge Stations (α = 0.05), evaluated at a 95% confidence level.
Table 3. Results of the Mann–Kendall trend test (critical Z = ±1.96, critical p-value (MK) = 0.05) and SNHT (critical value = 6.95, critical p-value (SNHT) = 0.05) at 26 gauge Stations (α = 0.05), evaluated at a 95% confidence level.
StationZ-Valuep-Value (MK)Kendall’s TauSNHT StatisticHomogeneousp-Value (SNHT)
SW131.51.32570.18490.16984.0221TRUE0.2
SW135A−0.17880.858−0.06663.6729TRUE0.2
SW135−2.5264 *0.0115 *−0.2350 *25.9143 *FALSE0.01 *
SW138−2.2254 *0.0260 *−0.2156 *6.8564TRUE0.1
SW157−1.61140.107−0.233312.897 *FALSE0.01 *
SW158.1−2.6223 *0.0087 *−0.2349 *13.4951 *FALSE0.01 *
SW1732.0349 *0.0418 *0.2751 *4.6849TRUE0.2
SW175.52.8646 *0.0041 *0.3862 *7.4874 *FALSE0.05 *
SW192−1.45770.1449−0.15158.3583 *FALSE0.05 *
SW201−1.0470.295−0.14285.1011TRUE0.2
SW233A1.66560.09570.21293.6703TRUE0.2
SW233−0.88930.3737−0.12163.2212TRUE0.2
SW251−0.42160.6732−0.06521.1437TRUE0.2
SW264A0.96980.33210.13843.1204TRUE0.2
SW2650.33790.73540.03842.1146TRUE0.2
SW266−0.1240.9012−0.02171.3569TRUE0.2
SW2671.68720.09150.254.2844TRUE0.2
SW2691.41380.15740.21013.7253TRUE0.2
SW280−1.81150.07−0.21265.3187TRUE0.2
SW326−0.88930.3737−0.12163.2212TRUE0.2
SW332−1.41920.1558−0.22856.2345TRUE0.2
SW333A0.21130.83250.0381.3085TRUE0.2
SW333−1.0470.295−0.14285.1011TRUE0.2
SW3370.33240.73950.05711.3458TRUE0.2
SW341−0.15110.8798−0.02854.5678TRUE0.2
SW670.11360.90950.015112.5919 *FALSE0.01 *
Note: * Indicates statistically significant trends.
Table 4. Summary of heterogeneity measures and station discordancy values for the five homogeneous groups. Values in the first bracket represent each station’s discordancy values.
Table 4. Summary of heterogeneity measures and station discordancy values for the five homogeneous groups. Values in the first bracket represent each station’s discordancy values.
GroupDcriticalStations Name and Their Discordancy Values (Di)H1H2H3
11.92SW337 (0.17), SW332 (1.74), SW341 (0.91), SW131.5 (1.67), SW269 (1.86), SW333 (0.34), SW333A (0.31)8.796.044.48
22.14SW267 (0.68), SW233 (0.94), SW251 (0.34), SW233A (1.81), SW266 (0.99), SW326 (0.68), SW265 (1.51), SW264A (1.04)4.533.552.58
33SW173 (1.00), SW175.5 (1.00)2.064.33.86
43SW138 (1.00), SW135 (1.00), SW201 (1.00)6.638.948.78
51.65SW264A (1.36), SW67 (0.70), SW158.1 (1.04), SW280 (0.67), SW157 (0.93), SW192 (1.30)1.10.08−0.31
Table 5. The regional homogeneity assessment results show site groupings, discordancy statistics (Di), and heterogeneity measures (H1, H2, H3).
Table 5. The regional homogeneity assessment results show site groupings, discordancy statistics (Di), and heterogeneity measures (H1, H2, H3).
RegionSites (Di) *Dcrit.H1H2H3
1SW175.5 (1.00), SW266 (1.00), SW267 (1.00), SW269 (1.00)3.000.940.911.16
2SW131.5 (0.60), SW138 (0.88), SW158.1 (1.41), SW192 (0.52), SW332 (1.46), SW233A (1.47), SW333A (0.66)1.921.311.761.77
3SW135 (0.46), SW173 (2.10), SW233 (0.15), SW251 (2.16), SW265 (0.46), SW326 (0.15), SW333 (2.26), SW337 (0.26)2.140.340.891.04
4SW67 (1.21), SW157 (0.71), SW280 (1.64), SW341 (0.35), SW135A (1.40), SW264A (0.68)1.650.700.340.26
Note: * Values in the first bracket represent each station’s discordancy values.
Table 6. Values of ZDIST statistics for candidate distributions.
Table 6. Values of ZDIST statistics for candidate distributions.
Serial No.RegionGLOGEVGNOPE3GPA
1Region 10.321.200.830.833.97 a
2Region 20.050.871.372.27 a3.01 a
3Region 31.013.55 a2.44 a3.02 a7.28 a
4Region 40.771.461.79 a2.39 a3.18 a
Note: a Indicates the calculated values exceeding the critical value of 1.64.
Table 7. Regional parameters for the four candidate distributions for L-moments.
Table 7. Regional parameters for the four candidate distributions for L-moments.
RegionDistributionξ (Location)α (Scale)K (Shape)
Region 1Gen. Logistic0.99920.045−0.0112
Gen. Extreme Value0.97130.07860.2638
Gen. Normal0.99910.0797−0.0229
Pearson Type III10.07980.0688
Region 2Gen. Logistic0.82100.3617−0.2749
Gen. Extreme Value0.61930.5014−0.1570
Gen. Normal0.80230.6350−0.5729
Region 3Gen. Logistic1.05110.14830.1997
Region 4Gen. Logistic0.84710.3412−0.2525
Gen. Extreme Value0.65490.4816−0.1246
Table 8. Estimated regional quantiles and corresponding accuracy metrics for the four identified homogeneous regions for GLO distribution.
Table 8. Estimated regional quantiles and corresponding accuracy metrics for the four identified homogeneous regions for GLO distribution.
Return Period2 Years5 Years10 Years20 Years50 Years100 Years200 Years1000 Years
Region 1f0.50.80.90.950.980.990.9950.999
q(F)0.99911.06201.09921.13381.17811.21131.24451.3222
RMSE0.00250.00820.01230.01660.02430.03230.04240.0740
Lower Bound0.99511.05261.08201.10631.13371.15071.16531.1884
Upper Bound1.00331.07731.12081.15971.20771.24261.27741.3575
Region 2q(F)0.82101.43141.91232.46123.34054.15865.14318.2901
RMSE0.01890.08560.16340.26120.46210.70971.08532.7769
Lower Bound0.78691.33471.70302.08592.63693.09983.60544.9954
Upper Bound0.84401.59952.22422.91363.95594.88875.96769.1964
Region 3q(F)1.05111.23071.31491.38131.45241.49721.53581.6069
RMSE0.00730.05170.07460.09460.12040.14010.16010.2068
Lower Bound1.04081.15771.20241.23161.25601.26791.27591.2813
Upper Bound1.06411.31861.43261.51671.59971.64811.68851.7572
Region 4q(F)0.84711.41351.84922.33793.10593.80754.63877.2253
RMSE0.01750.06770.12460.19610.34990.54610.84682.1913
Lower Bound0.81611.34181.69292.05132.55132.95983.40134.5394
Upper Bound0.86971.54812.09042.68083.56514.34495.23877.8847
Table 9. Estimated maximum flood discharges corresponding to various non-exceedance probabilities (return periods).
Table 9. Estimated maximum flood discharges corresponding to various non-exceedance probabilities (return periods).
Non-Exceedance Probability f (Return Period, Year)
f0.50.80.90.950.980.990.9950.999
Station ID2 Years5 Years10 Years20 Years50 Years100 Years200 Years1000 Years
SW264A18.8331.4141.1051.9669.0384.63103.11160.60
SW135A125.94210.14274.93347.58461.77566.08689.671074.22
SW3411031.481721.042251.592846.593781.804636.035648.188797.55
SW28047.7279.62104.17131.70174.97214.49261.32407.04
SW157263.38439.46574.94726.87965.681183.801442.252246.44
SW6791.39152.49199.49252.21335.08410.76500.44779.49
SW333166.68195.17208.53219.06230.34237.44243.56254.84
SW337541.23633.74677.10711.29747.92770.97790.86827.48
SW326811.03949.641014.621065.861120.751155.281185.081239.95
SW265428.34501.55535.87562.93591.92610.15625.89654.88
SW251987.431156.191235.301297.691364.511406.551442.841509.65
SW233811.03949.641014.621065.861120.751155.281185.081239.95
SW1731965.572301.512458.992583.172716.192799.872872.113005.10
SW1351968.632305.092462.812587.192720.412804.232876.583009.77
SW333A486.15847.561132.371457.381978.082462.453045.454908.90
SW233A586.2010221365.421757.322385.182969.253672.235919.20
SW3321401.722443.783264.984202.075703.407100.018780.9514,153.86
SW19240.5170.6294.36121.44164.83205.20253.78409.06
SW158.1199.94348.58465.72599.39813.541012.761252.532018.94
SW13857.3499.98133.57171.91233.34290.48359.25579.07
SW131.5844.891473.031967.982532.823437.764279.575292.768531.31
SW175.52001.962127.872202.432271.742360.452426.922493.432649.13
SW2661951.702074.452147.142214.712301.202366.052430.842582.62
SW2671748.161858.101923.211983.742061.202119.252177.322313.28
SW2692886.043067.553175.043274.963402.853498.673594.553819.03
Table 10. Geomorphological parameters of the watersheds derived from GIS analysis.
Table 10. Geomorphological parameters of the watersheds derived from GIS analysis.
Station IDTotal Stream Length (km)Number of StreamsPerimeter (km)Main Channel Length (km)Max Basin Length (km)Max Stream Order
SW264A5.79553.871.7817.852
SW135A122.2221182.6619.1857.893
SW34184.949173.6122.3959.112
SW28033.66589.2320.3830.72
SW157168.6537226.4117.1167.572
SW67104.9421193.6824.0863.52
SW33394.447202.8629.9362.992
SW33710.25149.0810.2518.411
SW326108.0217187.7715.9956.923
SW265414.6543358.9729.07124.043
SW251146.5111214.9736.6157.452
SW233137.2515184.0633.454.893
SW1732644.872641426.0247.11378.685
SW135119.421181.1519.1857.893
SW333A86.93719123.7159.092
SW233A139.7115186.2333.454.893
SW33253.727112.6516.6540.673
SW19212.06151.512.0617.651
SW158.1232.2643294.1726.1585.23
SW13826.91194.7426.9131.331
SW131.5430.3851392.3936.15105.254
SW175.53636.263601616.5274.07410.465
SW266276.1737277.0825.6476.63
SW267340.7837434.3671.75106.613
SW269762.1275683.9578.48159.955
Table 11. Annual maximum flood discharge equations for different return periods.
Table 11. Annual maximum flood discharge equations for different return periods.
Return PeriodEquationR2RMSEMAPE (%)
Q2Q = 2.48 × 10−16 × A0.2325 × E1.7621 × TSL−0.3745 × NS1.0083 × P0.6086 × MCL0.0850 × MBL−0.9550 × MSO−0.5222 × MAP3.8834 × MAT1.09420.869438.3442.68
Q5Q = 1.68 × 10−11 × A0.1712 × E1.6987 × TSL−0.4499 × NS1.0641 × P0.0461 × MCL0.1968 × MBL−0.4628 × MSO−0.4407 × MAP3.2083 × MAT−0.20860.865522.3441.00
Q10Q = 5.81 × 10−9 × A0.1378 × E1.6584 × TSL−0.4958 × NS1.0969 × P−0.2345 × MCL0.2628 × MBL−0.2173 × MSO−0.3964 × MAP2.8488 × MAT−0.88300.852615.6042.32
Q20Q = 9.74 × 10−7 × A0.1078 × E1.6195 × TSL−0.5386 × NS1.1269 × P−0.4722 × MCL0.3239 × MBL−0.0096 × MSO−0.3564 × MAP2.5317 × MAT−1.46850.833746.8845.32
Q50Q = 4.97 × 10−4 × A0.0702 × E1.5684 × TSL−0.5931 × NS1.1646 × P−0.7521 × MCL0.4018 × MBL0.2351 × MSO−0.3062 × MAP2.1430 × MAT−2.17550.797995.6049.74
Q100Q = 4.44 × 10−2 × A0.0425 × E1.5294 × TSL−0.6337 × NS1.1923 × P−0.9482 × MCL0.4599 × MBL0.4065 × MSO−0.2691 × MAP1.8615 × MAT−2.68160.7621253.8553.95
Q200Q = 3.52 × A0.0150 × E1.4899 × TSL−0.6740 × NS1.2197 × P−1.1353 × MCL0.5178 × MBL0.5700 × MSO−0.2321 × MAP1.5862 × MAT−3.17230.7211585.6258.97
Q1000Q = 6.90 × 104 × A−0.0488 × E1.3965 × TSL−0.7670 × NS1.2821 × P−1.5465 × MCL0.6522 × MBL0.9292 × MSO−0.1463 × MAP0.9612 × MAT−4.27450.6102726.4873.62
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Dey, S.; Zahid, S.M.T.; Dey, S.; Rahaman, K.M.A.; Islam, A.K.M.S. Regional Flood Frequency Analysis in Northeastern Bangladesh Using L-Moments for Peak Discharge Estimation at Various Return Periods in Ungauged Catchments. Water 2025, 17, 1771. https://doi.org/10.3390/w17121771

AMA Style

Dey S, Zahid SMT, Dey S, Rahaman KMA, Islam AKMS. Regional Flood Frequency Analysis in Northeastern Bangladesh Using L-Moments for Peak Discharge Estimation at Various Return Periods in Ungauged Catchments. Water. 2025; 17(12):1771. https://doi.org/10.3390/w17121771

Chicago/Turabian Style

Dey, Sujoy, S. M. Tasin Zahid, Saptaporna Dey, Kh. M. Anik Rahaman, and A. K. M. Saiful Islam. 2025. "Regional Flood Frequency Analysis in Northeastern Bangladesh Using L-Moments for Peak Discharge Estimation at Various Return Periods in Ungauged Catchments" Water 17, no. 12: 1771. https://doi.org/10.3390/w17121771

APA Style

Dey, S., Zahid, S. M. T., Dey, S., Rahaman, K. M. A., & Islam, A. K. M. S. (2025). Regional Flood Frequency Analysis in Northeastern Bangladesh Using L-Moments for Peak Discharge Estimation at Various Return Periods in Ungauged Catchments. Water, 17(12), 1771. https://doi.org/10.3390/w17121771

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop