1. Introduction
Held and Soden [
1] highlighted that variations in the hydrological cycle have profound impacts on human activities, including an increased risk of flooding as well as the occurrence and severity of droughts. Floods are among the most damaging natural hazards worldwide, and their impacts have intensified due to factors such as irregular human settlements near rivers, deforestation, and continuous land-use changes.
Mexico is especially vulnerable to hydrometeorological events that cause widespread damage across its territory. Between 2007 and 2020, heavy rains, floods, and tropical cyclones affected 27.8 million people, resulting in 1173 fatalities and economic losses exceeding 24.6 billion USD [
2]. Analyses of rainfall patterns in Mexico [
3,
4] indicate that floods caused by extreme rainfall events are likely to become more frequent and intense in the future. Given the potential consequences of underestimating or overestimating rainfall quantiles—ranging from hydraulic structure failure to unnecessarily inflated construction costs—reliable statistical tools are essential for estimating their magnitude and frequency.
Numerous at-site and regional rainfall frequency analyses have been conducted worldwide, and results consistently demonstrate that no single distribution universally fits annual maximum daily rainfall (AMDR) data. Although the Gumbel distribution (G) is widely applied, studies such as Gado et al. [
5] in Egypt have shown that it is not always appropriate, while in other regions it has been identified as the most suitable model [
6,
7,
8,
9,
10,
11,
12,
13,
14,
15].
In Italy, Moccia et al. [
16] analyzed AMDR records from 297 gauging stations and found that the Fréchet distribution (EVIII) provided the best fit. Their study also revealed significant differences in return periods and return levels compared to results from G, reversed Weibull (EVII), Pareto, Lognormal, and Gamma distributions, echoing earlier findings by Koutsoyiannis [
17,
18]. The EVI, EVII, and EVIII distributions are all special cases of the Generalized Extreme Value (GEV) distribution, which has often been identified as the best-fitting model for AMDR samples in multiple regions [
19,
20,
21,
22,
23,
24,
25,
26,
27,
28]. Similarly, the Log-Pearson Type III (LPIII) distribution has also proven effective in modeling AMDR series [
29,
30,
31,
32,
33,
34].
The Exponentiated Gumbel (EG) distribution was proposed by Nadarajah [
35], who demonstrated its improved performance in climate modeling compared to the EVI and GEV distributions. Escalante-Sandoval [
36] applied EG, exponentiated Weibull (EW), exponentiated Fréchet (EF), and their mixed forms to AMDR series from 19 Mexican stations, showing that exponentiated and mixed exponentiated distributions provide flexible and reliable alternatives for modeling extreme rainfall. More recently, Soleiman and Abdollahi [
37] compared the G and EG distributions, concluding—based on AIC and BIC criteria—that EG offered a more flexible fit for hydrological datasets, corroborating earlier findings by Nadarajah [
35].
Regional frequency analysis has also emerged as a practical approach to reduce uncertainties associated with short or incomplete rainfall records at gauged sites. In this context, multivariate joint estimation models have proven valuable, as they enhance the estimation of marginal distribution parameters and improve regional at-site estimates of return levels by incorporating information from neighboring sites within homogeneous regions. Bivariate approaches have shown promise in flood frequency analysis [
38,
39,
40,
41,
42,
43,
44,
45,
46].
In multivariate flood modeling, copula-based approaches remain standard because they decouple marginal behavior from the joint structure and are widely used in hydrology [
47,
48,
49,
50,
51,
52,
53]. By contrast, the multivariate logistic model provides a direct parametric description of dependence, avoids the need to select a copula family a priori, and can reduce modeling and computational burden while preserving the ability to capture joint behavior [
54,
55].
In this study, a bivariate distribution with Exponentiated Gumbel marginals (BEG) is proposed to improve the estimation of marginal parameters and corresponding quantiles. The performance of this model is compared with that of univariate probability distributions, namely G, GEV, and EG.
2. Materials and Methods
2.1. Study Area and Data
Mexico is characterized by a summer–rainy and winter–dry regime, except in the northwest. Annual precipitation ranges from <500 mm in the north and northwest to >2000 mm in the humid south and southeast. Two states that illustrate this climatic contrast are Coahuila, in northern Mexico, and Tabasco, in the southeast (
Figure 1).
Coahuila (151,563 km2) has a semi-warm summer and cold winter climate, with a mean annual temperature of 20 °C (min. 4 °C, max. 30 °C). Rainfall is scarce, averaging ~400 mm yr−1, concentrated in summer. Maximum daily rainfall varies between 11.3 and 453 mm.
Tabasco (25,267 km2) is predominantly warm-humid, with abundant summer rains (75.97%), year-round humid conditions (19.64%), and sub-humid summer rains (4.39%). The mean annual temperature is 27 °C (min. 18.5 °C, max. 36 °C). Rainfall occurs throughout the year, peaks from June to October, and averages 2550 mm yr−1. Maximum daily rainfall ranges from 38.8 to 816.9 mm.
The dataset consists of records from 106 rain gauge stations in Coahuila and 75 stations in Tabasco, obtained from National Water Commission (CONAGUA) [
56]. The location of each station is detailed in
Appendix A.
2.2. Delineation of Homogeneous Regions
The joint parameter estimation model requires that all samples belong to the same homogeneous region. In this study, regions were delineated using the Hosking–Wallis L-moments framework (L-CV, Tao2; L-skewness, Tao3; L-kurtosis, Tao4), [
57]. We assessed regional homogeneity via the
H statistic.
2.3. Univariate Distributions
We consider the G, GEV, and EG distributions as candidate marginals for AMDR. All parameters are estimated by MLE with standard regularity conditions; detailed formulas and log-likelihoods are compiled in
Appendix B. The Rosenbrock optimization algorithm for constrained variables [
58] was selected for estimating the univariate parameters by the direct maximization of its corresponding log-likelihood function.
2.4. Biexponentiated Gumbel Distribution (BEG)
As already mentioned, multivariate extreme value distributions have been shown to be a reliable option for fitting hydrological variables. Yue and Wang [
54] compared the mixed and logistic bivariate models and concluded that the logistic model (LM) is the best option in flood frequency analysis.
As outlined by Collali et al. [
51], LM is denoted as follows:
where
x and
y denote the extreme events (e.g., annual maximum daily rainfall) gauged at a pair of neighboring sites,
m is the association parameter, and
The marginals’ distributions
F(
x) and
F(
y) are specified as Exponentiated Gumbel (EG). As in the univariate case, we jointly estimate the parameters using the Rosenbrock optimization algorithm under the requisite parameter constraints [
58].
Appendix C presents the log-likelihood formulation and the parameter-estimation procedure in detail.
2.5. Selection of Best Fit
The selection of fit between the Empirical and Theoretical distribution of the AMDR was based on the AICc and BIC goodness of fit tests.
The AIC was proposed by Akaike [
59]:
while the BIC is [
60]:
where
represents the log-likelihood of empirical distribution,
p the number of maximum-likelihood estimates of the parameters, and
n the length of record.
For small
n, a corrected version of AIC is proposed as [
59]:
A distribution having least value of AICc is considered as best model; BIC is used to break ties when the difference in connected AIC (ΔAICc) is less than. Model preference is evaluated per station.
Here, AICc and BIC are computed from the same maximized log-likelihood for each candidate model.
2.6. Reliability of Estimated Quantiles
It is very important to evaluate whether the joint estimation of BEG distribution provides more accurate and reliable marginal quantiles than univariate fits. This evaluation is essential in hydrological frequency analysis, since underestimation of quantiles can increase the risk of hydraulic structure failure, while overestimation may result in unnecessarily high construction costs. To assess reliability, quantile estimates obtained from the BEG model were compared with those from univariate distributions using two statistical criteria: bias (BIAS) and mean squared error (MSE). This framework allowed us to determine not only the accuracy of the estimated quantiles but also the extent to which the joint estimation procedure enhances the transfer of information across sites, thereby improving the robustness of extreme rainfall frequency analysis.
To formalize this evaluation, the reliability of estimated quantiles was expressed in terms of bias and mean squared error, which are defined as follows:
Let
be the marginal return level for the base station, where the marginal is EG, but its parameters are estimated jointly with a neighbor via the BEG likelihood.
For the number of simulated samples “
”
and
When estimating quantiles is desirable to have unbiased and minimum MSE estimators.
Bivariate Bootstrap for Uncertainty Quantification (BEG)
We quantified uncertainty with a nonparametric bivariate bootstrap aligned to the BEG likelihood. Within the common period (CP), we resampled year indices as pairs (Xt, Yt) with replacement to preserve the empirical cross-site dependence; non-overlap segments (before/after the CP) were resampled independently within each station. For each bootstrap replicate (B =1000 per station–neighbor pair, unless noted), we re-estimated the BEG model (EG marginals and logistic model) by maximum likelihood and computed the base-station marginal return level 95% CIs were the percentile interval [q0.025, q0.975] across replicates.
3. Results
3.1. Quality Control Analysis
For each station, annual maximum daily rainfall values were analyzed. The data underwent a comprehensive quality control process, including outlier detection (Grubbs test, three-sigma rule), independence verification (Anderson–Mantilla–Amigó test), homogeneity assessment (Helmert, Student’s t, Cramer, Pettit, Standard Normal, Buishand, and Von Neumann tests), and trend analysis (Spearman and Mann–Kendall tests).
Table 1 presents the results of this quality analysis for a selection of climatological stations in Coahuila.
3.2. Delineation of Homogeneous Regions
The rain gauge stations in Coahuila and Tabasco were classified into homogeneous regions by applying the L-moments delineation procedure previously described. This methodological approach yielded six regions in Coahuila and eight regions in Tabasco were ultimately retained (
Appendix D), ensuring consistency with the adopted regionalization framework.
3.3. Univariate and Bivariate Frequency Analysis
The analysis of the Coahuila dataset considered all possible pairwise station combinations within each homogeneous region to evaluate the dependence structure of extreme rainfall.
Appendix E illustrates a representative example for station 5001, showing the BEG model combinations applied and the corresponding return level estimates.
It is important to clarify that return levels are computed from the base-station marginal EG distribution, where its parameters are estimated jointly with the neighbor through the BEG likelihood. Thus, although BEG is bivariate, the reported return levels are univariate design values for the base station.
When extended to the complete set of stations, the procedure enabled a systematic assessment of return levels using the BEG framework. For comparison, alternative univariate models—G, GEV, and EG—were also fitted to the same station, and the resulting return levels were ranked according to goodness-of-fit statistics, including the Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC). Notably, in 74% of the stations in Coahuila the best fit was achieved with the BEG distribution (
Table 2), highlighting its reliability for modeling extreme rainfall in this region.
A similar procedure is applied in Tabasco, where all possible station pairs within each homogeneous region were analyzed. In this case, 65% of the stations achieved the best fit with the BEG distribution (
Table 3), confirming the robustness of this approach under the markedly different climatic conditions of southeastern Mexico. Among the 181 total cases analyzed across 14 regions, the BEG distribution demonstrated the highest suitability, fitting 70% of the cases. This was followed by the G (12%), EG (10%), and GEV (8%) distributions. The BEG model showed particularly strong performance in regions 4 and 6 in Coahuila, accounting for 24 and 28 best fits, respectively.
3.4. Reliability of Estimated Quantiles
A data generation procedure was performed to see whether the quantiles obtained through the bivariate joint estimation of parameters are more reliable than those obtained by its univariate counterpart.
For the EG distribution, data were generated using population parameters , and with sample sizes n = 10, 20 and 50. A total of 1000 simulated samples were considered for each n.
For the BEG distribution, quantiles were obtained by combining samples with sizes n1-n2: 10–10, 10–20, 20–20, 20–50, 50–50 and 50–100. Comparisons were performed for non-exceedance probabilities of 0.50, 0.80, 0.90. 095 0.98 and 0.99. The associated site has population parameters , and .
Results (
Table 4 and
Table 5) indicate that as
increased relative to
, both BIAS and MSE of the shorter series decreased. This demonstrates an effective transfer of information when parameters are jointly estimated, supporting the conclusion that quantiles computed using the BEG distribution are more reliable than those obtained from the univariate case. For this analysis
represents the base station length,
is the neighbor length/common period; larger
emulates borrowing information.
3.5. Bivariate Bootstrap for Uncertainty Quantification (BEG)
We quantify uncertainty via bootstrap 95% Confidence Intervals for marginal return levels under BEG (joint estimation) and EG (univariate). For BEG we use paired resampling over the common period (CP) to preserve dependence and independent resampling within non-overlap segments. Each bootstrap replicate re-estimates parameters and recomputes the return levels. We denote 95% percentile confidence intervals (CIs) as [CIL, CIU], where CIL and CIU represent the lower and upper bounds of the interval, respectively.
For the illustrative bivariate combination (station 5001-station 5146), we used a nested bootstrap (1000 base resamples × 1000 neighbor resamples). Results are presented in
Table 6. Bootstrap CIs are narrower under BEG than EG, underscoring reduced quantile uncertainty via information sharing.
These results demonstrate that the BEG distribution consistently outperforms classical models, providing the best fit in both the arid to semi-arid conditions of Coahuila and the humid tropical environment of Tabasco. This consistency across regions and evaluation metrics underscores the versatility of the BEG framework for regional frequency analysis in contrasting hydroclimatic settings.
4. Discussion
The results obtained in Coahuila and Tabasco demonstrate the advantages of adopting flexible probability models such as the Bivariate Exponentiated Gumbel (BEG) distribution for extreme rainfall analysis. In both regions, the BEG framework provided the best fit for a majority of stations—74% in Coahuila and 65% in Tabasco—when compared to the classical Gumbel, Generalized Extreme Value (GEV), and Exponentiated Gumbel distributions. This superior performance, confirmed by goodness-of-fit criteria including AIC and BIC, highlights the ability of the BEG model to capture the dependence structure of extreme events more effectively than univariate alternatives.
The simulation experiments reinforce this conclusion. By jointly estimating parameters across paired samples, the BEG approach significantly reduced bias and mean squared error in quantile estimation, particularly as one sample length increased relative to the other. This result demonstrates the effective transfer of information between series, a feature that is especially relevant in hydrological contexts where records are often short, fragmented, or incomplete. In contrast, univariate models produced higher estimation errors under identical conditions, confirming the greater robustness of BEG.
An important aspect of these findings is the consistency of the BEG performance across contrasting climatic contexts. In Coahuila, a predominantly arid to semi-arid region, rainfall extremes are generally short-lived and spatially heterogeneous, which complicates regionalization and frequency analysis. Conversely, Tabasco is located in a humid tropical environment where rainfall extremes are often more spatially extensive and influenced by large-scale atmospheric dynamics. Despite these marked hydroclimatic differences, the BEG distribution consistently outperformed traditional models, underscoring its robustness and adaptability.
The systematic evaluation of all possible station pairs within homogeneous regions further strengthens the validity of the results. This approach ensures that spatial dependence is explicitly considered, rather than assuming independence between stations—a limitation common in univariate frequency analysis. The strong performance of the BEG distribution suggests that bivariate or multivariate models may be more appropriate for regions with high spatial variability, particularly where water resource planning and hydraulic design require accurate estimation of joint extremes.
From a practical perspective, the improved fit obtained with the BEG distribution has direct implications for risk management and infrastructure design. Underestimation of return levels, especially for long return periods, can lead to inadequate sizing of hydraulic structures and increased vulnerability to extreme events. By providing more reliable estimates, the BEG framework contributes to reducing uncertainty in hydrological design, supporting more resilient adaptation strategies in the face of climate variability and change.
Finally, these results align with recent studies emphasizing the importance of moving beyond stationary univariate models in hydrology. The incorporation of flexible, bivariate approaches not only improves statistical performance but also offers a more realistic representation of rainfall extremes, particularly in regions with complex climatic dynamics. The outcomes from Coahuila and Tabasco therefore provide empirical evidence supporting the broader adoption of BEG-based regional frequency analysis in Mexico and comparable hydroclimatic contexts.
Limitations of the Proposed Model
Despite its advantages, the BEG (Bivariate Exponentiated Gumbel) distribution has several limitations that should be acknowledged. First, it assumes that both marginals follow the Exponentiated Gumbel form and that dependence between stations is adequately captured by the logistic parameter m; if either assumption is violated (e.g., heavy-tail behavior not well described by EG, or nonlinear/spatially anisotropic dependence not well represented by the logistic model), performance may degrade. Second, the method relies on the definition of homogeneous regions and on pairing gauges that are assumed to be hydrologically comparable; when regions are poorly defined or strongly affected by localized processes (orography, land use, tropical cyclones, convective cells), the “information transfer” from a long-record neighbor to a short-record base station may introduce bias rather than reduce it. Third, the framework is essentially stationary: it assumes that extremes come from a single, time-invariant process after quality control and trend screening. Long-term non-stationarity driven by climate variability or land-use change is not modeled explicitly, so extrapolated return levels at long return periods may still be optimistic or conservative. Fourth, BEG is pairwise by construction—parameters for a target station depend on the specific neighbor used in the joint fit—which means estimates can change if a different neighbor is selected, and the approach does not yet exploit more than two sites at once. Finally, while bootstrap confidence intervals help quantify uncertainty, we have not formally tested whether reductions in bias and mean squared error are statistically significant across all stations, so gains in reliability should be interpreted as practical rather than universally guaranteed.
5. Conclusions
This study applied the Bivariate Exponentiated Gumbel (BEG) distribution to extreme rainfall analysis in Coahuila and Tabasco, two Mexican states with contrasting climatic conditions. The main conclusions are as follows:
Across the stations analyzed, the BEG distribution provided the best fit in 74% of the cases in Coahuila and 65% in Tabasco, outperforming classical alternatives such as the Gumbel, Generalized Extreme Value (GEV), and Exponentiated Gumbel distributions. This performance was consistently confirmed by statistical indicators, including AIC and BIC.
Simulation experiments showed that BEG reduced bias and mean squared error when estimating quantiles, especially for short or heterogeneous samples. The joint estimation of parameters allowed effective transfer of information, resulting in more reliable quantiles than those obtained from univariate approaches.
The BEG model demonstrated strong adaptability in both arid to semi-arid conditions (Coahuila) and humid tropical conditions (Tabasco), underscoring its versatility as a tool for regional frequency analysis of extreme rainfall.
The systematic evaluation of all possible pairwise station combinations within homogeneous regions captured spatial dependence more effectively than univariate models. This methodological strength enhances the reliability of rainfall return level estimates, particularly for long return periods.
The improved accuracy of return level estimates obtained with the BEG distribution can reduce underestimation of design events, thereby contributing to safer and more resilient hydraulic infrastructure, as well as supporting climate adaptation planning in water management.
In summary, the BEG distribution proves to be a reliable and flexible alternative for regional extreme rainfall analysis in Mexico. Its demonstrated robustness suggests that it can be extended to other regions with similar hydroclimatic variability, providing a valuable tool for both scientific research and applied hydrological practice. Future research should explore the application of the BEG model under non-stationary conditions, test its performance in multivariate settings that include variables such as streamflow or temperature, and assess its potential for integration into climate change impact studies.