2.1. Study Area
Mexico covers almost 2 million km
2 and exhibits pronounced hydro-climatic contrasts, from arid deserts in the north to humid tropical basins in the southeast. This variability is reflected in the 37 RHS of Mexico, officially defined by the National Water Commission (CONAGUA, Mexico) [
13], which serve as the basis for national water planning and management (
Figure 1,
Table 1). Each region integrates climatic, physiographic, and drainage-basin characteristics, providing a consistent framework for analyzing low-flow regimes and for interpreting the results in terms of practical water management needs.
To aid interpretation of national patterns, the 37 RHS were further grouped into four broader hydro-climatic classes used throughout the analysis: (i) northern and northwestern arid regions; (ii) central highlands and transition zones; (iii) Gulf of Mexico basins; and (iv) southern tropical basins. This grouping is process-based rather than purely geographic. It reflects dominant climate drivers (arid/semi-arid vs. humid tropical rainfall regimes), hydrologic controls on baseflow (groundwater-fed recession vs. perennial runoff), and water-use pressures (e.g., intensive regulation and abstraction in Lerma–Santiago and Balsas versus largely perennial, high-yield systems in the Grijalva–Usumacinta). Operationally, we delineated these groups using mean annual precipitation, persistence of dry-season flow, and known management stress (aquifer over-extraction, reservoir operation), and then aligned them with CONAGUA’s planning regions [
13]. In other words, the classification links hydrologic behavior to the actual institutional units used for water allocation and drought planning.
The spatial hydro-climatic gradients that motivate this classification are illustrated in
Figure 2, which shows mean annual precipitation alongside the regional boundaries in
Figure 1. Rainfall in Mexico spans more than an order of magnitude: from below 300–400 mm yr
−1 in Baja California, Sonora, and Mapimí to above 2000 mm yr
−1 in the Grijalva–Usumacinta and Papaloapan basins of the southeast [
13,
14]. Temperature and evaporative demand follow an inverse pattern—northern and interior basins experience high potential evapotranspiration and strong summer drought stress, while humid tropical basins sustain high rainfall and perennial baseflow even through the dry season. As a result, streamflow regimes differ sharply across regions. In the northwest and interior plateau, many rivers are intermittent or strongly regulated; dry-season discharges can approach zero for weeks, and observed 7-day minimum averages (7Q) at individual stations often fall below 0.1–0.5 m
3 s
−1. In contrast, tropical rivers in southern and Gulf-slope basins (e.g., Papaloapan, Grijalva–Usumacinta) are perennial; even during the low-flow season, 7Q values commonly remain one to two orders of magnitude higher than in arid basins. Transitional systems in central Mexico (e.g., Lerma–Santiago, Balsas) show mixed behavior: groundwater sustains some baseflow, but chronic over-extraction and reservoir operation can impose prolonged artificial low-flow periods during the dry season. These contrasts underline a key statistical challenge: no single unimodal probability distribution is likely to capture both (i) near-zero, drought-driven intermittence in semi-arid rivers and (ii) persistently positive perennial baseflows in humid tropical systems. This motivates testing flexible mixture distributions for national-scale low-flow frequency analysis.
As shown in
Figure 1 and
Figure 2, the 37 RHS of Mexico encompass a wide range of climatic and physiographic settings, from the arid basins of Baja California and the northern plateau to the humid tropical systems of the southeast. This spatial diversity justifies the regional grouping adopted in this study, whereby RHS were classified into (i) northern and northwestern arid regions, (ii) central highlands and transition zones, (iii) Gulf of Mexico basins, and (iv) southern tropical basins. Such grouping facilitates the interpretation of low-flow dynamics in relation to climate gradients, water use pressures, and management challenges across the country. Importantly, these sharp hydro-climatic contrasts highlight the limitations of single univariate models and motivate the use of Mixture Probability Distributions, which can better capture the diverse low-flow regimes observed across Mexico.
This set of regions covers much of Baja California, Sonora, and the inland arid basins of northern Mexico, including Baja California Noroeste (RHS 1), Baja California Suroeste (RHS 3), Sonora Norte (RHS 8), Sonora Sur (RHS 9), Sinaloa (RHS 10), Presidio–San Pedro (RHS 11), Cuencas Cerradas del Norte (RHS 34), Mapimí (RHS 35), and El Salado (RHS 37). These areas are characterized by arid and semi-arid climates, with annual precipitation often below 400 mm and strong seasonality. Rivers are intermittent or highly regulated, and low flows are largely controlled by groundwater recession and reservoir operation. Droughts are recurrent, and evapotranspiration rates are among the highest in the country.
This zone includes Lerma–Santiago (RHS 12), Río Huicicila (RHS 13), Río Ameca (RHS 14), Costa de Jalisco (RHS 15), Armería–Coahuayana (RHS 16), Costa de Michoacán (RHS 17), Balsas (RHS 18), Costa Grande de Guerrero (RHS 19), Costa Chica de Guerrero (RHS 20), and Nazas–Aguanaval (RHS 36). These regions correspond to temperate to semi-humid climates, with rainfall ranging from 700 to 1200 mm per year. The Lerma–Santiago basin (RHS 12) is heavily populated and industrialized, and together with the Balsas basin (RHS 18) represents one of the most water-stressed parts of the country due to aquifer overexploitation and high demand.
The Gulf slope includes Costa de Oaxaca (RHS 21), Tehuantepec (RHS 22), Costa de Chiapas (RHS 23), Bravo–Conchos (RHS 24), San Fernando–Soto La Marina (RHS 25), Panuco (RHS 26), Norte de Veracruz (RHS 27), Papaloapan (RHS 28), and Coatzacoalcos (RHS 29). These regions are generally humid subtropical to tropical, with annual rainfall often exceeding 1500 mm. Rivers are perennial but exhibit sharp seasonal contrasts. They are important for hydropower, irrigation, and ecosystems, but low-flow droughts can still stress ecological flows and agricultural activities.
This group includes the Grijalva–Usumacinta (RHS 30) and the three Yucatán regions (RHS 31–33). The Grijalva–Usumacinta system sustains some of the largest discharges in Mexico, with annual rainfall frequently above 2000 mm. By contrast, the Yucatán basins are karstic, with few surface rivers but extensive groundwater networks, making low-flow processes distinct and tied to aquifer dynamics.
This spatial diversity across 37 RHS explains why no single univariate distribution can adequately represent low-flow regimes at the national scale. Arid northern basins require flexible models to capture intermittent and drought-dominated patterns, while humid southern basins demand approaches that represent persistent perennial baseflows. Mixture Probability Distributions provide a natural solution to capture this variability, consistent with Mexico’s regionalized water management framework.
2.3. Probability Distributions for Low-Flow Frequency Analysis
To evaluate the statistical behavior of 7Q flows, both univariate and Mixture Probability Distributions were considered. The univariate case included four classical families—LN3, G3, G, and W3—whose probability density functions (PDFs) are defined by combinations of location, scale, and shape parameters. These models provide a straightforward framework for estimating low-flow quantiles, but they may lack flexibility when confronted with multimodal, highly skewed, or heterogeneous samples.
To address these limitations, two-component mixture distributions were also applied. A mixture model combines two distinct univariate distributions, each with its own parameter set, weighted by a mixing proportion “p” constrained between 0 and 1. Mathematically, the overall PDF is expressed as a convex combination of the component densities, enabling mixtures to capture complex low-flow regimes that cannot be adequately represented by a single distribution. This flexibility makes mixture models particularly suited for hydro-climatic contexts where diverse generating processes govern low-flow behavior.
2.3.1. Univariate Distributions
Lognormal 3-parameter distribution [
3,
5]
where
is the location parameter,
is the scale parameter, and
is the shape parameter.
Gamma 3-parameter distribution [
16,
17]
where
is the location parameter,
is the scale parameter, and
is the shape parameter.
Gumbel distribution [
18,
19]
where
is the location parameter and
is the scale parameter.
Weibull 3-parameter distribution [
20,
21]
where
is the location parameter
,
is the scale parameter
, and
is the shape parameter
.
2.3.2. Mixture Distributions
Annual minimum flows result from the progressive depletion of a basin’s water storage until discharge reaches its lowest level. In some rivers, this recession is primarily driven by evaporation, whereas in others it results from the combined effects of evaporation and the lack of rainfall-driven recharge [
22]. When such distinct mechanisms produce events that belong to different subpopulations, their combined behavior can be represented by a model that accounts for both groups simultaneously. This approach is formalized through mixture distributions [
12,
23], also known as blended distributions [
23]:
where
is the cumulative distribution function (CDF) of the mixture,
and
are the component CDFs, and
p is the mixing proportion that determines the relative contribution of each component.
2.5. Goodness-of-Fit Tests and Decision Criterion
The adequacy of each candidate distribution is evaluated using two empirical distribution function (EDF) tests: the Kolmogorov–Smirnov (K–S) and Anderson–Darling (A–D) statistics [
25,
26,
27,
28,
29,
30,
31,
32]. Both compare the empirical cumulative distribution function (ECDF) of the sample with the fitted theoretical cumulative distribution function
, but they weight deviations differently—a distinction that is particularly relevant in extreme-value applications, where tail behavior is critical.
2.5.1. Kolmogorov–Smirnov Test
Let
be a continuous candidate CDF and
with empirical distribution function:
The K–S statistic is defined as the maximum vertical distance between
and
:
The null hypothesis H0 states that the data are an i.i.d. sample from . Large values of indicate poor agreement between the ECDF and the fitted CDF. Critical values and p-values are obtained from the (asymptotic) sampling distribution of under H0, with appropriate adjustments when parameters are estimated from the data. The K–S test is a global goodness-of-fit (GOF) measure and is most sensitive near the center of the distribution, with comparatively less sensitivity in the tails—an important limitation in hydrological and extreme-value analyses.
2.5.2. Anderson–Darling Test
The Anderson–Darling test refines EDF-based GOF assessment by assigning greater weight to discrepancies in the distribution tails. For a continuous CDF
and ordered sample
, define
Under H0, the data are i.i.d. from , and large values of indicate poor agreement between the sample and the fitted distribution. For many distributions, the A–D test has higher power than the K–S test to detect subtle but important deviations, especially in the tails, which is crucial for modeling extreme rainfall, floods, droughts, and wind speeds.
2.5.3. Integrated Decision Criterion for Model Selection
K–S and A–D tests are applied to each candidate distribution (e.g., LN3, G3, G, W3, G–G, G–W3, W3–G, W3–W3) using the fitted CDFs with parameters estimated from the sample. Goodness-of-fit results are combined through the following hierarchical decision rule:
- (a)
At significance level α = 0.05, a model is considered acceptable if its p-value > α.
- (b)
Among acceptable models, preference is given to those with smaller and larger p-values (especially from the A–D test).
- (c)
Because of its emphasis on tail discrepancies, the A–D test is taken as the primary GOF indicator for extreme-value applications and design return levels.
- (d)
If competing models exhibit similar A–D performance, the model with the smaller K–S statistic is preferred, reflecting better overall agreement with the ECDF.
- (e)
If all models are rejected by at least one test, or if GOF statistics are very similar, the final choice is based on likelihood-based criteria (AIC, BIC [
33,
34]; lower values indicating a better balance between fit and parsimony) together with graphical diagnostics (P–P and Q–Q plots). In such cases, the selected distribution is explicitly reported as the least inadequate rather than formally adequate.
- (f)
If the K–S test is accepted but the A–D test is rejected, this typically indicates an adequate global shape but poor tail representation; such models are treated with caution or discarded for extreme-quantile estimation.
- (g)
If the A–D test is accepted and the K–S result is borderline, and likelihood-based criteria and tail-focused plots are satisfactory, the model may still be accepted for engineering purposes, prioritizing correct tail behavior.
This combined framework exploits the complementary strengths of the K–S and A–D tests and aligns the selection criterion with the primary objective of reliable estimation of extreme quantiles.