Statistical Development of Rainfall IDF Curves and Machine Learning-Based Bias Assessment: A Case Study of Wadi Al-Rummah, Saudi Arabia

Alhbib, Ibrahim T.; Elsebaie, Ibrahim H.; Alhathloul, Saleh H.

doi:10.3390/hydrology13030096

Open AccessArticle

Statistical Development of Rainfall IDF Curves and Machine Learning-Based Bias Assessment: A Case Study of Wadi Al-Rummah, Saudi Arabia

by

Ibrahim T. Alhbib

^*

,

Ibrahim H. Elsebaie

and

Saleh H. Alhathloul

Civil Engineering Department, King Saud University, P.O. Box 800, Riyadh 11421, Saudi Arabia

^*

Author to whom correspondence should be addressed.

Hydrology 2026, 13(3), 96; https://doi.org/10.3390/hydrology13030096

Submission received: 8 February 2026 / Revised: 12 March 2026 / Accepted: 12 March 2026 / Published: 16 March 2026

(This article belongs to the Section Statistical Hydrology)

Download

Browse Figures

Versions Notes

Abstract

Reliable estimation of extreme rainfall is essential for hydraulic design and flood risk mitigation, particularly in arid regions where rainfall exhibits strong temporal and spatial variability. This study presents a statistical framework for developing rainfall intensity-duration-frequency (IDF) curves, complemented by a machine learning-based assessment of model bias and performance. The analysis was conducted using data from ten rainfall stations located within or near the Wadi Al-Rummah Basin. Annual maximum series (AMS) from 1969 to 2024 were first reconstructed to address missing years using a modified normal ratio method (NRM) combined with nearest-station selection, ensuring spatial consistency while preserving station-specific rainfall characteristics. Six probability distributions (Weibull, Gumbel, gamma, lognormal, generalized extreme value (GEV), and generalized Pareto) were fitted to each station, and the best-fit distribution was identified using multiple goodness-of-fit (GOF) criteria, including the Kolmogorov–Smirnov (K-S) test, Anderson–Darling (A-D) test, root mean square error (RMSE), chi-square (χ²) statistic, Akaike information criterion (AIC), Bayesian information criterion (BIC), and the coefficient of determination (R²). Statistical IDF curves were then developed for durations ranging from 5 to 1440 min and return periods from 2 to 1000 years. To evaluate the robustness of the statistically derived IDF curves, three machine learning (ML) models, multiple linear regression (MLR), regression random forest (RRF), and multilayer feed-forward neural network (MFFNN), were trained as surrogate models using duration, return period, and station geographic attributes as predictor variables. Model performance was evaluated using RMSE, MAE, and mean bias metrics across stations and return periods. The lognormal distribution emerged as the best-fit model for four stations, while the Gumbel and gamma distributions were selected for two stations each. Overall, no single probability distribution consistently outperformed others, indicating station-dependent behavior. Among the machine learning models, the MFFNN achieved the closest agreement with statistical IDF estimates (

R M S E \approx 0.97

,

M A E \approx 0.65

,

b i a s \approx - 0.02

), followed by RRF and MLR based on global average performance across all stations and return periods. The proposed framework offers a reliable approach for rainfall IDF development and evaluation in arid region watersheds.

Keywords:

rainfall extremes; intensity-duration-frequency (IDF) curves; annual maximum series (AMS); statistical frequency analysis; goodness-of-fit analysis; machine learning; bias assessment; arid regions; Wadi Al-Rummah Basin

1. Introduction

Extreme rainfall events represent one of the primary drivers of flood hazards and infrastructure stress in arid basins [1,2,3]. Their infrequent yet intense nature introduces substantial uncertainty into hydraulic design and water resources planning. In such environments, rainfall events are typically short in duration but high in intensity, frequently triggering flash floods that challenge infrastructure resilience and flood risk mitigation strategies [4,5].

Intensity-duration-frequency (IDF) curves provide a practical and widely adopted framework for describing the probabilistic behavior of extreme rainfall and are commonly used in the design of urban drainage systems, culverts, detention facilities, and flood mitigation structures [1,6,7]. Conventional IDF development is based on statistical frequency analysis of annual maximum series (AMS), in which suitable probability distributions are fitted to observed rainfall extremes and extrapolated to selected return periods. Despite its widespread use, this approach is sensitive to data completeness, record length, and the choice of probability distribution, particularly when estimating design rainfall for long return periods [8,9,10].

In arid regions, rainfall records frequently contain missing years, short observation periods, or inconsistencies among nearby stations due to sparse gauge networks and operational constraints. Addressing missing data prior to frequency analysis is a critical step, especially when extreme values are of interest. Previous studies have demonstrated that inappropriate infilling techniques may distort the statistical properties of rainfall extremes and introduce systematic bias into IDF estimates [11,12,13]. Consequently, reconstruction approaches that preserve spatial coherence while maintaining station-specific rainfall characteristics are essential for defensible IDF development.

Another key challenge in rainfall frequency analysis is the selection of an appropriate probability distribution. No single distribution can be presumed to function at its best in a variety of climates and rainfall patterns. Comparative evaluation using multiple goodness-of-fit (GOF) criteria is therefore widely recommended to reduce subjectivity in model selection and to improve the robustness of design rainfall estimates [14,15,16].

In parallel with traditional statistical approaches, machine learning (ML) techniques have gained increasing attention in hydrology as flexible data-driven tools capable of capturing nonlinear relationships between rainfall characteristics, temporal scales, and geographic controls [17,18,19,20]. Several studies have demonstrated that machine learning models, particularly artificial neural networks and ensemble-based learners, can reproduce rainfall extremes and short-duration intensity patterns with strong predictive power, often achieving coefficients of determination exceeding 0.90 when applied to IDF-related analyses [20,21]. More recent investigations extended these approaches by employing advanced machine learning and deep learning architectures to construct IDF curves directly from satellite-derived or climate-projected precipitation data, reporting improved performance over conventional Gumbel-based formulations, especially for higher return periods where beyond the observed range uncertainty is pronounced [22,23]. While linear regression models are commonly retained as easy to interpret baselines, nonlinear learners such as neural networks and tree-based methods have generally been found to provide superior predictive skill under complex rainfall patterns [17,18,21,22,23].

Rather than replacing classical frequency analysis, machine learning models have been recognized as independent diagnostic tools for examining agreement patterns and identifying systematic deviations between data-driven predictions and statistically derived IDF curves [18,19,24]. Unlike most existing studies that employ machine learning as a direct predictive substitute for statistical IDF modeling, often within climate change impact assessments, the present study adopts machine learning as an independent evaluation framework to detect bias and assess the reliability of statistically derived IDF curves, rather than to generate IDF relationships directly.

Hydraulic structures within arid basins are fundamentally designed based on IDF curves relationships, which directly govern design floods, spillway sizing, culvert capacities, and overall safety margins. The Wadi Al-Rummah Basin contains numerous hydraulic crossings, including culverts and bridges, in addition to 39 dams constructed between 1977 and 2018. The storage capacities of these dams range from 60,000 m³ to 3,000,000 m³, reflecting varying design objectives and hydrological conditions. Notably, Wadi Al-Rummah Dam, constructed in 1977, which is the oldest dam in the basin, has a storage capacity of 1.5 million m³, and is located in the downstream reach of the Wadi Al-Rummah. These characteristics highlight the importance of reliable IDF curve development to ensure the adequacy and long-term safety of flood control infrastructure within the basin [25].

The main objective of this study is to develop station-specific IDF curves for the Wadi Al-Rummah Basin through a comprehensive framework that integrates data reconstruction, statistical modeling, and machine learning-based evaluation. Specifically, missing rainfall records are reconstructed using a distance-informed normal ratio method (NRM), and optimal probability distributions are objectively identified through multi-criteria performance ranking. The resulting statistical IDF estimates are subsequently compared with machine learning-based surrogate predictions to provide an additional layer of diagnostic consistency assessment.

2. Study Area

Al-Rammah, meaning a valley that swallows everything, describes how Wadi Al-Rummah floods sweep debris, animals and even humans along their path [26,27,28]. Wadi Al-Rummah Basin is the largest wadi system in the Arabian Peninsula and represents a typical hyper-arid to arid region watershed characterized by strong rainfall intermittency and pronounced spatial heterogeneity. Furthermore, the basin spans more than 500 km from the volcanic highlands near Al-Madinah region, passing through Hail and Qassim, and terminating within the Al-Thuwayrat sand dunes in central Saudi Arabia [28,29,30,31,32,33]. The basin receives runoff from approximately 320–600 tributaries, dominated by major wadis such as Wadi Al-Jarir and Wadi Al-Shuabah [26,27,28,32]. It goes over the Arabian Shield, which is made up of igneous and metamorphic rocks that cause a lot of runoff, and the Arabian Shelf, where sedimentary formations like the Al-Saq Aquifer dominate [28,33]. The present downstream termination of Wadi Al-Rummah is commonly identified near the coordinates (26.648° N, 44.307° E) according to the global watershed dataset [31]. Geological and historical evidence suggests that the Wadi Al-Rummah system was much larger in the past. During wetter climatic periods approximately 10,000 years ago, the drainage system extended farther northeast before progressive aridification and the eastward migration of the Al-Thuwayrat and Al-Dahna sand dunes blocked the original river pathway. This process divided the ancient channel into three major segments: Wadi Al-Rummah, Wadi Al-Ajradi, and Wadi Al-Batin, the latter eventually draining toward the Shatt Al-Arab near Basra city [28,34,35]. Collectively, these connected valleys once formed what has been described as the Great Valley of the Arabian Peninsula, also referred to as the Great River, considered the longest dry river system in the world with an estimated total length of 950–1209 km [28,34]. The basin covers an estimated area of about 94,900 km² and drains a large arid catchment characterized by low mean annual rainfall, strong spatial variability, and pronounced rainfall variability [31].

Topographic characteristics were derived from the Copernicus Digital Elevation Model (DEM) at 30 m spatial resolution. Copernicus DEM has demonstrated high vertical accuracy in regional assessments, outperforming other commonly used datasets such as SRTM and ASTER [36,37]. Accordingly, it was adopted in this study for basin delineation and terrain analysis as illustrated in Figure 1a,b. Figure 1a presents the delineated drainage network of the Wadi Al-Rummah Basin derived using the Strahler stream ordering method, which classifies channels according to the hierarchical structure of the river network. Figure 1b illustrates the spatial distribution of elevations within the basin, where terrain elevations range approximately from 562 m to 1938 m above sea level, reflecting the gradual topographic transition.

Hydrologically, Wadi Al-Rummah exhibits a seasonal flow pattern, with surface runoff occurring primarily during short-duration, high-intensity rainfall events. Although such events are infrequent, they can generate significant flash floods that pose substantial risks to infrastructure and land use across arid region basins, including Wadi Al-Rummah [16], consistent with its documented seasonal flow pattern [29]. Recent observations, including rare multiple flow events within a single year that occurred in 2023, further highlight the sensitivity of the basin to extreme rainfall and emphasize the need for reliable characterization of rainfall extremes [30].

Figure 2 presents the long-term monthly climatology derived from 93 rainfall stations across Wadi Al-Rummah Basin during the period 1963–2024. The results reveal a pronounced seasonal concentration of rainfall between October and April, while precipitation during the summer months (June–September) remains minimal. This seasonal concentration is characteristic of convective-dominated rainfall regimes in arid and semi-arid environments, which tend to produce short-duration, high-intensity storm events. Such climatological behavior partially explains the positive skewness observed in the annual maximum series used for IDF development.

Mean annual precipitation across the ten representative stations ranges from approximately 46 to 105 mm/year, reflecting the generally arid climatic setting of the Wadi Al-Rummah Basin. Nevertheless, maximum observed daily rainfall reaches 53–108 mm at several stations, indicating that a substantial portion of the annual rainfall may occur during a single intense storm event. Recent basin-specific research defined extreme precipitation events (EPEs) in the Wadi Al-Rummah Basin using a threshold of 22.5 mm, corresponding to the 80th percentile of a 3-day rolling GPM/IMERG rainfall accumulation (2000–2024) and aligned with documented flood-triggering events [38]. The fact that several observed daily extremes in the present dataset substantially exceed this threshold further confirms the dominance of short-duration, high-intensity rainfall in the basin. Such concentration of rainfall into intense events is consistent with documented extreme rainfall behavior in arid and semi-arid environments and contributes to positively skewed annual maximum rainfall series used for IDF development [1,8,15].

3. Data and Methods

This section summarizes the adopted methods. AMS were first extracted for each station. Missing AMS years were then reconstructed using a modified NRM based on the nearest available station. Frequency analysis and GOF-based ranking were subsequently used to select the best-fitting distributions and construct station-specific IDF curves, which were then evaluated using machine learning models for bias assessment, as shown in Figure 3.

3.1. Rainfall Data and Station Network

This study is based on rainfall observations from ten meteorological stations distributed within and around the Wadi Al-Rummah Basin, all of which have continuous records exceeding 30 years and were therefore deemed suitable for extreme rainfall frequency analysis, as shown in Figure 4.

Among the external stations, MDRO00166 and HIRM00388 are situated near the basin boundary and exert spatial influence on basin rainfall patterns, as identified using the Thiessen polygon method. These polygons were generated to illustrate the relative spatial influence of each station within the basin. The remaining three outside stations were used exclusively to support missing data reconstruction for stations located inside or near the basin. In particular, stations MDRO00194 and MDRM00212, which are located farther from the basin, were included solely to support the reconstruction of missing records for station MDRO00166, consistent with standard practices for rainfall infilling in data-sparse arid regions [11,12].

The rainfall stations have geographic coordinates and elevations that fall between 185 m and 976 m above mean sea level. Detailed station attributes are summarized in Table 1.

Figure 5 illustrates the interannual variation in total annual rainfall across all stations during 1969–2024. The boxplots summarize rainfall distributions at each station, where the median, mean (×), interquartile range (Q1–Q3), and whiskers indicate the spread of non-outlier values, while points above the whiskers represent unusually wet years. The results reveal clear spatial differences in rainfall variability across the Wadi Al-Rummah Basin and indicate the occurrence of occasional wet years within an otherwise predominantly arid climate.

3.2. Missing Data Reconstruction

Rainfall records in arid regions are often affected by missing observations due to sparse gauge networks, operational interruptions, and data quality issues. Initial data processing involved organizing rainfall records and extracting AMS for each station and each year from the beginning of 1969 to the end of 2024. Missing records were then identified within the AMS, as shown in Table 2, and reconstructed prior to subsequent statistical frequency analysis and machine learning-based evaluation, ensuring internally consistent extreme rainfall series. And because extreme rainfall values play a critical role in IDF development, missing rainfall values in this study were reconstructed using an NRM from the closest station only, as shown in Equation (1).

Inter-station correlations across the network are generally moderate; the geographically nearest station to each target typically exhibited the strongest rainfall similarity within the dataset (e.g.,

r = 0.577

between MDRM00212 and MDRO00194). These correlations were computed using the annual maximum daily rainfall series (AMS) at each station. A regression analysis between the log-transformed inter-station distance and the absolute correlation coefficient (

| r |

) indicated a statistically significant negative relationship, meaning that as the distance between stations increases, the similarity in extreme rainfall behavior decreases. The estimated slope (

β

) was

- 0.349

, with a

95 %

confidence interval ranging from

- 0.457

to

- 0.241

, and a p-value less than 0.001. The model explained approximately

50 %

of the variability in rainfall similarity (

R^{2} = 0.5

). Together, these findings provide quantitative evidence supporting the selection of nearby stations for missing-data reconstruction under the normal ratio framework.

To comprehensively evaluate the effectiveness of this approach, a systematic cross-validation framework was conducted across the ten rainfall stations selected for this study. Artificial data removal (

20 %

of available years) was performed for each station [11,12,13,39], and the removed values were reconstructed using four competing approaches: (i) arithmetic mean method (AMM), (ii) inverse distance weighting (IDW), (iii) conventional NRM, and (iv) NRM combined with nearest-station selection. Performance was evaluated using Pearson’s correlation coefficient (|

r

|), root mean square error (RMSE), and mean absolute error (MAE). The comparative results for all ten stations are summarized in Table 3.

Unlike direct interpolation approaches applied to raw time series, the adopted reconstruction strategy explicitly incorporated both temporal and spatial considerations. Spatial closeness between stations was accounted for through a distance matrix (Table 4).

The adopted approach differs from inverse distance or multi-station averaging methods in that it prioritizes the nearest station rather than combining multiple neighboring stations. This conservative strategy was selected to minimize excessive smoothing and to preserve the statistical structure of extreme rainfall events, which is essential for subsequent AMS-based frequency analysis and IDF curve construction. Similar closeness-based and NRM approaches have been shown to perform reliably in arid regions when extreme rainfall preservation is required [11,12]. NRM by nearest station expressed in Equation (1):

P_{x} = P_{r} \times (\frac{N_{x}}{N_{r}})

(1)

where

P_{x}

is the estimated rainfall at the target station (missing value).

x

is the missing value in (mm).

P_{r}

is the observed rainfall at the selected nearest reference station. The subscript

r

denotes the single nearest station selected based on minimum spatial distance.

N_{x}

is the long-term mean rainfall at the target station.

N_{r}

is the long-term mean rainfall at the selected reference station.

To explicitly clarify the extraction of the AMS and the reconstruction of missing years, a detailed station-level demonstration was conducted for two representative stations: QARS00249 (Qassim region) and MDRO00166 (Al-Madinah region).

First, the AMS was extracted directly from the daily rainfall records by identifying the maximum daily rainfall value for each hydrological year. The extracted annual maxima are presented alongside the full daily observations to visually demonstrate how each yearly extreme originates from the raw dataset, as shown in Figure 6 and Figure 7.

Second, missing AMS years were reconstructed using the method identified as the best-performing approach in Table 3. The selection was based on multi-station validation results, where the NRM combined with nearest-station selection demonstrated superior agreement and lower reconstruction errors compared with alternative methods. The completed AMS for representative stations is presented in Figure 8 and Figure 9, where reconstructed years are explicitly distinguished from observed values to illustrate the temporal continuity achieved after data infilling.

This graphical validation confirms that the adopted methodology maintains the statistical behavior of the extreme rainfall series while ensuring temporal continuity. Figure 10 illustrates the mean annual rainfall and the annual maximum values before and after the infilling process, demonstrating that the reconstruction does not distort the magnitude or variability of extreme events and thereby strengthens the reliability of the IDF development process.

3.3. Statistical Frequency Analysis

Rainfall frequency analysis was conducted using the AMS extracted for each rainfall station following the completion of missing data reconstruction. The AMS approach was selected because it is widely adopted in rainfall IDF analysis and provides a consistent framework for modeling extreme rainfall events relevant to hydraulic design applications [6,8,9,40].

For each station, AMS rainfall depths corresponding to durations ranging from 5 to 1440 min were derived and analyzed. The best-fit probability distribution was selected individually for each station using the combined GOF-based ranking framework. This station-specific distribution was then consistently applied to all durations when constructing the IDF curves.

Six probability distributions commonly applied in hydrological extreme value analysis were evaluated: Weibull, Gumbel, gamma, lognormal, generalized extreme value (GEV), and generalized Pareto (GP) distributions [8,9,10,16,20,41,42,43,44,45,46]. Distribution parameters were estimated using the maximum likelihood estimation (MLE) procedure.

The suitability of each probability distribution was evaluated using multiple GOF criteria to reduce subjectivity associated with single-metric selection. The applied GOF measures include the Kolmogorov–Smirnov (K-S) test, Anderson–Darling (A-D) test, RMSE, chi-square (

χ^{2}

) statistic, Akaike information criterion (AIC), Bayesian information criterion (BIC), and the coefficient of determination (

R^{2}

) [8,14,16,46,47,48]. These criteria collectively assess distributional performance in terms of tail behavior, overall fit, and simplicity of the model. The GOF tests adopted in this study are summarized as follows:

The K–S test (Equation (2)) and A–D test (Equation (3)) was used to quantify the maximum deviation between the empirical and theoretical cumulative distribution functions.

D = m a x | F_{o} (x) - F_{t} (x) |

(2)

A^{2} = - n - (\frac{1}{n}) Σ (2 i - 1) [l n F_{t} (x_{i}) + l n (1 - F_{t} (x_{n + 1 - i}))]

(3)

where F_o(x) and F_t(x) are the empirical and theoretical cumulative distribution functions, x_i is the i-th ordered observation, and

n

is the sample size.

The chi-square ( $χ^{2}$ ) statistic (Equation (4)) was applied to evaluate discrepancies between observed and expected frequencies across k classes.

χ^{2} = \frac{\sum_{i = 1}^{k} {(O_{i} - E_{i})}^{2}}{E_{i}}

(4)

where

O_{i}

and

E_{i}

are the observed and expected frequencies in class

i

, respectively, and

k

is the number of classes.

Overall predictive accuracy was evaluated using the root mean square error (RMSE) and the coefficient of determination ( $R^{2}$ ), as defined in Equations (5) and (6), respectively.

R M S E = \sqrt{[(\frac{1}{n}) {\sum_{i = 1}^{n} (x_{i} - {\hat{x}}_{i})}^{2}]}

(5)

R^{2} = 1 - [\frac{{\sum_{i = 1}^{n} (x_{i} - {\hat{x}}_{i})}^{2}}{\sum_{i = 1}^{n} {(x_{i} - \hat{x})}^{2}}]

(6)

where

x_{i}

denotes the observed AMS rainfall values,

{\hat{x}}_{i}

represents the corresponding model estimates,

\bar{x}

is the sample mean of observations, and

n

is the sample size.

Model parsimony was evaluated using information criteria, namely the Akaike information criterion (AIC) and the Bayesian information criterion (BIC), as defined in Equations (7) and (8), respectively.

A I C = 2 p - 2 l n (L)

(7)

B I C = p \times l n (n) - 2 l n (L)

(8)

where

p

is the number of estimated parameters in the fitted distribution,

n

is the sample size, and

L

is the maximized likelihood.

The parameters of each probability distribution were estimated using MLE, where the log-likelihood function was constructed from the corresponding probability density function (PDF) and numerically optimized. For two-parameter (2P) distributions (Weibull, Gumbel (EVI), gamma, and lognormal), the parameter vector includes two parameters (e.g., location and scale or shape and scale, depending on the distribution). For three-parameter (3P) distributions (GEV and GP), a shape parameter was included to account for tail behavior.

For the 2P distributions, the log-likelihood was optimized numerically using the fmincon solver (a constrained nonlinear optimization solver) with the interior-point algorithm. Parameter bounds were imposed to ensure physically meaningful solutions (e.g., positive scale parameters). Initial parameter values were obtained using the method of moments to enhance numerical stability and reduce the risk of convergence to local minima. Convergence followed standard stopping criteria based on objective function stabilization and first-order optimality conditions.

For 3P distributions (GEV and GP), built-in MLE procedures (gevfit for the GEV distribution and gpfit for the GP distribution) were applied.

All quantiles were subsequently computed using the corresponding inverse cumulative distribution functions (inverse CDFs) to ensure consistency in return-level estimation.

The logarithmic transformation was applied solely within the likelihood formulation to improve numerical stability during optimization. All final rainfall intensities and return levels were computed and reported in the original physical units (

m m / h r

), while retaining the original scale of extreme observations.

To identify the most appropriate distribution for each station, a unified ranking framework was adopted. For each GOF metric, distributions were ranked according to their relative performance, and a total score was computed by combining individual rankings. The distribution with the lowest total score was selected as the best-fitting model for that station. This multi-criteria ranking approach minimizes bias arising from reliance on a single GOF statistic and enhances the robustness of the selected frequency model, as expressed in Equation (9) [8,14].

T o t a l S c o r e = \sum_{j = 1}^{m} R_{j}

(9)

where

R_{j}

denotes the rank assigned to a given distribution under the j-th evaluation criterion, and

m

is the total number of goodness-of-fit measures considered.

Once the optimal probability distribution was identified for each station, quantile estimates corresponding to return periods ranging from 2 to 1000 years were derived. These quantiles were subsequently used to construct station-specific rainfall IDF curves, which served as the statistical reference service for subsequent machine learning-based bias assessment.

3.4. Machine Learning Bias Assessment Framework

Machine learning models were incorporated strictly as independent diagnostic tools to evaluate the agreement between data-driven predictions and statistically derived IDF intensities. These statistical estimates correspond to intensities derived from the best-performing probability distribution selected by the GOF ranking framework and were treated as the methodological reference in this study.

The ML-based assessment was implemented to quantify predictive consistency by comparing ML-predicted rainfall intensities against the statistically derived IDF values identified in Section 3.3, without implying methodological superiority or replacement of the conventional frequency-analysis approach.

Three machine learning models with different levels of complexity and interpretability were selected: multiple linear regression (MLR), regression random forest (RRF), and multilayer feed-forward neural network (MFFNN). MLR was included as a baseline model to represent linear relationships, while RRF and MFFNN models were adopted to capture nonlinear interactions between rainfall characteristics and controlling factors [17,18,19,24]. This combination allowed for a balanced comparison between simple and advanced data-driven approaches. In all machine learning models, five predictor variables (p = 5) were employed: log10(rainfall duration), log10(return period), latitude, longitude, and elevation.

MLR assumes a linear combination of the input features and serves as a reference benchmark for evaluating the performance gains achieved by more advanced machine learning models. The MLR formulation is expressed in Equation (10).

\hat{y} = β_{0} + β_{1} x_{1} + β_{2} x_{2} + \dots + β_{p} x_{p}

(10)

where

\hat{y}

denotes the predicted rainfall intensity,

x_{1}

,

x_{2}

, …,

x_{p}

represent the five input features.

β_{0}

is the intercept term,

β_{1}

,

β_{2}

, …,

β_{p}

are the regression coefficients, and

p

is the number of predictors. Despite its inability to capture nonlinear interactions, the MLR model provides a transparent and interpretable baseline for comparative assessment.

RRF was employed to model nonlinear relationships between rainfall characteristics and intensity by combining predictions from an ensemble of decision trees. Each tree was independently trained using a bootstrap sample of the training data and a random subset of input features, and the final prediction was obtained by averaging the outputs of all trees. In this study, the ensemble consisted of 200 decision trees to ensure stable aggregation and improved generalization performance, as expressed in Equation (11).

\hat{y} = (\frac{1}{T}) \times \sum_{t = 1}^{ᵀ} f_{t} (x)

(11)

where

\hat{y}

denotes the predicted rainfall intensity,

T

is the total number of decision trees,

f_{t} (x)

represents the prediction of the t-th decision tree, and

x

is the vector of input features supplied to the model. Averaging across trees reduces variance and improves generalization compared to individual decision trees. A conceptual schematic illustrating the RRF workflow adopted in this study is provided in Figure 11.

MFFNN was adopted to represent nonlinear relationships between rainfall characteristics and intensity. The network comprised two fully connected hidden layers with rectified linear unit (ReLU) activation and a linear output layer. The first and second hidden layers contained 20 and 10 neurons, respectively. The forward mapping is given by Equations (12)–(14).

h_{1} = R e L U (W_{1} x + b_{1})

(12)

h_{2} = R e L U (W_{2} h_{1} + b_{2})

(13)

\hat{y} = W_{3} h_{2} + b_{3}

(14)

where

x

is the input feature vector,

h_{1}

and

h_{2}

are the hidden-layer activation vectors,

W_{1}

,

W_{2}

, and

W_{3}

are the weight matrices,

b_{1}

,

b_{2}

, and

b_{3}

are the corresponding bias vectors, and

\hat{y}

is the predicted rainfall intensity. The ReLU activation is defined as

R e L U (z) = m a x (0, z)

, where

z

denotes the linear transformation of the inputs at each layer. A conceptual schematic of the adopted MFFNN architecture is presented in Figure 12.

A unified machine learning dataset was constructed by combining the statistically derived IDF intensities for all stations, durations, and return periods. The final dataset consisted of 700 samples (10 stations

\times

10 durations

\times

7 return periods). To ensure reproducibility and objective evaluation, the dataset was randomly partitioned into

70 %

training and

30 %

testing subsets using a fixed random seed. All models were trained exclusively on the training subset, while performance metrics were computed on the unseen testing subset.

These variables were selected to represent both temporal characteristics of rainfall extremes and spatial controls related to geographic location. The model output was rainfall intensity, which was directly comparable to the target variable, original rainfall intensity derived from the best-fitting statistical IDF distributions.

For each rainfall station, ML models were trained using the combined dataset, including all stations simultaneously, while performance was later evaluated at station and return-period levels. Model performance was evaluated by comparing ML-predicted rainfall intensities with the statistical reference values across all considered durations and return periods. Three primary evaluation metrics were used: the RMSE, MAE and mean bias. RMSE quantified the overall deviation between machine learning predictions and the corresponding statistically derived rainfall intensities and follows the standard RMSE formulation used in Equation (5), here expressed in terms of the difference between the best-fitting statistical IDF intensities and the ML-predicted values:

R M S E = \sqrt{[(\frac{1}{n}) \times \sum_{i = 1}^{n} {(I_{i} - {\hat{I}}_{i})}^{2}]}

(15)

M A E = \frac{1}{n} \times \sum_{i = 1}^{n} (|{\hat{I}}_{i} - I_{i}|)

(16)

m e a n B i a s = \frac{1}{n} \times \sum_{i = 1}^{n} ({\hat{I}}_{i} - I_{i})

(17)

where

I_{i}

denotes the statistically derived reference rainfall intensity,

{\hat{I}}_{i}

represents the corresponding machine learning prediction, and

n

is the total number of evaluated samples.

Bias was computed as the mean difference between ML-predicted and statistically derived rainfall intensities, with values close to zero indicating minimal systematic deviation. This formulation enabled direct comparison of model performance across stations and return periods and facilitated identification of consistent over- or under-prediction tendencies.

The combined use of RMSE, MAE, and mean bias provides a comprehensive evaluation framework: RMSE measures overall error magnitude with sensitivity to larger deviations, MAE captures the average absolute discrepancy independent of sign, and mean bias identifies systematic directional tendencies. Such multi-metric evaluation is commonly adopted in hydrological model assessment to ensure that both accuracy and systematic behavior are properly diagnosed [48,49,50,51,52,53].

Model evaluation was conducted at three levels: (i) individual stations, (ii) individual return periods, and (iii) overall average performance across all stations and return periods.

Although both RRF and MFFNN are capable of modeling nonlinear relationships, their behavior differs with respect to dataset size and model complexity. Random forest models are generally more stable for moderate-sized datasets due to their ensemble-based variance reduction and lower sensitivity to parameter selection [18,24]. By comparison, neural networks typically benefit from larger datasets, where their multilayer structure can better capture complex nonlinear patterns. However, they may exhibit overfitting when training data are limited [17,19].

In the present study (700 samples), the inclusion of both RRF and MFFNN allowed evaluation of bias behavior under two distinct nonlinear learning approaches: an ensemble tree-based approach (RRF) emphasizing stability and variance reduction [24], and a layered parametric architecture (MFFNN) emphasizing representational flexibility [19]. This dual-model comparison strengthens the robustness of the bias-assessment framework by ensuring that conclusions are not dependent on a single learning mechanism.

Within this framework, ML models were used as an independent diagnostic layer to examine the level of agreement between data-driven predictions and the statistically derived IDF intensities, which remained the primary reference of the analysis. This approach should not be interpreted as methodological superiority but rather provides an additional perspective for assessing predictive consistency.

4. Results and Discussion

4.1. Best-Fit Probability Distributions Across Stations

The best-fitting probability distribution for each rainfall station was identified using the proposed multi-criteria ranking framework that integrates GOF statistics and information criteria, as summarized in Table 5. The results confirm that no single distribution can be assumed universally optimal across the station network, which is consistent with the well-established dependence of extreme-rainfall frequency behavior on local climatology, record characteristics, and tail properties.

After evaluating candidate distributions for each station using the seven GOF metrics, the distribution with the lowest total score (sum of ranks) was selected as the best-fitting model, as illustrated in Equation (9).
For station MDRM00212, two distributions exhibited comparable performance across the applied GOF criteria, resulting in a shared ranking.

Across the ten stations, the lognormal distribution was identified as the best-fitting model for four stations, representing the most frequently selected distribution in this study. Gamma and Gumbel distributions were selected for two stations each, while GEV and Weibull were selected once each. The generalized Pareto distribution did not appear as a unique best-fit model but showed competitive performance at one station, resulting in a shared ranking with Gumbel. This distributional diversity highlights the spatial heterogeneity of extreme rainfall within and around the Wadi Al-Rummah Basin.

To support transparency in model selection, the station-level ranking outcomes and competitive distributions are summarized in Table 6 to illustrate how close-performing candidates may emerge under multi-criteria evaluation, particularly when sample size is limited and several distributions fit the central quantiles similarly.

The observed dominance of lognormal distributions at multiple stations is consistent with the tendency of positively skewed distributions to provide reliable fits for rainfall extremes in arid settings, particularly when annual maxima exhibit pronounced right tails [1,8,9]. However, Saudi-based IDF studies have not converged on a single dominant distribution across regions, as best-fit outcomes often vary with the adopted testing framework, record length, and regional rainfall flow pattern [1,2,6,44]. Overall, the station-specific best-fit selections support the use of a multi-criteria approach to reduce subjectivity in distribution choice and strengthen the defensibility of design-rainfall estimates. These selected distributions were then used to derive station-specific quantiles for return periods of 2–1000 years, forming the statistical reference IDF curves used later in the ML-based bias assessment.

4.2. Statistical IDF Curves and Spatial Characteristics

The station-specific IDF analysis revealed clear spatial heterogeneity in design rainfall intensity, and this heterogeneity became more pronounced as the return period increased.

Using the selected best-fit distribution for each station to construct IDF relationships across the considered durations is shown in Figure 13. The resulting IDF curves provide a consistent frequency-based reference for characterizing rainfall extremes across the Wadi Al-Rummah station network.

To provide a compact, quantitative comparison of the station-specific IDF relationships derived using each station’s best-fit distribution, the range of design rainfall intensities across the ten stations was computed for every duration-return period combination. Specifically, for each duration and return period, the minimum and maximum intensity values among all stations were extracted. Table 7 provides a quantitative summary of the intensity range, while Figure 14a–g illustrates the full set of station-specific IDF curves for each return period, allowing direct visual comparison of inter-station variability.

This presentation directly measures the spatial spread of design intensities without relying on visual inspection alone, and it highlights how inter-station contrasts persist across the full range of durations (5–1440 min) and frequencies (2–1000 years).

To better describe how rainfall intensities differ between stations at each return period, the stations producing the highest and lowest design intensities were identified for every return period, as summarized in Table 8. A simple comparison ratio (

m a x / m i n)

intensity was then calculated to show how large the difference is between stations at the same frequency level.

The results indicate that this ratio increases as the return period increases, meaning that differences between stations become larger for more extreme events. In other words, while the general shape of the IDF curves remains similar across stations (Figure 14), the actual rainfall intensity values (

m m / h r

) separate more clearly at higher return periods. This pattern indicates that inter-station differences are not only persistent but magnified under rarer events, highlighting the practical importance of site-specific IDF estimates for high-return period design.

Together, Table 7 and Table 8 and Figure 14 provide a clear and consistent evaluation of spatial variability: Table 7 summarizes the overall range of intensities across durations, Figure 14 shows the relative position of each station visually, and Table 8 quantifies how much intensities differ between stations for each return period.

To further illustrate the spatial variability of design rainfall intensities across the Wadi Al-Rummah Basin, IDW interpolation was applied to the statistically derived best-fitting IDF values for the 5 min duration in Figure 15.

The 5 min duration was selected as a representative short-duration design reference commonly adopted in urban stormwater practice, where rainfall intensity is typically evaluated for a duration equal to the time of concentration (Tc) under the Rational Method framework [6,7,54]. Several drainage design manuals explicitly adopt a minimum Tc of 5 min when computed response times are shorter [55,56,57]. Accordingly, the 5 min intensity was used as a consistent short-duration reference to visualize the upper bound of design rainfall across stations without implying a single basin-wide hydrologic response time.

The IDW method assigns weights inversely proportional to distance [46,58], allowing nearby stations to exert greater influence on local intensity estimates while preserving spatial gradients. It is emphasized that IDW was employed solely for spatial visualization purposes and does not modify the underlying station-based statistical IDF derivation.

The resulting maps reveal a pronounced east–west gradient in rainfall intensity that becomes more pronounced at higher return periods. This amplification is consistent with the statistical behavior of heavy-tailed probability distributions identified at several stations (e.g., lognormal and GEV), which tend to produce comparatively higher design intensities at extended return periods [40]. In contrast, stations characterized by more moderately skewed distributions (e.g., gamma or Gumbel) exhibit relatively restrained growth in extreme quantiles.

To further evaluate the uncertainty associated with high return period extrapolation, 95% confidence intervals were estimated for the 100- and 1000-year design rainfall intensities using a non-parametric bootstrap procedure. Figure 16 and Figure 17 present the resulting confidence bounds for three representative durations across all stations. As expected, the uncertainty band widens at the 1000-year return period, reflecting the increased extrapolation beyond the observed data range, while the relative ranking among stations remains consistent.

4.3. Machine Learning Bias Assessment Results

ML models were used as an independent diagnostic framework to assess how well data-driven models can reproduce the statistically derived reference IDF intensities, which were considered the primary and physically interpretable basis of the study, rather than as substitutes for conventional frequency analysis. Therefore, the assessment was performed for each station and return period, and then combined to provide overall performance indicators, as shown in Table 9.

Overall, the MFFNN exhibited the strongest agreement with the statistical IDF reference, yielding the lowest RMSE and MAE values together with a near-zero mean bias, indicating both reduced overall error magnitude and negligible systematic deviation across the tested stations and return periods.

In contrast, RRF and the MLR baseline showed larger deviations. While RRF and MLR captured the general structure of intensity variation with duration and return period, they were less effective than MFFNN in reproducing the finer station frequency patterns embedded in the reference IDF curves.

While Table 9 summarizes the overall results, a closer visual comparison across stations and return periods helps clarify the agreement between the statistical reference and the machine learning models. Accordingly, Figure 18, Figure 19 and Figure 20 illustrate these comparisons across three representative stations, with one station selected from each region, showing the relationship between the best-fit statistical IDF curves and the corresponding machine learning estimates. Visual inspection of the log–log IDF comparison figures reveal localized non-smooth behavior in several neural network-based curves, where sudden drops appear at specific intermediate durations. Verification of the plotting procedure confirmed that durations were consistently sorted prior to visualization and that the curves directly reflect the predicted intensity values stored in the ML statistical dataset. Therefore, the observed drops originate from the machine learning predictions rather than graphical or sorting plotting errors. For completeness, the corresponding comparisons for the remaining stations are provided in the Supplementary Material (Figures S1–S7).

The MLR model was implemented in a log–log form to reflect the commonly observed power-law scaling behavior in IDF relationships and to provide a transparent baseline. The superior performance of the MFFNN indicates that nonlinear interactions among predictors (duration, return period, latitude, longitude, and elevation) influence station-specific IDF behavior and are captured more effectively by the neural network architecture. Table 10 and Table 11 present the station-wise and return period-specific RMSE evaluation of the three ML models relative to the statistical IDF reference.

Table 10 presents the full RMSE matrix for all stations, return periods, and machine learning models. A general pattern can be observed across the station network where the magnitude of RMSE varies with both the model structure and the return period. The MFFNN model generally maintains lower RMSE values compared with both MLR and RRF, indicating closer agreement with the statistical IDF reference. In contrast, both MLR and RRF exhibit noticeably larger RMSE values at several stations, particularly as return periods increase. The largest discrepancies appear at the 1000-year return period, where error magnitudes increase substantially for several stations, reflecting the difficulty of extrapolating rare extreme events.

Across the 70 evaluated station–return period combinations of minimum RMSE values in Table 11, the MFFNN model achieved the lowest RMSE in 63 cases (

90.0 %

), whereas MLR and RRF were identified as the best-performing models in only four (

5.7 %

) and three cases (

4.3 %

), respectively. This dominance indicates that the MFFNN architecture provides closer agreement with the statistical IDF reference than MLR and RRF.

The superiority of MFFNN is particularly evident for larger return periods. While RMSE values for MLR and RRF increase toward the extreme tail (e.g., T = 1000 years), the neural network maintains relatively stable errors across most stations. This suggests that the nonlinear structure of the neural network better captures the scaling behavior of rainfall intensity across return periods.

In addition to RMSE, model bias was evaluated to determine whether ML predictions systematically over- or underestimate the statistical IDF intensities. Mean bias values are presented in Table 12, which reports the station-wise bias values for the three machine learning models. The results reveal that both MLR and RRF exhibit systematic positive or negative biases depending on station and return period, indicating either overestimation or underestimation of the reference IDF intensities. In contrast, the MFFNN model generally produces bias values much closer to zero across most stations and return periods, suggesting a more balanced agreement with the statistical reference curves.

To provide a clearer comparison, Table 13 identifies the model that produced the bias value closest to zero for each station and return period. Among the 70 evaluated cases, the MFFNN model achieved the smallest absolute bias in 60 cases (85.7%), while RRF and MLR were identified as the best-performing models in nine (12.9%) and one (1.4%) cases, respectively. These results further confirm the overall stability of the MFFNN model.

The combined RMSE and mean bias diagnostics, therefore, indicate that the MFFNN model provides the most consistent approximation of the statistically derived IDF relationships across the station network. The superior performance of MFFNN likely reflects its ability to capture nonlinear interactions between rainfall duration, return period, and spatial predictors (latitude, longitude, and elevation). Such nonlinear relationships are difficult to represent using linear regression models and may not be fully captured by tree-based ensemble approaches when extrapolating toward rare extreme rainfall events.

To provide a clearer interpretation of the machine learning diagnostic results presented in Table 10, Table 11, Table 12 and Table 13, an additional cross-analysis was conducted by grouping the performance metrics according to the best-fitting probability distributions identified in Table 5. In this analysis, the RMSE and mean bias statistics obtained from the three ML models (MLR, RRF, and MFFNN) were combined across stations sharing the same statistical distribution type, as illustrated in Table 14. This distribution-based aggregation enables a clearer assessment of how the structural characteristics of the underlying statistical IDF representation may influence the predictive agreement between the machine learning models and the statistically derived IDF curves.

The combined results reveal clear and consistent differences in predictive behavior across the tested machine learning models. For stations characterized by lognormal distributions, the MFFNN model achieved a substantially lower average RMSE (approximately

1.00 m m / h r

) compared with

5.44 m m / h r

for MLR and

5.96 m m / h r

for RRF. A similar pattern is observed for gamma-distributed stations, where the average RMSE of MFFNN is about

0.90 m m / h r

, compared with

7.34 m m / h r

and

4.87 m m / h r

for MLR and RRF, respectively.

The same trend remains evident for distributions associated with extreme rainfall behavior. For GEV, MFFNN produced an average RMSE of approximately

1.22 m m / h r

, substantially lower than those obtained by MLR (

4.65 m m / h r

) and RRF (

5.37 m m / h r

). Likewise, under the Gumbel distribution, MFFNN achieved an average RMSE of about

0.97 m m / h r

, compared with

5.80 m m / h r

and

3.59 m m / h r

for MLR and RRF. The Weibull-distributed station also exhibits the same behavior, where MFFNN provides the smallest average RMSE (

0.73 m m / h r

) and the lowest absolute bias among the evaluated models. Therefore, the MFFNN model consistently exhibits the lowest prediction errors in the distribution-based comparison summarized in Table 14, together with the smallest bias values across most cases.

Finally, the station-by-station and return period-specific diagnostics presented in Table 10, Table 11, Table 12, Table 13 and Table 14 enable a targeted assessment of model behavior, allowing the identification of stations and frequency levels that exhibit relatively higher disagreement with the reference curves. Overall, the results support the robustness of the statistically derived IDF framework and demonstrate the usefulness of machine learning models, particularly MFFNN, as an independent diagnostic tool for evaluating the consistency of frequency-based rainfall estimates across heterogeneous arid-climate stations. This was evidenced not only by the lowest global RMSE, MAE, and mean bias values, but also by its dominance in the station-wise and return period-specific comparisons.

4.4. Discussion and Implications

The findings carry two main implications for design rainfall estimation in arid region watersheds. First, the best-fit probability distribution is station-dependent rather than universal. The ranking outcomes show meaningful variability in the selected best model across the network, implying that adopting a single distribution for all stations can propagate avoidable errors into design quantiles, particularly toward the upper tail and longer return periods. This reinforces the need for station-specific model selection supported by multiple GOF and information criteria, especially in large basins where extreme rainfall processes and tail behavior may differ spatially [1,8,44,46]. Such variability in upper-tail rainfall estimation is not merely theoretical, but has practical implications for hydraulic infrastructure designed using earlier statistical assumptions. Wadi Al-Rummah Dam was designed based on the hydro-meteorological knowledge and rainfall records available in 1977, which were limited in temporal coverage and statistical representativeness. Recent hydrological reassessment studies conducted on old dams in the Wadi Hanifah basin, Riyadh region, Saudi Arabia, have demonstrated that short rainfall records and the reliance on traditional distributions (e.g., Gumbel Type I) may underestimate design rainfall associated with higher return periods, particularly beyond 5–10 years [36]. Therefore, the updated IDF curves developed in this study provide a more statistically robust basis for contemporary and future hydraulic infrastructure design within the Wadi Al-Rummah Basin. This is particularly relevant in arid environments where extreme rainfall events, although infrequent, largely govern the safety margins of flood control structures.

The performance of the NRM based on the nearest available station in this study is particularly interesting given the large spatial distances between some rainfall stations used for data reconstruction, although the normal ratio method is typically applied using nearby stations with comparable rainfall behavior [6,12,13,39]. In several cases, the distance between the target station and the selected neighboring station exceeded several hundred kilometers. Nevertheless, the cross-validation results showed that the NRM combined with nearest-station selection provided the most reliable reconstruction performance among the tested methods. This behavior can be explained by the rainfall characteristics of the Wadi Al-Rummah Basin. Although the basin covers a large geographic area with elevations ranging from approximately 562 m to 1238 m above sea level, the rainfall pattern across the basin remains relatively consistent due to its location within a hyper-arid to arid climatic zone [15,16]. This spatial consistency is also supported by the monthly rainfall patterns shown in Figure 3. In addition, the inter-station analysis confirmed the presence of a measurable relationship between rainfall stations. Regression analysis between the log-transformed inter-station distance and the absolute correlation coefficient

(| r |)

showed a statistically significant negative relationship, explaining about

50 %

of the variability in rainfall similarity (

R^{2} = 0.5

), a behavior commonly observed in rainfall networks where correlation decreases with increasing station separation distance [6,46].

A more detailed interpretation emerges when the ML diagnostic results are examined by statistical distribution type. As shown in Table 14, the influence of the underlying distribution is reflected not only in the magnitude of RMSE but also in the direction of the signed bias. In the present dataset, stations represented by lognormal and GEV distributions are associated with negative average signed bias across the MLR and RRF models, indicating a tendency toward underestimation relative to the statistical IDF reference. By contrast, gamma and Weibull-based stations show positive average signed bias, suggesting a tendency toward overestimation, while Gumbel-based stations display the most balanced behavior, with signed bias values remaining relatively close to zero. These results indicate that the statistical form of the fitted IDF model affects not only how accurately the ML models reproduce the reference curves but also the direction in which deviations occur. It should be noted that this interpretation is based on a limited set of ten stations, and some probability distributions (e.g., GEV or Weibull) are represented by only a single station. Therefore, the observed patterns should be interpreted as indicative rather than definitive, providing an initial diagnostic view of how the ML models respond to different statistical distribution forms. One possible explanation is that different probability distributions introduce varying levels of curvature and tail behavior in the derived IDF surface, which can influence how easily surrogate models generalize across durations and return periods, a behavior commonly reported in extreme rainfall modeling studies [1,8]. In this context, the MFFNN model appears less affected by these distribution-related differences than MLR and RRF, as it consistently produces lower mean RMSE values and near-zero mean bias across all distribution types. Similar advantages of neural network approaches in capturing nonlinear relationships in IDF modeling have also been reported in previous studies [20,21]. Overall, the distribution-based comparison suggests that the agreement between ML predictions and the statistical IDF reference is partly influenced by the adopted probability distribution, while also confirming that the MFFNN architecture provides the most consistent performance among the evaluated models, consistent with widely used model evaluation principles based on RMSE and bias diagnostics [49].

Although the ML bias assessment provides a complementary diagnostic perspective on the derived statistical IDF reference, the results reveal clear differences in model agreement with the station-specific IDF intensities across stations and return periods. In particular, the MFFNN demonstrated the strongest overall consistency, achieving the lowest average RMSE and near-zero mean bias across the evaluated cases. A summary of the frequency of best-performing models based on both RMSE and mean bias is provided in Table 15.

Importantly, this analysis does not represent an external validation of the adopted statistical distributions, as the ML models are evaluated against the statistical IDF reference rather than independent observed extremes at long return periods. Instead, the analysis serves as an internal consistency diagnostic that measures agreement, highlights systematic tendencies, and identifies station–return period combinations where deviations are more pronounced, particularly at very long return periods. This diagnostic use of ML aligns with prior hydrological applications in which nonlinear learners, particularly neural networks, have demonstrated strong agreement with reference IDF representations and improved handling of nonlinearity and uncertainty [21,22,23]. For example, previous studies reported prediction interval coverage exceeding 93%, indicating that most observations fall within the estimated uncertainty bounds, which is consistent with using ML as a diagnostic benchmark alongside frequency analysis rather than treating it as a standalone replacement [22].

Visual inspection of the station-wise log–log IDF comparison plots (Figure 18, Figure 19 and Figure 20) indicates localized non-smooth behavior in a limited subset of MFFNN panels, where sudden drops appear at specific durations. These features are not plotting errors: the comparison procedure reads predictions directly from the unified ML–statistical results and applies a consistent duration-based sorting prior to visualization. The drops, therefore, originate from the MFFNN outputs and reflect model behavior rather than visualization bias. This is reasonable given the adopted ReLU-based MFFNN architecture (two hidden layers with 20 and 10 neurons), which implements a piecewise linear mapping in feature space; consequently, small changes in log-duration can shift the activation system and appear as localized kinks on log–log curves [19]. In our case, the effect is amplified by the moderate dataset size (

n = 700

) and the random partitioning into a

70 %

training subset and a 30% testing subset using a fixed random seed. Importantly, such localized deviations may be weakly expressed in aggregated RMSE, MAE, and mean bias summaries because these criteria average errors across all durations, allowing a small number of problematic points to be masked by otherwise accurate predictions, an interpretation consistent with broader guidance that single-number performance measures can hide systematic, scale-dependent model deficiencies [48,49,50,52].

By contrast, the regression random forest (200 trees) generally yields smoother IDF curves because ensemble averaging reduces variance and stabilizes pointwise predictions [24]. However, at the far tail (e.g.,

T = 1000

years), RF predictions may appear overly smoothed or quasi-linear, which is consistent with the limited extrapolation capability of tree-based learners when queried beyond the dominant support of the training data [17,18]. From a hydrologic-frequency perspective, such tail sensitivity is expected: extremes-based quantiles are inherently uncertain and strongly distribution- and sample-dependent [1,8], particularly in arid and convective regimes where spatial heterogeneity is pronounced [2,15]. Therefore, we interpret these ML behaviors as diagnostic signals of where the statistical IDF surface is most challenging to learn (or safely extrapolate), rather than as contradictions of the frequency-analysis framework.

Several practical solutions can reduce the observed non-smoothness and improve ML accuracy in future implementations. First, training stability can be strengthened by increasing the effective sample size through expanded station networks, longer records where available, or resampling approaches and repeated cross-validation rather than relying on a single holdout split, which reduces sensitivity to a particular random partition [17]. Second, the MFFNN can be regularized and tuned (e.g., weight decay, dropout, early stopping, and architecture/hyper-parameter search) to reduce local oscillations, while also adopting loss functions or weighting strategies that emphasize tail cases (large return periods) to improve extreme-event behavior. Third, because engineering IDF relationships are expected to be monotone with respect to return period and (for fixed return period) non-increasing with duration in intensity form, incorporating explicit shape constraints can prevent physically implausible kinks (irregular bends) either through constrained or partially monotone neural architectures [59], or more recent lattice-based monotonic network formulations that enforce shape constraints by design [60]. Collectively, these measures would preserve the role of ML as a bias-diagnostic layer while improving the smoothness and interpretability of the learned IDF surfaces, particularly for high-return period extrapolation.

5. Conclusions

The results of this study demonstrate that rainfall IDF behavior across the Wadi Al-Rummah station network is strongly station-dependent, both in terms of the selected best-fit probability distributions and the resulting design rainfall intensities. To investigate this behavior, the study developed a reproducible framework for constructing station-specific rainfall IDF relationships across the Wadi Al-Rummah station network, while simultaneously evaluating the robustness of the resulting frequency-based design intensities using an independent machine learning-based diagnostic layer. AMS were first completed using an NRM reconstruction supported by an inter-station distance-rank matrix to minimize bias associated with missing years. Candidate probability distributions were subsequently evaluated through a multi-criteria ranking framework integrating several GOF measures and information criteria. The selected best-fit distribution at each station was then used to estimate design quantiles for return periods ranging from 2 to 1000 years and to derive IDF curves over durations of 5–1440 min. In parallel, multiple machine learning models were employed to assess the bias characteristics and relative stability of the statistically derived IDF estimates across stations and return periods, providing an additional layer of diagnostic insight rather than a replacement for conventional frequency analysis.

Key conclusions are summarized as follows:

Reliability of rainfall data reconstruction using NRM based on the nearest available station. Missing years in the AMS were reconstructed using an NRM implementation based on the closest available rainfall station. A comparative evaluation of four gap-filling methods (AMM, IDW, standard NRM, and NRM using the closest station) showed that this approach provided the most reliable reconstruction performance. As summarized in Table 3, the closest-station NRM achieved the lowest RMSE (14.18 mm) and MAE (10.21 mm) values and obtained the best overall ranking score (1.33), outperforming the alternative reconstruction methods. Despite the relatively large spatial distances between some rainfall stations exceeding several hundred kilometers in certain cases, the method remained effective. Regression analysis between the log-transformed inter-station distance and the absolute correlation coefficient $(| r |)$ revealed a statistically significant negative relationship explaining approximately $50 %$ of the variability in rainfall similarity ( $R^{2} = 0.5$ ). Although rainfall correlation decreases with increasing station separation, the overall rainfall pattern across the Wadi Al-Rummah Basin remains relatively consistent due to its location within a hyper-arid to arid climatic zone, allowing the nearest-station NRM approach to provide stable and reliable rainfall reconstruction even under large inter-station spacing and elevation differences. This suggests that the nearest-station NRM approach can also be applied for rainfall data reconstruction in other station networks, provided that a measurable inter-station rainfall relationship exists and that the involved stations possess sufficiently long rainfall records to allow such relationships to be reliably identified. This condition is particularly relevant when stations exhibit broadly similar rainfall characteristics and climatic conditions.
Best-fit distributions are station-dependent. No single distribution provided a universal best fit across the network, confirming that distribution choice is sensitive to local rainfall patterns and record characteristics. Across the evaluated stations, the lognormal distribution was most frequently selected (four stations), while gamma and Gumbel were each selected for two stations each, GEV and Weibull were selected once each, and generalized Pareto did not emerge as a unique best-fit choice. This outcome supports the adoption of station-specific frequency modeling rather than assuming a single distribution for the entire basin.
IDF curves show measurable spatial variability across the basin. The statistical IDF results indicate clear inter-station differences in design intensities for the same duration and return period. When summarized as intensity ranges across stations, spatial contrast becomes more pronounced at less frequent events: the ratio of maximum-to-minimum station intensity increased up to 2.463 (max is 246.3% of min) at the 1000-year return period, demonstrating that basin-scale heterogeneity can materially affect design rainfall estimates and that site-specific IDF information is essential for defensible infrastructure design in large arid basins.
Machine learning diagnostics confirm strong internal reproducibility of the statistical IDF reference. Three global ML models (MLR, RRF, and MFFNN) were trained using rainfall duration, return period, and spatial descriptors (latitude, longitude, and elevation) to reproduce the station-specific statistical IDF intensities. Among the evaluated models, the MFFNN demonstrated the strongest overall agreement with the statistical reference, exhibiting the lowest global error statistics ( $R M S E = 0.97 m m / h$ r, $M A E = 0.65 m m / h r$ ) and near-zero systematic bias across the station network. Across the 70 evaluated station–return period combinations (10 stations across 7 return periods), the MFFNN achieved the lowest RMSE in 63 cases (90%), whereas MLR and RRF performed best in only four (5.7%) and three cases (4.3%), respectively. A similar pattern was observed for the bias diagnostics, where the MFFNN produced the mean bias closest to zero in 60 cases (85.7%), compared with nine cases (12.9%) for RRF and one case (1.4%) for MLR. A summary of the frequency of best-performing models based on both RMSE and mean bias is provided in Table 15, highlighting the strong agreement between the MFFNN predictions and the statistical IDF reference and indicating that the statistical IDF surface derived from the selected distributions is largely learnable and internally coherent when examined through an independent modeling framework. Furthermore, the distribution-based comparison presented in Table 14 indicates that the direction of prediction bias varies with the adopted probability distribution. Despite this variability, the MFFNN model consistently maintained the lowest prediction errors (average RMSE ranging from 0.73 to 1.22 mm/hr) and near-zero bias across all evaluated distribution types.
Diagnostic value rather than replacement of frequency analysis. The ML component is positioned as a quality-assurance (QA) and bias-diagnostic layer, not as a substitute for conventional frequency analysis. The distribution-based approach remains the engineering-standard, physically interpretable basis for quantile estimation, while ML diagnostics add value by (i) measuring how closely the ML outputs match the statistical IDF reference, (ii) showing whether a model consistently overestimates or underestimates the reference, and (iii) pinpointing the specific station–return period cases where errors are largest, i.e., where design quantiles appear most sensitive and should be interpreted with extra caution. Future work may further enhance the smoothness and physical consistency of the diagnostic ML layer by incorporating monotonic constraints with respect to duration and return period and by expanding the training sample support, particularly for extreme return periods.
Practical implications for design in arid region basins. The combination of station-specific distribution selection, transparent ranking outputs (including close competitors), and performance heat scale provides a defensible pathway for reporting IDF products to end users and stakeholders. In particular, the results show that using a single universal distribution can introduce avoidable bias, especially at long return periods, whereas the proposed station-specific workflow better reflects the basin’s spatial heterogeneity.

Limitations and Future Work

Two limitations should be acknowledged. First, uncertainty at very long return periods (e.g., 1000 years) is inherently influenced by AMS sample size and record completeness, even after reconstruction. Second, the ML diagnostics evaluate internal consistency relative to the statistical reference rather than providing external validation against independent extreme event observations. Future work can extend the framework by (i) quantifying uncertainty bands by bootstrap resampling or Bayesian frequency analysis, (ii) use regional information from nearby similar stations to improve estimates at stations with limited records, and (iii) test additional explanatory variables (e.g., climate indicators or satellite-based storm characteristics) to better explain spatial differences in extreme rainfall, (iv) evaluating additional machine learning algorithms and alternative model architectures and exploring expanded predictor-variable sets to further investigate nonlinear relationships and assess their suitability for improving the representation of spatial variability in extreme rainfall.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/hydrology13030096/s1. The Supplementary Materials include additional station-level comparisons between the best-fit statistical intensity–duration–frequency (IDF) curves and the corresponding machine-learning estimates for the stations not presented in the main text. These figures extend the comparative analysis provided in the Results section and offer a complete station-by-station evaluation of the agreement between the statistical frequency analysis and the machine-learning models. Figure S1: Best-fit statistical distribution curve evaluated through machine-learning models for Station HIRM00386. Panels (a)–(g) correspond to T = 2–1000 years. Figure S2: Best-fit statistical distribution curve evaluated through machine-learning models for Station HIRS00379. Panels (a)–(g) correspond to T = 2–1000 years. Figure S3: Best-fit statistical distribution curve evaluated through machine-learning models for Station MDRM00212. Panels (a)–(g) correspond to T = 2–1000 years. Figure S4: Best-fit statistical distribution curve evaluated through machine-learning models for Station MDRO00166. Panels (a)–(g) correspond to T = 2–1000 years. Figure S5: Best-fit statistical distribution curve evaluated through machine-learning models for Station QARM00256. Panels (a)–(g) correspond to T = 2–1000 years. Figure S6: Best-fit statistical distribution curve evaluated through machine-learning models for Station QARO00253. Panels (a)–(g) correspond to T = 2–1000 years. Figure S7: Best-fit statistical distribution curve evaluated through machine-learning models for Station QARS00249. Panels (a)–(g) correspond to T = 2–1000 years.

Author Contributions

Conceptualization, I.T.A.; methodology, I.T.A.; software, I.T.A.; validation, I.T.A.; formal analysis, I.T.A.; investigation, I.T.A.; resources, I.T.A.; data curation, I.T.A.; writing—original draft preparation, I.T.A.; writing—review and editing, I.T.A.; visualization, I.T.A.; supervision (Methodology, Results and discussion review), S.H.A., I.H.E. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Rainfall AMS and station information (coordinates) were obtained from the Saudi Ministry of Environment, Water and Agriculture (MEWA). Due to data-use restrictions associated with these official records, the raw datasets are not publicly available. Derived datasets and scripts supporting the findings of this study are available from the corresponding author upon reasonable request.

Acknowledgments

The authors gratefully acknowledge the Saudi Ministry of Environment, Water and Agriculture (MEWA) for providing the rainfall station records, supporting station information, and the dam data used in this study.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Katz, R.W.; Parlange, M.B.; Naveau, P. Statistics of extremes in hydrology. Adv. Water Resour. 2002, 25, 1287–1304. [Google Scholar] [CrossRef]
Elsebaie, I.H. Developing rainfall intensity-duration-frequency relationships for two regions in Saudi Arabia. J. King Saud Univ. Eng. Sci. 2012, 24, 131–140. [Google Scholar] [CrossRef]
Kotb, A.; Taha, A.I.; Elnazer, A.A.; Basheer, A.A. Global insights on flood-risk mitigation in arid regions using geomorphological and geophysical modeling. Sci. Rep. 2024, 14, 19975. [Google Scholar] [CrossRef]
Elsebaie, I.H.; Kawara, A.Q.; Alnahit, A.O. Mapping and assessment of flood risk in the Wadi Al-Lith Basin, Saudi Arabia. Water 2023, 15, 902. [Google Scholar] [CrossRef]
Hussain Shah, S.M.; Alharbi, O.; Basahi, J.M.; Alqurashi, A.F.; Sharif, H.O. Flood risk and vulnerability from a changing climate perspective: An overview focusing on flash floods and associated hazards in Jeddah. Water 2023, 15, 3641. [Google Scholar] [CrossRef]
Chow, V.T.; Maidment, D.R.; Mays, L.W. Applied Hydrology; McGraw-Hill: New York, NY, USA, 1988. [Google Scholar]
FHWA. Urban Hydrology for Small Watersheds (HEC-22); Federal Highway Administration: Washington, DC, USA, 2002.
Papalexiou, S.M.; Koutsoyiannis, D. Battle of extreme value distributions: A global survey on extreme daily rainfall. Water Resour. Res. 2013, 49, 187–201. [Google Scholar] [CrossRef]
Yilmaz, A.G.; Perera, B.J.C. Extreme rainfall nonstationarity investigation and intensity–frequency–duration relationship. J. Hydrol. Eng. 2014, 19, 1160–1172. [Google Scholar] [CrossRef]
Singh, V.P. Entropy-Based Parameter Estimation in Hydrology; Springer: Dordrecht, The Netherlands, 1998. [Google Scholar]
Burhanuddin, S.R.M. Revised normal ratio methods for imputation of missing rainfall data. Sci. Res. J. 2016, 1, 15–27. [Google Scholar] [CrossRef]
Yaseen, A.; Al-Salihi, A.; Al-Yasiri, K.; Al-Sudani, H.; Neama, S.; Khudhair, H. Missing rainfall data estimation—An approach to investigate different methods: Case study of Baghdad. Arab. J. Geosci. 2022, 15, 1740. [Google Scholar] [CrossRef]
Abuvabor, G.D.D.; Pacursa, M.M.M.; Logronio, R.A. Comparison of normal ratio method and distance power method for estimating missing rainfall data with three neighboring stations. J. Eng. Res. Rep. 2021, 21, 1–9. [Google Scholar] [CrossRef]
Vogel, R.M.; McMartin, D.E. Probability plot goodness-of-fit and skewness estimation procedures for the Pearson Type 3 distribution. Water Resour. Res. 1991, 27, 3149–3158. [Google Scholar] [CrossRef]
Almazroui, M. Rainfall Trends and Extremes in Saudi Arabia in Recent Decades. Atmosphere 2020, 11, 964. [Google Scholar] [CrossRef]
Al-Rakathi, M.; Alodah, A. Assessing the impact of climate change on IDF curves for the Qassim Region, Saudi Arabia. Atmosphere 2025, 16, 59. [Google Scholar] [CrossRef]
James, G.; Witten, D.; Hastie, T.; Tibshirani, R. An Introduction to Statistical Learning, 2nd ed.; Springer: New York, NY, USA, 2021. [Google Scholar]
Hastie, T.; Tibshirani, R.; Friedman, J. The Elements of Statistical Learning, 2nd ed.; Springer: New York, NY, USA, 2009. [Google Scholar]
Goodfellow, I.; Bengio, Y.; Courville, A. Deep Learning; MIT Press: Cambridge, MA, USA, 2016. [Google Scholar]
Mirhosseini, G.; Srivastava, P.; Fang, X. Developing rainfall intensity-duration-frequency curves for Alabama under future climate scenarios using artificial neural networks. J. Hydrol. Eng. 2014, 19, 04014022. [Google Scholar] [CrossRef]
Acar, R.; Çelik, S.; Senocak, S. Rainfall intensity–duration–frequency (IDF) model using an artificial neural network approach. J. Sci. Ind. Res. 2008, 67, 198–202. [Google Scholar]
Dargham, E.; Andraos, C. Development of intensity-duration-frequency curves using machine learning and satellite-derived precipitation data. Front. Water 2026, 8, 1727182. [Google Scholar] [CrossRef]
Bakheit Taha, A.T.; Aldrees, A.; Mustafa Mohamed, A.; Hayder, G.; Babur, M.; Haq, S. Integrating statistical distributions with machine learning to model IDF curve shifts under future climate pathways. Front. Environ. Sci. 2025, 13, 1671320. [Google Scholar] [CrossRef]
Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Ministry of Environment; Water and Agriculture (MEWA); Water Agency; General Directorate of Water Resources; Dams Department. Official Dam Data for Wadi Al-Rummah Basin, Saudi Arabia; Unpublished governmental data; MEWA: Riyadh, Saudi Arabia, 2025.
Yaqut al-Hamawi. Muʿjam Al-Buldan; Dar Sader: Beirut, Lebanon, 2019; Volume 3, pp. 71–72. [Google Scholar]
Ibn Durayd. Amharat Al-Lughah; Dar Al-Ilm: Beirut, Lebanon, 2014; Volume 2, p. 803. [Google Scholar]
Bin Laaboun, A.A. Geological Landmarks of Wadi Al-Rummah; Wadi Al-Rummah Development Center: Al-Qassim Region, Saudi Arabia, 2021. [Google Scholar]
Al-Juaid, M.; Al-Zahrani, M.; Sharif, H.O. Hydrological characterization of major wadi systems in Saudi Arabia using remote sensing and GIS techniques. Environ. Earth Sci. 2018, 77, 356. [Google Scholar]
Saudipedia. Wadi Al-Rummah. 2023. Available online: https://saudipedia.com/en/article/2642/geography/valleys/wadi-al-rummah (accessed on 15 January 2025).
Global Watersheds. Watershed Data Report: Wadi Al-Rummah Basin. 2023. Available online: https://mghydro.com/app/report?lat=26.641&lng=44.315&precision=low&simplify=true (accessed on 15 January 2025).
Saudi Press Agency (SPA). Geographical report on Wadi Al-Rummah floods. 2008. Available online: https://www.spa.gov.sa/w1898062 (accessed on 15 January 2025).
Al-Sultan, S. Wadi Ar-Rumah: The Earth’s Longest Dry Watershed Analysis System Using Remote Sensing Thermal Data; Al-Qassim University: Al-Qassim, Saudi Arabia, 1999. [Google Scholar]
Pro, T. The History of the Ancient Arabs; In Chapter 2: Geography of the Arabian Peninsula—Water Drainage System in the Arabian Peninsula (Wadi Al-Rummah); Dar Al-Fikr: Damascus, Syria, 2001; Available online: https://shamela.ws/book/11084/43 (accessed on 1 March 2025).
Al-Anazi, M.A. Environmental Indicators of Wadi Al-Ajradi within the Imam Turki bin Abdullah Royal Reserve during the Holocene Period. Int. J. Res. Stud. Publ. 2023, 8, 56–77. [Google Scholar]
Hidayatulloh, A.; Chaabani, A.; Zhang, L.; Elhag, M. DEM study on hydrological response in Makkah City, Saudi Arabia. Sustainability 2022, 14, 13369. [Google Scholar] [CrossRef]
Carrera-Hernández, J.J. Not all DEMs are equal: An evaluation of six globally available 30 m resolution DEMs with geodetic benchmarks and LiDAR in Mexico. Remote Sens. Environ. 2021, 261, 112474. [Google Scholar] [CrossRef]
Karimi, H.; Sultan, M.; Yan, E.; Elhaddad, H.; Saleh, H.; Abdelmohsen, K.; Emil, M.K. Climate-Extreme Modeling Framework for Sustainable Flood Management in the Arabian Peninsula: Application to the Wadi Al-Rummah Basin. J. Environ. Manage. 2025, 393, 127074. [Google Scholar] [CrossRef]
Teegavarapu, R.S.V.; Chandramouli, V. Improved weighting methods for estimation of missing precipitation records. J. Hydrol. 2005, 312, 191–206. [Google Scholar] [CrossRef]
Elfeki, A.; Kamis, A.S.; Marko, K. Hydrological assessment of some old dams in Saudi Arabia under the current climate, environmental conditions, and the use of advanced technology. Appl. Water Sci. 2023, 13, 188. [Google Scholar] [CrossRef]
Simonovic, S.P.; Schardong, A.; Sandink, D.; Srivastav, R. A web-based tool for the development of Intensity Duration Frequency curves under changing climate. Environ. Model. Softw. 2016, 81, 136–153. [Google Scholar] [CrossRef]
Kumar, P.; Singh, R. Use of SciPy and NumPy for extreme rainfall modeling under data-scarce conditions. Hydrol. Res. 2022, 53, 1583–1598. [Google Scholar]
Yousaf, M.; Ahmad, I.; Hussain, S.; Khan, M.; Shahid, S.; Ismail, T.; Nawaz, N.; Rahman, A.; Khan, N.; Ali, R.; et al. Evaluation of extreme rainfall distributions using Python libraries for semi-arid environments in North Africa. Theor. Appl. Climatol. 2023, 154, 441–455. [Google Scholar]
Ewea, H.A.; Elfeki, A.M.; Bahrawi, J.A.; Al-Amri, N.S. Modeling of IDF curves for stormwater design in the Makkah Al-Mukarramah region, Saudi Arabia. Open Geosci. 2018, 10, 954–969. [Google Scholar] [CrossRef]
Elkollaly, M.; Al-Ghazali, R.; Al-Barakat, A. Comparative frequency analysis of extreme rainfall using multiple probability distributions including Lognormal (LN). J. Hydrol. 2025, 655, 132959. [Google Scholar] [CrossRef]
Kottegoda, N.T.; Rosso, R. Applied Statistics for Civil and Environmental Engineers; Blackwell Publishing: Oxford, UK, 2008. [Google Scholar]
Burn, D.H. Evaluation of regional flood frequency analysis with a region of influence approach. Water Resour. Res. 1990, 26, 2257–2265. [Google Scholar] [CrossRef]
Legates, D.R.; McCabe, G.J., Jr. Evaluating the use of “goodness-of-fit” measures in hydrologic and hydroclimatic model validation. Water Resour. Res. 1999, 35, 233–241. [Google Scholar] [CrossRef]
Moriasi, D.N.; Arnold, J.G.; Van Liew, M.W.; Bingner, R.L.; Harmel, R.D.; Veith, T.L. Model evaluation guidelines for systematic quantification of accuracy in watershed simulations. Trans. ASABE 2007, 50, 885–900. [Google Scholar] [CrossRef]
Moriasi, D.N.; Gitau, M.W.; Pai, N.; Daggupati, P. Hydrologic and water quality models: Performance measures and evaluation criteria. Trans. ASABE 2015, 58, 1763–1785. [Google Scholar] [CrossRef]
Ritter, A.; Muñoz-Carpena, R. Performance evaluation of hydrological models: Statistical significance for reducing subjectivity in goodness-of-fit assessments. J. Hydrol. 2013, 480, 33–45. [Google Scholar] [CrossRef]
Gupta, H.V.; Kling, H.; Yilmaz, K.K.; Martinez, G.F. Decomposition of the mean squared error and NSE performance criteria: Implications for improving hydrological modelling. J. Hydrol. 2009, 377, 80–91. [Google Scholar] [CrossRef]
Cinkus, G.; Mazzilli, N.; Jourde, H.; Wunsch, A.; Liesch, T.; Ravbar, N.; Chen, Z.; Goldscheider, N. When best is the enemy of good – critical evaluation of performance criteria in hydrological models. Hydrol. Earth Syst. Sci. 2023, 27, 2397–2415. [Google Scholar] [CrossRef]
WSDOT. Hydraulics Manual; Washington State Department of Transportation: Olympia, WA, USA, 2025. [Google Scholar]
NYSDOT. Highway Design Manual—Chapter 8: Hydrology; New York State Department of Transportation: Albany, NY, USA, 2021. [Google Scholar]
GDOT. Drainage Design Policy Manual; Georgia Department of Transportation: Atlanta, GA, USA, 2025.
Shepard, D. A two-dimensional interpolation function for irregularly spaced data. In Proceedings of the 1968 ACM National Conference; ACM: New York, NY, USA, 1968; pp. 517–524. [Google Scholar]
Burrough, P.A.; McDonnell, R.A. Principles of Geographical Information Systems; Oxford University Press: Oxford, UK, 1998. [Google Scholar]
Sill, J.; Abu-Mostafa, Y.S. Monotonicity Hints. In Advances in Neural Information Processing Systems 9 (NIPS 1996); MIT Press: Cambridge, MA, USA, 1997; pp. 634–640. [Google Scholar]
You, S.; Ding, D.; Canini, K.R.; Pfeifer, J.; Gupta, M.R. Deep Lattice Networks and Partial Monotonic Functions. Adv. Neural Inf. Process. Syst. 2017, 30, 2981–2989. [Google Scholar]

Figure 1. Study area of Wadi Al-Rummah Basin: (a) elevation map derived from DEM, (b) extracted stream network and basin delineation. The red box in the inset map indicates the location of the study area within Saudi Arabia.

Figure 2. Long-term monthly average precipitation across 93 rainfall stations in Wadi Al-Rummah Basin (1963–2024).

Figure 3. Overall methodological framework.

Figure 4. The spatial distribution of the rainfall stations and their relation to the Wadi Al-Rummah Basin. The red box in the inset map indicates the location of the study area within Saudi Arabia.

Figure 5. Inter-annual variability of annual precipitation across stations in Wadi Al-Rummah Basin (prior to reconstruction).

Figure 6. Annual maximum series (AMS) derived from daily rainfall records (station QARS00249).

Figure 7. AMS derived from daily rainfall records (station MDRO00166).

Figure 8. Completed AMS for station QARS00249, showing observed and reconstructed years.

Figure 9. Completed AMS for station MDRO00166, showing observed and reconstructed years.

Figure 10. Mean and maximum annual rainfall before and after series reconstruction.

Figure 11. Conceptual schematic of the regression random forest (RRF) workflow adopted in this study.

Figure 12. Conceptual schematic of the MFFNN workflow adopted in this study.

Figure 13. Statistical intensity-duration-frequency (IDF) curves (log-log) for the ten stations using the station-specific best-fit distribution. Panels are labeled (a–j), corresponding to: (a) QARO00253, (b) QARS00249, (c) QARS00238, (d) QARM00256, (e) HIRM00386, (f) HIRM00388, (g) HIRS00379, (h) MDRM00212, (i) MDRO00194, (j) MDRO00166. (2P and 3P) denote two-parameter and three-parameter forms of the fitted probability distributions, respectively.

Figure 14. Best-fit statistical IDF curves for all stations under different return periods. Panels (a–g) correspond to T = 2–1000 years. Legends are arranged in ascending order of rainfall intensity from the lowest to the highest-intensity station (bottom to top).

Figure 15. Spatial distribution of 5 min design rainfall intensity across Wadi Al-Rummah Basin based on the best-fitting statistical distributions using IDW interpolation for different return periods (a–d). The red box in the inset map indicates the location of the study area within Saudi Arabia.

Figure 16. Design rainfall intensity with 95% confidence intervals (T = 100 years).

Figure 17. Design rainfall intensity with 95% confidence intervals (T = 1000 years).

Figure 18. Best-fit statistical distribution curve evaluated through machine learning models for station HIRM00388. Panels (a–g) correspond to T = 2–1000 years.

Figure 19. Best-fit statistical distribution curve evaluated through machine learning models for station MDRO00194. Panels (a–g) correspond to T = 2–1000 years.

Figure 20. Best-fit statistical distribution curve evaluated through machine learning models for station QARS00238. Panels (a–g) correspond to T = 2–1000 years.

Table 1. Rainfall station information and location relative to the Wadi Al-Rummah Basin.

Station ID	Latitude (°N)	Longitude (°E)	Elevation (m)	Location Relative to Basin
QARO00253	42.6675	26.0613	816.8	Inside basin
QARS00249	43.9469	26.3745	624.1	Inside basin
QARS00238	43.6055	25.3596	850.9	Inside basin
QARM00256	43.4558	25.8592	686.9	Inside basin
HIRM00386	41.3662	26.2857	969.7	Inside basin
HIRM00388	40.6234	27.1585	976.2	Close to basin boundary
HIRS00379	42.3978	27.9159	717.7	Outside basin
MDRM00212	39.5398	23.2024	407.2	Outside basin
MDRO00194	39.2280	23.4324	185.4	Outside basin
MDRO00166	40.4863	25.0706	932.2	Close to basin boundary

Table 2. Annual rainfall records at each station.

Stations ID	Total Record (Years)	Missing Years Reconstructed (No., %)
QARO00253	46	10 (17.9%)
QARS00249	54	2 (3.6%(
QARS00238	54	2 (3.6%)
QARM00256	53	3 (5.4%)
HIRM00386	41	15 (26.8%)
HIRM00388	35	21 (37.5%)
HIRS00379	52	4 (7.1%)
MDRM00212	54	2 (3.7%)
MDRO00194	46	10 (17.9%)
MDRO00166	33	23 (41.1%)

Table 3. Evaluation of rainfall data imputation methods using |r|, RMSE, and MAE.

Methods	Mean \|r\|	Mean RMSE (mm)	Mean MAE (mm)	Rank \|r\|	Rank RMSE	Rank MAE	Average of Overall Rank Score	Final Rank
Arithmetic mean method (AMM)	0.408	14.80	11.71	4	4	4	4	4
Inverse distance weighting (IDW)	0.429	14.61	11.34	1	3	2	2	2
Normal ratio method (NRM)	0.410	14.35	11.40	3	2	3	2.67	3
NRM using Closest station	0.424	14.18	10.21	2	1	1	1.33	1

Table 4. Inter-station distance and closeness rank matrix.

Station ID	QARO 00253	QARS 00249	QARS 00238	QARM 00256	HIRM 00386	HIRM 00388	HIRS 00379	MDRM 00212	MDRO 00194	MDRO 00166
QARO 00253	—	132 (3)	122 (3)	82 (3)	132 (3)	237 (4)	132 (3)	448 (5)	454 (6)	245 (6)
QARS 00249	(5)	—	118 (2)	75 (2)	257 (7)	341 (6)	242 (6)	568 (9)	577 (9)	376 (9)
QARS 00238	(2)	(2)	—	58 (1)	247 (6)	358 (7)	253 (7)	477 (7)	492 (7)	315 (8)
QARM 00256	(1)	(1)	(1)	—	214 (5)	317 (5)	212 (5)	494 (8)	505 (8)	311 (7)
HIRM 00386	(4)	(5)	(4)	(5)	—	122 (2)	46 (1)	389 (3)	384 (3)	161 (1)
HIRM 00388	(6)	(6)	(7)	(7)	(2)	—	105 (2)	453 (6)	437 (5)	233 (5)
HIRS 00379	(3)	(4)	(5)	(4)	(1)	(1)	—	435 (4)	429 (4)	207 (2)
MDRM 00212	(8)	(8)	(8)	(8)	(9)	(9)	(9)	—	41 (1)	229 (4)
MDRO 00194	(9)	(9)	(9)	(9)	(8)	(8)	(8)	(1)	—	222 (3)
MDRO 00166	(7)	(7)	(6)	(6)	(4)	(3)	(4)	(2)	(2)	—

Note: The inter-station distance matrix is symmetric. Distances (km) are shown in the upper triangle, and values in parentheses indicate closeness rank (1 = closest).

Table 5. Best-fitting probability distribution for each rainfall station based on the multi-criteria ranking framework.

Station ID	Best Distribution	Total Score	K-S	A-D	$χ^{2}$	RMSE	$R^{2}$	AIC	BIC
QARO00253	Gamma	16	0.05	0.19	4.43	0.02	0.988	422.78	426.83
QARS00249	Lognormal	13	0.11	0.58	9.61	0.04	0.983	460.25	464.30
QARS00238	Lognormal	11	0.05	0.17	7.1	0.02	0.959	447.56	451.61
QARM00256	Generalized Extreme Value (GEV)	7	0.09	0.39	4.91	0.03	0.950	458.88	464.96
HIRM00386	Gumbel (EVI)	13	0.1	0.72	8.2	0.05	0.791	457.53	461.58
HIRM00388	Lognormal	13	0.1	0.49	5.05	0.04	0.966	441.28	445.33
HIRS00379	Weibull	16	0.08	0.34	17.26	0.03	0.984	444.57	448.62
MDRM00212	Gumbel (EVI) and generalized Pareto (GP)	14	0.13	1.17	8.14	0.06	0.766	494.35	498.40
MDRO00194	Gamma	17	0.07	0.26	7.85	0.03	0.988	451.94	455.99
MDRO00166	Lognormal	18	0.06	0.27	6.08	0.03	0.962	434.29	438.34

Table 6. Comparative performance of fitted distributions across stations using the goodness-of-fit (GOF)-based scoring framework.

Station ID	Statistical Distribution
Station ID	Weibull	Gumbel (EVI)	Gamma	Lognormal	GEV	GP
QARO00253	27 (5)	21 (3)	16 (1)	20 (2)	25 (4)	38 (6)
QARS00249	26 (4)	24 (3)	20 (2)	13 (1)	30 (5)	34 (6)
QARS00238	30 (4)	33 (5)	27 (3)	11 (1)	12 (2)	34 (6)
QARM00256	31 (5)	25 (3)	23 (2)	27 (4)	7 (1)	34 (6)
HIRM00386	21 (2)	13 (1)	35 (6)	30 (5)	25 (4)	23 (3)
HIRM00388	20 (2)	24 (4)	23 (3)	13 (1)	34 (6)	33 (5)
HIRS00379	16 (1)	18 (2)	27 (4)	26 (3)	27 (4)	33 (5)
MDRM00212	31 (3)	14 (1)	25 (2)	31 (3)	32 (4)	14 (1)
MDRO00194	23 (3)	19 (2)	17 (1)	23 (3)	31 (4)	34 (5)
MDRO00166	27 (5)	25 (3)	22 (2)	18 (1)	29 (6)	26 (4)

Note: Each cell reports the total score from the combined GOF-based ranking framework, followed by the ranking number in parentheses, where (1) indicates the best (lowest total score). Color shading reflects the magnitude of the total score, with lighter shades indicating better performance.

Table 7. Range of intensities (

m m / h

) based on best-fit distribution across Wadi Al-Rummah stations network.

Table 7. Range of intensities (

m m / h

) based on best-fit distribution across Wadi Al-Rummah stations network.

Duration (min)	Return Period (Years)
Duration (min)	2	5	10	25	50	100	1000
5	19.4–31.6	35.7–53.8	45.7–68.5	55.6–87.1	62.6–107.2	69.4–130.1	90.8–223.8
10	12.8–20.8	23.6–35.5	30.1–45.2	36.7–57.4	41.3–70.8	45.8–85.9	59.9–147.6
15	10.1–16.3	18.5–27.8	23.6–35.4	28.7–45.0	32.4–55.5	35.9–67.3	46.0–115.8
30	6.6–10.8	12.2–18.4	15.6–23.4	19.0–29.7	21.4–36.6	23.7–44.4	31.0–76.4
60	4.4–7.1	8.0–12.1	10.3–15.4	12.5–19.6	14.1–24.1	15.6–29.3	20.5–50.4
120	2.9–4.7	5.3–8.0	6.8–10.2	8.3–12.9	9.3–15.9	10.3–19.3	13.5–33.2
180	2.3–3.7	4.2–6.3	5.3–8.0	6.5–10.1	7.3–12.5	8.1–15.2	10.6–26.1
360	1.5–2.4	2.7–4.1	3.5–5.3	4.3–6.7	4.8–8.2	5.3–10.0	7.0–17.2
720	1.0–1.6	1.8–2.7	2.3–3.5	2.8–4.4	3.2–5.4	3.5–6.6	4.6–11.4
1440	0.7–1.1	1.2–1.8	1.5–2.3	1.9–2.9	2.1–3.6	2.3–4.4	3.0–7.5

Table 8. Maximum and minimum design intensities by return period and spatial contrast ratio (

m a x / m i n

).

Table 8. Maximum and minimum design intensities by return period and spatial contrast ratio (

m a x / m i n

).

Return Period (Years)	Maximum Station	Minimum Station	Ratio $(m a x / m i n$ )
2	MDRM00212	MDRO00166	1.625
5	MDRM00212	MDRO00166	1.506
10	MDRM00212	QARO00253	1.500
25	MDRM00212	QARO00253	1.567
50	QARS00249	QARO00253	1.712
100	QARS00249	QARO00253	1.874
1000	QARS00249	QARO00253	2.463

Table 9. Global machine learning (ML) bias-assessment metrics versus the statistical intensity-duration-frequency (IDF) reference (RMSE, MAE, Mean Bias).

Models	Mean RMSE $(m m / h r$ )	Mean MAE $(m m / h r$ )	Mean Bias $(m m / h r$ )
MLR	5.87	4.25	−0.17
RRF	5.10	3.38	−0.47
MFFNN	0.97	0.65	−0.02

Table 10. Station-wise RMSE (

m m / h r

) of machine learning (ML) models across return periods.

Table 10. Station-wise RMSE (

m m / h r

) of machine learning (ML) models across return periods.

Station ID	Return Period (Years)							ML Model
Station ID	2	5	10	25	50	100	1000	ML Model
HIRM00386	2.38	0.95	2.09	2.10	0.90	1.54	22.13	MLR
HIRM00388	2.10	1.77	4.30	7.22	9.02	10.31	8.19
HIRS00379	1.07	2.26	2.94	1.90	0.37	4.05	29.52
MDRM00212	0.77	4.66	7.03	8.43	8.08	6.31	13.77
MDRO00166	4.74	1.50	0.67	3.24	4.84	5.99	3.56
MDRO00194	3.91	0.17	1.20	1.26	0.15	3.03	27.11
QARM00256	3.37	0.20	2.35	4.82	6.41	7.69	7.71
QARO00253	2.97	1.37	1.61	3.61	6.53	10.85	39.00
QARS00238	4.86	1.21	1.17	3.91	5.54	6.61	2.73
QARS00249	3.43	1.29	4.57	8.67	11.47	13.87	15.66
HIRM00386	1.15	0.94	0.97	2.79	3.67	5.20	4.72	RRF
HIRM00388	1.66	0.93	1.45	0.98	2.19	2.62	20.67
HIRS00379	0.84	1.07	0.98	4.03	6.33	9.22	11.53
MDRM00212	0.65	5.18	6.26	3.46	3.02	2.09	10.18
MDRO00166	2.87	1.42	1.53	2.24	1.86	1.54	21.57
MDRO00194	2.27	1.23	1.73	3.18	4.41	6.41	5.34
QARM00256	2.42	0.65	1.29	1.05	1.68	1.29	29.24
QARO00253	1.15	1.00	2.54	6.44	8.14	10.95	13.39
QARS00238	2.58	0.61	1.37	1.38	3.52	4.17	28.80
QARS00249	2.93	1.24	2.92	3.11	5.04	6.70	38.96
HIRM00386	0.50	0.84	0.47	0.62	0.50	0.85	3.24	MFFNN
HIRM00388	0.40	0.96	1.21	0.52	0.43	0.48	1.91
HIRS00379	0.98	0.38	0.98	0.53	0.50	0.42	1.35
MDRM00212	0.39	1.39	1.88	0.73	0.53	0.75	0.92
MDRO00166	0.26	0.40	0.48	0.34	0.39	0.61	3.27
MDRO00194	0.91	0.67	0.59	0.53	0.37	0.99	0.93
QARM00256	0.86	0.76	0.73	0.99	1.10	1.22	2.90
QARO00253	1.09	0.61	1.21	0.97	0.52	0.59	2.65
QARS00238	0.68	0.62	0.57	0.36	0.72	0.54	1.91
QARS00249	1.30	1.04	0.56	0.60	0.83	0.88	5.63

Note: Each cell reports the RMSE (

m m / h r

) of the indicated machine learning model relative to the statistical IDF reference for the corresponding station and return period. Cell shading represents the RMSE magnitude, where darker (red) cells indicate higher RMSE (poorer agreement) and lighter cells indicate lower RMSE (better agreement).

Table 11. Best-performing machine learning (ML) model based on RMSE across stations and return periods (relative to statistical IDF).

Station ID	Return Period (Years)
Station ID	2-yr	5-yr	10-yr	25-yr	50-yr	100-yr	1000-yr
HIRM00386	0.50 MFFNN	0.84 MFFNN	0.47 MFFNN	0.62 MFFNN	0.50 MFFNN	0.85 MFFNN	3.24 MFFNN
HIRM00388	0.40 MFFNN	0.93 RRF	1.21 MFFNN	0.52 MFFNN	0.43 MFFNN	0.48 MFFNN	1.91 MFFNN
HIRS00379	0.84 RRF	0.38 MFFNN	0.98 MFFNN	0.53 MFFNN	0.37 MLR	0.42 MFFNN	1.35 MFFNN
MDRM00212	0.39 MFFNN	1.39 MFFNN	1.88 MFFNN	0.73 MFFNN	0.53 MFFNN	0.75 MFFNN	0.92 MFFNN
MDRO00166	0.26 MFFNN	0.40 MFFNN	0.48 MFFNN	0.34 MFFNN	0.39 MFFNN	0.61 MFFNN	3.27 MFFNN
MDRO00194	0.91 MFFNN	0.17 MLR	0.59 MFFNN	0.53 MFFNN	0.15 MLR	0.99 MFFNN	0.93 MFFNN
QARM00256	0.86 MFFNN	0.20 MLR	0.73 MFFNN	0.99 MFFNN	1.10 MFFNN	1.22 MFFNN	2.90 MFFNN
QARO00253	1.09 MFFNN	0.61 MFFNN	1.21 MFFNN	0.97 MFFNN	0.52 MFFNN	0.59 MFFNN	2.65 MFFNN
QARS00238	0.68 MFFNN	0.61 RRF	0.57 MFFNN	0.36 MFFNN	0.72 MFFNN	0.54 MFFNN	1.91 MFFNN
QARS00249	1.30 MFFNN	1.04 MFFNN	0.56 MFFNN	0.60 MFFNN	0.83 MFFNN	0.88 MFFNN	5.63 MFFNN

Note: Each cell reports the minimum RMSE (

m m / h r

) among the evaluated ML models for a given station and return period. Cell colors represent a heat scale of RMSE magnitude, where darker (red) cells indicate higher RMSE (poorer agreement with the statistical IDF reference) and lighter cells indicate lower RMSE (better agreement).

Table 12. Station-wise mean bias (mm/hr) of machine learning (ML) models across return periods.

Station ID	Return Period (Years)							ML Model
Station ID	2	5	10	25	50	100	1000	ML Model
HIRM00386	1.72	−0.69	−1.51	−1.52	−0.65	1.12	16.02	MLR
HIRM00388	1.52	−1.28	−3.11	−5.23	−6.53	−7.47	−5.93
HIRS00379	0.77	−1.64	−2.13	−1.37	0.27	2.93	21.38
MDRM00212	0.56	−3.37	−5.09	−6.10	−5.85	−4.57	9.97
MDRO00166	3.43	1.09	−0.48	−2.34	−3.50	−4.34	−2.58
MDRO00194	2.83	0.13	−0.87	−0.91	0.11	2.19	19.63
QARM00256	2.44	−0.14	−1.70	−3.49	−4.64	−5.57	−5.58
QARO00253	2.15	0.99	1.16	2.62	4.73	7.86	28.24
QARS00238	3.52	0.88	−0.85	−2.83	−4.01	−4.79	−1.98
QARS00249	2.48	−0.93	−3.31	−6.28	−8.31	−10.04	−11.34
HIRM00386	0.77	−0.22	0.36	1.84	2.55	3.29	3.42	RRF
HIRM00388	1.21	0.02	−0.26	−0.07	−0.38	−1.92	−12.80
HIRS00379	0.42	−0.43	0.51	2.70	4.32	5.97	9.08
MDRM00212	−0.24	−3.27	−3.78	−2.50	−2.00	−1.43	−4.49
MDRO00166	2.11	1.10	1.02	1.44	0.37	−1.01	−12.74
MDRO00194	1.68	−0.06	0.09	2.15	3.15	4.17	4.38
QARM00256	1.42	−0.19	−0.48	−0.04	−0.19	−0.76	−18.67
QARO00253	0.72	0.73	2.05	4.45	5.75	7.29	10.52
QARS00238	1.73	−0.01	−0.37	−0.45	−1.37	−2.64	−17.97
QARS00249	2.07	−0.42	−1.50	−1.92	−3.00	−4.48	−25.72
HIRM00386	−0.33	−0.51	−0.07	0.03	0.22	0.18	0.93	MFFNN
HIRM00388	0.09	0.38	0.48	0.15	0.08	−0.11	0.27
HIRS00379	−0.01	0.00	0.35	0.03	−0.12	−0.02	−0.01
MDRM00212	−0.06	−0.71	−0.67	0.15	0.19	0.34	−0.33
MDRO00166	0.03	0.15	0.26	−0.14	0.13	0.00	−1.00
MDRO00194	0.20	0.17	0.09	−0.29	−0.07	0.13	−0.40
QARM00256	0.47	−0.15	−0.07	−0.42	−0.56	−0.65	−0.33
QARO00253	−0.65	−0.05	0.57	0.07	0.10	0.17	1.22
QARS00238	0.41	−0.16	−0.24	−0.07	−0.14	0.26	0.69
QARS00249	0.82	0.02	−0.16	−0.33	−0.24	−0.22	−1.68

Note: Each cell reports the mean bias (

m m / h r

) of the indicated machine learning model relative to the statistical IDF reference for the corresponding station and return period. Positive values indicate overestimation of rainfall intensity (red), whereas negative values indicate underestimation relative to the statistical reference (blue).

Table 13. Best-performing machine learning (ML) model based on mean bias across stations and return periods (relative to statistical IDF).

Station ID	Return Period (Years)
Station ID	2-yr	5-yr	10-yr	25-yr	50-yr	100-yr	1000-yr
HIRM00386	−0.33 MFFNN	−0.22 RRF	−0.07 MFFNN	0.03 MFFNN	0.22 MFFNN	0.18 MFFNN	0.93 MFFNN
HIRM00388	0.09 MFFNN	0.02 RRF	−0.26 RRF	−0.07 RRF	0.08 MFFNN	−0.11 MFFNN	0.27 MFFNN
HIRS00379	−0.01 MFFNN	0.01 MFFNN	0.35 MFFNN	0.03 MFFNN	−0.12 MFFNN	−0.02 MFFNN	−0.01 MFFNN
MDRM00212	−0.06 MFFNN	−0.71 MFFNN	−0.67 MFFNN	0.15 MFFNN	0.19 MFFNN	0.34 MFFNN	−0.33 MFFNN
MDRO00166	0.03 MFFNN	0.15 MFFNN	0.26 MFFNN	−0.14 MFFNN	0.13 MFFNN	0.01 MFFNN	−1 MFFNN
MDRO00194	0.2 MFFNN	−0.06 RRF	0.09 RRF	−0.29 MFFNN	−0.07 MFFNN	0.13 MFFNN	−0.4 MFFNN
QARM00256	0.47 MFFNN	−0.14 MLR	−0.07 MFFNN	−0.04 RRF	−0.19 RRF	−0.65 MFFNN	−0.33 MFFNN
QARO00253	−0.65 MFFNN	−0.05 MFFNN	0.57 MFFNN	0.07 MFFNN	0.1 MFFNN	0.17 MFFNN	1.22 MFFNN
QARS00238	0.41 MFFNN	−0.01 RRF	−0.24 MFFNN	−0.07 MFFNN	−0.14 MFFNN	0.26 MFFNN	0.69 MFFNN
QARS00249	0.82 MFFNN	0.02 MFFNN	−0.16 MFFNN	−0.33 MFFNN	−0.24 MFFNN	−0.22 MFFNN	−1.68 MFFNN

Note: Each cell reports the bias value closest to zero among the evaluated machine learning models for a given station and return period. Values closest to zero indicate better agreement with the statistical IDF reference, while larger absolute values represent stronger systematic overestimation (red) or underestimation (blue).

Table 14. Combined machine learning performance metrics across statistical distributions.

Statistical Distribution	Model	Average RMSE	Average Signed Bias
Lognormal	MLR	5.44	−3.02
Lognormal	RRF	5.96	−2.78
Lognormal	MFFNN	1.00	−0.01
Gamma	MLR	7.34	5.06
Gamma	RRF	4.87	3.36
Gamma	MFFNN	0.90	0.09
GEV	MLR	4.65	−2.67
GEV	RRF	5.37	−2.70
GEV	MFFNN	1.22	−0.24
Gumbel	MLR	5.80	0.00
Gumbel	RRF	3.59	−0.41
Gumbel	MFFNN	0.97	−0.05
Weibull	MLR	6.02	2.89
Weibull	RRF	4.86	3.22
Weibull	MFFNN	0.73	0.03

Note: (1) Average RMSE: Indicates the average prediction error relative to the statistical IDF reference; higher values represent larger deviations. Red shading highlights higher RMSE values. (2) Average Signed Bias: Indicates systematic overestimation (positive values) or underestimation (negative values) of rainfall intensity; values closer to zero indicate better agreement with the statistical IDF reference. Blue shading highlights negative bias values.

Table 15. Summary of best-performing ML models based on RMSE and mean bias across all stations and return periods.

Model	Best RMSE Cases	Best Mean Bias Cases	Total Best Cases
MLR	4	1	5
RRF	3	9	10
MFFNN	63	60	123

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Alhbib, I.T.; Elsebaie, I.H.; Alhathloul, S.H. Statistical Development of Rainfall IDF Curves and Machine Learning-Based Bias Assessment: A Case Study of Wadi Al-Rummah, Saudi Arabia. Hydrology 2026, 13, 96. https://doi.org/10.3390/hydrology13030096

AMA Style

Alhbib IT, Elsebaie IH, Alhathloul SH. Statistical Development of Rainfall IDF Curves and Machine Learning-Based Bias Assessment: A Case Study of Wadi Al-Rummah, Saudi Arabia. Hydrology. 2026; 13(3):96. https://doi.org/10.3390/hydrology13030096

Chicago/Turabian Style

Alhbib, Ibrahim T., Ibrahim H. Elsebaie, and Saleh H. Alhathloul. 2026. "Statistical Development of Rainfall IDF Curves and Machine Learning-Based Bias Assessment: A Case Study of Wadi Al-Rummah, Saudi Arabia" Hydrology 13, no. 3: 96. https://doi.org/10.3390/hydrology13030096

APA Style

Alhbib, I. T., Elsebaie, I. H., & Alhathloul, S. H. (2026). Statistical Development of Rainfall IDF Curves and Machine Learning-Based Bias Assessment: A Case Study of Wadi Al-Rummah, Saudi Arabia. Hydrology, 13(3), 96. https://doi.org/10.3390/hydrology13030096

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Statistical Development of Rainfall IDF Curves and Machine Learning-Based Bias Assessment: A Case Study of Wadi Al-Rummah, Saudi Arabia

Abstract

1. Introduction

2. Study Area

3. Data and Methods

3.1. Rainfall Data and Station Network

3.2. Missing Data Reconstruction

3.3. Statistical Frequency Analysis

3.4. Machine Learning Bias Assessment Framework

4. Results and Discussion

4.1. Best-Fit Probability Distributions Across Stations

4.2. Statistical IDF Curves and Spatial Characteristics

4.3. Machine Learning Bias Assessment Results

4.4. Discussion and Implications

5. Conclusions

Limitations and Future Work

Supplementary Materials

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI