1. Introduction
Spatial rainfall distribution plays a significant role in hydrological studies. It has important impacts on, e.g., floods occurrence, crop production, urban drainage systems, and hydraulic structures design. It also influences decision-making and risk assessment in environmental management. Arid regions are especially vulnerable to extreme hydrological events such as floods and droughts [
1]. Providing accurate and gap-free rainfall records is vital for effective flood mitigation and water resource management in these and the downstream areas. However, these areas often suffer from insufficient precipitation records. To overcome this challenge, more advanced techniques such as satellite data and geostatistical interpolation need to be utilized. These methods, however, should be assessed and validated to optimize the design processes in water resource management.
Water scarcity is a global challenge for human development and the achievement of economic goals. The world is suffering from increasing demand of water resources due to the increase in urbanization, industrial activities, the world’s human population, and the overexploitation of aquifers [
2,
3]. In total, 700 million people suffer from the shortage of drinking water, and more than 40% of the world’s population suffers from water scarcity [
4]. The increasing stress of water resources in arid regions due to the increasing population and climate change causes increasing water scarcity [
2]. Moreover, the occurrence of floods poses a threat to lives and economics. On 10 September 2023, storm Daniel caused landfall in Libya. The massive flooding killed more than 4300 people, while more than 8500 people are still missing. On 7 March 2024, deadly floods struck West Sumatra in Indonesia, leading to the damage of homes and forcing people to migrate. The maximum yearly precipitation forms the backbone for the design of flood mitigation measures, while the total yearly precipitation is fundamental for sustainable water resource management. Filling the gaps in rainfall data can contribute to managing the water resources and meeting the increasing needs of fresh water resources.
Rainfall distribution is affected by mountainous terrain. The complex topography of mountainous regions makes the spatial rainfall distribution different from plain regions [
5]. There are different spatial interpolation methods used to model rainfall that depend on the data of sparse stations to predict rainfall distribution but differ in their mathematical concepts. Spatial interpolation predicts values at locations with no observations [
6]. Interpolation methods can be divided into two main groups: deterministic and geostatistical approaches. Deterministic interpolation techniques, like inverse distance weighting and Thiessen polygons, specify values to locations according to the neighboring observed values and to mathematical formulas that set the degree of smoothness of the generated surface. Geostatistical methods like kriging depend on statistical models that contain autocorrelation (statistical relationship among the measured points). Hence, geostatistical approaches not only generate a prediction surface but also provide some measure of the accuracy of the estimations. Deterministic methods do not use probability theory or an indication of the extent of possible errors, while geostatistical methods provide probabilistic estimates or use the concept of randomness [
7,
8]. Kriging is the most used geostatistical method for spatial interpolation, in which the neighboring observed values are weighted to produce an estimated value for an unmeasured point. Weights depend on the distance between the observed locations, the estimation points, and the spatial arrangement among the observed points. The weights are calculated such that points nearby to target locations are given more weight than those farther away. Kriging methods do not only consider the distance between the measured values but also capture the spatial structure in the data; they depend on a spatial model between observations defined by a variogram. Kriging methods are divided into two categories: univariate and multivariate methods. The methods that can use secondary information are called “multivariate”, while those that ignore secondary information are known as “univariate” methods [
9].
Table 1 shows a summary of different geostatistical and deterministic interpolation techniques, while
Table 2 presents univariate and multivariate kriging methods.
A variogram is a visual representation of the covariance between each two points in the sampled data, also called a semivariogram. The semivariance (i.e., gamma value) is the value of the half mean-squared difference between observations. The semivariance is plotted against the distance (i.e., lag) between each two points in the sampled data [
10]. There are different variogram models, while the most commonly used are exponential, spherical, linear, Gaussian, and circular [
11].
Spatial interpolation of annual rainfall was performed in South Africa using different interpolation techniques (universal kriging, ordinary kriging, co-kriging, and IDW); however, the cross-validation was applied to determine the best interpolation approach [
12]. The best performance was produced by ordinary kriging. The results showed that the kriging methods outperformed IDW. For the variogram models, the circular model was the best regardless of the technique used. Kyriakidis et al. [
13] conducted a study to map rainfall distribution from rain gauge data using different interpolation techniques in the coastal region of northern California that has the characteristics of extreme seasonal variability in rainfall and complex terrain topography. The results showed that using secondary information such as terrain and atmospheric characteristics in predicting the rainfall distribution could improve estimates and result in more accurate representations of rainfall distribution. Moreover, the magnitude of estimation improvement depends on the spatial variability of the rainfall field, the density of rain gauge stations, and the degree of correlation between rainfall and predictors.
Sadeghi et al. [
14] performed a study to predict rainfall distribution in Iran. The study used the average data of rainfall from 35 stations and rainfall observations for the period from 1982 to 2012 using different interpolation methods (i.e., kriging, co-kriging, local polynomial interpolation, global polynomial interpolation, RBF, and IDW). These techniques were evaluated using cross-validation. The results showed that the simple co-kriging (exponential) method was the most suitable method to predict rainfall distribution over the study area, while IDW with power 5 had the poorest performance. The study found that correlation existed between elevation variations and the precision of co-kriging method. Haberlandt [
15] applied multivariate methods (i.e., indicator kriging with an external drift (IKED) and kriging with an external drift (KED)) to predict the spatial distribution of hourly rainfall from rain gauges with the use of secondary information from elevation, rainfall from a daily network, and radar, in a region of 25,000 km
2 in southeast Germany. The cross-validation method was utilized to compare the performances of IKED and KED incorporating secondary information with univariate techniques (Thiessen polygon, IDW, ordinary indicator kriging (IK), and ordinary kriging (OK)). The results showed that the multivariate techniques IKED and KED obviously outperformed the univariate techniques due to the use of secondary information from elevation, daily rainfall from the network, and radar. In addition, the best result was produced when all secondary information were utilized together with KED.
Rata et al. [
16] compared three interpolation approaches to mapping the annual rainfall distribution of Cheliff watershed, Algeria. The mean annual rainfall data with elevation from 58 stations for the period from 1972 to 2012 were used during the study. Moreover, OK, KED, and regression kriging (RK) were considered for the study. The interpolation approaches used were compared utilizing the cross-validation method. It was found that KED was the best interpolation technique to represent mean annual rainfall distribution for the study area. Goovaerts [
17] used three multivariate interpolation methods (co-located co-kriging, kriging with an external drift, and simple kriging with varying local means) to incorporate elevation into the spatial interpolation of rainfall in a region of 5000 km
2 in Portugal. The data of monthly and annual rainfall from 36 climatic stations were used, and cross-validation was also utilized to compare the multivariate methods with three univariate ones (ordinary kriging, inverse square distance, and Thiessen polygon). The results showed that the three multivariate interpolation methods outperformed the univariate ones, as higher prediction errors were produced by the univariate techniques that ignore rainfall measurements at neighboring stations and elevation.
Frequency analysis is utilized to estimate the periodic occurrence of quantities of rainfall that are predicted in the future. Data from rainfall frequency analysis are vital for many sectors of water resource engineering such as dam and sewage system design and flood mitigation [
18,
19]. Rainfall frequency analysis is often performed with the use of maximum yearly rainfall series [
20]. The quantity of rainfall over a certain location in each period is characterized by a high variation in time. This variation is dependent on, e.g., the length of the given period, climate type, and topographic conditions. Arid areas usually display higher variation. Normally, management and design of flood mitigation systems, hydraulic structures, and irrigation water supply are dependent on rainfall amounts that are predicted for a certain return period. This rainfall is determined using frequency analysis of long time series [
21]. The return period in the frequency analysis of precipitation is the average time interval between the occurrence of rainfall with a specific quantity or intensity [
22]. Rainfall records in frequency analysis are considered random variables that are based on identical distribution and independent variables [
21,
22]. The frequency analysis is performed using different probability distributions such as two-parameter distributions (gamma and Gumbel) and three-parameter distributions (generalized Pareto (GPA), Pearson type III (PE-III), generalized normal (GNO), generalized extreme value (GEV), and log-Pearson type III (LP-III)) [
23].
For the determination of the goodness of fit for probability distributions, the chi-squared test can be utilized [
24,
25]. The Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC) are also commonly utilized techniques for model selection [
26,
27]. Assessment of the performance of probability distributions for the frequency analysis of rainfall was carried out utilizing rainfall data for the period from 1973 to 2012 for Dois Vizinhos, Brazil [
28]. The study showed that Weibull and gamma distribution were the most suitable. Waghaye et al. [
29] compared probability distributions for monthly rainfall data from 1972 to 2001 at Adilabad district of Telangana, India. The chi-squared test was conducted, and the gamma, GEV, and Gumbel distributions were the best. Yuan et al. [
30] identified five probability distributions (i.e., gamma, Gumbel, normal, log-Pearson type III, and log-normal) to predict the frequency analysis of annual maximum hourly rainfall for the period from 1981 to 2000 for 15 locations in Japan. The chi-squared test was considered for evaluating the goodness of fit. It was found that log-Pearson type III was the most suitable for the data.
Based on the above, it is obvious that geospatial interpolation is important to overcome insufficient rainfall gauge data, especially in arid mountainous regions. Therefore, the objectives of the current study were (1) using different geospatial interpolation models to fill the precipitation data gaps in the mountainous region in arid northern Oman based on data from 279 rain gauges spanning from 1975 to 2009, (2) comparing the performance of the investigated geospatial interpolation methods, (3) generating spatial distributions for annual maximum and total annual rainfall over the study area using different geospatial interpolation techniques, and (4) performing frequency analysis on the annual maximum rainfall considering different probability distributions to produce predictions for the annual maximum rainfall with different return periods over the study area.
3. Results and Discussion
The interpolation methods produced different spatial rainfall distribution patterns as shown in
Figure 6; the colored legend goes from blue for low rainfall values to red for the highest rainfall values.
Figure 6 shows a slight difference between the generated rainfall distribution pattern by each method, and the results showed that the OCK exponential method outperformed the other three methods (i.e., EBK L, OK stable, and RBF ST) in representing the spatial distribution of the maximum rainfall for year 1991 based on the cross-validation results, as it had a lower RMSE and a higher R.
For maximum yearly rainfall interpolation, the results showed that the kriging methods outperformed the deterministic ones (IDW and RBF). IDW with power 2 outperformed the other power values. The spline with the tension basis function was more efficient than the thin-plate spline. In average, universal co-kriging using the J Bessel variogram model was superior to the other methods. The exponential model performed as well as the circular model, while the stable model had the lowest performance. IDW with power 5 and the RBF method with the thin-plate spline basis function had the poorest performance in predicting the rainfall distribution over the study area. The results of the cross-validation for the interpolation methods used for maximum rainfall for the year 1999 are shown in
Table 5.
Figure 7 and
Figure 8 show the RMSE and R variability with interpolation methods for the maximum rainfall for the year 1999.
The results of maximum rainfall for 1999 showed that OK using circular variogram model was the best method at predicting the rainfall distribution as it had the lowest RMSE (17.02) and the highest R (0.63). Also, EBK P had the second best performance, with RMSE 17.11 and R 0.62. IDW with power 2 performed better than the other power values. For the RBF method, the spline with the tension basis function outperformed the thin-plate spline basis function. The RBF method with the thin-plate spline basis function had the poorest performance as it had the highest RMSE (20.53) and lowest R (0.52).
Figure 7 and
Figure 8 indicate that the RBF TPS method clearly produced less accuracy compared to other interpolation techniques, with the highest RMSE and lowest R.
Figure 9 shows the spatial distribution of the maximum rainfall for the year 1999 produced by the OK circular method. The colored legend goes from blue for low rainfall values to red for the highest rainfall values.
Figure 9 shows that the rainfall distribution is uneven, the western and northern areas show low rainfall values, while the high rainfall values are concentrated in the southern area.
Table 6 shows the best interpolation method for annual daily maximum rainfall for each year for the period 1975–2009. In
Table 6, it is clearly noted that geostatistical methods outperformed deterministic ones in predicting maximum yearly rainfall distribution over the study area.
For total annual rainfall interpolation, the kriging methods outperformed the deterministic ones (IDW and RBF). IDW with power 2 outperformed the other power values. The RBF method performed better than IDW. The spline with the tension basis function was more efficient than the thin-plate spline. On average, ordinary co-kriging and universal co-kriging using the K Bessel variogram model were superior to the other methods. The J Bessel and K Bessel variogram models performed well, while the Gaussian and exponential models had the lowest performance. IDW with power 5 and the RBF method with the thin-plate spline basis function had the poorest performance in predicting the rainfall distribution over the study area. The results of the cross-validation of the interpolation methods used for total rainfall for the year 2000 are shown in
Table 7.
Figure 10 and
Figure 11 show the RMSE and R variability with interpolation method for the total rainfall for 2000.
The results for the total annual rainfall in 2000 showed that OK using the J Bessel variogram model and UK with the J Bessel variogram had the best performance in predicting the rainfall distribution as they had the lowest RMSE (47.77 mm/year) and the highest R (0.81). Also, OK using the exponential variogram and UK with the exponential variogram performed similarly well. IDW with power 2 performed better than the other power values. For the RBF method, the spline with the tension basis function outperformed the thin-plate spline basis function. The RBF method with the thin-plate spline basis function had the poorest performance as it had the highest RMSE (64.82 mm/year) and the lowest R (0.69). It is noted from
Figure 10 and
Figure 11 that the RBF TPS method obviously had less accuracy with respect to other interpolation techniques, with the highest RMSE and the lowest R.
Figure 12 shows the spatial distribution of the total rainfall for 2000 produced by the UK J Bessel method. The colored legend goes from blue for low rainfall values to red for the highest rainfall values. It appears from
Figure 12 that the rainfall distribution is uneven; the northern and southern areas are of low rainfall values, while the high rainfall values are concentrated in the western area.
Table 8 shows the best interpolation method for total yearly rainfall for each year for the period from 1975 to 2009. In general, the results obviously showed that geostatistical methods outperformed deterministic ones in predicting maximum yearly and total yearly rainfall over the study area.
Figure 13 shows that for kriging methods, the J Bessel variogram model was superior to the other variogram models in predicting maximum yearly and total yearly rainfall over the study area regardless of the kriging method used, while the Gaussian model had the poorest performance.
A frequency analysis of the maximum yearly rainfall was performed for the original data and for the data after filling the gaps. For the original data analysis, 50 stations were excluded from the analysis as they had less than 10 records and they would not produce appropriate results.
Table 9 shows the results of the frequency analysis of the original sampled rainfall data for return periods of 2, 5, 10, 25, 50, and 100 years.
Table 10 shows the frequency analysis of the data after filling the gaps. It appears that the WEI2 distribution was the best to fit the data for both the original data analysis and gap-free data analysis as it had the lowest AIC and BIC.
Figure 14 shows a comparison between the probability distributions in fitting the rainfall data. In general, the frequency analysis results showed that the WEI2 distribution was the best to fit the data, followed by the gamma distribution, while the LN3 distribution had the poorest performance.
Table 11 shows the ratio between the average of the predicted rainfall values by the gap-free data analysis and the average of the predicted values by the original data analysis, it was noticed that for the return period of 2 years, the average of the predicted rainfall values using the original data analysis was lower than the average of the predicted rainfall values using the gap-free data analysis. Also, for the other return periods, the average of the predicted values using the original data analysis was higher than the average of the predicted values using the gap-free data analysis.
Figure 15 represents the trend in the frequency analysis results for the original sampled data analysis and gap-free data analysis. It appears that for return periods greater than 2 years, the predicted rainfall value from the original data analysis was higher than that obtained by the gap-free data analysis.
4. Conclusions and Recommendations
In the current research, the performance of various interpolation methods in predicting the spatial distribution of rainfall over the mountainous region of Oman was evaluated. The results demonstrated that geostatistical interpolation techniques (i.e., SK, SCK, OK, OCK, UK, UCK, and EBK) outperformed deterministic interpolation techniques (i.e., IDW and the RBF) in generating the spatial distribution of maximum and total yearly records over the study area in Oman.
Universal co-kriging was the most effective method at predicting the maximum yearly rainfall distribution across the study area. On the other hand, both ordinary co-kriging and universal co-kriging showed comparable performance in interpolating the total yearly rainfall distribution in Oman. These findings highlight the reliability of co-kriging methods, particularly when incorporating additional variables such as elevation, for accurate spatial rainfall estimation in complex terrains. The J Bessel variogram outperformed the other six evaluated variograms (i.e., exponential, circular, Gaussian, stable, spherical, and K Bessel) in representing the spatial variance of the rainfall records.
The frequency analysis was conducted using data from the available rain gauges, and the two-parameter Weibull distribution outperformed the other seven tested statistical distributions (i.e., GEV, LN3, P-III, EV1, LN2, EXPN, and gamma) in predicting the design storm for different return periods (i.e., 2, 5, 10, 25, 50, and 100 years). The results indicated that for return periods greater than two years, the corresponding rainfall depth derived from the raw rain gauge data (before filling data gaps) was higher than that obtained from the gap-filled data. This suggests that flood protection design based on rainfall depths derived from raw data without gap filling is on the conservative side, ensuring a higher margin of safety.