Regional Ombrian Curves: Design Rainfall Estimation for a Spatially Diverse Rainfall Regime

: Ombrian curves, i.e., curves linking rainfall intensity to return period and time scale, are well-established engineering tools crucial to the design against stormwaters and ﬂoods. Though the at-site construction of such curves is considered a standard hydrological task, it is a rather challenging one when large regions are of interest. Regional modeling of ombrian curves is particularly complex due to the need to account for spatial dependence together with the increased variability of rainfall extremes in space. We develop a framework for the parsimonious modeling of the extreme rainfall properties at any point in a given area. This is achieved by assuming a common ombrian model structure, except for a spatially varying scale parameter which is itself modeled by a spatial smoothing model for the 24 h average annual rainfall maxima that employs elevation as an additional explanatory variable. The ﬁtting is performed on the pooled all-stations data using an advanced estimation procedure (K-moments) that allows both for reliable high-order moment estimation and simultaneous handling of space-dependence bias. The methodology is applied in the Thessaly region, a 13,700 km 2 water district of Greece characterized by varying topography and hydrometeorological properties.


Introduction
Ombrian (from the Greek word 'óµβρoς' meaning rainfall) curves are a standard engineering tool in the form of a mathematical relationship linking rainfall intensity to timescale and return period, usually known as 'intensity-duration-frequency' curves. This term, albeit widely used, appears to be a misnomer, considering that 'duration' does not refer to the actual duration of a rainfall event but rather to the (arbitrary) time scale of averaging the rainfall intensity, while 'frequency' is not meant to be frequency but its reciprocal, i.e., return period. To oppose this common confusion (and having in mind the Aristotelian principle that science presupposes clarity-or saphenia [1]), the term 'ombrian curves' has been used as an alternative name in the past [2][3][4][5] and has been adopted here as well.
Ombrian curves have been used in hydrology since the works of Sherman [6] and Bernard [7] and own their popularity to their practical benefits when design problems affected by rainfall extremes are of interest. Although the typical curves have been constructed mostly following an empirical fashion, over the past decades, there have been several attempts to provide a theoretical basis for their modeling, e.g., [8][9][10], with the most recent advance being their full upgrade to multi-scale models of rainfall intensity [2]. At this point, it could be argued that the issue of deriving the curves for single sites has been efficiently tackled both at the practical level, with diverse methodologies providing satisfactory results for small scales, of the order of minutes to a few days (see the review by Frameworks based on the first approach are older yet still in use, e.g., [12][13][14], as this approach is easier to apply. Indeed, the task of constructing the curves for a single site is more straightforward, and the same is true for the mapping of the parameters in space, given the ample availability of geostatistical software. However, an important limitation of this approach is that results are very sensitive to data from single stations, which are often short and fragmented and can be impacted by large sampling uncertainty. A further issue with automated spatial interpolation methods is that their structure is obscure and cannot be easily modified, making their results not always interpretable. For instance, it is known that both the number and spatial distribution of available stations impact the reliability of the geostatistical method, but the former is not easy to assess. In this respect, Malamos and Koutsoyiannis [15] note that kriging requires a large number of available data points (at least 100, according to Oliver and Webster, [16], or 50-100 according to other studies [17]), in order to produce a reliable estimation of the variogram. The same authors propose a bilinear surface smoothing (BSS) model [18,19] that is shown to have superior performance to Universal kriging in terms of bias and heteroscedastic behaviors of the data, as well as a more interpretable theoretical structure.
On the other hand, having a single spatial model is a theoretically more powerful approach since it allows simultaneously using all observations to limit uncertainty by 'substituting space for time'; i.e., the principle behind the well-known 'regional frequency analysis' [20]. Nonetheless, the simultaneous use of all stations in the estimation process is challenging, for it requires a rigorous estimation framework to handle the underlying assumptions that control the information content of the pooled records. These are related to the different record lengths of the stations and the presence or not of spatial dependence among them. The most well-known framework of this type is the regional frequency estimation based on the L-moments [20,21]. Yet the latter is founded on the inter-site independence assumption and has been shown to decrease in accuracy as the level of dependence increases [22]. To explicitly address the effect of the spatial dependence in the data, a new regional frequency estimation framework has been proposed that allows high-order properties estimation from spatially correlated data by means of 'knowable' moments' (K-moments of high order [2,23]). Separate space dependence models have been employed in other approaches (e.g., [24]). Evidently, the issue of regional frequency analysis is still an evolving research subject.
Aside from the choice of the theoretical framework for the regional estimation, the formulation of a regionalization approach for the extreme rainfall properties is an additional demanding task that has to be performed in approach (b). The literature on the different types of regionalization techniques for frequency analysis is vast and extensively covered by previous works [11,25]. The simpler approach considered is the estimation of a common frequency distribution by directly pooling together all stations of a homogenous region [26]. This is a reasonable approach for small areas or areas with limited spatial variability but is less efficient for complex regimes. In such cases, an extension of the same concept is to delineate the region by identifying homogenous sub-regions and pool the data within them, allowing for the possible variation only of a site-specific scale parameter, known as the 'index-rainfall'. This is a widely used approach [11,20,25,27,28], inspired by a similar method in flood frequency analysis, the so-called 'index-flood' method [29]. Considerable limitations of the approach relate to the subjectivity of detecting clusters in complex regimes and the arising of spatial discontinuities at the clusters' boundaries. Remediation to these issues is the 'region of influence approach', which, instead of using fixed boundaries regions, employs individual regions of varying boundaries centered at the site of interest [30]. Irrespective of the arrangements of each method, the need to model the spatial variability of the site-specific parameter(s) still emerges and is often treated through standard interpolation and geostatistical methods as in approach (a).
In this sense, several frameworks resort to a combination of the two approaches (e.g., [28,31,32]). Namely, the data are pooled together for the distribution fitting after being proportionally adjusted by a site-specific parameter, which itself is obtained for a given grid by interpolation methods. This approach bypasses the difficulty of identifying sub-regions based on hydrological similarity and is characteristically called 'regionless' or 'boundaryless' [25,32].
The scope of this work is to construct regional ombrian curves by exploiting and combining recent advances in the different methodological components of the analysis and integrating them into a new framework for regional frequency analysis of rainfall extremes. The latter is based on: 1.
An advanced estimation method of extreme values based on high-order moment estimation while respecting space dependence [2].

2.
A couple of flexible spatial smoothing models [18,19] to describe the spatial variability of extreme rainfall without resorting to uncontrolled interpolation.

3.
A formulation of ombrian curves, which is revisited through recent theoretical developments in the field [2]. This is the first time that the K-moments framework, by now used in frequency estimation for several hydrometeorological variables [23,33,34], has been put into practice in regional frequency estimation of rainfall extremes. This is also the first time that the BSS model framework has been employed for the regionalization of extreme rainfall. As a proof-of-concept, the framework is applied in the region of Thessaly (Greece), utilizing data from 55 stations over a 13,700 km 2 basin area. The large spatial scale, together with the hydrological complexity of the case study, form a challenging test for the methodology, from which new insights into regional rainfall frequency analysis are gained.

Mathematical Form of the Ombrian Relationship
Koutsoyiannis ([2]; Chapter 8) recently developed a framework advancing typical ombrian curves to stochastic models of rainfall intensity, valid over any scale supported by the data. This type of ombrian model arises directly from the stochastic properties (dependence structure and marginal distribution) of rainfall intensity and their (nonsimple) scaling behavior, and, as such, it can also be applied for simulation of the rainfall process [35]. The framework in [2] can be applied at any time scale, arbitrarily large, to produce the ombrian relationship linking rainfall intensity x to any timescale k and return period T. We note, though, that for large time scales the mathematics becomes somewhat involved. Here we apply the framework only for small time scales, for which a Pareto distribution for the non-zero rainfall intensity is justified. (For larger scales, this should be replaced by a Pareto-Burr-Feller distribution.) In this case, the Pareto distribution quantile is given as [2]: 1 is the probability wet, λ(k) is a scale parameter and ξ is the tail-index of the Pareto distribution. Both P (k) 1 and λ(k) are functions of the timescale obtained as [2]: where µ is the mean intensity (constant at all time scales) and γ(k) is the climacogram of the process, i.e., the function of the variance across timescale, which can follow different models [2]. A simplification of Equations (1)- (3) is possible based on the following assumptions stemming from the fine-scale behavior of rainfall. For small time scales, of the order of minutes to a few days: 1 ∝ k, and hence we can set the quantity β(k) := k/P (k) , and thus we can neglect the latter term in their sum. • we select the generalized Cauchy-type model for the climacogram: where α and λ 1 are scale parameters, with dimensions of time [t] and [x], respectively, and H, M are dimensionless parameters in the interval (0, 1), controlling the long-range (Hurst-Kolmogorov dynamics) and local scaling of the process (fractal behavior) of the process, respectively. For M we take the neutral value M = 1/2 as default. We note though that if the focus is on even smaller temporal scales, this value (M = 1/2) can be inappropriate. These simplifying assumptions result in some violations of a full stochastic consistency, as detailed in [2]. However, at small scales, the violations are negligible [2,3]. By virtue of these simplifications, the ombrian relationship is given as: Setting λ = (1/2 − ξ)λ 2 1 / ξµ and η = 2 − 2H, Equation (5) can be rewritten as: where the following five parameters are involved, λ an intensity scale parameter in units of x (e.g., mm/h), β a timescale parameter in units of the return period (e.g., years), α a timescale parameter in units of timescale (e.g., h) with α ≥ 0, η a dimensionless parameter with 0 < η < 1, and ξ > 0 the tail index of the process. It is easily observed that Equation (6) can be written concisely as the quotient of two separable functions b(T) and a(k) of the return period and the timescale, respectively, in the form: Hydrology 2022, 9, 67 5 of 22 with a(k) = 1 + k α η (8) being a function of time scale, in wide use as an approximation [2,36] but here resulting as a consequence of the climacogram, and b(T) a function of the return period that is analytically derived from the distribution function of the rainfall intensity.
In the case that the return period of the rainfall intensity is determined based on rainfall exceedances extracted from the full series (i.e., Peaks over Threshold, POT), a Pareto distribution can be generally assumed for modeling the rainfall intensity, as implied in Equation (6). However, if the return period is determined based on series of annual maxima (AM) of rainfall intensity, then both long-term empirical evidence and theoretical arguments support the use of the Extreme Value Type 2 (EV2) distribution from the Generalized Extreme Value (GEV) distribution family [37][38][39]: where ψ (dimensionless), ν > 0 (units same as in y) and ξ > 0 (dimensionless) are location, scale, and shape parameters, respectively. It should be mentioned that the case of ξ < 0 is not appropriate for maximum rainfall, since it presumes the existence of an upper limit for the variable, which is inconsistent with the physical reality. Furthermore, the case of ξ = 0, i.e., assuming a Gumbel (Extreme Value Type 1-EV1) distribution for the maximum rainfall intensity, is also not supported by worldwide empirical evidence and is to be avoided in general [37]. Therefore, it is not developed herein, but further details for this case are given in Koutsoyiannis [2]. Equivalently, the EV2 distribution as given by Equation (9) can be re-parameterized consistently to Equation (6) as follows: where β = (1 − ξψ) 1/ξ ∆ and λ = (1 − ξψ) ν/ξ and ξ > 0. The variable y represents either the rainfall intensity x or, equivalently, the product x a(k) (Equation (7)). Solving Equation (10) in terms of y and substituting F(y) = 1 − ∆/T, where ∆ = 1 year for annual maxima, yields: Therefore, by substituting Equations (8) and (11) in (7), the following generalized form of ombrian curves for annual maxima is derived: It is easily shown that for small return periods, Equation (6) deriving from a Pareto distribution yields higher intensity than Equation (12), whereas for larger return periods (T > 10 years), the two are practically indistinguishable, given that for small ∆/T holds ln [1 − (∆/T)] = −(∆/T) − (∆/T) 2 − · · · ≈ −∆/T. Therefore, even in the case that the model fitting is based on Equation (12), i.e., when annual maxima are used (which is also the case here), it is safer, from an engineering point of view, to express the final model used to obtain design rainfall in the form of Equation (6). Yet, the availability of either POT or AM series determines which of Equation (6) or (12), respectively, will be used for model fitting. Following a slightly different parameterization, Equations (6) and (12) were also proposed by Koutsoyiannis et al. [8] for small scales, which nevertheless are sufficient for most engineering applications of ombrian curves, namely those involving flood analyses.
The advantage of this simplified approach is precisely the separability of a(k) and b(T) functions that allows for a two-step procedure for the parameter estimation. This turns out to be convenient for typical applications and even more for regional analyses, as will be shown next. It also has attractive flexibility in using different sources of data. Namely, a reliable determination of the timescale parameters α and η requires data from fine-scale records, whereas the parameters of the function b(T), including the highly uncertain tail index, are better inferred from daily raingauge data, which are usually more systematic and less prone to erroneous recordings [8].

Regionalization Method: Bilinear Surface Smoothing Models for the 24 h Average Annual Rainfall Maxima
The target of the regional model is to generalize Equation (6) or (12) in space, achieving their applicability for any set of coordinates in a given area. To efficiently describe the spatial heterogeneity in a region without resorting to a great number of parameters and uncontrolled interpolation, we should make an assumption on which parameters we consider as regionally varying. It is reasonable to begin by applying diagnostics for the spatial heterogeneity of rainfall in the study area, both in terms of the a(k) and b(T) functions' parameters, bearing in mind that selected parameters should be reliably estimated from single-site data. Following this rationale and supported by the available data, which in our case are series of annual maxima at different temporal scales, as shown in Section 3.2, we choose to spatially model the parameter λ of Equation (12). This corresponds to the scale parameter of the EV2 distribution, which is proportional to the mean value of the process, i.e., the average annual maxima at each location for any timescale. Since the transformation between timescales is controlled by the a(k) function, it suffices to model the average maxima at a single, convenient timescale. We choose the 24 h scale due to the much greater availability of daily data in the study region. Depending on the need to apply further complexity, the location parameter of the EV2 distribution could be another option for spatial modeling. Yet it is not advisable to spatially model the highly uncertain shape parameter (i.e., the tail index) based on single-site data unless long-term empirical evidence supports a spatial variation thereof. Parameters α and η, which control the timescale transformations of the curves, are very sensitive to the existence of sub-daily data [8], which are scarce in the study region, and thus, they do not constitute good choices for spatial modeling either. For these reasons, we apply common values for the rest of the parameters over the whole area. The choice of the mean of the AM distribution as the parameter to be regionalized has been proven to be a robust choice in regional frequency analysis, dating back to the index-flood method [29] and several applications thereafter (see [11]).
Having chosen to spatially model only the λ parameter related to the mean value of the annual maxima series, we have to identify a model for the spatial variation. As already discussed in the introduction, a large region with complex topography cannot be efficiently treated as a homogenous area. In this case, instead of identifying several sub-regions, which may give rise to abrupt changes in the final results with questionable physical basis, it is better to explicitly model the process at any given point in the area of interest. Towards this 'regionless' modeling approach, we apply a framework of spatial smoothing modeling that is described next.
We apply two versions of a bilinear surface smoothing model proposed by Malamos and Koutsoyiannis [18,19], generalizing in 2D a previous numerical smoothing and interpolation method [40,41]. A brief overview of the mathematical framework is presented since it is detailed in the aforementioned publications. The general idea behind both methods is to compromise the trade-off between the objectives of minimizing the fitting error and the roughness of the fitted bilinear surface, therefore termed bilinear surface smoothing (BSS). The larger the weight of the first objective, the rougher the surface will appear, while the opposite is true for a larger weight of the second objective.
The mathematical framework of BSS suggests that fitting is meant in terms of minimizing the generalized cross-validation error (GCV; [42]) between the set of the given data points and the corresponding estimates. The general estimation function,ẑ u , for point u on a plane, according to the BSS method, is:ẑ where d u is the value of the fitted bilinear surface d at that point. The BSS method can be extended by the introduction of an additional explanatory variable (bilinear surface smoothing with an explanatory variable; BSSE) at a denser dataset compared to that of the main variable, as follows. We assume that at the locations of the given data points, we also know the value of an explanatory variable t, and therefore for each point z there corresponds a value t. In this case, the general estimation function for point u is:ẑ where d u , e u are the values of two fitted bilinear surfaces at that point, namely d and e, while t u is the value of the explanatory variable at that point. This is not a global linear relationship but a local linear one as the quantities d u and e u change in space.
In the case of the BSS, there are four adjustable parameters for surface d: the numbers of intervals along the horizontal and vertical direction, respectively, i.e., mx, my, and the corresponding smoothing parameters τ λx and τ λy . The incorporation of the explanatory variable for the BSSE case adds two more adjustable parameters: the smoothing parameters τ µx and τ µy corresponding to surface e. The values of all the smoothing parameters are restricted in the interval [0, 1) for both directions [18]. When the smoothing parameters are close to 1, the resulting bilinear surfaces exhibit greater smoothness, whereas, for small values of these parameters, interpolation among the known points is obtained.
A desirable feature of the method for regional analyses is the fact that it is proven reliable even in the case of few and scarce data, in contrast to common geostatistical methods that require a denser data network to be applied reliably (e.g., to estimate a semivariogram) [15]. It is important to note that the method is also parsimonious in terms of the number of parameters and the choices involved in the modeling. This is evident when compared to the standard kriging framework, in which one has to decide among the n different types of the method, among the 4 n standard variogram types (i.e., spherical, exponential, Gaussian and power), and also identify the values for the range, sill, and nugget, resulting in a total of 12 n choices, depending on the selected number of kriging methods. This increases the complexity of the approach and may increase the subjectivity as well, given that an objective framework for basing these decisions is lacking.

Bilinear Surface Smoothing Model Parameters Estimation
As mentioned, the parameter estimation methodology for both the BSS and BSSE methods is based on the minimization of GCV error, and therefore, there is no effect in terms of heteroscedasticity. For a given combination of the bilinear surface segments, mx and my, the minimization of GCV error results in the optimal values of τ λx , τ λy and τ µx , τ µy . This can be repeated for several trial combinations of mx and my values until the global minimum of GCV is reached.
Both variants of the method are applied, one taking into account only the coordinates of the stations (BSS) and the second one exploiting as well the elevation of the stations (BSSE) as an additional explanatory variable. In the second case, a digital elevation model of the wider area (Thessaly and neighboring areas) with 90 m resolution at the equator (SRTM; [43]) is employed both for extracting the elevation at the stations' coordinates and for making estimations for any point in the given grid.
For the objective evaluation of the two methods, two statistical indices, the root mean square error (RMSE) and the mean absolute error (MAE), are compared in terms of (a) performance using all data and (b) the leave-one-out cross-validation performance, i.e., when the estimation at each coordinate set is performed by omitting the known value at that position. This analysis is presented in Section 4.1.

Timescale Parameters Estimation
The simplified version of the ombrian model also allows for a simplified fitting procedure. By utilizing the separability of functions a(k) and b(T) in this version, an independent, two-step fitting approach can be used, as introduced by Koutsoyiannis et al. [8]. Namely, Equation (12) can be written as: From this expression, it is easy to see that for the different timescales k j the stochastic variables y j := a k j x = (1 + k/α) η x have a common distribution function, with the y j for the different k j being samples of it. Let then, y ji := a k j x ji of length n = ∑ j n j denote the merged sample of all sub-samples x ji of size n j corresponding to timescale k j . Let also r ji denote the rank of each sub-sample x ji in the merged sample y ji so that the mean rank of each sub-sample is given as r j = ∑ i r ji /n j . Replacing all r ji with the mean rank value r j we obtain a sample of n values, with n 1 equal to r 1 , n 2 equal to r 2 etc. Then the mean and variance estimators are, respectively: If no ties are present among the different ranks, then r = (n + 1)/2. Following the assumption that the samples are from the same distribution, given by the right-hand side of Equation (15), then each r j should be close to the mean r while the variance should be minimal. Therefore, we can find the parameters α and η as the values that minimize the estimate of the variance γ r from the observations x ji . The original values y ji could be used as well instead of the ranks, yet the use of the ranks makes the estimation process more robust to outliers. In order to improve the fit to the higher quantile region, we could also use a part of the data of each sample belonging to the highest 1/2 or 1/3 of the data [8]. In this study, the highest 1/2 is used. The limited availability of sub-daily stations, in combination with their short records, hinder the reliable estimation of the time scale function parameters at each station separately. It is also pointed out that parameter α is very sensitive to small scales intensities and ideally requires sub-hourly data to be reliably estimated [8]. Given that few stations have data at such scales, to limit uncertainty, we apply the methodology described above simultaneously to the sample of all fine-scale raingauges. In particular, we identify the set of the time scale parameter values as the one that minimizes the sum of all stations' variances, each of which is given by Equation (17).

Regional Estimation of Distribution Parameters through K-Moments
Having estimated the α and η parameters, it remains to specify the parameters of the b(T) function through the following procedure. We form the pooled sample comprising all stations' annual rainfall maxima at the 24 h scale after first standardizing (dividing) them by their theoretical mean value, given by the spatial smoothing model. To the pooled standardized sample of annual maxima, we fit the EV2 distribution using the method of the non-central K-moments [2,23]. K-moments are newly proposed moments developed with the aims of being knowable for very large orders (depending on the sample size) and also interpretable in terms of order statistics. They are closely related to the probabilityweighted moments [44], but their formulation is simpler and more intuitive, which are attractive qualities similar to the ones of the L-moments [21]. The distinctive feature of K-moments, though, is that they are tailored to perform extreme-oriented analyses, as they enable reliable estimation of very high-order moments. What is more, each high-order K-moment estimate can be directly assigned a return period, which provides a direct means to empirical estimation of probability, alternative to order statistics. Furthermore, this estimation can be appropriately modified in the case that there is bias due to dependence [2]. In particular, K-moments allow straightforward estimation of high-order moments even for spatially dependent data, which is a rare property. Namely, typical regional applications of L-moments do not go beyond the 4th moment (L-kurtosis) estimation. This advantage of K-moments is exploited in the regional frequency analysis, which is particularly sensitive to high-order moments.
Koutsoyiannis [2,23] has introduced several variants of K-moments, of which here we use the simplest non-central variant, defined as: for the moment order p ≥ 1. K p has the important property that it equals the expected value of the maximum of p independent stochastic variables identical to x, i.e., The estimators of the non-central K-moments are given by the following formulae: where x (i:n) is the ith smallest variable in a sample x, of size n, (the ith item of the sample in ascending order) and p is the order of the moment, which can be any positive number p ≤ n. In addition, the following holds: The fact that b inp = 0 for i < p means that as the moment order increases, fewer data are used in the estimation, until only one is left, the maximum, when p = n, and b nnn = 1. For p > n, b inp = 0 for every i, 1 ≤ i ≤ n, and the estimation becomes impossible. The first order non-central K-moment is the mean value of the sample.
K-moment values can be assigned a return period as follows [2]: where Λ 1 , Λ ∞ are coefficients depending on the distribution function. For the EV2 distribution it is shown [2] that the Λ coefficients are functions of the shape parameter ξ: For validation purposes, the following relationship of empirical return periods based on order statistics is also used, which is shown to provide an unbiased estimate of the logarithm of the return period [2]: The procedure outlined above could be directly applied for assigning return periods to the K-moments of any single station, and the parameters of the EV2 distribution could be obtained by minimizing an error metric (e.g., MAE or RMSE) between the theoretical quantiles and the empirical K-moments, or between the corresponding return periods. However, for the pooled sample, the resulting information gain, and thus the maximum return period that can be estimated from the data, is a function of the sample's dependence structure. It is well known that in the case of cross-correlated variables, the quantity of information for the variable corresponds to a smaller sample of length compared to the case of independence but still greater than that of an individual station.
The framework of K-moments allows for the effect of dependence to be explicitly accounted for in the estimation of the return period. This is achieved through proper modification of the order of the moments of the unified sample, p', which in turn modifies the estimation of the return period. In particular, let n 1 denote the sample length of each station, m denote the number of stations, and n = m n 1 denote the size of the merged sample, and then the following methodology is applied [2]: For p > n 1 the following approximation is used. We estimate the equivalent Hurst parameter H, based on the spatial correlation of the stations ρ: Based on the estimated H the following coefficient is used for bias correction, Θ HK : Then the modified orders of the moments are obtained as: and their corresponding return periods are adjusted accordingly based on Equation (23).
It is obvious that n 1 controls the maximum moment order, which is unaffected by dependence. In the case that the stations have different lengths, n 1 can be estimated as the average record length of all stations (here, n 1 = 42). An increased value of n 1 suggests that the information gain is also increasing, as fewer return periods are modified downwards, while the opposite is true when n 1 decreases. In order to bypass the uncertainty regarding the modification of the return periods based on the estimated correlation structure, a good strategy is to use for model calibration only the moment orders up to n 1 , and employ the higher moments for validation. In so doing, the moments used in the calibration are still much more than the ones used in regular moment fitting procedures (typically up to 3 or 4 orders), while a second-moment set is also available for validation.
To transform the parameters of the EV2 distribution (expressed as in Equation (10)) of the standardized 24 h rainfall maxima to the final ombrian b(T) parameters, we use the following procedure. Let u T denote the 24 h annual maximum rainfall value for return period T standardized by its mean value µ. Then the rainfall intensity for any station at the 24 h timescale is x (1 + 24/α) η , whose distribution defines the function b(T) of the ombrian relationship will be y T = µ u T (1 + 24/α) η /24. Consequently, the variable y shares the same distribution function with the variable u with the same shape and location parameters, and scale parameter proportional to the one of u by a factor of µ (1 + 24/α) η /24, i.e., where the subscript u denotes the standardized rainfall maxima at the 24 h scale and α is expressed in h. It is recalled that the mean value µ for each location used in the standardization is derived from the BSS/BSSE models. In this way, parameters α and η, which are estimated simultaneously from all stations, in combination with the parameters of the distribution of the standardized maximum 24 h rainfall values and the spatially modeled mean value of the process by the BSS/BSSE models fully determine the ombrian curves, as given by Equations (6) and (12).

Study Area
The study region is the geographic area of the Water District (WD) of Thessaly (~13,700 km 2 ) which is one of the 14 WD of Greece. The district extends mostly within the administrative region of Thessaly, while it also includes a small part of the region of Central Greece and a small part of the Western and Central Macedonia region. The topography of the area is characterized by the existence of four mountain ranges in its perimeters, Olympus-Kamvounia in the north, Pindus in the west, Othrys in the south, and Pelion-Ossa in the east, which surround the Thessalian plain that rests in the central area ( Figure 1a). The Thessalian plain contains the largest part of the water bodies of the district and is traversed by the Pineios river and its tributaries. It is also the largest agricultural area in Greece, with lowland topography making it prone to frequent and heavy flooding [45,46]. A recent flood event of the 18-19 September 2020 triggered by a Mediterranean cyclone has caused human and livestock losses and extensive structural and agricultural damages to the area, sparking a revitalization of the decade-long initiatives for improving the area's flood protection design and strategy [47]. The climate of the Western region is continental, while the Eastern region has a typical Mediterranean climate, while the rainfall pattern exhibits strong differences between the lowlands and the mountain regions [48]. These characteristics of the study region, namely its hydrometeorological diversity, vast spatial extent, and criticality of flood risk, make it a challenging case study for the application of the methodology.

Data Processing and Quality Control
The construction of ombrian curves is based on rainfall intensity data at a range of timescales, typically starting from fine scales, i.e., 5 to 60 min, and extending to the 24 or 48 h scale for common applications. To this aim, we assemble a set of 17 rainfall records from tipping buckets and telemetric stations, recording data at the 5-30 min timescale and 61 rainfall records from daily raingauges. The data are obtained from the Public Power Corporation (PPC) of Greece, the Hellenic Ministry of Environment and Energy (HMEE), the Hellenic Ministry of Agricultural Development and Food (HMADF), the Hellenic Ministry of Agriculture (HMA), the Hellenic National Meteorological Service (HNMS) and the meteo network [49] of the National Observatory of Athens. The properties of the stations are detailed in Tables S1 and S2 of the Supplementary material.
We aggregate the original series at a range of timescales from 5 min to 48 h, with k =0.08, 0.17, 0.25, 0.5, 1, 2, 6, 12, 24, 48 h (depending on data available at the finest scale), and we extract the maximum rainfall depth at each scale h (k) for all hydrological years. Accordingly, we compute the corresponding rainfall intensity at the given scale as x (k) = h (k) /k, thus deriving the empirical rainfall intensities corresponding to the annual maxima of the hydrological years. The reason for using AM series instead of POT or even the full data series for the estimation of the extremes is that several historical records are available only in this form.

Data Processing and Quality Control
The construction of ombrian curves is based on rainfall intensity data at a range of timescales, typically starting from fine scales, i.e., 5 to 60 min, and extending to the 24 or 48 h scale for common applications. To this aim, we assemble a set of 17 rainfall records from tipping buckets and telemetric stations, recording data at the 5-30 min timescale and 61 rainfall records from daily raingauges. The data are obtained from the Public Power Corporation (PPC) of Greece, the Hellenic Ministry of Environment and Energy (HMEE), the Hellenic Ministry of Agricultural Development and Food (HMADF), the Hellenic Ministry of Agriculture (HMA), the Hellenic National Meteorological Service (HNMS) and the meteo network [49] of the National Observatory of Athens. The properties of the stations are detailed in Tables S1 and S2 of the Supplementary material.
We aggregate the original series at a range of timescales from 5 min to 48 h, with = 0.08, 0.17, 0.25, 0.5, 1, 2, 6, 12, 24, 48 h (depending on data available at the finest scale), and we extract the maximum rainfall depth at each scale ( ) for all hydrological years. Accordingly, we compute the corresponding rainfall intensity at the given scale as ( ) = ( ) / , thus deriving the empirical rainfall intensities corresponding to the annual maxima of the hydrological years. The reason for using AM series instead of POT or even the full data series for the estimation of the extremes is that several historical records are available only in this form. We note that the choice of the starting point for the aggregation is arbitrary, and a change thereof would likely result in a different estimate. For this reason, it was a common hydrological practice in the past to either take the maximum estimate resulting from all possible positions of the starting point or 'inflate' the given estimate by a specific factor, known as the Hershfield coefficient [50]. Although this practice aims for safer estimates from an engineering point of view, in theory, all realizations of a stochastic process are equivalent, and there is no theoretical basis to 'correct' them. In fact, by correcting the series, we distort its stochastic properties and, instead of studying ( ) , the behavior of ( ) : = max ( ) , = 0, … − 1 is studied, which is a different stochastic process [2].
Hence, we do not apply the Hershfield coefficient.
To ensure a good quality dataset for our analysis, we use stations that have at least 12 years of data, and we undertake quality checks based on hydrological experience in the study area. In particular, we perform spatial consistency checks excluding stations with We note that the choice of the starting point for the aggregation is arbitrary, and a change thereof would likely result in adifferent estimate. For this reason, it was a common hydrological practice in the past to either take the maximum estimate resulting from all possible positions of the starting point or 'inflate' the given estimate by a specific factor, known as the Hershfield coefficient [50]. Although this practice aims for safer estimates from an engineering point of view, in theory, all realizations of a stochastic process are equivalent, and there is no theoretical basis to 'correct' them. In fact, by correcting the series, we distort its stochastic properties and, instead of studying x τ (k) , the behavior of w τ (k) := max j x τ+j (k) , j = 0, . . . k − 1 is studied, which is a different stochastic process [2].
Hence, we do not apply the Hershfield coefficient.
To ensure a good quality dataset for our analysis, we use stations that have at least 12 years of data, and we undertake quality checks based on hydrological experience in the study area. In particular, we perform spatial consistency checks excluding stations with systematically lower recordings in comparison to neighboring ones. In addition, we perform hydrological consistency checks, i.e., to ensure that single-site empirical maximum rainfall is consistent with hydrological experience worldwide, suggesting an unbounded right tail of sub-exponential type [37,38]. We note that poorly maintained raingauge records (e.g., in remote mountainous areas) sometimes exhibit maximum rainfall recordings of the same (or nearly the same) amount due to spillage effects during storm events. In this case, a bounded GEV distribution might falsely emerge.
After screening with these criteria and excluding stations with severe inconsistencies, the resulting set of stations includes 48 daily raingauge stations, 7 of which are at locations gauged by tipping buckets as well, and 14 sub-hourly tipping bucket/telemetric stations. The estimation of the extreme properties and of the average 24 h AM rainfall (Sections 2.3 and 2.5) is based on the set of daily raingauge stations due to the latter being more and of larger record lengths. Yet, maximum rainfall data at the 24 h scale are also employed from the fine-scale rainfall stations when daily raingauge data are not available at the same location. Taking the latter into account, the distribution properties of the maximum rainfall are estimated using a combined set of 55 samples of 24 h annual rainfall maxima, whose spatial distribution is depicted in Figure 1b. The set of the 14 fine-scale rainfall stations is used in the estimation of the timescale parameters (Section 2.4).

Fitting of the Bilinear Surface Smoothing Models
Before fitting the regional model, we explore the spatial variability of the rainfall regime by identifying the variations in the mean and standard deviations of the rainfall intensity and the possible association with the elevation of the stations. In Figure 2, it can be seen that the first two moments of the rainfall intensity across scales follow a similar statistical behavior, and, therefore, it is reasonable to apply common timescale parameters (i.e., the function a(k)). In terms of the average annual maxima at the 24 h scale, there also appears to be a positive association with the stations' elevation, although this is not verified in all cases ( Figure 3). Therefore, it seems that the elevation might serve as an explanatory variable, but it needs to be incorporated into a more general spatial model identifying additional patterns of the rainfall maxima in space. gauged by tipping buckets as well, and 14 sub-hourly tipping bucket/telemetric stations. The estimation of the extreme properties and of the average 24 h AM rainfall (Sections 2.3 and 2.5) is based on the set of daily raingauge stations due to the latter being more and of larger record lengths. Yet, maximum rainfall data at the 24 h scale are also employed from the fine-scale rainfall stations when daily raingauge data are not available at the same location. Taking the latter into account, the distribution properties of the maximum rainfall are estimated using a combined set of 55 samples of 24 h annual rainfall maxima, whose spatial distribution is depicted in Figure 1b. The set of the 14 fine-scale rainfall stations is used in the estimation of the timescale parameters (Section 2.4).

Fitting of the Bilinear Surface Smoothing Models
Before fitting the regional model, we explore the spatial variability of the rainfall regime by identifying the variations in the mean and standard deviations of the rainfall intensity and the possible association with the elevation of the stations. In Figure 2, it can be seen that the first two moments of the rainfall intensity across scales follow a similar statistical behavior, and, therefore, it is reasonable to apply common timescale parameters (i.e., the function ( )). In terms of the average annual maxima at the 24 h scale, there also appears to be a positive association with the stations' elevation, although this is not verified in all cases ( Figure 3). Therefore, it seems that the elevation might serve as an explanatory variable, but it needs to be incorporated into a more general spatial model identifying additional patterns of the rainfall maxima in space.   To explore the suitability of elevation as an explanatory variable in an objective manner, we evaluate its performance within the BSS/BSSE model framework, comparing the results from both versions. In the BSSE case, the stations' altitudes are derived from a digital elevation model of the area (Thessaly and neighboring areas) with 90 m resolution at the equator (SRTM; [43]), which is also used for the estimation of the average maximum rainfall at each point in space.
The parameters deriving from the optimization are mx = 3, my = 5, = 0.550, = 0.005 for the BSS model and mx = 4, my = 12, = 0.068, = 0.001, = 0.621, = 0.451 for the BSSE model. In order to compare the model fits we compute RMSE and MAE (a) for the fit of both models using all the data and (b) for the fit applying the leave-oneout cross validation method. Results are shown in Table 1.  To explore the suitability of elevation as an explanatory variable in an objective manner, we evaluate its performance within the BSS/BSSE model framework, comparing the results from both versions. In the BSSE case, the stations' altitudes are derived from a digital elevation model of the area (Thessaly and neighboring areas) with 90 m resolution at the equator (SRTM; [43]), which is also used for the estimation of the average maximum rainfall at each point in space.
The parameters deriving from the optimization are mx = 3, my = 5, τ λx = 0.550, τ λy = 0.005 for the BSS model and mx = 4, my = 12, τ λx = 0.068, τ λy = 0.001, τ µx = 0.621, τ µy = 0.451 for the BSSE model. In order to compare the model fits we compute RMSE and MAE (a) for the fit of both models using all the data and (b) for the fit applying the leave-one-out cross validation method. Results are shown in Table 1. It is seen that the BSSE model involving elevation as an additional explanatory variable is found superior in both comparisons, which is expected by hydrological experience in the area and also supported by previous applications for annual rainfall in Central Greece [19]. Accordingly, the BSSE model is applied to estimate the 24 h average annual maxima in the center points of a 2 × 2 km grid of the study region, as shown in Figure 4. It is observed that both the Thessaly plain and the surrounding mountain ranges are strongly identified in the resulting rainfall patterns.

Construction of the Regional Ombrian Curves
To estimate a common value of the timescale parameters that would be representative of all stations, we optimize the fit for all stations by simultaneously minimizing the sum of all 14 variances as estimated from each station by Equation (17). This optimization results to parameters = 0.03 and = 0.64 which are considered representative for the whole region.
To estimate the distribution parameters, we first divide each annual maxima value at the 24 h scale by its modeled mean value as estimated by the BSSE model (Figure 4). To take space dependence into account, as explained in Section 2.5, we compute the spatial correlation of the 55 standardized annual maxima series at the 24 h scale. This is estimated to be = 0.17 corresponding to = 0.61 (Equation (27))-a moderate value. The standardized series are then unified into one for the estimation of the ( ) parameters.

Construction of the Regional Ombrian Curves
To estimate a common value of the timescale parameters that would be representative of all stations, we optimize the fit for all stations by simultaneously minimizing the sum of all 14 variances as estimated from each station by Equation (17). This optimization results to parameters α = 0.03 and η = 0.64 which are considered representative for the whole region.
To estimate the distribution parameters, we first divide each annual maxima value at the 24 h scale by its modeled mean value as estimated by the BSSE model (Figure 4). To take space dependence into account, as explained in Section 2.5, we compute the spatial correlation of the 55 standardized annual maxima series at the 24 h scale. This is estimated to be ρ = 0.17 corresponding to H = 0.61 (Equation (27))-a moderate value. The standardized series are then unified into one for the estimation of the b(T) parameters.
As explained above, we determine n 1 as the mean record length of the stations (n 1 = 42), following the rationale outlined in Section 2.5. We recall that n 1 is equal to the maximum moment order, which need not be modified for spatial dependence bias. Higher moment orders (corresponding to higher return periods) are adapted for spatial dependence. In the approach we follow herein, we choose to fit the model only up to the moment order not impacted by dependence bias. This set of 42 moments is to be used as a calibration set, while we also use the remaining higher 2305 moments as a validation set ( Figure 5).  Table 2, whereas the spatial distribution of the re parameter is shown in Figure 6. Note that the values are analog maxima values predicted by the BSSE model (Figure 4), as implied by Table 2. Ombrian parameters , , , of Equations (6) and (12). Parame derived at any point in space through the BSSE model.  Using the method of the non-central K-moments, we fit the EV2 distribution to the unified sample of all standardized annual maxima at the 24 h scale minimizing the MAE between the empirical first 42 K-moments and the respective quantiles of the EV2 distribution. In Figure 5, it is seen that the fit is excellent for all 42 K-moments (MAE = 0.00489, RMSE = 0.00471), which constitute the calibration set, and there is also good agreement between the theoretical and empirical moments for higher orders, albeit some deviations in the area of higher return periods. Still, we have to note that very high orders are impacted by the spatial dependence structure of the data, whose estimation is in turn impacted by higher uncertainty. In any case, the use of higher return periods as a validation set proves that the fitting is robust.
Taking into account the results of the BSSE model and following the parameter transformations described in Section 2.5, the values of the four common parameters are derived as shown in Table 2, whereas the spatial distribution of the regionally varying λ parameter is shown in Figure 6. Note that the λ values are analogous to the average maxima values predicted by the BSSE model (Figure 4), as implied by Equation (30). Table 2. Ombrian parameters α, η, ξ, β of Equations (6) and (12). Parameter λ is analytically derived at any point in space through the BSSE model.
their theoretical values by the EV2 distribution and the corresponding return periods.
Taking into account the results of the BSSE model and following the parameter transformations described in Section 2.5, the values of the four common parameters are derived as shown in Table 2, whereas the spatial distribution of the regionally varying parameter is shown in Figure 6. Note that the values are analogous to the average maxima values predicted by the BSSE model (Figure 4), as implied by Equation (30). Table 2. Ombrian parameters , , , of Equations (6) and (12). Parameter is analytically derived at any point in space through the BSSE model.

At-Site Verification
To ensure that the spatial model is in agreement with the at-site empirical behavior, we compare the theoretical to the empirical curves as derived for various stations in characteristic locations of the WD. In Figure 7, we show the theoretical quantiles (as given by Equation (12)) of the rainfall intensities for the daily raingauges at six locations representative of the different rainfall regimes of the area, while in Figure 8, the same plots are shown for a wider range of scales that are available from the sub-daily rainfall stations. For the return period estimation of the empirical intensities, we also plot the estimates from the order statistics by Equation (26), in addition to the K-moments. Figures 7 and 8 show that the empirical distribution functions are generally in good agreement with the theoretical ones, with some notable yet non-systematic deviations in the area of higher return periods. The presence of measurement uncertainty is also evident in certain deviations, in the area of higher return periods, between the empirical intensities estimated from the daily raingauges and the sub-daily resolution gauges (Figure 8). The latter are, however, of shorter length compared to the daily gauges.
Taking into account the large sampling variability of rainfall and the spatial extent of the area, the results are deemed acceptable. Yet there are also a few cases in which the model does not capture well the single stations' behavior due to spatial differences between neighboring sites. Such an example is shown in Figure 9 for two stations in the Pertouli area, where it becomes apparent that the approach favors modeling the spatially average behavior between the two stations. It is obvious that in such cases of spatial uncertainty, a model perfectly capturing single stations' behaviors becomes less relevant; rather, the importance of robustness in the regional framework is highlighted. from the order statistics by Equation (26), in addition to the K-moments. Figures 7 and 8 show that the empirical distribution functions are generally in good agreement with the theoretical ones, with some notable yet non-systematic deviations in the area of higher return periods. The presence of measurement uncertainty is also evident in certain deviations, in the area of higher return periods, between the empirical intensities estimated from the daily raingauges and the sub-daily resolution gauges (Figure 8). The latter are, however, of shorter length compared to the daily gauges.  (e) (f) the area, the results are deemed acceptable. Yet there are also a few cases in which the model does not capture well the single stations' behavior due to spatial differences between neighboring sites. Such an example is shown in Figure 9 for two stations in the Pertouli area, where it becomes apparent that the approach favors modeling the spatially average behavior between the two stations. It is obvious that in such cases of spatial uncertainty, a model perfectly capturing single stations' behaviors becomes less relevant; rather, the importance of robustness in the regional framework is highlighted.
(a) (b) Figure 9. Example of the model fitting to two neighboring stations (from (a) meteo and (b) HMEE) at the Pertouli area with significant deviations in the stations' recordings.

Discussion and Conclusions
Ombrian curves have been around in hydrological engineering for approximately a century and are considered a standard task for at-site modeling. In this work, we address the more complex problem of constructing ombrian curves at regional scales that are of practical interest to the hydrologist in regional flood studies and in the common case that single-site data are either not available for the catchment of interest or the catchment is too large to characterize based on single-site data, e.g., [51]. Even more, constructing the curves by regional fitting is a powerful approach to limit estimation/sampling uncertainty resulting from short-length single records. As such, regional rainfall modeling has been an active research field in hydrological literature. The approach devised herein aims to create a framework for regional ombrian curves that incorporates recent advances in the field of regional frequency analysis and regionalization approaches within a theoretically consistent formulation of ombrian curves. The approach is tested in a challenging case study of the Thessaly WD in Greece, which shows a large variability of rainfall patterns stemming from its complex topography and large extent (~13,700 km 2 ).
The curves are constructed following the newly revisited mathematical formulation of single-site curves by Koutsoyiannis [2] coupled with a new regionalization approach that is developed herein. Four common parameters are identified, and one spatially varying scale parameter is employed. The site-specific scale parameter is explicitly modeled by a spatial smoothing model (BSSE) that employs elevation as an additional

Discussion and Conclusions
Ombrian curves have been around in hydrological engineering for approximately a century and are considered a standard task for at-site modeling. In this work, we address the more complex problem of constructing ombrian curves at regional scales that are of practical interest to the hydrologist in regional flood studies and in the common case that single-site data are either not available for the catchment of interest or the catchment is too large to characterize based on single-site data, e.g., [51]. Even more, constructing the curves by regional fitting is a powerful approach to limit estimation/sampling uncertainty resulting from short-length single records. As such, regional rainfall modeling has been an active research field in hydrological literature. The approach devised herein aims to create a framework for regional ombrian curves that incorporates recent advances in the field of regional frequency analysis and regionalization approaches within a theoretically consistent formulation of ombrian curves. The approach is tested in a challenging case study of the Thessaly WD in Greece, which shows a large variability of rainfall patterns stemming from its complex topography and large extent (~13,700 km 2 ).
The curves are constructed following the newly revisited mathematical formulation of single-site curves by Koutsoyiannis [2] coupled with a new regionalization approach that is developed herein. Four common parameters are identified, and one spatially varying scale parameter is employed. The site-specific scale parameter is explicitly modeled by a spatial smoothing model (BSSE) that employs elevation as an additional explanatory variable and produces a continuous 2D surface for the average 24 h annual maxima regime. This 2D surface suffices to model the spatial heterogeneity of the curves without involving cluster analysis for delineation of homogenous regions and avoiding related discontinuities and abrupt changes in the parameter space. The result is a model explicitly describing maximum rainfall at any point in a given space, which simplifies hydrological design. The BSSE model is selected for being more interpretable and less involved in parametric choices than common geostatistical software, while its robustness in cases of high spatial uncertainty has been documented [15]. We note that a map of the average maximum regime could also be used instead if already available. Still, the explicit incorporation of a surface smoothing model into the framework guarantees the consistency of the final spatial estimates to the point data.
The approach is also based on recent advances in the analysis of extremes, namely the use of reliable high-order moment estimators that account for the effect of the spatial dependence structure in assigning return periods (K-moments; [2]). This enables a rigorous fitting procedure aiming at minimizing the estimation uncertainty while respecting spatial dependence. To our knowledge, this is the first time that such a high number of moments (42 for estimation and 2305 for validation) is estimated from a sample of spatially correlated data with a provision for space dependence bias.
Results show that the model efficiently captures the spatial variability of extreme rainfall in the area covering scales from 5 min to 48 h, and its estimates are robust even under increased spatial uncertainty due to inconsistencies among the point data, which were present in a few cases.
A few modifications to the present approach would be required if one were to generalize the model over greater time scales, above the order of a few days [2]. While this task is not of direct use to flood estimation and typical applications, it is to be considered in view of a multi-purpose rainfall model at the regional scale. Further research is also required on the effect that spatial dependence exerts on the estimation of high return periods and the accompanying uncertainty bounds. This task is demanding as results are expected to depend on the assumed type and magnitude of the spatial dependence structure. Yet this research represents a first step toward significantly increasing the number of moments that can be justifiably employed in regional analyses of extremes.
Supplementary Materials: The following supporting information can be downloaded at: https:// www.mdpi.com/article/10.3390/hydrology9050067/s1, Table S1: Properties of the daily raingauges (coordinates, elevation, source and record length) used in the analysis, Table S2: Properties (coordinates, elevation, source and record length) of the sub-daily raingauges (tipping-buckets) used in the analysis. Funding: T.I. and D.K. worked on, and were funded by, a related study for floods in Thessaly conducted by G.T.B. Anodos S.A., which included collection, preprocessing and preliminary analysis of data used in this paper.

Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.

Data Availability Statement:
The rainfall data analyzed in this study were obtained, in the frame of a study for floods in Thessaly, from the Public Power Corporation of Greece, the Hellenic Ministry of Environment and Energy, the Hellenic Ministry of Agricultural Development and Food, the Hellenic Ministry of Agriculture, the Hellenic National Meteorological Service and the meteo network of the National Observatory of Athens. Requests to access these datasets should be directed to the respective parties. The SRTM elevation data used are publicly available from http://srtm.csi.cgiar.org (accessed on 27 November 2021).