Evaluating the Performance of a Max-Stable Process for Estimating Intensity-Duration-Frequency Curves

: To explicitly account for asymptotic dependence between rainfall intensity maxima of different accumulation duration, a recent development for estimating Intensity-Duration-Frequency (IDF) curves involves the use of a max-stable process. In our study, we aimed to estimate the impact on the performance of the return levels resulting from an IDF model that accounts for such asymptotical dependence. To investigate this impact, we compared the performance of the return level estimates of two IDF models using the quantile skill index (QSI). One IDF model is based on a max-stable process assuming asymptotic dependence; the other is a simpliﬁed (or reduced) duration-dependent GEV model assuming asymptotic independence. The resulting QSI shows that the overall performance of the two models is very similar, with the max-stable model slightly outperforming the other model for short durations ( d ≤ 10 h). From a simulation study, we conclude that max-stable processes are worth considering for IDF curve estimation when focusing on short durations if the model’s asymptotic dependence can be assumed to be properly captured.


Introduction
Much research has been recently done on the application of multivariate methods to estimate Intensity-Duration-Frequency (IDF) curves. IDF curves are a popular tool among hydrologists to estimate exceedance probabilities of extreme rainfall events with different durations. In broad terms, IDF curves model a relationship between intensities of extreme rainfall events and their frequencies (i.e., return periods) as a function of event duration. A challenge in estimating IDF curves is how to deal with the simultaneous modeling of intensities for different durations, in particular, how to account for the possible dependence that could arise between intensities of different durations.
Initially, the conventional approach to model IDF curves was based on univariate extreme value theory (EVT) models. Early work on the topic estimates extreme value distributions individually for several fixed durations and subsequently fits an empirical relation to quantiles (return levels) as a function of duration [1][2][3][4]. This approach is prone to inconsistencies as the natural ordering of quantiles is not guaranteed to be preserved over all durations (in other words: quantiles cross). To address this problem, Koutsoyiannis et al. [5] suggested a consistent extreme value model for intensities as a function of duration with location and scale being functions of duration. Later on, a couple of studies implemented methods from Bayesian statistics for the univariate relationship between intensity and duration. Lehmann et al. [6] formulated a Bayesian Hierarchical Model (BHM) based on the model from [5], and Van de Vyver [7] presented a multi-scale model using Bayesian inference. More recently, Ritschel et al. [8] used this model to characterize stochastic precipitation models, and Ulrich et al. max-stable approach is able to capture the "strength" of such dependence for the estimated intensities, the model performance should be significantly higher than for a model assuming independence.
The present paper introduces a scheme to evaluate the impact of the asymptotic dependence between rainfall intensities aggregated over different durations on the performance of EVT-based IDF models. This involves a comparison of skill between two IDF models: one accounting for asymptotic dependence between durations, the other assuming independence. The comparison shows that accounting for the dependence between rainfall intensity aggregated over different durations slightly improves the point estimates for long return periods, and particularly for "short" durations (d ≤ 10 h) usually associated with convective phenomena. However, this comes at the price of increased complexity of modeling the asymptotic dependence.

Methods and Data
Our study involves numerical experiments to estimate the relative performance of IDF curves modeled with two approaches: one based on a max-stable process to account for asymptotic dependence (henceforth named as the MS-GEV approach), and another one based on the assumption of independence using the reduced d-GEV model (henceforth named as the rd-GEV approach). By doing so, we aimed to estimate how the performance of IDF curves is affected by considering (or ignoring) the asymptotic dependence between rainfall intensity maxima for different durations.
We performed the study in two broad steps. First, we conducted a case study using data from 6 rain-gauge stations. This data was used to estimate the respective IDF-model parameters from both the MS-GEV and rd-GEV approaches. We compared the performance of both approaches using a measure of skill. In the second step, we introduced synthetic data with known levels of dependence to estimate the effect that the level of dependence has on the resulting estimations. We used the synthetic data to estimate and compare their performance again. Finally, we compared the results from both steps to determine how the performance was affected by the asymptotic dependence between durations. The two different methods used for estimating IDF curves are explained in more detail in the following section. Subsequently, the methods used for verification are described, and, finally, the observation data and the synthetic data are presented.

Estimation of IDF Curves
Let ζ d (t) be the instantaneous rainfall intensity series integrated over a time window of length d, where d is an arbitrary time duration (which commonly ranges from the measurement interval to 72-120 h). Given ζ d (t), we obtain the series of the maximum annual average rainfall intensity for each value of d as where n is the total number of observation years, and y − , y + are the beginning and end of the yth year, respectively. As a rule, k durations d j , j = 1, ..., k are simultaneously used when constructing i(d), with values from the measurement interval to 120 h (depending on the application). The resulting k series can be thought of as a realization of a random variable I(d). Notice that, in Equation (1), d is not a random variable but a parameter for the intensity, as noted by Koutsoyiannis et al. [5]. The construction of i(d) from a single duration series (e.g., hourly precipitation sums) generates a statistical dependence between i(d 1 ) and i(d 2 ) corresponding to different aggregation durations d 1 = d 2 . For example, i(d = 2 h) and i(d = 3 h) show a very high dependence (that is, when one of them has what is considered to be a high value, the other intensity also has a high value). However, as the gap in aggregation duration between the values grows, this dependence diminishes: i(d = 2) and i(d = 24) show almost complete independence. Nadarajah et al. [20] proposed a scheme to work with this type of random variable, which they denoted as ordered random variables. Some authors have linked this concept to the different physical processes that result in different time scales for precipitation events. For example, Muller et al. [14] claimed that the independence between the 1 h and 24-h events were due to the 1-h events being a result of convective (local) motions, while the 24-h event was related to synoptic phenomena. In this paper, we take a closer look at how this dependence affects the estimation of IDF curves down the road.
Following the block-maxima approach, e.g., [21], a GEV can be fitted to each k-series of i(d j ). Then, the return levels z d,T associated with return periods T = 1/p can be calculated for each duration used for the fit d j , where p is denoting the non-exceedance probability with values usually in the range corresponding to upper extreme quantiles. The idea behind IDF curves is to describe the return level z d,T for arbitrary durations d based on the sample i(d j ) in a meaningful way. This typically involves a parametric form of z d,T as a function of d. The choice of the model used for estimating IDF curves depends on the choice of parametric form of z d,T .
For this study, we employed two different approaches for the parametric form of z d,T as a function of d. For the model that assumes asymptotical independence for i(d) for different d, we follow the duration dependent GEV model (d-GEV) of Ritschel et al. [8]. Based on Koutsoyiannis et al. [5], the d-GEV model of Reference Ritschel et al. [8] estimates a GEV simultaneously from annual maxima associated with various durations, thus conceiving the GEV as a function of duration. The d-GEV yields consistent quantiles z d,T , which cannot cross by definition. The other approach is based on a max-stable process for modeling the relationship of z d,T that accounts for the asymptotic dependence between durations.

Using the Duration-Dependent GEV
Following Ritschel et al. [8] and Ulrich et al. [9], we used the duration dependent GEV (d-GEV) to model i(d) with the distribution where σ(d) = σ 0 /(d + ν) η is the duration dependent scale parameter, andμ = µ(d)/σ(d) is the modified location parameter. Given the estimated (µ(d), σ(d), ξ) parameters, it is straightforward to calculate the return level z d,T for any arbitrary duration using To compare the resulting z d,T of this approach with those estimated using the MS-GEV approach, we set the parameter ν = 0. This parameter is related to sub-hourly duration values (0 < d < 1), which we do not consider here. Therefore, the dependence of location and scale parameter on duration follows This results in a model with four parameters to be estimated, namely {μ, σ 0 , ξ, η}. We call this modified distribution the reduced d-GEV or rd-GEV for short. The parameters of the d-GEV distribution are estimated by maximizing the likelihood as implemented in the R-package IDF [22]. Equation (3) is used to get intensities for arbitrary durations (d ≥ 1).

Using a Max-Stable Process
Max-stable processes are extensions to infinite dimensions of finite-dimensional extreme value theory models (i.e., extremes of random variables or vectors). They arise as "the pointwise maxima taken over an infinite number of (appropriately rescaled) stochastic processes" [23]. Let {X(x) : x ∈ χ} be a stochastic process, where χ is a compact subset of R d , d ≥ 1, and {Z(x) : x ∈ χ} be a max-stable stochastic process. Following de Haan [24], if there exist continuous functions a n (x) > 0 and b n (x) ∈ R, and provided that the limit is non-degenerate, the process Z(x) can be defined as The max-stable process Z(·) describes the limiting process of maxima from the X i IID random fields [25]. The use of this max-stable process for modeling spatial extremes is justified when, based on n independent replicates, and, if n is large enough, we assume that Z(x) is a good candidate for modeling the partial maxima process {max i=1,...,n X i (x) : x ∈ χ} [26].
One of the main advantages of using a max-stable process is that it provides a flexible way of modeling the dependence structure between the X i IID random fields. If we assume that χ ⊂ R 2 represents a geographical catchment, we can think that for multiple points (x ∈ χ), the marginal distributions are jointly modeled via the max-stable process, resulting in continuous functions of the GEV parameters µ(x), σ(x), ξ(x) for each margin.
In order to implement a max-stable model for estimating IDF curves, we followed the framework proposed by Tyralis and Langousis [19]. This approach (MS-GEV) employs the Brown-Resnick process, a frequently used parametric family of max-stable processes for modeling environmental extremes [27][28][29]. Previous studies have shown the applicability of the Brown-Resnick process for extreme rainfall applications [17,[30][31][32]. A central proposition of our current approach is to define a continuous variableī(d) in a one-dimensional space, where each "location" is one of the durations d. This in contrast to other applications of max-stable processes, where the variable of interest is commonly defined in a two-dimensional (e.g., latitude and longitude) space.
For any max-stable process, the X(s i ) marginals are generalized extreme value (GEV) distributed, with distribution function: where µ, σ, ξ are the location, scale and shape parameters, and x + = max(0, x). Following de Haan [24] (with the adaptation for d > 0), when the limiting process {ī(d) : d > 0} is non-degenerate, a simple max-stable process can be constructed by its so-called spectral characterization, which is a representation of the max-stable process in the frequency domain [23]. In the spectral characterization, the max-stable process is given by choice of the stochastic process (X i (d) in Equation (A1)). Such max-stable processes are called simple as the marginsz(d) are unit Fréchet distributed (i.e., µ = σ = ξ = 1). The use of unit Fréchet marginals is standard, as the max-stable process theory is based on the assumption that the marginals have a common, convenient max-stable distribution.
There is no loss of generality in assuming that the limiting process {ī(d) : d > 0} has unit Fréchet margins, as it is straightforward to transform such margins into arbitrarily GEV-distributed ones and vice versa. For this study, we used the bivariate form of the Brown-Resnick process given by Kabluchko et al. [33] (see Appendix A) as the stochastic process X i (d). In this form, the dependence is a function of the semivariogram γ, defined as where α and ρ are, respectively, the smooth and range parameters of the semivariogram, and h represents a measure of the distance between two durations. Tyralis and Langousis [19] calculated this distance as the euclidean distance where the indices i and j denote different durations d i in hours, and j = i. However, this measure does not account for the non-linearity of the distance between durations for events of increasing magnitude. For example, consider that an event of 4 h compared to one of 2 h (h e = 2) is already twice as long, while an event of 50 h compared to one of 48 h (h e = 2) is only 1.04 times the second one.
To address this issue, we explored the use of a distance measure based on a logarithm following Van de Vyver and Van den Bergh [34]. This distance is defined as where i and j denote different durations d i in hours, and j > i. We compare the resulting pairwise extremal coefficients from both distance measures to discern which one results in a more appropiate fit for the semivariogram of Equation (8).
The bivariate form of the Brown-Resnick max-stable process described in Equation (A2) is valid only for unit Fréchet marginals. Therefore, the series of yearly rainfall intensityī(d) requires an appropriate transformation (see [19] for details) to be unit Fréchet distributed, using the relationship To linkī(d) toz(d) for all durations, we used the response surfaces for the GEV parameters [19]: These response surfaces follow the constraints given by Equations (18)- (20), and describe a function Ψ(d) for the parameters of the marginals ofī(d). Equations (12) and (13) are equivalent to Equations (4) and (5) in the rd-GEV model. The response surfaces allow to use all durations simultaneously when estimating the max-stable process parameters.
Taking the response surfaces into account, the parameters that we need to estimate for calculating IDF curves using the Brown-Resnick model are six: [ρ, α, µ 0 , σ 0 , ξ 0 , c]. This is accomplished via the maximum likelihood estimate of the pairwise likelihood given in Equation (A3). The estimation of the parameters is done using the R package SpatialExtremes [35].
Of particular usefulness for our study, is a measure to summarize the strength of the asymptotical dependence modeled by the Brown-Resnick max-stable process. A well-known measure is the extremal coefficient θ, which for the bivariate case, can take values of 1 ≤ θ ≤ 2. The value of θ decreases when the dependence between the two margins increases. When θ = 1 the two margins are completely dependent, and when θ = 2 they are independent.
For the dependence structure of the Brown-Resnick max-stable process described by Equation (8), the extremal coefficient is a function only of the distance between durations: θ = θ(h). Given the semivariogram γ for a Brown-Resnick max-stable process, with Φ denoting the standard normal distribution function, the extremal coefficient is given by For comparison purposes, we also calculate the extremal coefficient nonparametrically (θ emp ). For our study, we used the method proposed by Marcon et al. [36], which was found by Vettori et al. [37] to generally perform better than other nonparametric estimators.
To summarize: To estimate IDF curves with a max-stable process, we (i) transform our block-maxima data i(d) into unit Fréchet using Equation (11) with the response surfaces given by Equations (12)-(13); then (ii) estimate the parameters via the maximum likelihood estimates of the pairwise likelihood given by Equation (A3); and, finally, (iii) calculate the intensity for any arbitrary duration d and return period T from Equation (3). We perform all the computations within the R language [38]. The data and code is available as supplementary information.

Verification and Model Comparison
As a performance measure, we use the quantile score (QS) [39]. This allows us to evaluate predictions of z d,T estimated from IDF-curves in terms of quantiles (i.e., return periods). For D durations, N years, and a given return period T the QS is defined as where i n (d) is the observed block maxima, z d,T is the corresponding intensity from the model (using Equation (3)), and ρ T (u) is the so-called check function thus, for this particular application, The QS is always positive and reaches an optimal value at zero. To compare the performance of a model with a reference, we use the Quantile Skill Index (QSI) [9] derived from the Quantile Skill Score [40] for more details on skill scores), defined as For our study, QS model is the score for the MS-GEV approach, and QS reference is the score for the rd-GEV approach. Positive (negative) values of the QSI indicate a gain (loss) of skill for the MS-GEV approach compared to the rd-GEV one. The advantage of using the QSI over QSS is that negative values have a more meaningful interpretation.
To get a robust estimation of the prediction error for the QSI, we applied 10-fold cross-validation to estimate the QS [41]. The QSI is then obtained using the mean cross-validated QS, averaged over all cross-validation folds, for each model. We used return periods T = (5, 10, 20, 40, 100) years. However, the results from the 100 year return period should be interpreted with caution, as the data used for parameter estimation consists of much shorter series of approximately 40 years. The QSI has to be interpreted with care for return periods much larger than the length of the time series (e.g., T > 40 years). The lack of observations for this region could result in a really high uncertainty of the value of the QS, and, therefore, of the QSI.

Data
Two different datasets are used for our study. The first one is a synthetic dataset generated for the simulation study. The second one is the block maxima from six rain gauge stations located in the Wupper Catchment (West Germany). We describe both datasets in the following section.

Synthetic Data
We generate synthetic datasets with varying levels of dependence to investigate the models' performance in estimating IDF curves. We designed three synthetic datasets that simulate rainfall block maxima aggregated over different durations with increasing levels of dependence. For each dataset, we simulate values from a Brown-Resnick simple max-stable process with known dependence parameters using the R-package SpatialExtremes. For the marginal d-GEV distribution, we used a set of parameters characteristic of those d-GEV distributions fitted from the stations in the observational dataset. Then, to fulfill the constraints of annual rainfall maxima averaged over durations d i , i = 1, ..., k, we transformed the initial simulated data from having unit-Fréchet margins to GEV margins that follow the constraints given by [5,20] This transformation uses the response surface described in Equations (12) and (13). We simulate 40 values for each dataset (representing 40 years) for d = (1, 3, ..., 119, 120) h. Following Zheng et al. [25], three sets of dependence parameters were used: (ρ = 1, α = 1) for weak dependence, (ρ = 0.5, α = 0.5) for moderate dependence, and (ρ = 0.5, α = 0.2) for strong dependence. For each parameter set, we generate 1000 realizations. An issue we encountered is that the nature of the simulated data allowed for rainfall intensity series that did not strictly follow the constrains given by Equations (18)-(20), however, we considered this to happen at a frequency that would not affect the final result of the study.

Observations
We used six rain gauge stations from the Wupper Catchment in Germany (Figure 1). All the stations have hourly values of accumulated precipitation height for the period 1979-2016. The stations are all within an elevation range of 250 m, and horizontally the shortest and longest distance between each station is 7 Km and 34 Km, respectively. We chose this dataset as it has a large number of years with high-frequency (hourly) measurements. Furthermore, the stations range from the Bergisches Land to the Upper Rhine Plain and therefore represent very well the different altitudes of the catchment. Figure 1 shows the distribution of rainfall maxima for this period. We obtain the annual block maxima i(d j ) of the averaged rainfall intensity ζ d (t) over the time window d j for each station using Equation (1). For estimation purposes, we used durations d = (1, 3, ..., 119, 120) h. The decision for the cut-off value of 120 h was based on previous studies on IDF curve estimation [18,19]. By visual inspection of the corresponding Quantile Quantile (QQ)-plot, we ensure good agreement of the resulting i(d j ) block maxima for all 6 stations with the GEV distribution. A small subset of the QQ-plots can be seen in Figure A2.

Results
We present the results for the case study in the Wupper region of Germany first, followed by the results of the simulation study. Dependence   Figures 2 and 3 show a comparison of the pairwise extremal coefficient θ derived from the parameters of the MS-GEV approach (Equation (14), red line) and from a nonparametric estimate (dots) to assess how well the MS-GEV approach captures the observed asymptotic dependence. We used different distance measures h for each plot. Figure 2 uses the euclidean distance h e (Equation (9)), and Figure 3 uses the log-distance h l (Equation (10)). Additionally, the different colors show the lower distance d i used for each duration pair (d i , d j ), where i < j, and (i > 0, j > 0). For the euclidean distance h e (Figure 2), the values of θ BR are close to the binned means ofθ emp (black circles) with respect to the scattering of the individual empirical estimates (color circles) for all stations. Nevertheless, the empirical point clouds show a remarkably high variability of θ around the mean value. To name one example, in station Buchenhofen, the distance of h e = 25 h has a range of very different values of θ, spanning from 1.2 to 1.8. Thus, using a model that approximates the empirical mean-binned values in this case does not appear to be a meaningful representation of the overall variability of the dependence between durations.

Structure of the Extremal
Furthermore, each set of duration pairs with a fixed lower duration d i in Figure 2 (represented by different colors) seems to follow a different path as the distance h e grows. In particular, for duration pairs with a short lower duration (d i ≤ 10 h), the value of θ grows much faster than duration pairs with longer lower durations (d i > 10 h). This suggests that several different regimes of dependence coexist, which are not only a function of the distance h e , but also of the magnitude of the durations used. This is a transgression of the assumption that the extremal coefficient, and by extension, the semivariogram model of Equation (8) is isotropic (i.e., γ should only be a function of h). It is thus not evident that using a dependence model for the MS-GEV approach based on the euclidean distance h e adequately captures the asymptotical dependence between durations seen on the data used for this study.
The resulting extremal coefficient using the log-distance h l is shown in Figure 3. The point cloud shows a remarkably lower variability around the binned means than those of Figure 2. Furthermore, the empirical values ofθ emp appear to follow the same regime, suggesting that θ is isotropic when using the log-distance. The shape of the point cloud seems to be appropriately captured by θ BR for duration ratios d j /d i 6 (i.e., when the upper duration d j is around six times the lower duration d i ). However, the parametric model deviates from the empirical estimates as h l increases. With the exception of station Leverkusen, the parametric model consistently overestimates the strength of the asymptotical dependence for the pairs with a duration ratio d j /d i > 6. In light of the above results shown by Figures 2 and 3, we decided to use the log-distance h l when estimating the semivariogram of the MS-GEV approach in all of the following calculations. Table 1 shows the parameter estimates for the MS-GEV approach for all Wupper catchment stations, estimated from durations d = (1, 3, ..., 119, 120) h. The estimates for the range parameter ρ are reasonably consistent across all stations. Their value of ∼ 2 suggests that, for all stations, the rainfall intensities of different durations become asymptotically independent when their ratio d j /d i is larger than (2 2 = 4).  Figure 4 shows the IDF curves following from Equation (3) for the MS-GEV approach (solid lines) and compares them to the IDF curves based on the rd-GEV approach (dashed lines). Return levels from the MS-GEV are for the most part consistently higher, which is in agreement with Tyralis and Langousis [19]. To compare the results of the IDF curves using the euclidean distance h e instead of the log-distance h l , a plot comparing the resulting 100-year return level of both distances is shown in Appendix A. Figure 5 shows the cross-validated QSI evaluating the performance of the MS-GEV approach compared to the rd-GEV one. Similar behavior can be seen for all stations. For short return periods, the QSI is close to zero (denoting that both models are equally good), increasing towards longer periods. Station Lindscheid profits most from the MS-GEV with a 20% increase in skill for the 100-year return level. Only for a few points is skill negative, mostly for the shorter return periods.

Performance for Individual Durations
For a detailed comparison, we show the QSI conditioned on duration in Figure 6. The QSI varies appreciably over different durations for a given return period. For short durations (d < 10 h), the QSI is mostly positive for all return periods; for long durations d > 100 h, it is mostly negative. Gauge Lindscheid exhibits positive skill for many more combinations of durations (d < 50 h) and return periods T > 40 years; station Bever is the only station not showing a positive skill for very short durations, showing a general loss of skill for all but the shortest durations.

Simulation Study
We studied the effect of the level of dependence on the performance of the MS-GEV approach. To this end, we used synthetic data with known dependence parameters (ρ and α in Equation (8)) and estimate the performance for various levels of dependence. Figure 7 shows the cross-validated QSI obtained using Equation (17), using the averaged QS values over all d = (1, 2, ..., 120) h durations for 1000 replications as box-whisker plots. Similar to the case study, the distance h used for the semivariogram of the Brown-Resnick process was the log-distance h l (Equation (10)).
The variation of skill among the replicates increases with increasing return period. The median is consistently positive (MS-GEV superior to rd-GEV) but below 0.05, suggesting that the dependence does not impact strongly on the performance results. The strong level of dependence leads to a slightly higher median skill but also to a considerable larger variance. Figure 8 shows the QSI calculated with the model and reference QS averaged over all replicates for individual durations. The results are similar to the pooled QSI over all durations of Figure 7: The QSI increases with return period and dependence, staying below QSI = 0.1 for a 100 year return period and the strongly dependent series. This again suggests that the level of dependence has little impact on the performance of the return levels from the IDF curves.

Discussion
In this study, we obtained a measure of relative performance for return level estimates of IDF curves for various durations involving a max-stable process that allows for asymptotic dependence between durations (MS-IDF), compared to a model that assumes independence (rd-GEV). To do so, we built upon the previous study of Tyralis and Langousis [19], who focused on the theoretical basis of using max-stable processes for modeling IDF-curves and did not investigate the consequences in terms of model performance.
To investigate the possible impact of the asymptotical dependence, we evaluated and compared the performance of estimating IDF curves with two different approaches: i) using a max-stable process to describe the dependence between rainfall intensity maxima for different durations and ii) assuming independent maxima. We evaluated individual performance based on a score that allows us to focus on the tail of the distribution, namely the quantile score (QS). The comparison between the MS-GEV and rd-GEV approach was carried out with the quantile score index (QSI), an index based on the QS skill score. The QSI enables us to quantify a gain/loss in performance when accounting for asymptotic dependence in return level estimation.
The results of Figure 2 showed that the resulting pairwise extremal coefficient was non-isotropic when using the euclidean distance h e as the measure h in the semivariogram of the Brown-Resnick process for the MS-GEV approach. Thus, in contrast to the approach of Tyralis and Langousis [19], we explored the use of a logarithmic distance measure instead of an euclidean one for the semivariogram of the Brown-Resnick process. Figure 3 shows that this was a reasonable choice, with the resulting parametric extremal coefficient properly capturing the variability of the empirical extremal coefficient around its binned means.
A simulation study suggests a minor advantage of the MS-GEV approach over the rd-GEV, particularly for long return periods (large quantiles). This advantage increases with the strength of the dependence, but remains low (QSI ≤ 0.1) even for the strongest level of dependence. A complementary case study for six gauges in the Wupper catchment (Germany) corroborates a general advantage for long return periods when averaging the performance measure over all durations. A detailed investigation of performance conditioned on durations shows, for our case study, that this advantage results mostly from short (d 10 h) and, in some cases, from intermediate (20 h d 50 h) durations, depending, however, on the specific station.
The presented findings support the idea of Tyralis and Langousis [19] that max-stable processes are valuable models for IDF curve estimation. The simulation and case study results indicate that an increase in skill with the MS-GEV approach is mostly found for large quantiles and short to medium durations. This effect might be related to the fact that shorter durations have a larger number of pairs than the longer durations for the pairwise likelihood. This means that the longer durations could be underrepresented in the current likelihood expression, leading to a better fit of the model for the shorter durations for the MS-GEV approach. However, extreme events of short durations have usually a higher impact on society than those of long durations. Therefore, we believe that this underrepresentation does not necessarily detract value from using the MS-GEV approach. When focusing on large quantiles and short durations, it might be worth taking the added computational expense and model complexity in exchange for increased skill in the return levels for such events.
In addition, the simulation study demonstrated that the level of dependence had a modest impact on the overall performance and variability of the MS-GEV approach. These findings contradict those of Zheng et al. [25], who found that the dependence strength did not influence the performance of the return level estimates. However, our study focused on a different dependence structure than that of Zheng et al. [25], who used a spatial approach, in contrast to our duration space. This contradiction may be associated with the particular form of the asymptotic dependence for the different durations of i(d), which stems from the nature of i(d) as random ordered variables. As previously studied by [20,42], when two random variables are intrinsically ordered, each margin's distribution is affected, something that has to be taken into account. While the response surfaces described in Equations (12) and (13) take this ordering into account, the dependence structure given in Equation (8) does not, a factor that could explain the contradiction with previous studies.
The parametric estimate of the extremal coefficient θ BR from the MS-GEV approach shown in Figure 3 appears to be a reasonable fit for the empirical values of θ when the ratio of the duration pair is around (d i /d j 6). However, as the upper duration gets around six times larger than the lower duration, the model consistently overestimates the strength of the dependence. Figure 3 also showed that duration pairs (d i , d j ) involving short lower durations d i ≤ 10 h had an extremal coefficient that approached independence (i.e., θ = 2) faster than those with longer lower durations as h l increased. This suggests that the dependence between rainfall maxima of different aggregation durations is short-ranged, in particular, for the shorter durations.
For an operational use of this approach, uncertainty estimates for the quantiles (IDF curves) need to be incorporated. In this regard, Ganguli and Coulibaly [12], Mélèse et al. [43] showed that a Bayesian Hierarchical Model approach resulted in reliable credibility intervals for IDF curves. For the rd-GEV approach, we investigate a bootstrap-based method for estimating the uncertainty of d-GEV based return levels in a different study [9]. We also limited our study to using data with hourly frequency, resulting in the value of the (sub-hourly) ν parameter of the d-GEV to be artificially set to zero (what we called the rd-GEV). Further studies would benefit from using sub-hourly frequencies, allowing ν to vary freely. Moreover, we assumed that the stations' data was stationary, ignoring the possible effects of climate change. Several studies have shown that accounting for nonstationarity has a measurable effect on the return levels estimates [11][12][13].
The method presented in this study is a straightforward and practical manner of estimating IDF return level performance based on a max-stable process. The results for the case study encourage to investigate its performance within a larger geographical setting. Furthermore, it seems worth implementing more flexible functions describing the variability of GEV parameters with duration as used for the response surface, e.g., an additional parameter accounting for different behavior, particularly for short durations, as suggested in Koutsoyiannis et al. [5]. Another critical issue for future studies is to explore how different dependence structures could impact the performance of the estimation. Furthermore, a comparison with the recent developments in the use of covariates for the rd-GEV approach could result in better skill for such an approach compared with our current MS-GEV one [9].

Conclusions
Our findings indicate that the use of models that allow for the asymptotic dependence between rainfall maxima of different durations when estimating IDF curves can lead to moderately better return level estimates, particularly for long return periods (100 years, generally of considerable interest) and short durations (d ≤ 10 h). However, the former comes at the expense of the added complexity of modeling the asymptotic dependence. Furthermore, this asymptotical dependence seems to be short-ranged for the short durations. We, therefore, recommend the use of the simpler univariate-EVT methods assuming independence between durations for a single station when the main goal is obtaining return levels for a wide range of short and long durations from IDF-curves. wherez follows a unit Fréchet distribution, Φ denotes the standard normal distribution function, h is a measure of the "distance" between duration pairs (d i , d j ) (given in this study by Equation 10), and the semivariogram γ is defined in Equation (8).
For inference purposes, we applied the commonly used pairwise likelihood proposed by Padoan et al. [45], given by where ψ = [µ, σ, ξ, ρ, α]represents the parameters to estimate, and each term f (i k (d j ), i k (d j )|ψ) is the (appropriately transformed) bivariate density function derived from Equation (A2) for observed maximaī(d) at durations d j and d j . Note that the first three parameters in ψ are the univariate parameters of the GEV distribution, unique for each duration, while the last two parameters of ψ are the parameters of the Brown-Resnick process, which model the asymptotic dependence. Figure A1 shows a comparison of the resulting 100-year return level intensity resulting from the MS-GEV approach using the euclidean distance h e and the log-distance h l . To facilitate the comparison, it shows the ratio q h l(0.99)/q h e(0.99), where q(0.99) is the quantile corresponding to the probability of 0.99, that is, the corresponding 100-year return level. Figure A1. Comparison of the 100-year return level intensity for the MS-GEV approach using the euclidean distance h e and the log-distance h l . Surprisingly, although the variability of θ around its mean is remarkably different when using h l instead of h e , the resulting 100-year return levels are very similar for both distances. The single exception was for station Leverkusen, which is the only station of the catchment located in the Upper Rhine Plain; the change observed for this stations was however still relatively small. However, this result is only accounting for the point estimates of the return levels. As seen in Figures 2 and 3, the variability of the extremal dependence is much higher when using the euclidean distance h e than the log-distance h l . Thus, we believe that the resulting uncertainty for the MS-GEV approach should be lower when using h l instead of h e . Nevertheless, as mentioned in the limitations of our study, we did not perform any estimation of the uncertainty. Figure A2 shows the QQ-plots for validation of the marginal fits of the GEV distribution for four intensity maxima seriesī(d), with d = (1, 3, 48, 72) h, for three stations (Bever, Leverkusen, and Neumuehle.). The closer that the points are aligned to the identity line, the better the fit of the GEV distribution. Figure A2. QQ plots for model checking of the marginal distributions for three stations: Bever (top row), Leverkusen (middle row), and Neumuehle (bottom row). The duration used for the accumulation of the rainfall maxima is indicated in each plot.