Advances in Deriving the Exact Distribution of Maximum Annual Daily Precipitation

Maximum annual daily precipitation does not attain asymptotic conditions. Consequently, the results of classical extreme value theory do not apply to this variable. This issue has raised concerns about the frequent use of asymptotic distributions to model the maximum annual daily precipitation and, at the same time, has rekindled interest in deriving and testing its exact (or non-asymptotic) distribution. In this review, we summarize and discuss results to date about the derivation of the exact distribution of maximum annual daily precipitation, with attention on compound/superstatistical distributions.


Introduction
Classical methods for modeling the probability distribution of maximum annual daily precipitation use an asymptotic extreme-value distribution. Fréchet [1] identified one possible asymptotic extreme value distribution, which received the author's name (Fréchet distribution). Fisher and Tippet [2] and, later Gnedenko [3] showed that, asymptotically, there exist only three types of extreme value distributions, referred to by Gumbel [4] as the "extreme value asymptotes". These are EVI (or Gumbel distribution), EVII (or Fréchet distribution), and EVIII (or reversed Weibull distribution). These three asymptotic distributions were combined by von Mises [5] into the Generalized Extreme Value distribution (GEV). Sometimes, the GEV is also referred to as the von-Mises-type extreme value distribution, or the von Mises-Jenkinson-type distribution.
The first studies in which asymptotic extreme value distributions were used to model the occurrence of maximum annual daily precipitation were those by Jenkinson [6] and then Gumbel [4]. The main advantage of asymptotic laws is that they do not require knowledge of the parent distribution (that is, the distribution of the underlying variable). The tendency to reduce the dimensionality of the extreme-value models and to proceed on the basis of simplified models has also supported the use of asymptotic approaches [7]. In this context, the Gumbel (EVI) distribution has been the most popular model for extremes, especially in the analysis of precipitation maxima [8]. However, the Gumbel distribution was found to underestimate the extreme precipitation amounts [8,9]. It has been suggested that the GEV should always be used instead of the Gumbel distribution, unless enough information supporting the latter is available [7].
A (non-exhaustive) list of papers published in the period 2000-2015 that exclusively used the GEV, or its three asymptotic laws, to describe the behavior of maximum annual daily precipitation is reported in [10]. Some studies have also investigated which of these three asymptotic laws is the most appropriate to represent the statistical variability of maximum annual daily precipitation [8,11]. Using a worldwide database, it has been shown that the EVII is the most suitable asymptotic distribution to describe the maximum annual daily precipitation [12].
The applicability of the asymptotic extreme value theory to annual maxima of precipitation, both for daily and for shorter durations, was recently questioned in Veneziano et al. [13]. The motivation is the comparatively slow convergence of the distribution of annual maxima to the asymptotic laws (see also [8]). Consequently, the annual maxima depend on a range of the parent distribution that is well below its upper tail. For the case of maximum annual daily precipitation, for example, the number of precipitation days in any given year is theoretically bounded to 365, and so it is far from asymptotic conditions, which assume this number to be much greater than 365. The number of precipitation days is even smaller than 365, because precipitation is an intermittent process [14][15][16].
Even if results by Veneziano et al. [13] prevent the asymptotic theory from being strictly applicable to the case of maximum annual daily precipitation, the GEV distribution can still be used to fit extreme precipitation amounts similarly to using any other probability distribution. In this case, however, one should bear in mind that the estimation of GEV parameters will be the result of numerical fit and, as such, it will have no direct link with the statistical properties of the parent distribution. In particular, the shape parameter of the GEV will not represent the shape parameter of the parent distribution. The violation of the asymptotic theory also raises doubts about the extrapolation capability of the fitted GEV model to values (quantiles) associated to high values of probability or return period [10,17]. From this standpoint, an increasing number of studies have recently considered a pool of distributions to model maximum precipitation, including several families of asymptotic-and standard-type (see Table S2 of [10]). Results show that the asymptotic laws often do not yield the best fit according to goodness-of-fit tests and/or statistical indicators.
In order to overcome the asymptotic (or ultimate) assumption in extreme variables, penultimate approximations have been developed [18][19][20][21]. For example, Marani and Ignaccolo [22] proposed a penultimate approximation to model the maximum annual daily precipitation, using the Weibull distribution as parent distribution, according to [20,21]. Another promising option would be to analytically derive the exact distribution of annual maxima of daily precipitation, that is, the one that would theoretically be derived from the parent distribution as the distribution of the largest-order statistic (see the next section for details). The word "exact" is used here following the terminology reported in [4], to remark that it is a non-asymptotic distribution.
This option has received little attention so far, as noted by [8]. A first, recurring argument against the exact distribution is that it may not be strictly necessary since a truncated approximation could be enough. A second, and more substantial argument, is that the parent distribution is either not known or difficult to determine in practical cases, as demonstrated by the fact that several distributions have been used in literature to describe the statistical behavior of the non-zero daily precipitation amount. These include both canonical (Exponential, Gamma, Lognormal, generalized Pareto) and more complex ones (mixtures of Exponentials or Lognormals-see [23] for a review). Among these candidate distributions, Gamma has been the most used, especially in weather generators [24].
The argument that the distribution of daily non-zero precipitation is difficult to determine has been recently confuted by increasing evidence, which opened the door to a growing body of literature about the exact distribution. First, Wilson and Toumi [25] provided some physical argumentations in favor of the probability distribution of daily non-zero precipitation being stretched exponential, also known as Weibull distribution. This seminal result is coherent with later findings by [10,17,22,[26][27][28] about both the distribution of the daily amount and the distribution of daily values above a given threshold. Also, Porporato et al. [29] obtained a distribution of daily non-zero precipitation with a stretched exponential tail by compounding an exponential distribution with a parameter having a Gamma distribution. Porporato et al. [29] further motivated the emergence of a Weibull-like tail of daily non-zero precipitation amount based on the interannual variability of the parameter of daily precipitation. More recently, mixture distributions have been used to reproduce daily non-zero precipitation amounts when these amounts are the result of multiple sources/causes [30]. For example, in [31,32] a mixture of a Gamma distribution (for low and moderate values) and a generalized Pareto (for high values) was considered; in [33,34] and then, [25], a mixture of two exponential distributions was considered; a mixture of an exponential distribution for low to moderate values and a generalized Pareto for high values was used in [35]. Mixtures of two distributions (namely, two Gumbel, or a Gamma and a Gumbel) for the representation of maximum annual value of daily amounts were used in [30].
In this review, we summarize recent advances in the field of extreme daily precipitation distribution, with a focus on recent efforts regarding exact distributions derived as compound-superstatistical distributions. We start by providing some general context about the exact distribution as the largest-order statistic in stationary and non-stationary conditions. We then discuss existing analytical solutions for this exact distribution. We conclude with some directions of future research.

Some General Results
The exact distribution of the maximum M = max {X 1 , X 2 , . . . , X n } of n random variables {X 1 , X 2 , . . . , X n }, can be formally derived as the distribution of the largest-order statistic. If {X 1 , X 2 , . . . , X n } are independent and identically distributed (i.i.d.) variables with a common distribution function F(x), also known as a parent distribution, the exact distribution of M is written as In the case of maximum annual daily precipitation, {X 1 , X 2 , . . . , X n } are non-zero daily precipitation variables, n is the counting number of wet days in the year, and M is the maximum annual value (see [4], p. 75, Equation (1) A cornerstone of statistical literature, Equation (1) has, however, been obtained under the hypotheses that n is fixed and that F(x) is a-priori known. These hypotheses are not satisfied in the case of maximum annual daily precipitation. First, the distribution F(x) is a-priori unknown. To solve this problem, statistical techniques are used to estimate F(x) from data and substitute it into Equation (1); some examples of these fitting results for daily precipitation are mentioned in the Introduction above. Second, n is not fixed, but it is in fact a realization of a counting random variable N. In this case, using the theorem of total probability, the exact distribution can be written as where p N (n) is the probability distribution of N, denominated also counting distribution-see [4], p. 78, Equation (9) (2) is the "compound" distribution of the extreme value distribution, [F(x)] n , or compound extreme value distribution, because it is the "composition" of [F(x)] n and p N (n). In the literature, Equation (2) is frequently indicated by first reporting the name of the counting distribution after the term "compound", and then the name of the original distribution-here, the extreme value distribution, [F(x)] n ). For example, if the counting distribution is a Poisson distribution, then the name of the compound will be Compound Poisson extreme value distribution (see also Equation (6) below). Usually, the name "compound distributions" is used when the parameter(s) θ of the distribution is(are) itself(themselves) random variable(s)-indicated with Θ (see [36], Section 3.5.3); [39], p. 52, Equation (1.113); and [40], Section 3.4.4, p. 151). Equation (2) is also, occasionally, referred to as a "contagious" distribution. The notion of compound, or contagious distribution, was originally proposed by Feller [41]. The term contagious is due to the field of first applications, i.e., entomology and bacteriology.
If the parameter(s) θ of F(x) is(are) not constant, but instead, is(are) statistically variable, then F(x) becomes F(x|θ), where the symbol | indicates the conditional distribution given a value of parameter(s) Θ = θ. Equation (2) will thus provide F M ( x|θ), i.e., the conditional distribution of M given Θ = θ. The unconditional distribution of M is depending on whether Θ is discrete (3a) or continuous (3b), p Θ (θ) is the (joint) mass function of Θ, f Θ (θ) is the (joint) density function of Θ, and A Θ is the domain of Θ. Equation (3) (1) and Θ = N, as reported in [40], Example 3.61. Equation (3) can also be written in a compact form as where E[.] is the expected value calculated with respect to the (joint) distribution of Θ [43].
In physics, the statistical variability of parameters has been referred to as superstatistics [44,45], meaning "statistics of statistics" or superposition of different statistics. From the physical point of view, this characterizes phenomena having variabilities at multiple scales (small and large). For example, the distribution of daily precipitation can have a day-by-day variability (small scales), while the parameters of this distribution can have a year-by-year variability (large scales). Recently, Equation (3) has been referred to as "Metastatistical Extreme Value distribution" [17,22,46]. Since a large amount of literature has referred to Equation (3) as "compound/superstatistical extreme value distributions", we adopt this nomenclature in this review.
Under non-stationary conditions, i.e., when parameters vary with time t, Θ(t), Equation (3) can be written as: where the distribution of M depends on time. Equation (5) has recently been used by [47] for analyzing the non-stationary behavior of annual maxima of rainfall at different durations.

Some Analytical Solutions
The practical issue with using Equations (2)-(4), as well as Equation (5), is that analytical solutions of these equations are rarely available. Recently, [17,22,46] tackled this issue via a numerical approach. A downside of numerical approaches is that calculating the inverse distribution and, therefore, quantiles, is not straightforward, which makes analytical solutions particularly appealing.

Analytical Solutions for Equation (2)
A first example of an analytical solution of Equation (2) was derived under the assumption that N is Poisson distributed with distribution p N (n) = ν n e −ν n! , where the parameter ν denotes the mean number of occurrences in a given time period (here, this period is one year). In this case, the exact distribution of M can be written as (6) Equation (6) was originally developed by [48] and then used for the first time in hydrology by [49]. Equation (6) was then reported in many textbooks, such as [40], (p. 440, Equation (7.2.69), [39], p. 51, Equation (1.108), and [38], p. 29, Equation (1.54). According to the terminology reported above, Equation (6) is denominated Compound Poisson Extreme Value distribution.
If the distribution F(x) in Equation (6) is a shifted Exponential, then the distribution of M is a Gumbel (EVI); if the distribution F(x) in Equation (6) is a Pareto, then the distribution of M is a Fréchet (EVII); finally, if the distribution F(x) in Equation (6) is a generalized Pareto, then the distribution of M is a GEV, see [40] (pp. 440-441), [50]. This means that Gumbel, Fréchet, and GEV can be viewed as the exact distributions of Compound Poisson Extreme Value distributions with shifted Exponential, Pareto, and generalized Pareto as parent distributions, respectively.
As a generalization of Equation (6), the product of two compound Poisson Extreme Value distributions in order to characterize ordinary and rare components of extremes (assumed independent from each other) were proposed in [51,52]. F M (x) is then: The two components have both Poisson chronology with parameters ν 1 and ν 2 , respectively. The parent distributions are F 1 (x) and F 2 (x), respectively. Shifted exponential distributions were used for the parent distributions, i.e., . The result is that Equation (7) becomes the following: (8) is the product of two Gumbel distributions and is known in the literature as the Two-Component Extreme Value distribution-TCEV, see [53], or [40], p. 443, Equation (7.2.77). The TCEV was formulated for the analysis of extremes of daily precipitation and daily flow (among other variables) and is widely used in Italy to describe the behavior of maximum annual daily precipitation in regionalization studies realized under the VAPI (Valutazione delle Piene in Italia; Flood Evaluation in Italy) project, see e.g., [54,55].
In order to account for the seasonal variability or the non-identical distribution of the parent process in the distribution of annual maxima of precipitation, Revfeim [56] proposed another generalization of Equation (6), using Equation (6) to model the monthly maximum of daily precipitation, following the work of Todorovic and Rousselle [57]. In that case, the distribution of M becomes where ν i and F i (x) are the Poisson parameter and the parent distribution for the i-th month, respectively. An exponential distribution was used for F i (x) = exp(−x/b i ), while a i = b i ln ν i . Revfeim [56] considered a harmonic variability of both the two parameters (ν i , b i ,) in order to reduce the number of parameters to determine from 2 × 12 to 2 × 2, which is the same number of parameters of TCEV. Equation (9) is known as Multi-Component Extreme Value distribution, also indicated as MCEV.

Analytical Solutions for Equation (3)
The analytical solution of Equation (3) is even more challenging and less investigated for extreme values than that of Equation (2). We mention here the theoretical result obtained by Dubey [58], who considered F M ( x|θ) as a generalized Gumbel distribution with parameters a and b and an extra parameter θ: In Equation (10), Dubey [58] assumed the parameter θ to be a random variable Θ having a Gamma distribution with density function f Θ (θ) = β α Γ(α) θ α−1 exp(−βθ), so the compound distribution of M is Equation (11) is known as generalized Logistic distribution, or type I generalized Logistic distribution, see [59]. This shows how the generalized Logistic distribution can be viewed as an extreme-type distribution, explaining why it has been successfully used in modeling maximum annual daily precipitation, see Table S2 in [10].

Non-Poisson Analytical Solutions
Based on this literature review, we conclude that the exact distribution of annual maxima has been mostly developed around the Compound Poisson Extreme Value distribution (see Equation (6)). However, this distribution has an often-overlooked weak point: even if Equation (6) is an exact distribution, it still asymptotic characteristics because it is the combination of two distributions, a generic distribution F and a Poisson distribution. The latter is the limit case of a Binomial distribution for a number of trials →+∞ and a "small" probability of occurrence for wet days (or of a generic event). Thus, the Poisson distribution is an upper unbounded distribution, which is theoretically not coherent with the upper bounded behavior of the counting variable N under study, i.e., the number of wet days during a given year (≤365). The Poissonian chronology of wet days due to the clusterization of rain events was questioned in [7].
Recently, De Michele and Avanzi [10] derived the analytical solution of Equation (2) In De Michele and Avanzi [10], Equation (12) was derived as a particular case (i.e., zero-order Markov chain) of a first-order Markov chain [10], but proofs of Equation (12) can also be obtained using the Binomial theorem [60]. Similarly to Equation (6), Equation (12) is called Compound Binomial Extreme Value distribution.
A shifted Weibull distribution as parent distribution, F(x) = 1 − exp − x−x T λ β , was used in [10], where λ and β are, respectively, the scale and shape parameter, and x T is a fixed threshold. Thus, Equation (12) becomes the following: De Michele and Avanzi [10] tested the hypothesis that the daily precipitation amount is Weibull distributed over a large dataset (20,561 sites across the world), using threshold values in the range [0,16] mm and showing that an increase in the threshold rapidly increases the percentage of sites where the hypothesis is accepted to 100% [10].
If the parent distribution F(x) in Equation (12) was written as a mixture of two distributions, F 1 (x) and F 2 (x), in order to account for the different behavior of low and high values [31,32], then Equation (12) could be written as F(x) = a 1 F 1 (x) + a 2 F 2 (x), with a 1 and a 2 being the weights of the two components and a 1 + a 2 = 1 [43]. The distribution of M could then be written as Equation (14) can be denominated Two-Component Compound Binomial Extreme Value distribution. Equation (14) has the same inspiring principle of the TCEV but uses a Binomial distribution instead of Poisson for the variable N, which, as mentioned, overcomes the asymptotic characteristics of the latter. The applicability and performance of Equation (14) is not known at this stage and should be the target of further research.

Conclusions and Outlook
Maximum annual daily precipitation is a fundamental hydrologic variable. The determination of its probability distribution is a key issue in the design and verification of many engineering works. This task has mainly been addressed using asymptotic distributions, and more recently, through selection from a broader pool of distributions. Less attention has been paid to the development and use of a more rigorous, exact distribution that does not assume asymptotic conditions. The violation of asymptotic conditions for the maximum annual daily precipitation has rekindled interest in this exact distribution, for which the compound/superstatistical distributions play a key role.
Here, we have reviewed existing results to date regarding the exact distribution of maximum annual daily precipitation. The results of our review show that non-asymptotic, exact results already exist, but several of them somehow retain an asymptotic flavor as they are based on a Poisson-occurrence assumption. Some recent results based on a Binomial-occurrence law overcome this limitation and suggest that the way toward an analytical and non-asymptotic definition of maximum-precipitation probability distributions is open. While asymptotic distributions are appealing to use compared to the exact distribution because they do not require information about the underling process, results by several authors show that their predictions are less robust than the exact one beyond the observed values, simply because the latter include information about all precipitation events and not only about maxima [10,17].
We identify several possible directions of future research at this stage. First, we note that model uncertainty associated to quantiles of the exact distribution has been generally overlooked compared to the broad body of literature about quantile uncertainty for the GEV and thus, should be the target of future research. Relatedly, the extrapolation power of the exact distribution compared to asymptotic distributions beyond observed values should be more systematically related to climatic conditions and event types. Third, the statistical variability of the parameters of the exact distribution should be better understood in the wake of potential future changes to climate. Fourth, more investigations of the parent distribution are still needed, possibly considering mixtures of distributions [30][31][32][33][34][35][61][62][63][64].
Funding: This research was partially funded by FONDAZIONE CARIPLO through FLORIMAP project, grant number 2017-0708.