Baseline Methods for Bayesian Inference in Gumbel Distribution

Usual estimation methods for the parameters of extreme value distributions only employ a small part of the observation values. When block maxima values are considered, many data are discarded, and therefore a lot of information is wasted. We develop a model to seize the whole data available in an extreme value framework. The key is to take advantage of the existing relation between the baseline parameters and the parameters of the block maxima distribution. We propose two methods to perform Bayesian estimation. Baseline distribution method (BDM) consists in computing estimations for the baseline parameters with all the data, and then making a transformation to compute estimations for the block maxima parameters. Improved baseline method (IBDM) is a refinement of the initial idea, with the aim of assigning more importance to the block maxima data than to the baseline values, performed by applying BDM to develop an improved prior distribution. We compare empirically these new methods with the Standard Bayesian analysis with non-informative prior, considering three baseline distributions that lead to a Gumbel extreme distribution, namely Gumbel, Exponential and Normal, by a broad simulation study.


Introduction
Extreme value theory (EVT) is a widely used statistical tool for modeling and forecasting the distributions which arise when we study events that are more extreme than any previously observed. Examples of these situations are natural rare events in climatology or hydrology, such as floods, earthquakes, climate changes, etc. Therefore, EVT is employed in several scientific fields, to model and predict extreme events of temperature [1][2][3], precipitation [4][5][6][7][8][9][10] and solar climatology [11][12][13], as well as in the engineering industry to study important malfunctions (e.g., [14], finance (study of financial crisis), insurance (for very large claims due to catastrophic events) and environmental science (concentration of pollution in the air).
Overall fitting method tries to fit all the historical data to several theoretical distributions and then choose the best one, according to certain criteria. However, since the number of extreme observations is usually scarce, overall fitting works well in the central distribution area, but it can poorly represent the tail area.

Domains of Attraction of Gumbel Distribution
As is well known, the BM approach consists on dividing the observation period into non overlapping periods of equal size and select the maximum observation in each period. Given a sequence of i.i.d. random variables Y 1 , Y 2 , . . . , Y m with common distribution function F, and given a fixed k ∈ N (block size), we define the block maxima Hence, the total observations, m = k × n, are divided into n blocks of size k. The extreme values depend upon the full sample space from which they have been drawn through its shape and size. Therefore, extremes variate according to the initial distribution and sample size ( [26]). Then, the cumulative distribution function of X i is This result depends on our knowledge of F, in which we could be lacking. Therefore, it is useful to consider the asymptotic distribution.
According to the Gnedenko [27] and Fisher and Tippett [28] theorems, the asymptotic distribution of block maxima of random i.i.d. variables can be approximated by a generalized extreme value distribution, with distribution function with ξ, µ ∈ R, σ > 0 and 1 + ξ When ξ = 0, the right-hand side of Equation (3) is interpreted as and it is called Gumbel distribution with parameters µ (location) and σ > 0 (scale).

Definition 1.
We say that the distribution function F is in the domain of attraction of a extreme value Gumbel distribution when there exist sequences Sequences {a k } and {b k } are called normalizing constants. We usually call the distribution F baseline or underlying distribution. Normalizing constants correspond to the parameters of scale and location of the limit Gumbel distribution, therefore they allow us to establish a relation between this distribution and the baseline distribution.
Moreover, Ferreira and de Haan [29] showed theoretical results which allow determining the normalizing constants for many baseline distributions in the domain of attraction of a Gumbel distribution: Theorem 1. When F belongs to the domain of attraction of a Gumbel distribution, there is a positive function h that verifies To determine function h, the following condition is very useful.
Theorem 2 (Von-Mises condition). When F (x) and F (x) exist, and F is positive for all x belonging to a neighborhood at the left of x * (right endpoint of F), and or equivalently then F belongs to the domain of attraction of the Gumbel distribution. In this case, function h is determined by Distributions whose tails decrease exponentially produce a Gumbel distribution when taking the block maxima. Besides the Exponential distribution, the class of distributions which belong to the domain of attraction of the Gumbel includes the Normal distribution, and many others, such as Log-normal, Gamma, Rayleigh, Gumbel, etc. We also use the following result: Proposition 1. If X is a random variable belonging to the domain of attraction of a Gumbel distribution, then Y = µ + σX also belongs to the same domain of attraction. The normalization constants are: where a k and b k are the normalization constants of X.

Gumbel Baseline Distribution
If Y ∼ G (µ, σ), then block maxima distribution of size k, denoted by X, is also a Gumbel distribution, because therefore X ∼ G (µ + σ ln k, σ).

Exponential Baseline Distribution
Exponential distribution belongs to the domain of attraction of the Gumbel, with h (t) = λ −1 . As F −1 (u) = λ −1 ln (1 − u), the normalization constants are and they settle a relation that allow us to make estimations for Gumbel limit distribution, when there is an exponential baseline distribution for k big enough.

Normal Baseline Distribution
When the baseline distribution is a Standard Normal distribution, normalizing constants can be computed, making use of asymptotic limit and results showed before.
Let Z ∼ N (0, 1), with distribution function F and density function f . It is easy to show that F verifies von Mises condition (8): using L'Hôpital and noticing that f (t) = −t · f (t). Therefore, 1 − F(t) ≈ f (t) · t −1 , and, consequently, the function h defined as (9) verifies lim t→x * h(t) = t −1 Besides, by (6), Defining the function g(b k ) = b 2 k + ln(2π) + 2 ln b k − 2 ln k, and developing its Taylor series around (2 ln k) 1/2 , we obtain In addition, as a k can be taken as Besides, as a consequence of Theorem 10, if Y ∼ N (µ, σ), for k big enough, the normalization constants are, approximately,

Other Baseline Distributions
This way of working can be extended to other baseline distributions, whose block maxima limit is also a Gumbel, by using existing relations between baseline and limit parameters. In Table 1, normalization constants computed for the most employed distribution functions in the domain of attraction of the Gumbel distribution are shown. Constants a N and b N are the normalization constants for Standard Normal distribution, given by (15) and (17), respectively.

Classical Bayesian Estimation for the Gumbel Distribution
To make statistical inferences based on the Bayesian framework, after assuming a prior density for the parameters, π(θ), and combining this distribution with the information brought by the data which are quantified by the likelihood function, L(θ|x), the posterior density function of the parameters can be determined as The remaining of the inference process is fulfilled based on the obtained posterior distribution. The likelihood function for θ = (µ, σ), given the random sample x = (x 1 , ..., x n ) from a Gumbel(µ, σ) distribution, with density function given by with In the case of the Gumbel distribution, Rostami and Adam [21] selected eighteen pairs of priors based on the parameters' characteristics, assumed independence, and compared the posterior estimations by applying Metropolis-Hastings (MH) algorithm, concluding that the combination of Gumbel and Rayleigh is the most productive pair of priors for this model. For fixed initial hyper-parameters µ 0 , σ 0 , λ 0 The posterior distribution is and conditional posterior distributions are Then, an MCMC method is applied through the MH algorithm.
Iterate the former procedure. Notice that Therefore, we obtain a Markov chain that converges to the posterior distributions for the parameters µ and σ. We call this method Classical Metropolis-Hastings method (MHM).

Baseline Distribution Method
In Baseline distribution method (BDM), we take all the information available to determine posterior baseline distribution. We denote B(θ) as the baseline distribution with parameter vector θ.
Then, we can apply Bayesian inference procedures to estimate the posterior distribution of the baseline distribution, denoted by π b (θ|y) and, therefore, to obtain estimations for the parameters of the baseline distribution θ, with all the data provided by y.
Afterwards, making the transformation given by the relations we obtained in previous section, we can obtain new estimations for the parameters of block maxima distribution, which is the Gumbel in this case. We explain the procedure for the three baseline distributions considered in this paper: Gumbel, Exponential and Normal distribution.

Gumbel Baseline Distribution
. Therefore, MH algorithm can be applied to the whole dataset, y, to find estimations for µ b and σ b . Afterwards, we make the adequate transformation to compute estimations for the parameters of X.

Exponential Baseline Distribution
When the baseline distribution Y ∼ Exp (λ b ), we consider a Gamma distribution with parameters α 0 and β 0 as prior distribution Therefore, the posterior distribution is thus Gibbs algorithm can be employed to generate samples of posterior distribution π (λ b |y). Once the estimation of λ b is obtained, kth power of the distribution function will be the estimation for block maxima distribution function of size k.

Normal Baseline Distribution
Finally, when the baseline distribution Y ∼ N (µ b , σ b ), we employ Normal and inverse Gamma prior distributions Therefore, posterior distributions are and we can employ Gibbs algorithm to generate samples of posterior distribution, and, afterwards, the kth power of the distribution function, as in the previous case.

Improved Baseline Distribution Method
Finally, we propose a new method, called Improved Baseline distribution method (IBDM), to import the highly informative baseline parameters into the Bayesian inference procedure. Here, we take into account the spirit of classical EVT, which grants more weight to block maxima data than to baseline data.
The method consists on applying BDM to obtain the posterior distribution for the parameters of the baseline distribution π(θ|y), and then uses it to build a prior distribution for the parameters of the Gumbel. Therefore, we have a highly informative prior distribution.
As the priors are highly informative, π(θ * ) = π(θ (j) ), the ratio in the jth step of MH algorithm is For every iteration of the algorithm, we first make an iteration of Baseline Distribution method, resulting θ b as estimation of the posterior distribution π(θ b |y). Afterwards, a candidate θ * is generated using a Normal distribution N ( f (θ b ), ν θ ) with the adequate transformation f (θ b ), given by Equations (11), (13) and (18) in the case of Gumbel, Exponential or Normal baseline distributions, respectively.
Obviously, this method is quite similar to BDM when block size is big and, consequently, there are few maxima data. It approaches the classical Bayesian method as the block size gets smaller (more maxima data).

Simulation Study
We made a simulation study for the three distributions analyzed above, which belong to the domain of attraction of the Gumbel distribution: Gumbel, Exponential and Normal.
For each distribution selected (once its parameters are fixed), we generated m ij = n i × k j values, where • n i is the number of block maxima, n i = 2 i , i = 1, 2 . . . , 7; and • k j is the block size, k j = 10 j , j = 1, 2, 3.
Therefore, the sample sizes vary from 20 to 128,000. Besides, each sequence is replicated 100 times. Consequently, 2100 sequences of random values were generated for each combination of parameters of each baseline distribution.
To guarantee the convergence of the MCMC algorithm, we must be sure that the posterior distribution has been reached. Some proceedings are advisable to be performed.

•
Burn-in period: Eliminate the first generated values.

•
Take different initial values and select them for each sample.

•
Make a thinning to assure lack of autocorrelation.
These proceedings were made using library coda [30] for R software, taking 3000 values for the burn-in period, 50 values for the thinning and selecting initial values for each sample. Finally, to get the posterior distribution for each parameter, a Markov chain of length 10,000 was obtained. Therefore, 53,000 iterations were made for each sequence.
There are some common features for the baseline distributions considered when comparing the three methods MHM, BDM and IBDM.

1.
To choose an estimator for the parameters, we compared mean-and median-based estimations. They were reasonably similar, due to the high symmetry of posterior distributions. Therefore, we chose the mean of the posterior distribution to make estimations of the parameters. 2.
MHM usually provides high skewed estimations for the posterior distributions. BDM is the method that shows less skewness. 3.
BDM is the method that offers estimations for posterior distribution with less variability. IBDM provides higher variability, but we must keep in mind that this method stresses the importance of extreme values, therefore more variability is expectable than the one provided by BDM. The method with highest variability is MHM.

4.
The election of the most suitable method also depends on the characteristics of the problem. When block maxima data are very similar to the baseline distribution, BDM provides the best estimations and the lowest measures of error. On the contrary, when extreme data differ from baseline data, IBDM offers the lowest errors. IBDM is the most stable method: regardless of the differences between extreme data and baseline data, it provides reasonably good measures of error.

Gumbel Baseline Distribution
We considered the baseline G(µ b , σ b ) distribution. As the localization parameter has no effect on the variability of data, its value was fixed as µ b = 0 for easiness. Scale parameter does affect variability, so we considered values σ b = 2 j , j = −2, −1, 0, 1, 2.
One important point of the simulation study is to observe how the estimation of the parameters vary for a fixed block size as the number of block maxima n (the amount of information we have) is changing. Regardless of the chosen method, as n increases, variability and skewness decreases. However, for small values of n, BDM and IBDM provide more concentrated and less skewed distributions than the ones offered by MHM. We can appreciate this in Figure 1, where the probability density functions (pdf) are shown for the 100 estimations of the mean for the parameters µ (left) and σ (right) for block maxima Gumbel distributions, with the three methods. The baseline distribution is G (0, 4) and fixed block size k = 1000. Therefore, block maxima distribution is G(27.63, 4) (from (11)). Scales are very different in this charts, due to a better visualization of distributions. For example, for MHM, highest value of the pdf forσ is around 1.5 (for n = 128), but it is over 40 for BDM.
This behavior is shown qualitatively for all the values of the parameters employed in the simulation.  To compute measures of error, in the case of Gumbel baseline distribution, we can employ absolute errors AE i = |θ i − θ|, whereθ i is the estimation obtained from ith sample and θ is the real value of the estimated parameter. We can then define

•
Mean error: • Root mean square error: • Mean absolute error: where M is the number of samples. The three methods provide estimations with low absolute errors AE when the number of maxima n is high, and especially when block size k is high. When both numbers are small, BDM and IBDM get closer estimations and differ from MHM.
In Tables 2 and 3, we show values for ME and RMSE for the estimations of parameters µ and σ, respectively, for some values of k, n and σ b , for a G(0, σ b ) baseline distribution. We can see that BDM is the method that offers lower values for both measures of error, followed by IBDM. The method that provides highest errors is MHM.   Table 3. ME for σ, with RMSE in brackets, for a baseline distribution G (0, σ b ).

Exponential Baseline Distribution
Assume now we have another baseline distribution F, which is not a Gumbel. Notice that, for methods MHM and IBDM, we are approaching block maxima distribution by a Gumbel. However, for BDM, we employ the kth power of the baseline distribution function. When the baseline distribution is a Gumbel, we know that the kth power is also a Gumbel. However, for another baseline distributions, this is not true.
For this reason, we have to define different measures of error to evaluate the quality of estimations. We compared estimated distribution functions (H) with real ones (F k ) through their mean distance (D), mean absolute distance (AD) and root square distance (RSD). As analytical computation is not possible, we made a Monte-Carlo computation employing sample size s = 10 4 . Then, and • Mean error: • Root mean square error: • Mean absolute error: We considered the baseline Exp(λ b ) distribution. In this case, for k big enough, . We took λ b = 2 j , with j = −2, −1, 0, 1, 2. As in the Gumbel baseline distribution, MHM shows high skewness when the number of blocks n is very small, compared to IBDM (see Figure 2). In addition, if we compute measures of error, we can see that, for small block sizes, the three methods offer similar values (see Table 4). For bigger values of k, BDM and IBDM provide better results, and usually BDM is the best method.

Normal Baseline Distribution
Finally, assume the baseline distribution is N (µ b , σ b ). As the mean has no effect on the variability of data, its value was fixed as µ b = 0 for easiness. Standard deviation was taken as σ b = 2 j , j = −2, −1, 0, 1, 2.
The conclusions we obtain are quite similar to the previous case. We illustrate skewness and variability for MHM employing similar graphs (see Figure 3). Variability is especially pronounced when n is small, and we are estimating σ. In addition, errors are shown in Table 5. Table 5. ME for σ, with RMSE in brackets, for a baseline distribution N (0, σ b ).  In practical situations, data might not adjust to a concrete distribution and some perturbations (noise) could appear. To get a quick overview of how differences between baseline distribution data and block maxima data can affect the choice of the best method, we simulated a simple situation, when data come from a mixture of normal variables. Concretely, Y = 0.9 · Z + 0.1 · Y 1 , Z ∼ N (0, 1), Y 1 ∼ N (1, 1.5).
In Figure 4, we can see how MAE vary for the three methods when we vary the number of block maxima n, for a block size k = 100. In this case, IBDM offers the lowest errors, because it stresses the importance of extreme data. When the extreme data are scarce, both new methods, BDM and IBDM, improve MHM meaningfully.

1.
One of the most common problems in EVT is estimating the parameters of the distribution, because the data are usually scarce. In this work, we considered the case when block maxima distribution is a Gumbel, and we developed two bayesian methods, BDM and IBDM, to estimate posterior distribution, making use of all the available data of the baseline distribution, not only the block maxima values. 2.