Baseline Methods for the Parameter Estimation of the Generalized Pareto Distribution

In the parameter estimation of limit extreme value distributions, most employed methods only use some of the available data. Using the peaks-over-threshold method for Generalized Pareto Distribution (GPD), only the observations above a certain threshold are considered; therefore, a big amount of information is wasted. The aim of this work is to make the most of the information provided by the observations in order to improve the accuracy of Bayesian parameter estimation. We present two new Bayesian methods to estimate the parameters of the GPD, taking into account the whole data set from the baseline distribution and the existing relations between the baseline and the limit GPD parameters in order to define highly informative priors. We make a comparison between the Bayesian Metropolis–Hastings algorithm with data over the threshold and the new methods when the baseline distribution is a stable distribution, whose properties assure we can reduce the problem to study standard distributions and also allow us to propose new estimators for the parameters of the tail distribution. Specifically, three cases of stable distributions were considered: Normal, Lévy and Cauchy distributions, as main examples of the different behaviors of the tails of a distribution. Nevertheless, the methods would be applicable to many other baseline distributions through finding relations between baseline and GPD parameters via studies of simulations. To illustrate this situation, we study the application of the methods with real data of air pollution in Badajoz (Spain), whose baseline distribution fits a Gamma, and show that the baseline methods improve estimates compared to the Bayesian Metropolis–Hastings algorithm.

There are two approaches for modeling an extreme value problem. The first one is the block maxima method: dividing the sample space into blocks of equal size, the maxima values of each block follow, under a certain domain of attraction conditions, approximately a Generalized Extreme Value (GEV) distribution [22]. The second way to deal with an extreme value data set attempts to make use of the available information about the upper tail of the distribution than just the block maxima. The so-called peaks-over-threshold (POT) method is due to hydrologists trying to model floods. Loosely speaking, References [23,24] showed that when we consider the distribution of data above a certain threshold u, it can for 0 ≤ x ≤ x F − u, being x F the right endpoint of F, that is, x F := sup{x : F(x) < 1}.
Notice that X u is the random variable that we obtain when we consider the distribution of data above the threshold, which we usually call the tail distribution.
Given a random variable X, we say that it follows a Generalized Pareto Distribution (GPD) when its distribution function is where σ > 0 and ξ ∈ R are the scale and shape parameters, respectively. Equation (1) is valid when x ≥ 0 for ξ ≥ 0, and for 0 ≤ x ≤ −σ/ξ for ξ < 0. The fundamental result that connects EVT and GPD distribution belongs to [23,24], and it establishes that under certain mild conditions, for a random variable X, the distribution of X u for a sufficiently high threshold u follows a properly scaled Generalized Pareto distribution (GPD).
We will call the distribution function of X, F, the baseline distribution of the GPD. The parameters ξ and σ will depend on the value of the threshold u, and on the baseline distribution. For example, ξ is determined by the shape of the upper tail of the baseline distribution F. Positive values of the shape parameter correspond to heavy tails, while negative ones come from light tails. The special case ξ = 0 will appear when the upper tail of the distribution tends to an exponential distribution of parameter 1/σ.
In this work, we will consider several types of baseline distributions for the GPD. Traditionally, estimation for the parameters of the limit GPD in Extreme Value Theory has been made taking into account only values above the threshold, but this information might be scarce. We propose a new strategy consisting in seizing all the data from the baseline distribution. We will show how this strategy can produce accurate estimations for the parameters of the GPD.
In the case when the baseline distribution is an exponential distribution with parameter λ, with distribution function F(x) = 1 − e −λx , x ≥ 0, for every u ≥ 0, Consequently, the tail distribution X u is the same as the underlying distribution in the exponential case. Therefore, we must employ all the available data to estimate the parameter λ = 1/σ in the definition of the GPD (1).
The case when ξ = 0 is different. In this paper, we will consider some of the most employed distributions as underlying distributions: normal distribution, which has light tails (ξ < 0); and the Cauchy and Lévy distributions, which lead to heavy tails (ξ > 0). Those distributions are stable; therefore, they have additional properties that will be helpful to estimate the parameters of the GPD. We will study such properties below.
With respect to the threshold, it can be settled as a known value, which has a physical meaning, depending on the characteristics of the data, or it can be defined as an upper order statistic. It is generally defined as a p-quantile of the underlying distribution q p , for appropriate values of p.

Stable Distributions
Stable distributions are a rich class of probability distributions characterized by [36] and they have been proposed as a model for many types of physical and economic systems because of their interesting properties. They are the attractors of sums of independent, identically distributed distributions whether or not the mean or variance is finite. Good references to study them are [16] or [37].
Let Z be a random variable with parameters defined by its characteristic function: where the parameter α ∈ (0, 2] is called the index of stability, and β ∈ [−1, 1] is the skewness parameter. When β = 0, the distribution is symmetric. A random variable X is said to follow a stable distribution with parameters a > 0 and b ∈ R if it satisfies that Generally, densities can be expressed only by complicated special functions, but there are three special cases of stable distributions that have probability density functions, which can be expressed analytically: • When α = 1/2 and β = 1, we obtain Lévy distributions, Z ∼ L(0, 1). If X ∼ L(γ, δ), then a = δ and b = γ in (3). • When α = 1 and β = 0, we obtain the family of Cauchy distributions, Z ∼ C(0, 1). If X ∼ C(γ, δ), then a = δ and b = γ in (3). • When α = 2 and β = 0, we obtain the normal distribution, N(0, √ 2). If X ∼ N(µ, σ), then a = σ and b = µ in (3). As usual, in this case we will denote Z ∼ N(0, 1).
In particular, as we can standardize the stable distributions of a family, the p-quantiles q p for a stable distribution X, can be expressed in terms of the p-quantiles z p of the standard distribution Z, as Let us assume that X follows a stable distribution with parameters a and b for Equation (3), and fix u = q p as the threshold for the problem of extreme value. Then, for u large enough, z p from (4) is also large, and, consequently, denoting u Z = z p , Theorem 1 guarantees that Z u Z ∼ GPD(ξ Z , σ Z ). Therefore, as and then The parameter ξ Z will remain constant for all the random variables of the same stable family, whatever the parameters of the baseline variable are, while the scale parameter is obtained through the product of the standardization parameter a and the scale parameter In the case when the underlying distribution is a Cauchy or a Lévy, or any stable distribution X with the index 0 < α < 2, the tail of the distribution is considered to be "heavy", therefore it leads to a GPD where ξ > 0.
Stable distributions with index of stability α = 2, (all of them except normal distribution) also verify an interesting property. As we can see in [36], given the standard distribution Z, its survival functionF can be approximated by: From this approximation, we can infer that the shape of the tail of the distribution will only depend on the index of stability α. Therefore, if we consider the GPD that models Z u Z , the shape parameter ξ Z will be fixed. We will estimate it through simulation. Proposition 1. When the baseline distribution is a standard stable distribution Z with α < 2, the relation between the parameters of Z and the parameters of the GPD that models the distribution above the p-quantile of Z, u Z , is: Proof. From Theorem 1, for u Z that is big enough, and by Proposition 1 (9), also for big values of u Z Then, making equal both expressions, Therefore, we can takeξ Z = 1/α as an estimator for ξ Z (notice that the shape of the tail of the distribution depends only on the value of α).
In Section 5, we will assure the accuracy of these estimators through an extensive simulation study.

Metropolis-Hastings (MH) Method
In this section, we will explain how to apply the Markov chain Monte Carlo (MCMC) method through the Metropolis-Hastings (MH) algorithm to make the estimations for stable distributions. We have to distinguish between light tails (ξ < 0) and heavy tails (ξ > 0). Let us assume X u ∼ GPD(ξ, σ) and that we dispose of m values. Let x = (x 1 , . . . , x n ) be a sample of n values from X and x u = (x 1 u , . . . , x m u ) be a sample of m values from X u .
• Sample candidates k * , δ * from the proposal distributions • Calculate the ratios Iterate the former procedure.
Notice that Finally, we obtain estimations for ξ and σ from ξ = −k and σ = kδ.

Heavy Tails ξ > 0
In this case, the likelihood function is Taking a type I Pareto (a 0 , b 0 ) as prior distribution for ξ and InvΓ(a 1 , b 1 ) for σ, Then, MH algorithm is applied, as in the previous case. Notice that

Baseline MH Method (BMH)
In this section, we will introduce Baseline Metropolis-Hastings (BMH) method, designed according to the objectives of seizing all the available data from the baseline distribution and making use of the existing relations between the baseline parameters and the limit GPD parameters. The method consists of:

1.
Applying the MH algorithm to estimate the parameters for the baseline distribution θ.

2.
Making use of the relations between the baseline parameters θ and the GPD parameters ξ and σ to compute estimations for ξ and σ.
In the case of stable distributions, these relations have been explained in previous sections and are given by (8). We will detail below the application of the BMH method for the three selected stable distributions.
For the rest of the baseline distributions, the strategy to find such relations would be made thorough studies of simulation, in order to establish correspondences between the baseline parameters and the tail GPD parameters. At the moment, there are no studies in the literature about this subject; therefore, it would be interesting to perform them in future research.
Then, in the case of stable distributions, the application of BMH would be: 1.
Apply the MH algorithm to estimate scale parameter a from the stable baseline distribution.

2.
Make use of the relation (8) to compute estimations for ξ and σ, using estimators for ξ Z and σ Z that we detail below.

Estimations for ξ Z and σ Z
In order to provide good estimations for ξ Z and σ Z , we made a thorough simulation study for the three baseline distributions we have considered: Lévy, Cauchy, and Normal distribution. We took values for p ∈ [0.990, 0.995] with increments of 0.001, and set the threshold u Z = q p . For each distribution, and for each value of the threshold, m = 1000 values from Z u Z were generated. This sequence was repeated 100 times. Therefore, we obtained 100 point estimations for each p.
To guarantee the convergence of the MCMC algorithm, we must be sure that the posterior distribution has been reached. These proceedings were made using library coda [38] for R software, taking 10,000 values for the burn-in period, 25 values for the thinning, and selecting initial values for each sample. Finally, to obtain the posterior distribution for each parameter, a Markov chain of length 10,000 was obtained and we considered the mean as the estimator. The results of the simulation study are shown in Figure 1.

Lévy and Cauchy Baseline Distribution
By Proposition 1 (10), we had estimations for ξ Z and σ Z . Concretely, these arê for the Lévy distribution and,ξ for the Cauchy distribution.

Normal Baseline Distribution
In the case of the normal distribution, property (10) is not verified. In this case, the adjustment from the simulation study is: Now, we will estimate the scale parameter a of the baseline distribution X. Notice that parameter b for the stable distribution X defined in (3) does not have any influence on the estimation of the parameters of the GPD, as we have shown before. Consequently, we will assume b = 0 from now on.

Lévy Distribution
Likelihood function for Lévy distribution is: Taking a prior distribution Γ(a 0 , b 0 ) for a, and making use of (8) and (11), we obtain estimationsξ andσ.

Normal Distribution
In this case, we consider InvΓ(a 0 , b 0 ) as prior distribution for a 2 , and making use of (8) and (13), we obtain estimationsξ andσ.

Informative Priors Baseline MH (IPBMH)
Finally, we propose an MH method to estimate the parameters ξ and σ, where highly informative priors are employed, seizing the estimations computed before. Employing estimations of ξ Z , σ Z and a obtained in previous sections, we can settle priors for ξ and σ, which are very informative. Notice that this way of proceeding keeps the original idea of Extreme Value Theory, granting more weight to tail values because they are employed twice: to compute estimations for ξ Z and σ Z and also through the likelihood function. As we commented before, this method could also be implemented for other baseline distributions once we have found relations between baseline and GPD parameters.
For stable distributions, highly informative priors are Then, for the three distributions studied, a is estimated through BMH and ξ Z , σ Z are estimations computed through (11)- (13). In addition, • b 1 is constant, being 0.03, 0.065 and 0.1 for Normal, Cauchy and Lévy baseline distributions, respectively.
where values are given in Table 1. The Joint posterior distribution is and marginal distributions are Then, we apply the MH algorithm with

Simulation Study
Now, we will develop a thorough simulation study in order to compare the accuracy of the three MH methods of estimation: MH, BMH and IPBMH.
We fixed p = 0.9 and the threshold u = q p . Furthermore, we take b = 0. We sample n = 2 i , with i = 5, . . . , 10 values from the three baseline distributions considered, with scale a = 2 j , j = −2, −1, 0, 1, 2. We obtained an MCMC with length 10,000, taking 10,000 values for the burn-in period, 25 values for the thinning and selecting initial values for each sample. Finally, this sequence was repeated 100 times and we considered the mean as the estimator.
In Figure 2, we can see the posterior distribution of the parameters ξ Z and σ Z for the different sample sizes n, when the baseline distribution is L(0, 1) (left), C(0, 1) (middle) and N (0, 1) (right), for the methods MH and BMH. For the first one, the distribution is right skewed, although skewness becomes smaller as n increases. BMH offers a point estimation, plotted as a vertical red line. In Figure 3, we can see the posterior distribution for σ Z , for both methods. MH (upper charts) shows much more skewness, such as in the case of ξ Z , while estimations from BMH (lower charts) are less skewed and dispersed. Now, we will compare mean absolute errors (MAE) for MH and BMH when a = 0.25, 1, 4, and for sample sizes n = 2 5 , 2 7 , 2 9 . In Figures 4-6, we can see how BMH provides smaller errors than MH. Clearly, BMH provides more accurate estimations for the parameters of the GPD when the baseline distribution is stable and known. However, in practical situations, we might not know which is the baseline distribution, or data could better fit a mixture of distributions rather than a simple one. In these situations, the use of highly informative priors, built with the information available from all the data, could be advisable. To develop this idea, we simulated from different mixtures and computed values of MAE for the three methods, finding that IPBMH is the method that shows the best behavior when data differ from the simple distributions.
In Figure 7, we can see MAE for the three methods, in the case of the mixtures employed (α = 0.5): αF(0, 1/2) + (1 − α)F(0, 2) (left charts), αF(0, 1) + (1 − α)F(1, 1) (middle charts) and αF(0, 1) + (1 − α)F(1, 2) (right charts), for the three baseline distributions considered (Lévy, Cauchy, Normal). In general, IPBMH is the most advisable method, especially for the case when data are scarce. Notice that when data approaches one of the pure stable distributions, for example, in the case of Cauchy mixtures (which are still quite similar to a simple Cauchy) and the second mixture for the Normal distribution, BMH and IPBMH show very similar results. However, when the mixtures differ significantly from the simple distribution, the method IPBMH and MH are more advisable than BMH.

An Application: PM 2.5 in Badajoz (Spain) during the Period 2011-2020
As we mentioned before, both baseline methods can be generalized for other baseline distributions by studying the relations between the parameters of the baseline distributions and the parameters of the limit GPD. We show an example, employing real data whose baseline distribution can be fitted by a Gamma distribution.
Data from measurements of the levels for diverse air pollutants in many municipalities in Spain are publicly available on the website https://www.miteco.gob.es/es/calidad-yevaluacion-ambiental/temas/atmosfera-y-calidad-del-aire/, accessed on 15 January 2022. Particulate matter is a mixture of solid particles and liquid droplets that can be inhaled and cause serious health problems. In particular, we have selected particulate matter less than 2.5 micrometers in diameter, known as PM 2.5, because it is considered to be especially dangerous to human health. In this context, studying the tail distribution above a certain threshold is essential because the World Health Organization recommends not to exceed an average daily value of 25 µg/m 3 and not to exceed an average annual value of 10 µg/m 3 . We studied levels of PM 2.5 measured in µg/m 3 for the last ten years available, 2011-2020, from Badajoz. There are n = 1066 observations, and, as can be seen in Figure 8, data can be fitted by a Gamma distribution. As in the previous simulations, p = 0.90 and the threshold u = q p , resulting u = 15 µg/m 3 . Making a simulation study, similar to the previous ones made for Normal, Lévy and Cauchy, we obtained the following relation between the parameters of the baseline distribution Γ(α, β) and the parameters of limit GPD(ξ, σ): Then, we randomly selected 50 data and applied the three methods MHM, BDM and IPBDM to fit the tail data. This proceeding was repeated many times, and we found three possible behaviors, as shown in Figure 9. In the first case (left chart), there is an example of the most usual situation: IPBDM offers intermediate estimations, between MHM and BDM. When there are scarce tail data (middle chart), MH differs significantly from the real density, while BDM and IPBDM provide better estimations. Finally, in the right chart, a common situation is shown in which IPBDM clearly offers the best estimations.

1.
In the parameter estimation of GPD, usual EVT methods waste a lot of information.
We have proposed two MH methods that make the most of all the information available from the data set. They are based on making use of the existing relations between baseline and GPD parameters, through informative priors.

2.
When considering the GPD coming from stable baseline distributions, we employed singular properties of stable distributions to simplify the problem (reducing to standard cases) and to provide estimators for the parameters of the GPD.

3.
We have achieved very accurate estimations for the parameters of the GPD when the baseline distribution is Cauchy, Normal or Lévy, making use of the properties of stable distributions and MH methods. 4.
We have studied the goodness of the estimations for classical MH method and BMH when the baseline distribution is standard Cauchy, Normal or Lévy. Clearly, BMH provides more accurate estimations than MH. 5.
In most real situations, data do not fit a simple distribution. We simulated some examples of mixtures of stable distributions and showed that IPBMH provides more accurate estimations than the other methods. 6.
These proposals could be generalized for other baseline distributions by studying the relations between the parameters of the baseline distributions and the parameters of the limit GPD. We provide an application with real data to illustrate this situation.