A New Parameter Estimator for the Generalized Pareto Distribution under the Peaks over Threshold Framework

: Techniques used to analyze exceedances over a high threshold are in great demand for research in economics, environmental science, and other ﬁelds. The generalized Pareto distribution (GPD) has been widely used to ﬁt observations exceeding the tail threshold in the peaks over threshold (POT) framework. Parameter estimation and threshold selection are two critical issues for threshold-based GPD inference. In this work, we propose a new GPD-based estimation approach by combining the method of moments and likelihood moment techniques based on the least squares concept, in which the shape and scale parameters of the GPD can be simultaneously estimated. To analyze extreme data, the proposed approach estimates the parameters by minimizing the sum of squared deviations between the theoretical GPD function and its expectation. Additionally, we introduce a recently developed stopping rule to choose the suitable threshold above which the GPD asymptotically ﬁts the exceedances. Simulation studies show that the proposed approach performs better or similar to existing approaches, in terms of bias and the mean square error, in estimating the shape parameter. In addition, the performance of three threshold selection procedures is assessed by estimating the value-at-risk (VaR) of the GPD. Finally, we illustrate the utilization of the proposed method by analyzing air pollution data. In this analysis, we also provide a detailed guide regarding threshold selection.


Introduction
Currently, the appearance of rare events may have critical effects on economic and social development. Extreme event analysis is widely used in a variety of fields, such as economics, hydrology, engineering, and environmental studies. The most common application is modeling the risk of extreme events and estimating their probabilities. There are two fundamental methods in extreme value theory: the block maxima (BM) approach and the peaks-over-threshold (POT) approach [1]. Ferreira and de Haan (2015) [1] provided details of the differences between these two methods. Under the POT framework, the generalized Pareto distribution (GPD) is applied to fit the values exceeding a high threshold in many applications. Due to its importance in qualifying extreme tail risk, the statistical inference of the GPD has becoming a pressing and widely studied problem in various fields and in particular in actuarial statistics, in which it plays a prominent role in reinsurance. More details can be found in the excellent book by Rolski et al. (1999) [2]. For regulatory purposes, financial institutions are interested in estimating tail measures, such as the value at risk (VaR) or expectiles [3], using the extensive simulations and use a realistic example to evaluate the performance of the proposed GPD parameter estimator. This paper is organized as follows. In Section 2, we obtain two new GPD parameter estimators based on the POT framework. In Section 3, the results of the simulation studies are shown and the proposed methods are applied to model the daily PM2.5 concentration in Beijing using data from the China National Environment Monitoring Center. The conclusions are given in Section 4.
Denote the survival function of a continuous random variable X byF(x) = 1 − F(x), 0 < x < ∞. F(x) regularly varies for index−ξ < 0, or simply,F ∈ R −ξ if the following constraint is met.
Based on the famous Pickands-Balkema-de Haan theorem [39,40], forF ∈ R −ξ , the excess loss Y = X − u|X > u converges to the two-parameter GPD with ξ > 0 when a threshold u > 0 is sufficiently large. There is a function σ(u) such that [41] lim where x 0 ≤ ∞ is the right endpoint of F(x), F(x) is an arbitrary continuous distribution function when F ∈ R −ξ , and F u (y) is the distribution function of the excess loss Y. In the following, for convenience, we refer to σ(u) as σ.
By the conditional probability formula, the cdf of Y can be expressed as follows.
Thus, based on the relation given in (2), F(x) can be rewritten as follows.
Additionally, if X is a random variable distributed according to the GPD(ξ, σ), then the excess loss Y follows GPD(ξ, σ + ξu). This property of the GPD is important and reflects the fact that Y has the same shape parameter as X based on the "excess over threshold" operation, and only the scale parameters vary.

Existing Estimation Methods
In this subsection, we review some existing estimation algorithms related to our proposed method, and these approaches are later used in a simulation study.

Likelihood Moment Estimation
Let x 1 , · · · , x n be a sample from the GPD(ξ, σ). Zhang (2007) [20] proposed the likelihood moment (LM) method to solve the convergence problems of the ML estimation. The GPD log-likelihood function is as follows: is the solution of the following likelihood estimation equations: In addition, the rth moment of 1 + bX with 1 − rξ > 0 is given as so the moment estimation equation has the following expression: Substituting ξ given in (4) into (5), Zhang constructed the following likelihood moment estimation equation: where r * = rξ, and p = r * n/ ∑ n i=1 log(1 + bx i ) with r * < 1 being a constant to be given. Equation (6) has a unique solutionb, which is called the LM estimator of b. Then the shape and scale parameters ξ and σ can be estimated as follows:ξ log(1 +bx i ),σ =ξ/b.

Maximum Likelihood Estimation
Let x 1 , · · · , x n be a sample from the GPD(ξ, σ). Recently, Castillo and Serra (2015) [21] found a new and efficient estimation of the maximum likelihood estimation (MLE) for the scale and shape parameters of the GPD. Based on Equation (4), the shape parameter ξ can be regarded as a function of Consequently, the corresponding profile-likelihood function is given as follows.
The MLEk of k is obtained by maximizing the above profile-likelihood (8), so the shape and scale parameters can be estimated byξ = ξ(k) andσ =ξk.

Weighted Nonlinear Least Squares Estimation
Suppose we have a sample x 1 , · · · , x n of size n. Without loss of generality, let x 1 ≤ · · · ≤ x n . Based on the Pickands-Balkema-de Haan theorem [39,40], with a large threshold u, the distribution of excesses (x i − u|x i > u, i = n u + 1, · · · , n) can be approximated by the GPD under certain suitable conditions, where n u < n is the number of observations that are less than and equal to the chosen GPD threshold u. This condition means that the distribution is in the maximum domain of attraction [24]. Song and Song (2012) [24] pointed out most of the common continuous distributions that satisfy this condition.
Recently, Park and Kim (2016) [4] proposed the WNLS estimator, which consists of three steps. First, the following nonlinear objective function will be minimized with respect to (ξ, σ): where F n (x) is an empirical distribution function (EDF). The intention of the first step is to stabilize the shape parameter with an initial estimate. Second, using (ξ 1 ,σ 1 ) as the initial values, the following nonlinear objective function will be minimized with respect to (ξ, σ): In the third step, they observed that the estimator (ξ 2 ,σ 2 ) can perform better by drawing into the weight [Var(F(x i ))] −1 for F(x i ). Third, with (ξ 2 ,σ 2 ) as the initial values, the following nonlinear objective function will be minimized with respect to (ξ, σ): Then, the resultingξ 3 ,σ 3 are the final WNLS estimators of the GPD shape and scale parameters, respectively.

The Proposed Estimation Methods
In practice, the distribution type of a random variable is always unknown. Pickands (1975) [39] and Balkema and de Haan (1974) [40] showed that the tail distribution can be approximated by the GPD if the distribution is in the maximum domain of attraction [24]. Therefore, when the distribution of the sample is unknown but is in the maximum domain of attraction, we may apply the POT method to model the tail region of the dataset over an appropriate threshold separately using the GPD. This implies that we may choose an optimal threshold in advance, then model the tail region of the dataset over the threshold by the GPD under the POT framework. Using many threshold selection methods (for example [34,36]), the GPD parameters are estimated by the maximum likelihood method first. Then, once they are estimated, the threshold u is fixed by the threshold selection procedures, such as those in [34,36]. For the selected threshold u, we again estimate the parameters by our methods. The intention is to further improve the performance and obtain the final estimators of the GPD shape and scale parameters. The advantage of our estimators compared to the existing estimators will be illustrated in a later simulation.
Suppose we have a sample x 1 , · · · , x n of size n. Without loss of generality, let x 1 ≤ · · · ≤ x n . For the chosen threshold u, the distribution of excesses (x i − u|x i > u, i = n u + 1, · · · , n) can be approximated by the GPD with ξ > 0. Here, we introduce two new estimation approaches for the GPD by minimizing the loss function as follows. That is The goal of this method is to seek parameter estimation that minimizes the residual sum of squares between the theoretical distribution function and its expectation. One of the contributions of this paper is that we combine the method of moments and likelihood moment techniques based on the least squares concept under the POT framework. By these two classical estimation theories, we propose two nonlinear methods for determining the estimators of the GPD.

Weighted Nonlinear Least Squares Moments Estimation
Following Zhang (2007) [20], we [4], based on the MOM concept, we find the interim estimate (ξ 1 ,b 1 ) by minimizing the squared distance between E[− log(1 − F(x))] and [− log(1 − F(x))] as follows: where F(x) is given in (3), F 0 ≡ F n (u) = n u /n, and the expected value of the standard exponential order statistic is [29]. Furthermore, in order to improve the performance of the interim estimate (ξ 1 ,b 1 ), we revise the first step optimization (10) to minimize the squared distance between E[F(x)] and F(x): Note that this revised step is equivalent to using the least squares regression given by where ε i is random variable with E(ε i ) = 0. Therefore, combining with the weighted least squares regression, we choose [VarF(x i )] −1 as the weight for each response variable F(x i ). It is known that the distribution of F(x) follows a uniform distribution in (0,1). For the order statistics By the numerical characteristics of the Beta distribution, the weight for 2 given in [4], with (ξ 1 ,b 1 ) as the initial values, we extend the proposed estimator to the weighted version by minimizing the following equation: We will name (ξ 2 ,b 2 ) as the weighted nonlinear least squares moments (WNLSM) estimator under the POT framework. One advantage of the WNLSM over the unweighted version is that the weighted estimator is more stable because with increasing i when i ≥ n/2, the weight ω i becomes increasingly larger. This relation implies that larger observations play a greater role in parameter estimation than smaller values.

Weighted Nonlinear Least Squares Likelihood Moments Estimation
Based on the likelihood function, the shape parameter ξ can be written as the function ξ(b) of b given in (7), as follows.
By replacing ξ in our proposed two-step optimizations (10) and (11) with ξ(b) given in (12), these two minimizations become one-dimensional equations for b as follows: In the following second step, we also add the weights ω i to further improve results from the first optimization step (13). With b * 1 as the initial value, our proposed estimator for b is We will name b * 2 as the likelihood WNLSM (WNLLSM for short) estimator under the POT framework, and ξ can be estimated by ξ * = (n − n u ) −1 ∑ n i=n u +1 log(1 + b * 2 (x i − u)). Compared to (10) and (11), steps (13) and (14) are advantageous in that the term involving ξ has been eliminated. Thus, b can be separately estimated, independent of ξ, making this method more efficient.

Simulation Studies
In Section 3.1, for a GPD sample, we estimate the GPD parameters using alternative estimators and demonstrate that the proposed WNLLSM estimator performs best for the shape parameter in most cases. In Section 3.2, we estimate the VaR at quantile levels from 95% to 99% to study the performance of three threshold selection methods under various parametric mixture distributions. Last, we used daily Beijing PM2.5 (particulate matter with a diameter less than 2.5 µm) data to illustrate the utility of the proposed methods in detail and identify important knowledge for air pollution.

Simulation Study of the Parameter Estimation
In this section, extensive simulation studies are conducted to investigate the estimation of ξ and b. In the remainder of this article, we denote the proposed methods as WNLSM and WNLLSM. The methods presented for comparison are denoted as WNLS [4], L-moments [14], GPWME [18], Zhang [20], and MLE [21]. The R software code for the proposed WNLSM and WNLLSM methods can be seen in Appendix A.
All evaluations are based on the empirical bias and mean square error (MSE) estimated from 100,000 simulations. The simulations are conducted as follows.
The full simulation results for 3 different sample sizes are given in Table 1. The smallest values of MSE and bias are highlighted in bold format. Specifically, for the shape parameter ξ, in terms of both MSE and bias, the proposed WNLLSM and WNLSM yielded the best performance. The Zhang, MLE, and GPWME are about average. For b, Zhang and WNLS performed better in most cases. As shown in Table 1, all methods displayed acceptable performance when the shape parameter ξ was less than 1. However, the L-moments method was the worst when the shape parameter ξ ≥ 1. Among the two parameters, ξ determines the tail thickness of the GPD because 1/ξ is the tail index [4]. Parameter ξ characterizes the shape of the GPD, especially the shape of the tail, which is significant for fitting the extreme values. Table 1 only shows the performance of estimating ξ and b when the shape parameter ξ takes a certain value. To comprehensively evaluate the performance of different estimation methods, another simulation was conducted to evaluate the absolute bias (Abias) and MSE with respect to the shape parameter in a given range. In this simulation study, we vary ξ from 0 to 5. Figures 1 and 2 display the results for µ = 0, σ = 1, and sample size n = 100 based on 1000 repetitions. Figure 1 shows that for ξ, the WNLLSM method is the best estimator, followed by WNLSM, Zhang, and WNLS. In Figure 2, for b, Zhang and WNLS are the best estimators, followed by WNLLSM and WNLSM.

Simulation Study of the Threshold Selection
Suppose that we have a sample x 1 , · · · , x n of size n. Without loss of generality, let x 1 ≤ · · · ≤ x n . If we know the sample follows the GPD, it is unnecessary to search for the GPD threshold. While if the distribution of the sample is unknown but is in the maximum domain of attraction, we may apply the POT method to model the exceedances by the GPD. First, we can obtain the maximum likelihood estimations for the parameters of the GPD. Then, once they are estimated, the threshold can be fixed by many methods. Among them, we propose three existing procedures to find a proper threshold, as outlined below.
Method 1. ForwardStop [34]: (a) Fix u i = x i , where i ranges from 1 to m (m < n), and m is a fixed positive integer. (b) Test H i 0 proposed by [36], and compute the corresponding p-values p i . (c) Compute cutoffk such that where α is a significance level. In this case, the selected threshold is uˆk.

Method 2. Raw Up [34]:
Begin with the order statistics x i where i = 1, and continue with i = i + 1 until the threshold u * i = x i accepts that the data above the u * i follows the GPD. Method 3. Raw Down [34]: Begin with order statistics x i where i = n, and continue with i = i − 1 until the threshold u i = x i rejects that the data above the u i follows the GPD. Now we turn to studying the performance of threshold selection by estimating the VaR of the GPD, denoted by for a given threshold u 0 , where F 0 is the ratio of the observations less than and equal to u 0 . VaR p presents the 100p quantile to qualify tail risk in several fields, with p close to 1. We compare the performance of the ForwardStop method with two competing methods, Raw Up and Raw Down [34], in terms of tail risk measures VaR. Consider a sequence of order candidate thresholds. Raw Up begins at the first threshold and chooses the lowest threshold as the optimal threshold until the GPD fits the exceedances over this threshold. In contrast, Raw Down starts from the last threshold and select the biggest one until the GPD assumption is rejected.
The simulation samples are generated from a mixture distribution of the Weibull and GPD. A similar construction to that in [34], the mixture distribution can be obtained by defining the following hazard function where ε > 0 is the length of the transition period, h 1 (x) = κβ −κ x κ−1 is the hazard function of the Weibull distribution with parameters (κ, β), and h 2 (x) = (σ + ξ(x − u)) −1 , x ≥ u is the hazard function of the GPD(u, σ, ξ). The transition function defined as in [34] is This ensures that h(x) becomes a continuous hazard function. Therefore, the mixture distribution function F on [0, ∞) of the random samples generating mechanism is For x ≤ u, the mixture distribution is the Weibull distribution. Denote F 1 by the probability of data being less than and equal to u, so F 1 = F(u) and 1 − F 1 means the probability of the data being greater than u. Consequently, Then u can be expressed as For x ≥ u + ε, the mixture distribution is the GPD. Based on the construction mechanism, the GPD tail starts from u + ε. Therefore, u + ε is the real threshold for the GPD, but not u. 1 − F 0 implies the probability of exceedances over the real threshold. We can find the relation between F 0 and F 1 given as By substituting 1 − F 1 into (16), we can find the restriction between the threshold u, the transition interval length ε, and the Weibull and GPD parameters: For each sample, we estimate VaR 95%, 98%, and 99% quantiles of the mixture distribution. The optimal threshold can be selected by applying the R package "eva" [42]. The simulation results can be seen in Table 2. In term of the MSE and bias, overall we see in Table 2 that ForwardStop forms a more stable and less sensitive rule under all conditions. In addition, ForwardStop and Raw Down procedures perform better in most cases.

Application
In this section, we illustrate the utility of the proposed method by analyzing environmental data. Beijing's daily PM2.5 records were collected from the China National Environment Monitoring Center. The data contained 361 observations from 1 September 2016 to 28 February 2017 and 1 September 2017 to 28 February 2018. We only considered the data from autumn and winter because air pollution is more serious in these seasons in Beijing. The statistical analysis started with threshold selection, followed by obtaining the exceedances, and finally, parameter estimation. The details are listed below.
1. Use Raw Up, Raw Down, and ForwardStop rules to select the appropriate thresholds u * , u i , and uˆk, respectively. 2. Obtain exceedances above the chosen threshold. 3. Based on the exceedances, obtain parameter estimations of ξ and σ by different methods. Table 3 displays the results of threshold selection and parameter and VaR estimations for the real example. In estimation, we can find F 0 = 11.4% by Raw Up and ForwardStop rules and F 0 = 93.9% by Raw Down. The predicted results can serve as references for environmental agencies and Beijing residents. Combining the strengths of the three different threshold selection approaches, the potential thresholds are 10 and 188. For u = 10 and 188, Figure 3 displays the EDF and the estimated theoretical distribution function with corresponding parameter estimators, which are shown in Table 3. Comparison plots of the EDF and theoretical distribution function are displayed in Figure 3a for u = 10 and Figure 3b for u = 188, which obtained very similar results. A careful comparison of these two figures reveals that the threshold u = 10 fits the data better for the GPD, especially the extreme tail, which is an important index in POT analysis.

Conclusions
We propose two GPD-based modeling approaches for exceedances under the POT framework. Our new methods are adapted from the recent WNLS estimator of Park and Kim (2016), which is based on the nonlinear weighted optimization of the residual squares between the distribution function and the EDF. Combining the concepts of the weighted nonlinear least squares method, the MOM, and the likelihood moments technique, we introduce the WNLSM and WNLLSM methods, which minimize the sum of squared deviations between the theoretical distribution function and its expectation. Our proposed methods are stable and provide analytical solutions. The results of extensive simulation studies demonstrate that the performance of the new estimators is highly competitive with other methods in estimating the shape parameter ξ. We also investigate the performance of the threshold selection methods ForwardStop rule, Raw Up, and Raw Down. The simulation results show that there is not always a winner in all conditions. ForwardStop and Raw Down perform relatively better than the Raw Up procedure in most cases. However, ForwardStop is more stable to other two competing threshold selection tests. In practice, our estimations and the existing threshold selection methods are applied to daily PM2.5 data from Beijing, and the new methods produce satisfactory results in the environmental data analysis. In summary, our estimation methods are specifically designed for extreme data under the POT framework and are computationally efficient and straightforward.

Funding:
The authors gratefully acknowledge the support of National Natural Science Foundation of China through grant No. 11801019.