Extreme Value Index Estimation by Means of an Inequality Curve

A characterizing property of Zenga (1984) inequality curve is exploited in order to develop an estimator for the extreme value index of a distribution with regularly varying tail. The approach proposed here has a nice graphical interpretation which provides a powerful method for the analysis of the tail of a distribution. The properties of the proposed estimation strategy are analysed theoretically and by means of simulations. The usefulness of the method will be tested also on real data sets.


Introduction
A distribution function F with survivor functionF := 1 − F is regularly varying (RV) at infinity with index α, if there exists an α > 0 such that ∀x > 0 lim t→∞F (tx) F(t) = x −α ; (1) in this case we say thatF ∈ RV −α . In the extreme value (EV) literature it is typical to refer to the EV index γ > 0 with α = 1/γ. Informally, we will say that the distribution has a Pareto tail or that the distribution is of the power-law type. Note that the case 1 < α ≤ 2 (or 1/2 ≤ γ < 1) entails distributions with infinite variance and finite mean while the case α > 2 (or γ < 1/2) entails distributions with finite mean and variance. Precision in the analysis of the tail of a distribution allows to, for example, perform proper risk evaluation in finance, correcting empirical income distributions for various top-income measurement problems, or individuating a proper growth theory in economics or the biological sciences. For further examples of applications and deeper discussion see Clauset et al. [1], Jenkins [2] and Hlasny [3] with specific references to applications in income distributions and an overview of available models; see also Heyde and Kou [4] for a deep discussion of graphical methods for tail analysis.
The present paper will concentrate on estimation of the EV index γ. Probably the most well-known estimator of the EV index is the Hill [5] estimator, which exploits the k upper order statistics of a random sample through the formula where X (i) denotes the i-th order statistics from a sample of size n and k = k(n) ≤ n diverges to ∞ in an appropriate way. The Hill estimator has been thoroughly studied in the literature and several generalizations have been proposed. For a recent review of estimation procedures for the EV (or tail) index of a distribution see Gomes and Guillou [6]. Some recent approaches in tail or EV index estimation we would like to mention here are those of Brilhante et al. [7] which define a moment of order p estimator which reduces to the Hill estimator for p = 0 and Beran et al. [8] which define a harmonic moment tail index estimator. Recently Paulauskas and Vaičiulis [9] and Paulauskas and Vaičiulis [10] have connected in an interesting way some of the above approaches by defining parametric families of functions of the order statistics. Reduced bias (RB) versions of the above estimators have appeared in the literature, see for example Caeiro et al. [11], Gomes et al. [12] and Gomes et al. [13].
The main contribution of this paper consists in a new estimation procedure for EV index of a distribution satisfying (1) which relies on Zenga's inequality curve λ(p), p ∈ (0, 1) (Zenga [14]).
The curve λ has the property of being constant for the Pareto Type I distribution, it has an intuitive graphical interpretation, it does not depend on location and it shows a nice regular behaviour when estimated. These properties will be discussed, analysed and extended in order to define our inferential strategies. Also it is important to point out that an inequality curve is defined for positive observations and hence we will implicitly assume that the right tail of a distribution is analysed. This is not really a restriction since if one wishes to consider the left tail it is enough to change sign to the data. Also, if the distribution is over the real line, tails can be considered separately and, under the symmetry assumption, absolute values of the data could be considered. The approach to estimation proposed here, directly connected to the inequality curve λ, has a nice and effective graphical interpretation which greatly helps in the analysis. Other graph-based methods are to be found in Kratz and Resnick [15], which exploit properties of the QQ-plot, and Grahovac et al. [16] which discuss an approach based on the asymptotic properties of the partition function, a moment statistic generally employed in the analysis of multi-fractality; see also Jia et al. [17] which analyse graphically and analytically the real part of the characteristic function at the origin.
We would like to point out here that the λ curve discussed by Zenga [14] does not coincide with the Zenga [18] curve originally indicated by the author with I(p), p ∈ (0, 1) (more details in the next Section).
The paper is organized as follows: Section 2 introduces the curve λ and discusses its properties; Section 3 analyses the proposed estimation strategy and discusses some practical issues in applications. Finite sample performances are analysed in Sections 4 and 5 where applications with simulated and real data are considered. Proofs are postponed to the last Section.

The Proposed Estimator for the EV Index
Let X be a positive random variable with finite mean µ, distribution function F, and probability density f . The inequality curve λ(p) has been first defined in Zenga [14]; with original notation: where F −1 (p) = inf{x : F(x) ≥ p} is the generalized inverse of F and Q(x) = x 0 t f (t)dt/µ is the first incomplete moment. Q can be defined as a function of p via the Lorenz curve See further Zenga [19] Arcagni and Porro [20] for a general introduction and analysis of λ(p) for different distributions. It is worth to mention that the curve λ(p) should not be confused with the inequality curve defined in Zenga [18], originally indicated as , p ∈ (0, 1).
The curve I(p) has many nice properties and has been heavily studied in some recent literature; it is now commonly known as the Zenga curve Z(p). For the sake of completeness in Zenga [14] the notation Z(p) was originally used for another inequality curve based on quantiles, that is, where x p = F −1 (p) and x * p = Q −1 (p). As pointed out in Zenga [14] (without providing if and only if results) the curve λ is constant in p for type-I Pareto distributions, while the curve Z, as defined in Equation (6), is constant in p for Log-normal distributions. On the contrary, the curve I, as defined in (5), is not constant for any distribution, see Zenga [14] and Zenga [18] for further details. Turning back the attention to the curve λ, note that for a Pareto Type I distribution with under the condition that α > 1, the Lorenz curve has the form it follows that in this case λ(p) = γ, p ∈ (0, 1), that is λ(p) is constant in p. This is actually an if-and-only-if result, which we formalize in the following lemma (see Section 7 for its proof).
Lemma 1 could be exploited to derive a new approach to the estimation of the EV index γ = 1/α for the Pareto distribution. In order to define an estimator for the more general case whereF satisfies (1) it is worth to analyse in more detail what is the behaviour of the Lorenz curve under the framework defined by (1). This will be done by considering the truncated random variable Y = X|X > s with X ∼ F, F ∈ RV −1/γ . If G and g denote respectively the distribution function and the density of Y, note that G(y) = F(y)−F(s) F(s) and g(y) = f (y)/F(s). Furthermore, setting G(y) = p and inverting we have G −1 (p) = F −1 (F(s) + pF(s)). A formal result on the Lorenz curve for Y is given in the next lemma.

Lemma 2.
Consider the random variable X with distribution function F ∈ RV −1/γ and absolutely continuous density f ; define Y = X|X > s, s > 0, and let L Y (p) the Lorenz curve of Y. Then Remark 1. Lemma 2 implies that the curve λ(p), for the truncated random variable Y = X|X > s, with distribution satisfying (1), will be constant with value γ for all p ∈ (0, 1) if the truncation level s will be large enough. This fact can be exploited to derive a general estimator for the EV index for all distributions in the class (1) as long as γ < 1.
Before arriving at a formal definition of the estimator, some preliminary quantities need to be defined. Let X (1) , . . . , X (n) be the order statistics of a random sample of size n from a distribution satisfying (1). Let k = k(n) → ∞ and k(n)/n → 0 as n → ∞. Define the estimator of the conditional Lorenz curve asL After definingλ the proposed estimator of γ isγ Remark 2. The estimator defined in (12), based on a Lorenz curve computed on upper order statistics (defined by k), puts into practice the results of Lemma 1 and Lemma 2. Below we will discuss conditions under which (12) provides a consistent estimator of γ for the class of distributions satisfying (1). Guidance in the choice of k will be also discussed.
Letting I (A) denote the indicator function of the event A the above estimators are based on the non-parametric estimators Under the Glivenko-Cantelli theorem it holds that F n (x) → F(x) almost surely and uniformly in 0 < x < ∞; under the assumption that E(X) < ∞, it holds that Q n (x) → Q(x) almost surely and uniformly in 0 < x < ∞ (Goldie [21]). F n and Q n are both step functions with jumps at X (1) , . . . , X (n) . The jumps of F n are of size 1/n while the jumps of Q n are of size X (i) /T where T = ∑ n i=1 X (i) . Letting F −1 n (p) = inf{x : F n (x) ≥ p}, we note that since F −1 n n−k n = x (n−k) and that F −1 n F n (X (n−k) ) + pF n (X (n−k) ) = x (n−k+i) for i/k ≤ p < (i + 1)/k we have the representation Exploiting the above representation and the results of Goldie [21], uniform consistency ofL k (p) can be claimed. As far as uniform consistency ofλ k (p) we state the following lemma, which is proven in Section 7.
Following Lemma 2, graphical inspection of the tail of a distribution satisfying (1) can be carried out by analysing a graph with coordinates (p i ,λ i ), i = 1, . . . , n which will show a flat line with intercept around the value γ = 1/α. Apart from the case of the Pareto distribution, for distributions satisfying (1), to observe a constant line with intercept close to γ = 1/α it is necessary to truncate the sample, that is, using only the upper order statistics X (n−k+1) , . . . X (n) when estimating λ.
As an example, Figure 1 reports the empirical curveλ i as a function of p i for some cases of interest at different truncation thresholds. There appear two distributions with tail satisfying (1), namely Pareto as defined by (7) and Fréchet (more formally defined below), both with tail index α = 2. There appear also two distributions which do not satisfy (1), namely Log-normal with null location and standard deviation equal to 2 and Exponential with unit scale. Note that for Log-normal distribution the curve λ does not depend on location, while it does not depend on scale for the exponential distribution (Zenga [14]).
Inspection of the graphs reveals a remarkably regular behaviour of the curves; the Pareto case is constant (with some slight variations) for all level of truncation, while the Fréchet one becomes more and more constant with increasing levels of truncation. The Log-normal and Exponential cases show a slope in the curve at all levels of truncation.

Consistency
Exploiting some theoretical results given Section 7 (see the proof Lemma 2 for details), one can write where H U (s) is a slowly varying function, that is, it holds that lim s→∞ To analyze in more detail the second term on the r.h.s. of the above equation we assume a second order condition which is quite common in the EV theory (Caeiro et al. [11] and Gomes et al. [13]), that is, where n from which F(s) = k/n. Then, an asymptotic evaluation requires to evaluate the expression Note that the asymptotic behaviour of the sum in (18) is governed by p i → 0. Expanding in Taylor series the numerator and using log( For example if ρ = −1/2 then one can choose 0 < δ < 1/3 in order to have an asymptotically bias free estimator. There exists valid estimation methods for ρ and β implemented also in R (see Section 4 for details). (17)

Asymptotic Distribution
To provide an operational distributional result, we exploit a result of Csorgo and Mason [22].
Exponential with unit scale parameter (Exp(1)) random variables; then, for fixed k, Note that S j ∼ Γ(j, 1) and that if Y = S −γ j then Y has a Generalized Inverse Gamma (GIG) distribution with density Compare with Mead [23] setting λ = 0, α = j, β = γ, θ = 1. Using this general result, a simple and fast parametric bootstrap procedure can be implemented in order to obtain the full asymptotic distribution ofγ and from it estimates of the standard error and confidence intervals.

Remark 3.
Sinceγ k is consistent, the above procedure is consistent for the asymptotic distribution of the estimator (12).
Once the bootstrap distribution is available it can be used for variance and confidence intervals estimation. Simulations show that the approximation works quite well already for small sample sizes and for different k values. Clearly the precision depends on a good preliminary estimator of γ which is the only parameter needed in determining the distribution; this is however a typical feature of asymptotic results for estimators. Figure 2, considering different sample sizes (n = 100, 500, 1000, 2000) from a Fréchet (2)

Selecting k
The estimatorγ k , like many tail estimators, requires the choice of k, the number of upper order statistics to be used in estimation. Lemma 4 provides some indication on how to do so; estimation of the required parameters governing the second order conditions can be carried on quite straightforwardly (see discussion in the next section).
In order to arrive at a data-driven procedure to define the fraction of upper order statistics for estimation of the EV index, consider the linear equation From Lemmas 1 and 2, it follows that for a type-I Pareto distribution and for for a truncated random variable satisfying (1), as s → ∞ is large enough, distribution F satisfying (7) with γ = 1/α, in the above equation one has β 0 = γ and β 1 = 0. Considering a sample version of (23): given a random sample X 1 , . . . , X n , using the notation established in the previous Section, writê where ε i =λ i − λ i . Note that the proposed estimator (12) can be interpreted as the intercept estimate in model (24) exploiting the information that β 1 = 0. More formally, using ordinary least squares, define the estimatorŝ wherep is the mean of the p i 's and S 2 Following this reasoning, one can define a procedure based on the graph of (p i ,λ k,i ) for different levels of truncation: we observe the fraction of upper order statistics which gets the smallest, in absolute value, regression coefficientβ 1 from the regression (25). 3: Defineγ Opt as the estimateγ k obtained for the sub-sample which has the lowest value of |β 1 |.
In the next two Sections, the performance of the proposed estimation strategy is analysed on simulated and real data.

Numerical Comparisons
In this section we will evaluate the performance ofγ k with respect to some alternative estimators of the EV (or tail) index. As far as the estimator for γ is concerned, beyond considering the estimator γ Opt , the estimatorγ k with different levels of truncation of the data is considered. In the tables,γ 1−p indicates the estimatorγ k , with 1 − p indicating the fraction of upper order statistics used in estimation; the notationγ All indicates the case where all the sample data are used in estimation.
RB estimation of γ for the above mentioned alternative estimators is based on external estimation of additional parameters (ρ, β) (refer to Gomes et al. [26] and Gomes et al. [13] for further details). In our comparisons the following RB-versions are used: (1) RB-Hill estimator, outperforming H(k) (defined in (2)) for all k (2) RB-Moment estimator, denoted by MM in the tables, with and M (j) RB-Generalized Hill estimator,ḠH(k), denoted GH in the tables, with the same bias correction as in (27) applied to with UH(j) = X (n−j) H(k) 1 ≤ j ≤ k.
RB-MOP (moment of order p) estimator, for 0 < p < α (the case p = 0 reduces to the Hill estimator) defined byH with Denoted by MP p in the tables. In this case p is a tuning parameter which will be set, in our simulations, equal to 0.5 and 1. For an estimated optimal value of p based on a preliminary estimator of α see Gomes et al. [13].
Computations of the above estimators have been performed using the package evt0 (Manjunat and Caeiro [27]) in R. More precisely, GH(k) and M(k) are obtained using the function other.EVI() respectively with the options GH and MO. Estimation of the parameters (ρ, β) for the bias correction terms can be obtained from the function mop(). RB-Hill and RB-MOP estimates are directly obtained by the function mop() by appropriately specifying a value of p and the option RB-MOP. In order to optimize the choice of k we used the formula [13] k = min n − 1, ((1 − ϕ(ρ) −ρ) 2 n −2ρ /(−2ρβ 2 (1 − 2ϕ(ρ)))) 1/(1−2ρ) + 1 , where x is the integer part of x and ϕ(ρ) = 1 − (ρ + ρ 2 − 4ρ + 2)/2. For the comparisons, the following distributions are used: (1) Pareto distribution, as defined in (7). Random numbers from this distribution are simply generated in R using the function runif() and inversion of F.
. This distribution is simulated in R using the function rfrechet() from the package evd (Stephenson [28]) with shape parameter set equal to α.
Tables 1-8 contain the empirical RMSE (Root-MSE) and the relative RMSE, with respect toγ Opt , of the estimators, that is, for any of the evaluated estimators, sayγ, then Note that a Rel-RMSE greater than one implies a worse performance of the estimator with respect toγ Opt .Ê denotes the empirical expected value, that is, the mean over the Montecarlo experiments. For each sample size n = 50, 100, 200, 300, 500, and 1000; 1000 Montecarlo replicates were generated. Computations have been carried out with R version 3.5.1 and each experiment, that is, given a chosen distribution and a chosen n, has been initialized using set.seed (3). Numerical results representative for each distribution are reported in the tables. More tables with other choices of parameters can be found in the on-line Supplementary Materials accompanying this paper.       Trying to summarize the results we note the general good performance of the estimators based on the curve λ defined in this paper for which the gain in efficiency can be substantial. We note also the actual usefulness ofγ Opt for practical applications since it is able to individuate appropriate levels of truncation for different distributions although an actual knowledge of the optimal level of truncation would obtain higher efficiency.
Turning to the single cases, one can note that theγ Opt outperforms all the other estimators for the Pareto distribution where relative efficiency (see Table 2), is always greater than 4. For the case of the Pareto distribution,γ All would be the most efficient choice, as expected.
In the case of the Fréchet distributionγ Opt is always more efficient than all competitors test for smaller sample sizes (see Table 4); as sample size increases the gain in efficiency decreases and maybe slightly lower in some cases.
The performance ofγ Opt in the case of the Burr distribution is comparable to that of the competitors, with relative RMSE (see Table 6) slightly smaller or greater than one depending on the case considered.
In the case of the Symmetric stable distribution, the performance ofγ Opt is slightly better than all alternative estimators in all cases (see Table 8). The MM estimator turns out to be quite efficient for the stable distribution with α closer to 2 (see the on-line Supplementary Materials).
We note that the MM and GH estimators, computed with the package evt0, has shown some illogical results in some instances with extremely high values of the RMSE, typically for some specific sample sizes, after several checks, we could not figure out the reason of such results.

Examples
Here we concentrate on six real data examples that have been used in the literature to discuss methods to detect a power-law in the tail of the underlying distribution. These data have all been thoroughly analysed, for example, in Clauset et al. [1]. The following data sets are analysed here: 1.
The frequency of occurrence of unique words in the novel Moby Dick by Herman Melville (Newman [31]).

2.
The severity of terrorist attacks worldwide from February 1968 to June 2006, measured as the number of deaths directly resulting (Clauset et al. [32]).

3.
The sizes in acres of wildfires occurring on U.S. federal land between 1986 and 1996 (Newman [31]).

4.
The intensities of earthquakes occurring in California between 1910 and 1992, measured as the maximum amplitude of motion during the quake (Newman [31]). 5.
Peak gamma-ray intensity of solar flares between 1980 and 1989 (Newman [31]). On each of the data-set we apply Algorithm 2 in order to select the optimal number of k in computingγ Opt ; with the given estimate we apply Algorithm 1 in order to compute a 95% confidence interval for the estimate.
Next we apply a testing procedure to evaluate if the graphs in Figure 3, for the k chosen by Algorithm 1, can be considered "enough flat" in order to support the hypothesis that the data come from a distribution within the class (1). A bootstrap test setting H 0 : β 1 = 0 in model (24) has been developed in Taufer et al. [33].
For comparison we apply also the testing procedure for the power-law hypothesis developed by Clauset et al. [1]. Table 9 reports analytical results on estimated values, 95% confidence intervals, the fraction of upper order statistics used and the p-values of the testing procedures.
Trying to summarize briefly the results we would say that the conclusions about the presence of a Pareto-type tail in the distributions coincide fully with the conclusions of Clauset et al. [1], that is: clear evidence of a power law distribution fitting the data is for the Moby Dick and Terrorism data-sets.
For the others there is no convincing evidence. We point out that for the contrasting p-values for the Solar Flares data, Clauset et al. [1] suggest a power tail with an exponential cut-off at a certain point. Given the characteristics of the graphs based on the λ curve this feature cannot be noticed in our analysis.
As far as the estimated values of γ, the values of the estimators obtained here are substantially lower with respect to those obtained by Clauset et al. [1] (which uses the Hill estimator). Given the good performance in the simulations ofγ Opt in comparison to the Hill estimator, the values in Table 9, at least for the Moby Dick and Terrorism data-set can be considered reliable. For the other data-sets, since the null hypotheses of a power law has significant p-values, the estimated γ should be discarded and it becomes of interest to select an alternative model by using, for example a likelihood ratio test as discussed in Clauset et al. [1] to which the interested reader is referred.

Conclusions
An estimation strategy for the tail index of a distribution in class (1) has been defined starting from a characterizing property of Zenga's inequality curve λ. On the basis of the theoretical properties of the estimatorγ k two simple bootstrap procedures have been obtained: the first provides a general result for the asymptotic distribution ofγ k and the second gives a data-driven procedure to determine the optimal value of k. Simulations show the good performance ofγ k and the implementation algorithm.
The data-driven optimized estimator often outperforms optimized (with respect to bias) competing estimation strategies. The gain in efficiency is substantial in the case of Pareto distributions.
The graph of the λ curve associated with the estimator provides a valid support in the analysis of real data.
Proof of Lemma 2. Let G and g denote respectively the distribution function and the density of Y = X|X > s and note that G(y) = P(Y ≤ y) = F(y)−F(s) Proof of Lemma 3. Note that lim n→∞ sup p∈(0,1) |L n (p) − L(p)| = o P (1) and that p lies in a compact interval. λ n (p) is continuous transformation of L n (p); it follows that for fixed p ∈ (0, 1), lim n→∞ |λ n (p) − λ(p)| = o P (1).
To prove uniform consistency of λ n (p) we need to show it is equicontinuous. For this note that λ n (p) depends on p stochastically only through L n (p), which is uniformly continuous. Hence for any δ > 0 such that |p 1 − p 2 | < δ, it is possible to find an n 0 , not depending on p 1 , p 2 such that for n > n 0 , ε, η > 0, one has P(|λ n (p 1 ) − λ n (p 2 ))| > ε) < η.