1. Introduction
A distribution function
F with survivor function
$\overline{F}:=1-F$ is regularly varying (RV) at infinity with index
$\alpha $, if there exists an
$\alpha >0$ such that
$\forall x>0$
in this case we say that
$\overline{F}\in R{V}_{-\alpha}$. In the extreme value (EV) literature it is typical to refer to the EV index
$\gamma >0$ with
$\alpha =1/\gamma $. Informally, we will say that the distribution has a Pareto tail or that the distribution is of the power-law type. Note that the case
$1<\alpha \le 2$ (or
$1/2\le \gamma <1$) entails distributions with infinite variance and finite mean while the case
$\alpha >2$ (or
$\gamma <1/2)$ entails distributions with finite mean and variance.
Precision in the analysis of the tail of a distribution allows to, for example, perform proper risk evaluation in finance, correcting empirical income distributions for various top-income measurement problems, or individuating a proper growth theory in economics or the biological sciences. For further examples of applications and deeper discussion see Clauset et al. [
1], Jenkins [
2] and Hlasny [
3] with specific references to applications in income distributions and an overview of available models; see also Heyde and Kou [
4] for a deep discussion of graphical methods for tail analysis.
The present paper will concentrate on estimation of the EV index
$\gamma $. Probably the most well-known estimator of the EV index is the Hill [
5] estimator, which exploits the
k upper order statistics of a random sample through the formula
where
${X}_{\left(i\right)}$ denotes the
i-th order statistics from a sample of size
n and
$k=k\left(n\right)\le n$ diverges to
∞ in an appropriate way. The Hill estimator has been thoroughly studied in the literature and several generalizations have been proposed. For a recent review of estimation procedures for the EV (or tail) index of a distribution see Gomes and Guillou [
6].
Some recent approaches in tail or EV index estimation we would like to mention here are those of Brilhante et al. [
7] which define a
moment of order p estimator which reduces to the Hill estimator for
$p=0$ and Beran et al. [
8] which define a
harmonic moment tail index estimator. Recently Paulauskas and Vaičiulis [
9] and Paulauskas and Vaičiulis [
10] have connected in an interesting way some of the above approaches by defining parametric families of functions of the order statistics. Reduced bias (RB) versions of the above estimators have appeared in the literature, see for example Caeiro et al. [
11], Gomes et al. [
12] and Gomes et al. [
13].
The main contribution of this paper consists in a new estimation procedure for EV index of a distribution satisfying (
1) which relies on Zenga’s inequality curve
$\lambda \left(p\right)$,
$p\in (0,1)$ (Zenga [
14]).
The curve
$\lambda $ has the property of being constant for the Pareto Type I distribution, it has an intuitive graphical interpretation, it does not depend on location and it shows a nice regular behaviour when estimated. These properties will be discussed, analysed and extended in order to define our inferential strategies. Also it is important to point out that an inequality curve is defined for positive observations and hence we will implicitly assume that the right tail of a distribution is analysed. This is not really a restriction since if one wishes to consider the left tail it is enough to change sign to the data. Also, if the distribution is over the real line, tails can be considered separately and, under the symmetry assumption, absolute values of the data could be considered. The approach to estimation proposed here, directly connected to the inequality curve
$\lambda $, has a nice and effective graphical interpretation which greatly helps in the analysis. Other graph-based methods are to be found in Kratz and Resnick [
15], which exploit properties of the QQ-plot, and Grahovac et al. [
16] which discuss an approach based on the asymptotic properties of the partition function, a moment statistic generally employed in the analysis of multi-fractality; see also Jia et al. [
17] which analyse graphically and analytically the real part of the characteristic function at the origin.
We would like to point out here that the
$\lambda $ curve discussed by Zenga [
14] does not coincide with the Zenga [
18] curve originally indicated by the author with
$I\left(p\right)$,
$p\in (0,1)$ (more details in the next Section).
The paper is organized as follows:
Section 2 introduces the curve
$\lambda $ and discusses its properties;
Section 3 analyses the proposed estimation strategy and discusses some practical issues in applications. Finite sample performances are analysed in
Section 4 and
Section 5 where applications with simulated and real data are considered. Proofs are postponed to the last Section.
2. The Proposed Estimator for the EV Index
Let
X be a positive random variable with finite mean
$\mu $, distribution function
F, and probability density
f. The inequality curve
$\lambda \left(p\right)$ has been first defined in Zenga [
14]; with original notation:
where
${F}^{-1}\left(p\right)=inf\{x:F\left(x\right)\ge p\}$ is the generalized inverse of
F and
$Q\left(x\right)={\int}_{0}^{x}tf\left(t\right)dt/\mu $ is the first incomplete moment.
Q can be defined as a function of
p via the Lorenz curve
See further Zenga [
19] Arcagni and Porro [
20] for a general introduction and analysis of
$\lambda \left(p\right)$ for different distributions. It is worth to mention that the curve
$\lambda \left(p\right)$ should not be confused with the inequality curve defined in Zenga [
18], originally indicated as
The curve
$I\left(p\right)$ has many nice properties and has been heavily studied in some recent literature; it is now commonly known as the Zenga curve
$Z\left(p\right)$. For the sake of completeness in Zenga [
14] the notation
$Z\left(p\right)$ was originally used for another inequality curve based on quantiles, that is,
where
${x}_{p}={F}^{-1}\left(p\right)$ and
${x}_{p}^{*}={Q}^{-1}\left(p\right)$. As pointed out in Zenga [
14] (without providing if and only if results) the curve
$\lambda $ is constant in
p for type-I Pareto distributions, while the curve
Z, as defined in Equation (
6), is constant in
p for Log-normal distributions. On the contrary, the curve
I, as defined in (
5), is not constant for any distribution, see Zenga [
14] and Zenga [
18] for further details. Turning back the attention to the curve
$\lambda $, note that for a Pareto Type I distribution with
under the condition that
$\alpha >1$, the Lorenz curve has the form
it follows that in this case
$\lambda \left(p\right)=\gamma $,
$p\in (0,1)$, that is
$\lambda \left(p\right)$ is constant in
p. This is actually an if-and-only-if result, which we formalize in the following lemma (see
Section 7 for its proof).
Lemma 1. The curve $\lambda \left(p\right)$ defined in (3) is constant in p, $p\in (0,1)$, and equals $\gamma =1/\alpha $ if, and only if, F satisfies (7) with $\alpha >1$ or, equivalently, $\gamma <1$. Lemma 1 could be exploited to derive a new approach to the estimation of the EV index
$\gamma =1/\alpha $ for the Pareto distribution. In order to define an estimator for the more general case where
$\overline{F}$ satisfies (
1) it is worth to analyse in more detail what is the behaviour of the Lorenz curve under the framework defined by (
1). This will be done by considering the truncated random variable
$Y=X|X>s$ with
$X\sim F$,
$F\in R{V}_{-1/\gamma}$. If
G and
g denote respectively the distribution function and the density of
Y, note that
$G\left(y\right)=\frac{F\left(y\right)-F\left(s\right)}{\overline{F}\left(s\right)}$ and
$g\left(y\right)=f\left(y\right)/\overline{F}\left(s\right)$. Furthermore, setting
$G\left(y\right)=p$ and inverting we have
${G}^{-1}\left(p\right)={F}^{-1}(F\left(s\right)+p\overline{F}\left(s\right))$. A formal result on the Lorenz curve for
Y is given in the next lemma.
Lemma 2. Consider the random variable X with distribution function $F\in R{V}_{-1/\gamma}$ and absolutely continuous density f; define $Y=X|X>s$, $s>0$, and let ${L}_{Y}\left(p\right)$ the Lorenz curve of Y. Then Remark 1. Lemma 2 implies that the curve $\lambda \left(p\right)$, for the truncated random variable $Y=X|X>s$, with distribution satisfying (1), will be constant with value γ for all $p\in (0,1)$ if the truncation level s will be large enough. This fact can be exploited to derive a general estimator for the EV index for all distributions in the class (1) as long as $\gamma <1$. Before arriving at a formal definition of the estimator, some preliminary quantities need to be defined. Let
${X}_{\left(1\right)},\cdots ,{X}_{\left(n\right)}$ be the order statistics of a random sample of size
n from a distribution satisfying (
1). Let
$k=k\left(n\right)\to \infty $ and
$k\left(n\right)/n\to 0$ as
$n\to \infty $. Define the estimator of the conditional Lorenz curve as
After defining
the proposed estimator of
$\gamma $ is
Remark 2. The estimator defined in (12), based on a Lorenz curve computed on upper order statistics (defined by k), puts into practice the results of Lemma 1 and Lemma 2. Below we will discuss conditions under which (12) provides a consistent estimator of γ for the class of distributions satisfying (1). Guidance in the choice of k will be also discussed. Letting
${\mathbb{I}}_{\left(A\right)}$ denote the indicator function of the event
A the above estimators are based on the non-parametric estimators
Under the Glivenko-Cantelli theorem it holds that
${F}_{n}\left(x\right)\to F\left(x\right)$ almost surely and uniformly in
$0<x<\infty $; under the assumption that
$E\left(X\right)<\infty $, it holds that
${Q}_{n}\left(x\right)\to Q\left(x\right)$ almost surely and uniformly in
$0<x<\infty $ (Goldie [
21]).
${F}_{n}$ and
${Q}_{n}$ are both step functions with jumps at
${X}_{\left(1\right)},\cdots ,{X}_{\left(n\right)}$. The jumps of
${F}_{n}$ are of size
$1/n$ while the jumps of
${Q}_{n}$ are of size
${X}_{\left(i\right)}/T$ where
$T={\sum}_{i=1}^{n}{X}_{\left(i\right)}$.
Letting
${F}_{n}^{-1}\left(p\right)=inf\{x:{F}_{n}\left(x\right)\ge p\}$, we note that since
${F}_{n}^{-1}\left(\right)open="("\; close=")">\frac{n-k}{n}$ and that
${F}_{n}^{-1}\left(\right)open="("\; close=")">{F}_{n}\left({X}_{(n-k)}\right)+p{\overline{F}}_{n}\left({X}_{(n-k)}\right)$ for
$i/k\le p<(i+1)/k$ we have the representation
Exploiting the above representation and the results of Goldie [
21], uniform consistency of
${\widehat{L}}_{k}\left(p\right)$ can be claimed. As far as uniform consistency of
${\widehat{\lambda}}_{k}\left(p\right)$ we state the following lemma, which is proven in
Section 7.
Lemma 3. For ${X}_{1},\cdots ,{X}_{n}$$i.i.d.$ from a distribution F with $E\left(X\right)<\infty $; then Following Lemma 2, graphical inspection of the tail of a distribution satisfying (
1) can be carried out by analysing a graph with coordinates
$({p}_{i},{\widehat{\lambda}}_{i})$,
$i=1,\cdots ,n$ which will show a flat line with intercept around the value
$\gamma =1/\alpha $. Apart from the case of the Pareto distribution, for distributions satisfying (
1), to observe a constant line with intercept close to
$\gamma =1/\alpha $ it is necessary to truncate the sample, that is, using only the upper order statistics
${X}_{(n-k+1)},\cdots {X}_{\left(n\right)}$ when estimating
$\lambda $.
As an example,
Figure 1 reports the empirical curve
${\widehat{\lambda}}_{i}$ as a function of
${p}_{i}$ for some cases of interest at different truncation thresholds. There appear two distributions with tail satisfying (
1), namely Pareto as defined by (
7) and Fréchet (more formally defined below), both with tail index
$\alpha =2$. There appear also two distributions which do not satisfy (
1), namely Log-normal with null location and standard deviation equal to 2 and Exponential with unit scale. Note that for Log-normal distribution the curve
$\lambda $ does not depend on location, while it does not depend on scale for the exponential distribution (Zenga [
14]).
Inspection of the graphs reveals a remarkably regular behaviour of the curves; the Pareto case is constant (with some slight variations) for all level of truncation, while the Fréchet one becomes more and more constant with increasing levels of truncation. The Log-normal and Exponential cases show a slope in the curve at all levels of truncation.
4. Numerical Comparisons
In this section we will evaluate the performance of ${\widehat{\gamma}}_{k}$ with respect to some alternative estimators of the EV (or tail) index. As far as the estimator for $\gamma $ is concerned, beyond considering the estimator ${\widehat{\gamma}}_{Opt}$, the estimator ${\widehat{\gamma}}_{k}$ with different levels of truncation of the data is considered. In the tables, ${\widehat{\gamma}}_{1-p}$ indicates the estimator ${\widehat{\gamma}}_{k}$, with $1-p$ indicating the fraction of upper order statistics used in estimation; the notation ${\widehat{\gamma}}_{All}$ indicates the case where all the sample data are used in estimation.
Numerical comparisons will be carried out with respect to some reduced bias (RB) competitors (Caeiro et al. [
11], Gomes et al. [
12]) based on Hill (Hill [
5]), generalized Hill (Beirlant et al. [
24]), moment (Dekkers et al. [
25]) and moment of order
p (Gomes et al. [
13]) estimators; optimized with respect to the choice of
k as discussed in Gomes et al. [
13].
RB estimation of
$\gamma $ for the above mentioned alternative estimators is based on external estimation of additional parameters
$(\rho ,\beta )$ (refer to Gomes et al. [
26] and Gomes et al. [
13] for further details). In our comparisons the following RB-versions are used:
- (1)
RB-Hill estimator, outperforming
$H\left(k\right)$ (defined in (
2)) for all
k- (2)
RB-Moment estimator, denoted by MM in the tables,
with
and
${M}_{k}^{\left(j\right)}={\sum}_{i=1}^{k}{(ln{X}_{(n-i+1)}-ln{X}_{(n-k)})}^{j}$,
$j\ge 1$.
- (3)
RB-Generalized Hill estimator,
$\overline{GH}\left(k\right)$, denoted GH in the tables, with the same bias correction as in (
27) applied to
with
$\mathrm{UH}\left(j\right)={X}_{(n-j)}H\left(k\right)$$1\le j\le k$.
- (4)
RB-MOP (moment of order
p) estimator, for
$0<p<\alpha $ (the case
$p=0$ reduces to the Hill estimator) defined by
with
${H}_{p}\left(k\right)=(1-{A}_{p}^{-p}\left(k\right))/p$,
${A}_{p}\left(k\right)={\left(\right)}^{{\sum}_{i=1}^{k}}1/p$,
${U}_{ik}={X}_{(n-i+1)}/{X}_{(n-k)}$,
$1\le i\le k<n$. Denoted by
${\mathrm{MP}}_{p}$ in the tables. In this case
p is a tuning parameter which will be set, in our simulations, equal to
$0.5$ and 1. For an estimated optimal value of
p based on a preliminary estimator of
$\alpha $ see Gomes et al. [
13].
Computations of the above estimators have been performed using the package
evt0 (Manjunat and Caeiro [
27]) in
R. More precisely,
$GH\left(k\right)$ and
$M\left(k\right)$ are obtained using the function
other.EVI() respectively with the options
GH and
MO. Estimation of the parameters
$(\rho ,\beta )$ for the bias correction terms can be obtained from the function
mop(). RB-Hill and RB-MOP estimates are directly obtained by the function
mop() by appropriately specifying a value of
p and the option
RB-MOP. In order to optimize the choice of
k we used the formula [
13]
where
$\u230ax\u230b$ is the integer part of
x and
$\phi \left(\rho \right)=1-(\rho +\sqrt{{\rho}^{2}-4\rho +2})/2$. For the comparisons, the following distributions are used:
- (1)
Pareto distribution, as defined in (
7). Random numbers from this distribution are simply generated in
R using the function
runif() and inversion of
F.
- (2)
Fréchet distribution with
$F\left(x\right)=exp(-{x}^{-\alpha})$,
$x\ge 0$, denote by Fréchet
$\left(\alpha \right)$. This distribution is simulated in
R using the function
rfrechet() from the package
evd (Stephenson [
28]) with shape parameter set equal to
$\alpha $.
- (3)
Burr distribution with
$F\left(x\right)=1-{(1+{x}^{\alpha})}^{-1}$, indicated with Burr
$\left(\alpha \right)$. This distribution is simulated in
R using the function
rburr() from the package
actuar (Dutang et al. [
29]) with the parameter
shape1 set to 1 and
shape2 set equal to
$\alpha $.
- (4)
Symmetric stable distribution with index of stability
$\alpha $,
$0<\alpha <2$, indicated with Stable
$\left(\alpha \right)$:= Stable
$(\alpha ,\beta =0,\mu =0,\sigma =1)$; where
$\beta $,
$\mu $ and
$\sigma $ indicate, respectively, asymmetry, location and scale. This distribution is simulated in
R using the function
rstable() from the package
stabledist (Wuertz et al. [
30]). For this distribution only the positive observed data are used in estimation.
Table 1,
Table 2,
Table 3,
Table 4,
Table 5,
Table 6,
Table 7 and
Table 8 contain the empirical RMSE (Root-MSE) and the relative RMSE, with respect to
${\widehat{\gamma}}_{Opt}$, of the estimators, that is, for
any of the evaluated estimators, say
$\widehat{\gamma}$, then
Note that a Rel-RMSE greater than one implies a worse performance of the estimator with respect to
${\widehat{\gamma}}_{Opt}$.
$\widehat{\mathrm{E}}$ denotes the empirical expected value, that is, the mean over the Montecarlo experiments. For each sample size
$n=50,100,200$, 300, 500, and 1000; 1000 Montecarlo replicates were generated. Computations have been carried out with
R version 3.5.1 and each experiment, that is, given a chosen distribution and a chosen
n, has been initialized using
set.seed(3). Numerical results representative for each distribution are reported in the tables. More tables with other choices of parameters can be found in the on-line
Supplementary Materials accompanying this paper.
Trying to summarize the results we note the general good performance of the estimators based on the curve $\lambda $ defined in this paper for which the gain in efficiency can be substantial. We note also the actual usefulness of ${\widehat{\gamma}}_{Opt}$ for practical applications since it is able to individuate appropriate levels of truncation for different distributions although an actual knowledge of the optimal level of truncation would obtain higher efficiency.
Turning to the single cases, one can note that the
${\widehat{\gamma}}_{Opt}$ outperforms all the other estimators for the Pareto distribution where relative efficiency (see
Table 2), is always greater than 4. For the case of the Pareto distribution,
${\widehat{\gamma}}_{All}$ would be the most efficient choice, as expected.
In the case of the Fréchet distribution
${\widehat{\gamma}}_{Opt}$ is always more efficient than all competitors test for smaller sample sizes (see
Table 4); as sample size increases the gain in efficiency decreases and maybe slightly lower in some cases.
The performance of
${\widehat{\gamma}}_{Opt}$ in the case of the Burr distribution is comparable to that of the competitors, with relative RMSE (see
Table 6) slightly smaller or greater than one depending on the case considered.
In the case of the Symmetric stable distribution, the performance of
${\widehat{\gamma}}_{Opt}$ is slightly better than all alternative estimators in all cases (see
Table 8). The
MM estimator turns out to be quite efficient for the stable distribution with
$\alpha $ closer to 2 (see the on-line
Supplementary Materials).
We note that the MM and GH estimators, computed with the package evt0, has shown some illogical results in some instances with extremely high values of the RMSE, typically for some specific sample sizes, after several checks, we could not figure out the reason of such results.
5. Examples
Here we concentrate on six real data examples that have been used in the literature to discuss methods to detect a power-law in the tail of the underlying distribution. These data have all been thoroughly analysed, for example, in Clauset et al. [
1]. The following data sets are analysed here:
- 1.
The frequency of occurrence of unique words in the novel Moby Dick by Herman Melville (Newman [
31]).
- 2.
The severity of terrorist attacks worldwide from February 1968 to June 2006, measured as the number of deaths directly resulting (Clauset et al. [
32]).
- 3.
The sizes in acres of wildfires occurring on U.S. federal land between 1986 and 1996 (Newman [
31]).
- 4.
The intensities of earthquakes occurring in California between 1910 and 1992, measured as the maximum amplitude of motion during the quake (Newman [
31]).
- 5.
The frequencies of occurrence of U.S. family names in the 1990 U.S. Census (Clauset et al. [
1]).
- 6.
Peak gamma-ray intensity of solar flares between 1980 and 1989 (Newman [
31]).
Figure 3 provides the estimated
$\lambda $ curves for the six examples, either considering the whole data and selected percentages of the upper order statistics. The range of
$\lambda $ may vary in the graphs in order to have a better detail of the path of the curves.
On each of the data-set we apply Algorithm 2 in order to select the optimal number of k in computing ${\widehat{\gamma}}_{Opt}$; with the given estimate we apply Algorithm 1 in order to compute a 95% confidence interval for the estimate.
Next we apply a testing procedure to evaluate if the graphs in
Figure 3, for the
k chosen by Algorithm 1, can be considered “enough flat” in order to support the hypothesis that the data come from a distribution within the class (
1). A bootstrap test setting
${H}_{0}:{\beta}_{1}=0$ in model (
24) has been developed in Taufer et al. [
33].
For comparison we apply also the testing procedure for the power-law hypothesis developed by Clauset et al. [
1].
Table 9 reports analytical results on estimated values, 95% confidence intervals, the fraction of upper order statistics used and the p-values of the testing procedures.
Trying to summarize briefly the results we would say that the conclusions about the presence of a Pareto-type tail in the distributions coincide fully with the conclusions of Clauset et al. [
1], that is: clear evidence of a power law distribution fitting the data is for the Moby Dick and Terrorism data-sets. For the others there is no convincing evidence. We point out that for the contrasting
p-values for the Solar Flares data, Clauset et al. [
1] suggest a power tail with an exponential cut-off at a certain point. Given the characteristics of the graphs based on the
$\lambda $ curve this feature cannot be noticed in our analysis.
As far as the estimated values of
$\gamma $, the values of the estimators obtained here are substantially lower with respect to those obtained by Clauset et al. [
1] (which uses the Hill estimator). Given the good performance in the simulations of
${\widehat{\gamma}}_{Opt}$ in comparison to the Hill estimator, the values in
Table 9, at least for the Moby Dick and Terrorism data-set can be considered reliable.
For the other data-sets, since the null hypotheses of a power law has significant p-values, the estimated
$\gamma $ should be discarded and it becomes of interest to select an alternative model by using, for example a likelihood ratio test as discussed in Clauset et al. [
1] to which the interested reader is referred.