1. Introduction
While there are many concepts and approaches presented in the literature, none of them has been indicated to be suitable, and there is no single answer where the distribution tail begins. Searching for the tail of distribution is always a trade-off between bias and variance. If a threshold is specified too low, the method estimates indicate a high bias. The more the threshold is away from the tail, the more the empirical distribution of extrema deviates from the theoretical model. On the other hand, a too high threshold results in high variance of the model estimates, since not many data exceed the threshold (
Coles 2001). Numerous authors apply a fixed percentile of the total sample size as the threshold, usually 10%, 7.5%, 5% or 1% of the upper statistics (
Karmakar and Shukla 2014;
Jones 2003;
Bee et al. 2016;
Fernandez 2005;
Totić and Božović 2016;
Gençay et al. 2003;
Gençay and Selçuk 2004;
Echaust and Just 2020a;
Daníelsson and Morimoto 2000;
Echaust 2021;
Longin 2000;
McNeil and Frey 2000). More sophisticated approaches to threshold selection are based on graphical techniques. A frequently used procedure relies on the analysis of a mean excess plot, which represents the mean of the excesses of the threshold
u. This method was applied in
Aboura (
2014),
Cifter (
2011),
Gilli and Këllezi (
2006),
Łuczak and Just (
2020) and
Omari et al. (
2017). The change in the pattern for a very high threshold is observed in this plot; therefore, the choice of a threshold is ambiguous. The second graphical procedure for threshold selection is to estimate parameters of the Generalized Pareto Distribution at a range of thresholds. Above a threshold
u, at which the model is valid, the estimate of the shape parameter should be approximately constant, whereas the estimate of the scale parameter should be linear. Another very popular threshold selection technique involves a graphical representation of Hill (
Beirlant et al. 2011;
Hill 1975), Pickands (
Pickands 1975) or Dekkers–Einmahl–de Haan estimators (
Dekkers et al. 1989). However, the graphical-based threshold choice procedures require identification of stable regions in the graphs; therefore, they are highly subjective and difficult to apply in empirical studies. Finally, there are studies that use the choice of threshold based on optimization procedures (
Caeiro and Gomes 2016;
Danielsson et al. 2016;
Echaust and Just 2020b;
Scarrott and MacDonald 2012). An extensive overview of such methods is provided in
Section 2.
This paper provides an empirical study on various methods of optimal tail selection in risk measurement. The results are to indicate which of them may be used in practice by investors and financial and regulatory institutions. According to the
Amendment to Basel I (
1996), market risk should be calculated as the Value at Risk using the 99th percentile confidence level. A number of weaknesses have been identified by using VaR for determining capital requirement during the global financial crisis (GFC) of 2007–2009. Therefore, in the
Basel III document (
2013) the Committee proposed to replace VaR with ES at the 97.5% confidence level. This confidence level is to provide a similar level of risk as the 99th percentile VaR while ensuring a number of benefits, including generally more stable risk estimates and often lesser sensitivity to extreme outlier observations. Moreover, in contrast to VaR, ES is a subadditive risk measure and thus does not hinder the appropriate risk measurement of investment portfolios. In our empirical study, we analyze twelve automatic methods of threshold selection. Unfortunately, some of these methods estimate a threshold at a very high level to estimate these risk measures at the recommended confidence levels. The research objective of this paper is to compare these methods and to identify these methods that can be recognized as useful in risk measurement. Some models that perform well in simulation studies, based on theoretical distributions, may not perform well when real data are in use (
Danielsson et al. 2016). This paper sheds new light on the linkage between tail selection and financial applications.
This study contributes to the literature in two ways. Firstly, we provide a review of tail selection methods most popular in the literature. Secondly, we compare the results of the tail choice computed with a variety of methods. All threshold estimates are calculated in the tea R package (
Ossberger 2020). We are able to specify which of them tend to produce systematically too high threshold levels to calculate risk measures at commonly accepted confidence levels. Therefore, our results may be helpful for all investors, risk managers and academics who apply tail models in practice.
The remainder of the paper is organized as follows.
Section 2 provides an overview of optimal tail selection methods.
Section 3 describes the data used in the empirical study.
Section 4 presents empirical results and their discussion, while
Section 5 concludes the study.
2. Optimization Approaches for Threshold Selection
In the empirical study we use twelve methods to choose the optimal number of data in the tail:
The Mean Absolute Deviation Distance metric (MAD-Distance metric) method,
The Kolmogorov–Smirnov Distance metric (KS-Distance metric) method,
The Reiss and Thomas (RT) procedures,
The Path Stability (PS) method,
The automated Eyeball (Eyeball) method,
The Guillou and Hall (GH) procedure,
The minimization of the Asymptotic Mean Squared Error (dAMSE) method,
The Hall and Welsh (HW) procedure,
The single bootstrap (Hall) procedure proposed by Hall,
The single bootstrap (Himp) procedure proposed by Caeiro and Gomes,
The double bootstrap (Gomes) procedure proposed by Gomes, Figueiredo and Neves,
The double bootstrap (Danielsson) procedure proposed by Danielsson, de Haan, Peng and de Vries.
2.1. Mean Absolute Deviation Distance Metric (MAD-Distance Metric) and Kolmogorov–Smirnov Distance Metric (KS-Distance Metric) Methods
The MAD-Distance metric (MAD Dis) and KS-Distance metric (KS Dis) methods presented by
Danielsson et al. (
2016) are based on minimizing the distance between the largest upper order statistics of the dataset, i.e., the distance between the empirical tail and the theoretical tail of a Pareto distribution. The tail index
of this distribution is estimated using the Hill estimator (
Hill 1975):
where
is the number of upper order statistics used to estimate parameter
. The distance, which is measured in the quantile dimension, is minimized with respect to
. The optimal number of the upper order statistics
is called
. Determining
is equivalent to specifying the number of extreme values (sample size in the tail). The MAD-Distance metric method uses the mean absolute deviation penalty functions:
and the KS-Distance metric takes the maximum absolute deviation penalty functions:
where
is the region over which the distance metric is measured,
is the empirical quantile, and
is the quantile estimated by
with the Hill estimator
. In these approaches, the size of the upper tail is denoted by
. The motivation for the selection of these methods is that the tail-related risk measures (e.g., VaR and ES) are concepts related to quantiles at a given probability level.
2.2. The Reiss and Thomas (RT) Procedures
Reiss and Thomas (
2007) proposed alternative procedures to select the optimal number of data in the tail,
. These approaches select the lowest upper order statistic by minimizing the expression (RT1):
or expression (RT2):
where
is the Hill estimator, and the tuning parameter satisfies
.
In practice, automated implementation of these procedures is unreliable for small
; thus, a minimum value of
,
, is usually used (
Scarrott and MacDonald 2012.)
2.3. The Path Stability (PS) Method
The Path Stability method is a heuristic algorithm proposed by
Caeiro and Gomes (
2016). The algorithm looks for a stable region of the path stability, i.e., the plot of a tail index with respect to
. The optimal number of data in the tail,
, is identified by the algorithm that consists of six steps:
First step. Given an observed data , compute using the Hill estimator.
Second step. Take as a minimum value of , a non-negative integer, such that the rounded values, to decimal places, of the estimates in the first step are distinct. Define the rounded values of to j decimal places.
Third step. Consider the set of k values associated to equal consecutive values of obtained in the second step. Set and the maximum and minimum values, respectively, of the set with the largest range. The largest run size is .
Fourth step. Consider all those estimates, , now with two additional decimal places, i.e., compute . Obtain the mode of and denote the set of -values associated with this mode.
Fifth step. Obtain , the maximum value of .
Sixth step. Finally calculate .
2.4. The Automated Eyeball (Eyeball) Method
Another heuristic algorithm is the automated Eyeball method proposed by
Resnick and Stărică (
1997) and explored in (
Danielsson et al. 2016). The method finds a stable region in the Hill plot by defining a moving window. This method is based on a heuristic algorithm, which looks for a substantial drop in variance of the Hill plot as
is increased. The following estimator is used to select the optimal number of data in the tail:
where
is the size of the moving window (usually 1% of the full sample);
is the number of positive data;
is the size of the range in which estimates may vary (e.g., 0.3); and
is the percentage of data inside the moving window, which should occur in the tolerable range (usually around 90%).
2.5. The Guillou and Hall (GH) Procedure
The procedure for choosing the optimal threshold when fitting the Hill estimator of a tail exponent to extreme value data proposed by
Guillou and Hall (
2001) is based on bias reduction. The optimal number of data in the tail,
, in this method is identified by the procedure, which consists of three steps (e.g., see,
Caeiro and Gomes 2016):
First step. Given an observed data
, compute:
where
Second step. Given
, consider the choice
Third step. Next, obtain .
Guillou and Hall (
2001) investigated a wider range of
, but they suggest values between 1.25 and 1.5.
2.6. Minimization of the Asymptotic Mean Squared Error (dAMSE) Method
The minimization of the Asymptotic Mean Squared Error (dAMSE) method was presented by
Caeiro and Gomes (
2016). The optimal number of data in the tail,
, in this method is identified by minimizing the Asymptotic Mean Squared Error (AMSE) of the Hill estimator with respect to
. This algorithm consists of five steps:
First step. Given an observed data
, calculate for the tuning parameters
and
, the values of
, that have the form:
dependent on the statistics:
where
Second step. Consider . Calculate the median of , denoted , and compute .
Then, select the tuning parameter, , if , otherwise select .
Third step. Work with
and
, with
and
dependent on the estimator
, and where for any
,
where
the scaled log-spacings.
Fourth step. On the basis of estimators
and
, calculate
Fifth step. Finally, compute .
2.7. The Hall and Welsh (HW) Procedure
Hall and Welsh (
1985) proposed a procedure to identify the optimal number of upper order statistics for the Hill estimator by minimizing the AMSE criterion. This method is based on a similar algorithm as that in the dAMSE method but with a different estimation of second-order parameters dependent upon three tuning parameters,
(
Caeiro and Gomes 2016). The Hall and Welsh procedure is quite sensitive to changes in the tuning parameters, but the parameter values proposed by Hall and Welsh (
,
,
) seem to work well (
Gomes and Oliveira 2001). This method may be presented in five steps (e.g., see
Gomes and Oliveira 2001;
Hall and Welsh 1985):
First step. For , , , obtain , , .
Second step. Obtain
where
If , select another method, otherwise continue.
Fifth step. Finally, compute
The consistency of this procedure holds only for .
2.8. The Hall Single Bootstrap (Hall) Procedure
In 1990, Hall introduced a bootstrap-based methodology for the estimation of the optimal tail fraction (see, e.g.,
Hall 1990;
Gomes and Oliveira 2001). This method is applied to the Hall class of Pareto-type tails (
Danielsson et al. 2016). Given a simple
from an unknown model
F, and the functional
,
, consider the bootstrap sample
, where
, from the model
, the empirical d.f. associated with the original sample
. Given an initial value
, i.e.,
, such that
is a consistent estimator of
, Hall proposes the minimization of the bootstrap estimate of the Mean Squared Error (
MSE) of
:
Next, the value
is selected so that it minimizes
, and the tail fraction is determined from the formula:
In the algorithm, the number of bootstrap repetitions is denoted by .
Most often,
and
are taken as
and
, respectively (
Caeiro and Gomes 2016).
Gomes and Oliveira (
2001) noted that there is a disturbing sensitivity of the method to the initial value of
; in turn, there is almost independence on
. These facts contributed to the search for an alternative bootstrap methodology.
2.9. The Single Bootstrap (Himp) Procedure Proposed by Caeiro and Gomes
An improved version of Hall bootstrap methodology was introduced in
Caeiro and Gomes (
2014). They proposed a single bootstrap procedure consisting of five steps:
First step. Given an observed sample , compute the estimates and of the second order parameters and as described in the dAMSE algorithm.
Second step. Consider a sub-sample size . For from 1 until , generate independently bootstrap samples of sizes , from the empirical distribution associated with the observed data .
Third step. Denoting
the bootstrap counterpart of
obtain
, the observed values of
For
compute
and obtain
Fourth step. Compute the threshold estimate
If go back to the second step generating different bootstrap samples.
Fifth step. Finally, obtain .
2.10. The Double Bootstrap (Gomes) Procedure Proposed by Gomes, Figueiredo and Neves
The double bootstrap procedure described by
Gomes et al. (
2012) leads to an increased precision of the result with the same number
of bootstrap samples generated (
Caeiro and Gomes 2016). This algorithm consists of five steps:
Let denote the Hill estimator.
First step. Given an observed sample , compute the estimates and of the second order parameters and as described in the dAMSE algorithm.
Second step. Consider a sub-sample size and . For from 1 until , generate independently bootstrap samples and of sizes and , respectively, from the empirical distribution associated with the observed data .
Third step. Denoting
the bootstrap counterpart of
obtain
, the observed values of
For
and
compute
and obtain
Fourth step. Compute the threshold estimate
If , go back to the second step generating different bootstrap samples.
Fifth step. Finally obtain .
2.11. The Double Bootstrap (Danielsson) Procedure Proposed by Danielsson, de Haan, Peng and de Vries
The procedure proposed in
Danielsson et al. (
2001) for selecting the optimal sample fraction in tail index estimation simulates the AMSE criterion of the Hill estimator using an auxiliary statistic. For the AMSE,
ξ is unknown. To solve this problem, the theoretical
ξ in the AMSE formula is replaced with a control variate. Since a simple bootstrap is inconsistent in the tail area, a sub-sample bootstrap is used. Moreover, to be able to scale the sub-sample
MSE back to the sample size, a second, even smaller sub-sample bootstrap is needed. The AMSE of the control variate is
where
In this procedure,
is the sub-sample for the bootstrap (the number of bootstrap repetitions is denoted by
). In the first step, the
T function is minimized over two dimensions:
and
. In the next step, given the optimal
and
, a second bootstrap with a sub-sample size
is made to find
. Finally, the optimal number of order statistics is determined as follows:
4. Results of the Empirical Study
Table 2 presents the results of an optimal threshold selection for the left and right tails. More precisely, it shows the percentile ranks for the entire period.
Table 3 and
Table 4 show the results for the first and third sub-periods. For the sake of brevity, we do not report the results for the second and fourth sub-periods. Note that the results for these periods do not differ substantially from those presented in
Table 3 and
Table 4.
Not all methods perform well in finite samples. Some that perform well in simulation studies based on theoretical distributions may not be suitable in financial applications. We can distinguish several methods that produce very high threshold estimates and pick a small number of data in the tails. They include the MAD Dis, KS Dis, RT1, RT2, Eyeball, GH, Himp, Gomes and Danielsson approaches. These methods tend to produce thresholds above the 99th percentile (max. above 0.99); thus, they prevent an estimation of VaR or ES at commonly accepted confidence levels. In particular, the KS Dis, RT1 and RT2 for
kmin = 2, Eyeball with the tuning parameter
w = 0.01, GH and Danielsson methods systematically produce high threshold estimates, since the median exceeds the 98th percentile. The findings support the results of
Danielsson et al. (
2016), who argued that the Eyeball and KS Dis methods tend to pick the threshold close to the maximum of distribution. For shorter time-series (in sub-periods), the RT1, RT2 for
kmin = 2 and Danielsson methods are the most restrictive and systematically set the threshold at too high levels to be used in financial applications. This finding is in line with the observation by
Scarrott and MacDonald (
2012), who pointed out that the RT1 approach is unreliable for a small
k despite the weighting by
. In turn,
Reiss and Thomas (
2007) suggested using alternative distance metrics or weighting schemes when we deal with limited data. Methods based on minimizing the asymptotic
MSE, especially the bootstrap-based methods, do not perform well in empirical studies (
Danielsson et al. 2016). Similarly,
Ferreira et al. (
2003) noted that these methods do not give satisfactory results for samples of size under approximately 2000.
The other methods estimate the threshold in a more conservative way, lower than the 99th quantile. Such a choice of threshold guarantees enough data in the tail to calculate Value at Risk at the 99th confidence level. These are the HW, dAMSE, Hall (excluding parameters satisfying B = 10,000, ε = 0.955, ), Eyeball (, ) and PS approaches. However, only two of them, i.e., the PS and HW algorithms, satisfy the Basel III requirements with reference to the Expected Shortfall, since their maxima are below the 97.5th percentile. Although HW performs well for long time-series, it fails in shorter series. In sub-periods, the optimizing procedure does not converge for several assets either for the upper or lower tails. Two algorithms, i.e., PS and HW, exhibit a very high range and standard deviation. It means that the threshold is highly volatile and cannot be substituted by a fixed percentile of the dataset as a threshold in practical applications. On the other hand, the Eyeball and dAMSE approaches produce less volatile estimations. The dAMSE method establishes the threshold close to the 95th percentile, while the Eyeball method is even more stable, but it indicates a much higher threshold, i.e., at the 98th percentile. Thus, we cannot decide which method is the optimal choice for financial applications. However, we can indicate those approaches which may be useful in practice. A majority of the analyzed methods, i.e., MAD Dis, KS Dis, PS, GH, HW, Eyeball, dAMSE, Gomes and Himp, indicate a higher left threshold than the right one. High returns might be perceived as extreme returns when they are positive, but they do not have to be recognized as extreme when they are negative. This result suggests asymmetry between the left and right tails. This result seems to be general since it holds in the entire period as well as the sub-periods.
Having specified the distribution of tails, the next step in our study comprises a comparison of the methods recognized as applicable in a risk measurement. Two different distances between threshold estimates are compared. The results for the entire research period are presented in
Table 5 and
Table 6. Average absolute differences are shown in
Table 5 and root mean squared differences in
Table 6. As can be seen, the methods may be divided into two groups. The Eyeball, dAMSE and Hall methods show a relatively large deviation from the two other methods. Note that all the three above-mentioned methodologies utilize a low fraction (median above the 97th and below the 99th percentiles) of the total sample. The other group consists of the PS and HW methods. The distance between the threshold estimates is relatively close, and they both pick the threshold far away from the maximum (median below the 90th percentile).
Danielsson et al. (
2016) documented the similarity in choosing optimal sample fractions between the KS Distance and Eyeball methods and between the approach proposed by
Drees and Kaufmann (
1998) and the Danielsson method. However, they cast doubt on the applicability of the latter methods for real-world empirical estimations. The results for the right tails and the left tails do not indicate any significant differences, but approximately the same relative differences are preserved between the methodologies.
5. Conclusions
The selection of the threshold which separates the tail from the middle part of a return distribution is crucial in the estimation of tail-related risk measures. Unfortunately, the right threshold is unknown in empirical applications. This paper presents evidence for twelve different optimal tail selection methods in risk measurement. We selected the optimal tail fraction for daily return time-series for forty-eight world indices. We found that many methods tend to set the optimal tail above the 99th percentile; therefore, their applicability in risk management is very limited. The methods that perform well in long and relatively short time-series include the minimization of the Asymptotic Mean Squared Error (dAMSE), the single bootstrap procedure (Hall), Eyeball with carefully selected tuning parameters and Path Stability (PS). The methods can be divided into two categories. The first three methods produce threshold estimates around the 98th percentile for the entire research period and their estimates are close to one another. This level of the threshold allows the estimation of the Value at Risk at the 99th confidence level, but not the Expected Shortfall at the 97.5th confidence level recommended by the Basel Committee. In the sub-periods, the estimated thresholds are lower, and for the dAMSE and Hall (excluding parametrization: B = 10,000, ε = 0.955, kaux = ) methods, their maximum is below the 97.5th percentile. The PS method sets the optimal threshold much lower than the other methodologies (below the 95th percentile for the entire research period and sub-periods). The estimates of PS are relatively volatile, suggesting a difference from the fixed percentage of the total sample size, commonly presented in the literature. The PS method is based on a rather simple algorithm; thus, it can be easily implemented in a risk management process.