An Optimal Tail Selection in Risk Measurement

: The appropriate choice of a threshold level, which separates the tails of the probability distribution of a random variable from its middle part, is considered to be a very complex and chal-lenging task. This paper provides an empirical study on various methods of the optimal tail selection in risk measurement. The results indicate which method may be useful in practice for investors and ﬁnancial and regulatory institutions. Some methods that perform well in simulation studies, based on theoretical distributions, may not perform well when real data are in use. We analyze twelve methods with different parameters for forty-eight world indices using returns from the period of 2000–Q1 2020 and four sub-periods. The research objective is to compare the methods and to identify those which can be recognized as useful in risk measurement. The results suggest that only four tail selection methods, i.e., the Path Stability algorithm, the minimization of the Asymptotic Mean Squared Error approach, the automated Eyeball method with carefully selected tuning parameters and the Hall single bootstrap procedure may be useful in practical applications.


Introduction
Extreme value theory (EVT) is a branch of statistics dealing with extreme values. It focuses on the asymptotic behavior of extreme values of random variables instead of the whole distribution. Such an approach enables us to model extreme values more accurately. Nowadays, EVT is widely used in many disciplines such as hydrology, environmental sciences, engineering, insurance and finance (Akhundjanov and Chamberlain 2019;Embrechts et al. 1997;Embrechts and Schmidli 1994;Gilli and Këllezi 2006;Longin 2000;Loretan and Phillips 1994;Mcneil 1999;McNeil and Frey 2000;Roth et al. 2016;Smith 2003;Vilasuso and Katz 2000;Wang et al. 2010). One of the key problems in EVT is related to the search for the optimal tail fraction. The appropriate choice of the threshold (the beginning of a tail of distribution) is ambiguous but fundamental in the estimation of the tail index or parameters of a Generalized Pareto Distribution (GPD) and then in the calculation of tail-related risk measures, e.g., Value at Risk (VaR) or Expected Shortfall (ES).
While there are many concepts and approaches presented in the literature, none of them has been indicated to be suitable, and there is no single answer where the distribution tail begins. Searching for the tail of distribution is always a trade-off between bias and variance. If a threshold is specified too low, the method estimates indicate a high bias. The more the threshold is away from the tail, the more the empirical distribution of extrema deviates from the theoretical model. On the other hand, a too high threshold results in high variance of the model estimates, since not many data exceed the threshold (Coles 2001). Numerous authors apply a fixed percentile of the total sample size as the threshold, usually 10%, 7.5%, 5% or 1% of the upper statistics (Karmakar and Shukla 2014;Jones 2003;Bee et al. 2016;Fernandez 2005;Totić and Božović 2016;Gençay et al. 2003;Gençay and Selçuk 2004;Echaust and Just 2020a;Daníelsson and Morimoto 2000;Echaust 2021;Longin 2000;McNeil and Frey 2000). More sophisticated approaches to threshold selection are based The single bootstrap (Hall) procedure proposed by Hall, • The single bootstrap (Himp) procedure proposed by Caeiro and Gomes, • The double bootstrap (Gomes) procedure proposed by Gomes, Figueiredo and Neves, • The double bootstrap (Danielsson) procedure proposed by Danielsson, de Haan, Peng and de Vries.

Mean Absolute Deviation Distance Metric (MAD-Distance Metric) and Kolmogorov-Smirnov Distance Metric (KS-Distance Metric) Methods
The MAD-Distance metric (MAD Dis) and KS-Distance metric (KS Dis) methods presented by Danielsson et al. (2016) are based on minimizing the distance between the largest upper order statistics of the dataset, i.e., the distance between the empirical tail and the theoretical tail of a Pareto distribution. The tail index α of this distribution is estimated using the Hill estimator (Hill 1975): where k is the number of upper order statistics used to estimate parameter α. The distance, which is measured in the quantile dimension, is minimized with respect to k. The optimal number of the upper order statistics k is called k 0 . Determining k 0 is equivalent to specifying the number of extreme values (sample size in the tail). The MAD-Distance metric method uses the mean absolute deviation penalty functions: x n−j,n − q(j, k) and the KS-Distance metric takes the maximum absolute deviation penalty functions: where T > k is the region over which the distance metric is measured, x n−j,n is the empirical quantile, and q(j, k) is the quantile estimated by x n−k+1,n (k/j) 1/α k with the Hill estimator α k . In these approaches, the size of the upper tail is denoted by ts. The motivation for the selection of these methods is that the tail-related risk measures (e.g., VaR and ES) are concepts related to quantiles at a given probability level.

The Reiss and Thomas (RT) Procedures
Reiss and Thomas (2007) proposed alternative procedures to select the optimal number of data in the tail, k 0 . These approaches select the lowest upper order statistic by minimizing the expression (RT1): whereξ H i is the Hill estimator, and the tuning parameter satisfies 0 ≤ β ≤ 0.5. In practice, automated implementation of these procedures is unreliable for small k; thus, a minimum value of k, k min , is usually used (Scarrott and MacDonald 2012.)

The Path Stability (PS) Method
The Path Stability method is a heuristic algorithm proposed by Caeiro and Gomes (2016). The algorithm looks for a stable region of the path stability, i.e., the plot of a tail index with respect to k. The optimal number of data in the tail, k 0 , is identified by the algorithm that consists of six steps: First step. Given an observed data (x 1 , . . . , x n ), compute T(k) =ξ H k,n (k = 1, . . . , n − 1) using the Hill estimator.
Second step. Take j 0 as a minimum value of j, a non-negative integer, such that the rounded values, to j decimal places, of the estimates in the first step are distinct. Define a (T) k (j) = round(T(k), j)(k = 1, . . . , n − 1), the rounded values of T(k) to j decimal places.
Third step. Consider the set of k values associated to equal consecutive values of min the maximum and minimum values, respectively, of the set with the largest range. The largest run size is max , now with two additional decimal places, i.e., compute T(k) = a (T) k (j 0 + 2). Obtain the mode of T(k) and denote K T the set of k-values associated with this mode.
Fifth step. Obtaink T , the maximum value of K T . Sixth step. Finally calculateξ H|PS =ξ Ĥ k T ,n .

The Automated Eyeball (Eyeball) Method
Another heuristic algorithm is the automated Eyeball method proposed by Resnick and Stărică (1997) and explored in (Danielsson et al. 2016). The method finds a stable region in the Hill plot by defining a moving window. This method is based on a heuristic algorithm, which looks for a substantial drop in variance of the Hill plot as k is increased. The following estimator is used to select the optimal number of data in the tail: where w is the size of the moving window (usually 1% of the full sample); n + is the number of positive data; ε is the size of the range in which estimates may vary (e.g., 0.3); and h is the percentage of data inside the moving window, which should occur in the tolerable range (usually around 90%).

The Guillou and Hall (GH) Procedure
The procedure for choosing the optimal threshold when fitting the Hill estimator of a tail exponent to extreme value data proposed by Guillou and Hall (2001) is based on bias reduction. The optimal number of data in the tail, k 0 , in this method is identified by the procedure, which consists of three steps (e.g., see, Caeiro and Gomes 2016): First step. Given an observed data (x 1 , . . . , x n ), compute: (log x n−i+1:n − log x n−i:n )(1 ≤ i ≤ k < n).
Second step. Given c crit = 1.25, consider the choice Third step. Next, obtainξ GH =ξ GĤ k GH 0 ,n . Guillou and Hall (2001) investigated a wider range of c crit , but they suggest values between 1.25 and 1.5.

Minimization of the Asymptotic Mean Squared Error (dAMSE) Method
The minimization of the Asymptotic Mean Squared Error (dAMSE) method was presented by Caeiro and Gomes (2016). The optimal number of data in the tail, k 0 , in this method is identified by minimizing the Asymptotic Mean Squared Error (AMSE) of the Hill estimator with respect to k. This algorithm consists of five steps: First step. Given an observed data (x 1 , . . . , x n ), calculate for the tuning parameters τ = 0 and τ = 1, the values ofρ τ (k), that have the form: dependent on the statistics: (log x n−i+1:n − log x n−k:n ) j (j = 1, 2, 3). Second step. Consider K = n 0.995 , n 0.999 . Calculate the median of {ρ τ (k)} k∈K , denoted χ τ , and compute I τ = ∑ k∈K (ρ τ (k) − χ τ ) 2 (τ = 0, 1) .

The Hall and Welsh (HW) Procedure
Hall and Welsh (1985) proposed a procedure to identify the optimal number of upper order statistics for the Hill estimator by minimizing the AMSE criterion. This method is based on a similar algorithm as that in the dAMSE method but with a different estimation of second-order parameters dependent upon three tuning parameters, σ < τ 1 < τ 2 (Caeiro and Gomes 2016). The Hall and Welsh procedure is quite sensitive to changes in the tuning parameters, but the parameter values proposed by Hall and Welsh (σ = 0.5, τ 1 = 0.9, τ 2 = 0.95) seem to work well (Gomes and Oliveira 2001). This method may be presented in five steps (e.g., see Gomes and Oliveira 2001;Hall and Welsh 1985): First step. For σ = 0.5, τ 1 = 0.9, τ 2 = 0.95, obtain s = [n σ ], t 1 = [n τ 1 ], t 2 = [n τ 2 ]. Second step. Obtain where M (1) Ifk HW 0 / ∈ [1, n), select another method, otherwise continue. Fifth step. Finally, computeξ HW =ξ HŴ k HW 0 ,n .

The Hall Single Bootstrap (Hall) Procedure
In 1990, Hall introduced a bootstrap-based methodology for the estimation of the optimal tail fraction (see, e.g., Hall 1990;Gomes and Oliveira 2001). This method is applied to the Hall class of Pareto-type tails (Danielsson et al. 2016). Given a simple X n = (X 1 , . . . , X n ) from an unknown model F, and the functionalξ H k,n = φ k (X n ), 1 ≤ k < n, consider the bootstrap sample X * n 1 = X * 1 , . . . , X * n 1 , where n 1 ≤ n, from the model F * n (x) = 1 n ∑ n i=1 I {X i ≤x} , the empirical d.f. associated with the original sample X n . Given an initial value k, i.e., k aux , such thatξ H k aux ,n is a consistent estimator of ξ, Hall proposes the minimization of the bootstrap estimate of the Mean Squared Error (MSE) ofξ H k,n 1 : Next, the value k * 0 (n 1 ) is selected so that it minimizes MSE * (n 1 , k), and the tail fraction is determined from the formula: In the algorithm, the number of bootstrap repetitions is denoted by B.
Most often, k aux and n 1 = n ε are taken as 2 √ n and n 0.955 , respectively (Caeiro and Gomes 2016). Gomes and Oliveira (2001) noted that there is a disturbing sensitivity of the method to the initial value of k aux ; in turn, there is almost independence on n 1 . These facts contributed to the search for an alternative bootstrap methodology.

The Single Bootstrap (Himp) Procedure Proposed by Caeiro and Gomes
An improved version of Hall bootstrap methodology was introduced in Caeiro and Gomes (2014). They proposed a single bootstrap procedure consisting of five steps: First step. Given an observed sample (x 1 , . . . , x n ), compute the estimatesρ andβ of the second order parameters ρ and β as described in the dAMSE algorithm.
Second step. Consider a sub-sample size n 1 = o(n). For l from 1 until B, generate independently B bootstrap samples x * 1 , . . . , x * n 1 , of sizes n 1 , from the empirical distribution

The Double Bootstrap (Gomes) Procedure Proposed by Gomes, Figueiredo and Neves
The double bootstrap procedure described by Gomes et al. (2012) leads to an increased precision of the result with the same number B of bootstrap samples generated (Caeiro and Gomes 2016). This algorithm consists of five steps: Letξ H k,n denote the Hill estimator. First step. Given an observed sample (x 1 , . . . , x n ), compute the estimatesρ andβ of the second order parameters ρ and β as described in the dAMSE algorithm.
Third step. Denoting T * k,n i the bootstrap counterpart of T k, , the observed values of T k,n . For k = 2, . . . , n i − 1, and i = 1.2 compute The procedure proposed in Danielsson et al. (2001) for selecting the optimal sample fraction in tail index estimation simulates the AMSE criterion of the Hill estimator using an auxiliary statistic. For the AMSE, ξ is unknown. To solve this problem, the theoretical ξ in the AMSE formula is replaced with a control variate. Since a simple bootstrap is inconsistent in the tail area, a sub-sample bootstrap is used. Moreover, to be able to scale the sub-sample MSE back to the sample size, a second, even smaller sub-sample bootstrap is needed. The AMSE of the control variate is In this procedure, n 1 = n ε is the sub-sample for the bootstrap (the number of bootstrap repetitions is denoted by B). In the first step, the T function is minimized over two dimensions: n 1 and k 1 . In the next step, given the optimal n 1 and k 1 , a second bootstrap with a sub-sample size n 2 = n 2 1 /n is made to find k 2 . Finally, the optimal number of order statistics is determined as follows:

Data
We employed several approaches and compared the results of the optimal tail selection for forty-eight world indices computed using twelve methods with different parameters (56 different variants of tail estimation for each asset). The source of the price data is the Polish financial service stooq.pl. The country sample encompasses both developed and emerging markets and comes from different continents. We conducted the analysis for the period 2000-Q1 2020 and selected four sub-periods to demonstrate the robustness of our analysis with reference to the time period under consideration. The first sub-period covers 750 initial index quotes, the second sub-period covers the next 750 quotes, the third sub-period covers 750 index quotes starting in 2008 in the subprime crisis, and the fourth sub-period covers another 750 index quotes ending at the beginning of the COVID-19 crisis. This means that for each sub-period there are 749 log returns. In turn, the entire period ranges from 4520 to 5176 log-returns depending on the number of trading days in the market taken into account. The basic characteristics for the entire research period are given in Table 1. The average return ranges from −0.05% for Greece to 0.08% for Argentina. Higher volatility is found for the emerging markets, especially for Ukraine, Russia, Turkey and Argentina. Since our research period covers two crises, i.e., the global financial crisis   Table 2 presents the results of an optimal threshold selection for the left and right tails. More precisely, it shows the percentile ranks for the entire period. Tables 3 and 4 show the results for the first and third sub-periods. For the sake of brevity, we do not report the results for the second and fourth sub-periods. Note that the results for these periods do not differ substantially from those presented in Tables 3 and 4.   Not all methods perform well in finite samples. Some that perform well in simulation studies based on theoretical distributions may not be suitable in financial applications. We can distinguish several methods that produce very high threshold estimates and pick a small number of data in the tails. They include the MAD Dis, KS Dis, RT1, RT2, Eyeball, GH, Himp, Gomes and Danielsson approaches. These methods tend to produce thresholds above the 99th percentile (max. above 0.99); thus, they prevent an estimation of VaR or ES at commonly accepted confidence levels. In particular, the KS Dis, RT1 and RT2 for k min = 2, Eyeball with the tuning parameter w = 0.01, GH and Danielsson methods systematically produce high threshold estimates, since the median exceeds the 98th percentile. The findings support the results of Danielsson et al. (2016), who argued that the Eyeball and KS Dis methods tend to pick the threshold close to the maximum of distribution. For shorter time-series (in sub-periods), the RT1, RT2 for k min = 2 and Danielsson methods are the most restrictive and systematically set the threshold at too high levels to be used in financial applications. This finding is in line with the observation by Scarrott and MacDonald (2012), who pointed out that the RT1 approach is unreliable for a small k despite the weighting by i β . In turn, Reiss and Thomas (2007) suggested using alternative distance metrics or weighting schemes when we deal with limited data. Methods based on minimizing the asymptotic MSE, especially the bootstrap-based methods, do not perform well in empirical studies (Danielsson et al. 2016). Similarly, Ferreira et al. (2003) noted that these methods do not give satisfactory results for samples of size under approximately 2000.

Results of the Empirical Study
The other methods estimate the threshold in a more conservative way, lower than the 99th quantile. Such a choice of threshold guarantees enough data in the tail to calculate Value at Risk at the 99th confidence level. These are the HW, dAMSE, Hall (excluding parameters satisfying B = 10,000, ε = 0.955, k aux = √ n + ), Eyeball (w = 0.025, h = 0.9, ε = 0.3) and PS approaches. However, only two of them, i.e., the PS and HW algorithms, satisfy the Basel III requirements with reference to the Expected Shortfall, since their maxima are below the 97.5th percentile. Although HW performs well for long time-series, it fails in shorter series. In sub-periods, the optimizing procedure does not converge for several assets either for the upper or lower tails. Two algorithms, i.e., PS and HW, exhibit a very high range and standard deviation. It means that the threshold is highly volatile and cannot be substituted by a fixed percentile of the dataset as a threshold in practical applications. On the other hand, the Eyeball and dAMSE approaches produce less volatile estimations. The dAMSE method establishes the threshold close to the 95th percentile, while the Eyeball method is even more stable, but it indicates a much higher threshold, i.e., at the 98th percentile. Thus, we cannot decide which method is the optimal choice for financial applications. However, we can indicate those approaches which may be useful in practice. A majority of the analyzed methods, i.e., MAD Dis, KS Dis, PS, GH, HW, Eyeball, dAMSE, Gomes and Himp, indicate a higher left threshold than the right one. High returns might be perceived as extreme returns when they are positive, but they do not have to be recognized as extreme when they are negative. This result suggests asymmetry between the left and right tails. This result seems to be general since it holds in the entire period as well as the sub-periods.
Having specified the distribution of tails, the next step in our study comprises a comparison of the methods recognized as applicable in a risk measurement. Two different distances between threshold estimates are compared. The results for the entire research period are presented in Tables 5 and 6. Average absolute differences are shown in Table 5 and root mean squared differences in Table 6. As can be seen, the methods may be divided into two groups. The Eyeball, dAMSE and Hall methods show a relatively large deviation from the two other methods. Note that all the three above-mentioned methodologies utilize a low fraction (median above the 97th and below the 99th percentiles) of the total sample. The other group consists of the PS and HW methods. The distance between the threshold estimates is relatively close, and they both pick the threshold far away from the maximum (median below the 90th percentile). Danielsson et al. (2016) documented the similarity in choosing optimal sample fractions between the KS Distance and Eyeball methods and between the approach proposed by Drees and Kaufmann (1998) and the Danielsson method. However, they cast doubt on the applicability of the latter methods for real-world empirical estimations. The results for the right tails and the left tails do not indicate any significant differences, but approximately the same relative differences are preserved between the methodologies.

Conflicts of Interest:
The authors declare no conflict of interest.