A Method for Confidence Intervals of High Quantiles

The high quantile estimation of heavy tailed distributions has many important applications. There are theoretical difficulties in studying heavy tailed distributions since they often have infinite moments. There are also bias issues with the existing methods of confidence intervals (CIs) of high quantiles. This paper proposes a new estimator for high quantiles based on the geometric mean. The new estimator has good asymptotic properties as well as it provides a computational algorithm for estimating confidence intervals of high quantiles. The new estimator avoids difficulties, improves efficiency and reduces bias. Comparisons of efficiencies and biases of the new estimator relative to existing estimators are studied. The theoretical are confirmed through Monte Carlo simulations. Finally, the applications on two real-world examples are provided.


Introduction
Extreme value analysis (EVA) was first introduced by Leonard Tippett (Fisher and Tippett, 1928 [1]). Tippett was working on how to make cotton thread stronger, he realized that the strength of the weakest threads were the only factor that matters when it comes to deciding the strength of the cotton thread. Nowadays, extreme value analysis is widely used in almost all fields, from engineering, social science, economics, traffic predictions to insurance and so on. People are interested in extreme events in these fields such as, the shortest life span of a new engine, the maximum appreciation of the stock market, the longest driving time on a highway at rush hour, or the biggest medical claim to an insurance company. The distributions of these extreme events are usually unknown. In general, EVA involves the extrapolation of an unknown distribution and its high quantiles. Estimating high quantile based on observation is very important in EVA, since it gives the corresponding value x for a very small exceeding possibility p.
There are certain risks, ones that are not decided by us or can barely be predicted until right before they are about to happen. This can include things such as an earthquake, terrorist attacks, a virus breakout, and so forth. For these events, we will need risk management which is in place to minimize, monitor, and control the impact of unfortunate events, or to maximize the realization of opportunities. Estimating the confidence interval of high quantiles plays an important role in risk management. Since a high quantile is located at the tail area, it heavily depends on the behaviour of the tail distribution, or from the statistical point of view, it depends on the k largest order statistics. This leads to the challenges of the instability in the choice of k, and the bias issues. There are many research on the mathematical models and theoretical studies in the literature for estimating confidence intervals of high quantiles, we review them in Section 2.
This paper proposes a new method to estimate high quantile of a heavy-tailed distribution. The new method has interesting improvements compared with other existing methods. This paper makes three main contributions to methodology.
(1) This paper proposes a new estimation method based on a geometric mean with good asymptotic properties. It is consistent and stable relative to the existing methods. The paper provides a computational algorithm which overcomes the mathematical difficulties and bias problems of the estimation of confidence intervals of high quantiles of a heavy tailed distribution.
(2) The Monte Carlo simulation studies on three heavy tailed distribution models: Fréchet (0.25), GPD (0.5) and GPD(2) (GPD: generalized Pareto distribution). The simulation results confirm that the proposed method is more efficient relative to the existing quantile estimators.
(3) This paper uses the proposed estimation method to predict extreme values in the flu in Canada, and gamma ray from solar flare examples. It is interesting to see that these data sets fit the GPD model very well. We apply the proposed method to estimate the confidence intervals of high quantiles. The numerical results show that the proposed method gives more efficient results compared with other existing methods.
In this paper, we review several existing high quantile estimators with their behavior in Section 2. We propose a new estimator for the confidence interval of high quantiles based on the geometric mean and explore its asymptotic properties in Section 3. To compare the new estimator with the existing estimators, Section 4 presents Monte Carlo simulation results and the improvement of the proposed quantile estimator relative to existing methods. In Section 5 we apply the proposed new method to construct confidence intervals of high quantiles on flu in Canada and gamma ray examples. Finally, conclusions and discussions are given in Section 6.

Existing Estimator for High Quantiles
Heavy-tailed distributions (de Haan and Ferreira, 2006 [2]) is important to extreme value events.

Definition 1. A random variable X is said to have a heavy tail distribution if its distribution function F(x) satisfies
where L(t) is a slowly varying function with lim t→∞ L(tx) L(t) = 1, for all x > 0. γ is the tail index.
Notice that we can have L(x) = (ln(x)) b , b ∈ R (de Hann and Ferreira, 2006, p. 362 [2]). Since L(t) behaves approximately as a constant c, for simplicity, we assume that a heavy tailed distribution satisfies Since the heavy tailed distributions decay slower than the exponential distributions and have longer tails. A tail function is defined as Definition 2. A tail function U(t) of any distribution function F(x) is defined as For the heavy tailed distribution in (1), we can rewrite the tail function as Definition 3. The quantile function Q(1 − p, γ) of a heavy tailed distribution F(x, γ) in (1) for a given probability 1 − p is defined by Value at Risk (VaR) is widely used in risk management. When p is very small, x 1−p becomes a high quantile as the pth value at risk, we define Also we can use the tail function in (2) to write VaR p,γ as The heavy-tailed models have a compulsory infinite right endpoint. In the case of negative observations in the model, the sample size should be exclusively the number of positive observations, n + , although a deterministic shift in the data is preferred by some authors, to work only with positive values. In this paper, we use the real line (0, ∞).
To estimate VaR p,γ , let X 1:n ≤ X 2:n ≤ ... ≤ X n:n be the order statistics from a random sample X 1 , X 2 , ...X n . We review the four high quantile estimation methods in the literature.

Quantile Function-Tail Index Method
For estimating high quantiles, we use the ln function, and estimate the tail index first ln Q To estimate high quantile function, we estimate the tail index first (Dekkers and de Haan 1989 [3]). Hill (1975) [4] estimator is a well known consistent estimator for tail index γ. Definition 4. Consider the order statistics X n−k:n , and k as an intermediate sequence of integers, Hill estimator is defined as where k = k n → ∞, k ∈ [1, n), k = o(n) as n → ∞. The Hill estimator γ H = H(k) in (5) used largest k order statistics of a random sample. Substitute γ H = H(k) defined in (5) into (4), then we obtain ln (1 − p)th high quantile as This estimator depends on k, small values of k provide high volatility whereas large values of k induce considerable bias. Hence, semi-parametric extensions may be considered for increasing the degree of freedom in the trade-off between variance and bias. Note that the tail index γ is a parameter of a given distribution, and a quantile of a distribution is a function of γ.

Weissman Method
Weissman (1978) [5] proposed the following semiparametric estimator of a high quantile We substitute γ H = H(k) in (5) into the function above, then we have, Without any prior indication on k, the Weissman estimator shows a large volatility as it depends on the fraction sample k. Although the minimization of the bias and MSE can be considered as a criterion to select k, it is impractical as they are unknown. Other methods for the selection of sample fraction k can be found in Beirlant et al. (1996) [6]; Dreea and Kaufmann (1998) [7]; Guillou and Hall (2001) [8]; Gomes and Oliveira (2001) [9].
The optimal k value through the tail index Hill estimator H, k 0 , is given by formula (15) in Section 2.4 Optimal k Values.

Reduced-Bias Method
Hall and Welsh (1985) [10] proposed a second-order expansion on the tail function U in (2) with C, γ > 0, ρ < 0, and β = 0. Where β is the scale second-order parameter and ρ is the shape second-order parameter.
Careiro et al. (2005, p. 122) [18] advises the the use of turning parameter τ in the estimation of ρ. It provides higher stability as functions of k, the number of the top order statistics used, for a wide range of large k value, by means of any stability criterion.

Optimal k Values
As discussed previously, we have problem that the estimation varies as the k varies, and it become very unreliable when k is large. Gomes and Pestana (2007) [17] suggested to use the numerically estimated optimal k values.
The optimal k for the tail index estimator through Hill estimator H(k) in (5) is k 0 , The optimal k for the semiparametric quantile estimator ln Q H (k) in (7), is k Q H 0 , The optimal k for the second-order reduced-bias quantile estimator ln Q   (14) should be larger than k 0 , is k 01 , By using these optimal k values, all the quantile estimators provide better results. However, with an unknown distribution, and estimated second-order parameters, these numerically estimated k values are not always accurate. Since all the quantile estimators are so sensitive to the k value, in this paper, we propose a new quantile estimator which does not depends on k.

New Estimator
Our goal is to improve the quantile estimators in Section 2. There are bias issues and difficult in determining k with the existing estimating methods. In order to overcome these problems, Huang (2011) [21] proposed a new quantile estimator which is the geometric mean of the reduced-bias quantile estimator in (14).
where X n−k:n is the (k + 1)th top order statistic, γ is any consistent estimator for γ, and Q stands for quantile function.
Based on (16), (20) can be written as where 0 < p < 1 and α is a constant that α ∈ R. αC p (k; β, ρ) is the adjustment term, where C p (k; β, ρ) is defined in (13) that reduces bias using the second-order parameters. α is a key value depends only on n to furthermore reduce the bias by observing the behavior of the second-order parameters. We will discuss the choice of α in Section 4.
Sections 3-5 will show that the new estimator ln Q

(p)
New, γ has good properties, and 1. The new quantile estimator ln Q

(p)
New, γ has the least bias, the smallest MSE and the highest efficiency.

The new quantile estimator ln Q (p)
New, γ is consistent and does not depend on k as the existing quantile estimators does.

The confidence interval based on the new quantile estimator ln Q (p)
New, γ is the most efficient compared to the existing methods, where it not only has the shortest length of the interval, but also has the highest probability coverage of the true value in most cases.  New,H has a asymptotic normal distribution

Condition 1 (C1). For intermediate k
The asymptotic mean, variance and efficiency of ln Q See Appendix A for the proof of Theorem 1. New,H in (19) is given by

The C.I. for The New Estimator ln Q
where z 1−α/2 is the (1 − α/2)th quantile of standard normal distribution, and See Appendix A for the proof of Theorem 2. (23), the main term ln Q

Computer Simulations of Quantile Estimators
To verify that the new estimator ln Q new,H has good properties, we use simulations and compare the new estimator to the existing estimators using the following statistics 1.
The expected value E[·].
for γ > 0, an estimator of the pth ln-high quantile function is

The Choice of α
As mentioned in Section 3, α is a key value to reduce the bias of the ln Q (p) New, γ defined in (19). We developed an algorithm to estimate α based on the results of m− simulation runs: Step 1: For a fixed sample size n, the α i (n) in ith iteration, i = 1, ..., m, m = 500, is the true solution of equation Note that α(n) depends on n. ln VaR p is the true lnVaR value.
Step 2: Obtain estimator α(n) based on the linear regression (LR) models where α is related to n. We collect data set (α j , n j ), j = 1, ..., l, with the sample size l.
Note that the estimate α(n) in (27) depends the parameters of the models and LR relationship with sample size n.

Remark 3.
If we assume α j in (α j , n j ), j = 1, ..., l, is normally distributed, based on (Bickel and Doksum, 2015, pp. 286-388) [24], then α(n) is a maximum likelihood estimator (MLE) and has an asymptotic normal distribution. Since the estimator α(n) only depands to n not related to the order statistics, it will not affect the asymptotic proprties of the proposed estimator ln Q   (27), we compare mean values, mean squared errors (MSE) and REFF of the four ln VaR estimators in Table 1, at optimal level k = k 0 based on (15) Note that the new estimator ln Q New, γ is defined as New, γ,i is the ln Q New, γ has the best performance with the least bias and RMSE. It does not change as k varies. Figures 2 and 3 are for GPD(0.5) and GPD (2), N = 500 iterations, sample size n = 1000, γ = 0.5 and 2, ρ = −γ, β = 1, p = 1/2n. We note that the new estimator ln Q new,H is the best estimator as well, with the least bias, consistency as k varies, and the smallest RMSE. Note that ln Q

Simulations of Confidence Intervals
By Gomes and Pestana (2007) [17], the 95% confidence interval of the true tail index using H is and the 95% confidence interval of the true tail index using H is Next, we compute the confidence intervals for the true ln-quantile by using the quantile estimators. We only use three out of four quantile estimators in Table 1, except ln q H which has the worst result. Therefore, we compare CIs only using ln Q H , ln Q H and ln Q new,H in (30), (31) and (23). Thus (1) The 95% confidence interval for the true ln VaR p using ln Q H is where LCL H (k), UCL H (k) is given in (28), and (2) The 95% confidence interval for the true ln VaR p using ln Q H is where LCL H (k), UCL H (k) is given in (29), and (3) The 95% confidence interval for the true ln VaR p using ln Q new,H is given in (24).
To compare new proposed CI in (23) to CIs in (30) and (31), we use evaluate the length and probability coverage of the CIs.
The length of CI is given as length of CI = UCL quantile estimator − LCL quantile estimator. and the efficiency of the length of 95% CI is given as    Also, the confidence interval is more efficient when it has a higher coverage of the true value under the simulations, where the probability coverage of 95% CI is defined as P.C. = number of 95% CI's contains the true value number of 95% CI's simulated in total * 100%. and the efficiency of the probability coverage of 95% CI is given as when EFF P.C. is bigger means it is more efficient. Figures 4-6 show the 95% confidence interval of the three ln-quantile estimators under Fréchet (0.25), GPD (0.5 and 2) with p = 0.0005. We compare the size of each confidence interval at their optimal k level, and the probability coverage of each confidence interval at their optimal k level. Recall, the optimal k level for ln Q H is at k Q H 0 based in (16), the optimal k level for ln Q H and ln Q new,H is at k 01 based in (15). Table 5 compare the efficiencies of 95% CI of the three quantile estimators under Fréchet (0.25), GPD (0.5 and 2). The efficiency of 95% CI can be compared by the length of CI and the probability coverage of CI, denoted by EFF length and EFF P.C. . new,H has the least bias, the smallest RMSE, and not depends on k too much. It also has the smallest length and the highest probability coverage in 95% confidence interval in most cases. The simulation results verify that ln Q new,H is the best quantile estimator among all three methods. Next section, we apply the new estimator ln Q

Applications
We will study two real-world examples in this Section. We are interested in the population that is above the threshold for each example. The goal is to estimate the (1 − p)th high quantiles of the example, where 0 < p < 1 is a very small. We use the four quantile estimators in table 1 ln q H , ln Q H , ln Q H and ln Q new, γ in (21), and compare their performances.
A. Procedure: Step 1: Choose and collect data of examples of real life extreme events.
Step 2: Run Goodness-of-Fit tests to check if data is heavy distributed.
Step 3: Estimate the high quantiles and construct the confidence intervals by using the new method and the existing methods.

We use α(n) in (27) for the new estimator ln Q (p)
new,H in (19) for the GPD model.

Remark 4.
In applications, the GPD is used as a tail approximation to the population distribution from which a sample of excesses x − µ above some suitably high threshold µ are observed. The GPD is parameterized by location, scale and shape parameters µ, λ > 0 and γ, and can equivalently be specified in terms of threshold excesses x − µ or, as here, exceedances x > µ, as three parameters (γ, µ, λ) GPD in (34) (de Zea Bermudeza and Kotz, 2010) [23], Traditionally, the threshold was chosen before fitting, giving the so-called fixed threshold approach (Pickands, 1975 [25], Balkema and de Haan, 1974 [26]). It is common for practitioners to assume a constant quantile level, determined by some assessment of fit across all or a subset of the datasets (Scarrott and McDonald, 2012, p.36 [27]). In our application, the threshold is pre-determined by physical considerations, that is, number of type A flu viruses detected weekly in Canada above the average in flu season, and the counts of gamma ray released from significant solar flares (M and X rated) during the Sun's active years. Although it is possible to make some arbitrary definition of the choice of the threshold, it is preferable not to become involved with such delicate question. The application of the proposed method is presented in both examples for illustrative purpose.

Flu in Canada Example
According to the WHO (World Health Organization, 2020 [28]), seasonal influenza is a common infection of the airways and lungs that can spread easily among humans. There are 37 million people in Canada, and flu season usually runs from November to April. Most people recover from the flu in about a week. However, influenza may be associated with serious complications such as pneumonia, especially in infants, the elderly and those with underlying medical conditions like diabetes, anemia, cancer, and immune suppression. On average, the flu and its complications send about 12,200 Canadians to the hospital every year, and around 3500 Canadians die. There are 3 types of flu viruses, A, B and C. Type A flu virus is the most harmful, and it is constantly changing and is generally responsible for the large flu epidemics. The 1918Spanish Flu, 1957Asian Flu, 1968Hong Kong Flu, 2009 Swine flu, and the most recent 2014 H5N1 Bird Flu are all type A flu. In this paper, we study type A viruses in Canada.
We collected the number of the type A flu viruses detected weekly in Canada, from 1 January 1997 to 31 December 2019, resulting in a sample size of n * = 994 weeks. According to the WHO, the average number of type A flu viruses detested per week in the flu season, November to April, is 953, for the past 10 years. We set 953 viruses/week as the threshold, which reduced our sample size to n = 111 weeks. Full data-set is available at http://apps.who.int/influenza/gisrs_laboratory/flunet/en. Figure 7a shows a Flu chart in n * = 994 weeks of type A flu viruses detected in Canada, and n = 111 weeks remaining after the threshold, of average 953 flu viruses. For each flu incubation period, a flu virus can last from one up to few weeks, that is why some arches are narrow and some arches are more bell shaped in this figure. The top three weeks are circled in the plot. Figure 7b shows a histogram of n * = 994 weeks data. We are interested in the 99% quantile, x 0.99 , such that 99% chance that the viruses detected in a given week would be less than this value, or equivalently, with a 1% possibility, the number of flu viruses detested in a given week would be in excess of this value. This information is useful for monitoring and studying the virus, also is helpful for medical organizations that deal with disease control and prevention, pharmaceutical availability, and hospital resource readiness, especially during a serious flu outbreak. x 0.99 is approximately located in the plot. In this paper, we propose a new estimate high quantiles method, and compare it with existing methods. Our interest is to find the 5%VaR and 1% VaR of the number of type A flu viruses detested in a week, and their 95% confidence intervals.

Goodness-of-Fit Test
Through data transformation Y i = X i −µ λ , i = 1, ..., n, n = 111. Take µ = 953 as the threshold, the maximum likelihood estimators (MLE) are λ MLE = 1275.97287 and γ MLE = 0.01345. Figure 8a is the log-log plot of GPD curve with the horizontal axis ln (x) against the vertical axis ln (P{x < X}). Visually the transformed data fit the one parameter GPD in (26) the best using γ MLE (red curve). Figure 8b shows the GPD density curve (red curve) fits the histogram very well. Beside visual view of Figure 8, we also carry on the three goodness-of-fit tests: the Kolmogorov-Smirnov (K-S) test (Kolmogorov, 1933 [29]), Anderson-Darling (A-D) test, and Cramér von Mises (C-v-M) test (Anderson-Darling, 1952 [30]). All three tests are based on the maximum vertical distance between the empirical distribution function and the observations, and the parent distribution function is the GPD.
The Hypothesis for all three tests is , for at least one value of x F(x) is the true but unknown distribution of the sample. F * (x) is the theoretical distribution, in our project, the parent distribution, GPD. S n (x) is the empirical distribution and step function of the sample. It is defined as Based on Table 6 goodness of fit tests' results, we set the GPD model for the flu in Canada data. We define the absolute errors (AE) in (34) and integrated errors (IE) in (35) as IE = 1 X n:n − X n−r+1:n X n:n X n−r+1:n (36) Table 6. The goodness-of-fit tests under the GPD model for the flu in Canada data. For both AE and IE, we use 3 different r values by letting r = n 10 th, r = n 2 th , and r = nth top statistics. Table 7 lists the AE and IE errors which are very small. Next, we estimate the high quantiles and their confidence interval for this example.

Compare Four Estimation Methods
We use the four estimators in Table 1: ln q H , ln Q H , ln Q H , and the new estimator ln Q new,H .
We use ρ τ (k) in (10), and β ρ 0 (k) in (11). To decide if the tuning parameter τ = 0 or 1, consider { ρ τ (k)} k∈k , for k ∈ k = ( n 0.995 , n 0.999 ), and compute their median x τ , then With n = 111, we get k ∈ k = (108, 110) and , conclude that τ = 0, thus we have ρ 0 (k 1 ) = −0.7101 and β ρ 0 (k 1 ) = 1.026571, where k 1 is the optimal k value. Figure 9 shows the results.  Figure 9a shows estimates of the second-order parameters ρ through ρ and ρ τ (k), τ = 0; Figure 9b shows Estimates β and β ρ 0 (k). Figure 9c shows the two estimated tail index, H, H, H = 0.4379 at its optimal level using k 0 = 21 based on (15) and H = 0.3736 at its optimal level using k 01 =42 based on (17). Figure 9d shows four quantile estimators of flu in Canada example, with p = 0.01. The full circles "•" in the plot are the values of the quantile estimators at their optimal k level. We note that ln Q new,H has a constant value, which does not depend on k. Figure 10 compares the confidence intervals of three quantile estimators in (7), (12) and (19). This figure shows that the new quantile estimator ln Q  In Table 8, we compare the four ln-quantile estimators and their mean, median, VaR 0.05 and VaR 0.01 . Table 9 compares the size of confidence intervals at ln VaR 0.01 and VaR 0.01 of the three quantile estimators. In Table 9, we compared Q H , Q H and Q new,H , the Q new,H has the shortest confidence interval with the highest efficiency of 2.2462.

Summary
Based on Figure 10 and Table 9, we conclude that the new estimator ln Q new,H in (19) is the best estimator for Flu in Canada example. We can predict that at VaR 0.01 , we expect 5500 type A flu viruses during a flu outbreak after threshold 953/week. This is shown in Figure 8b.

Gamma Ray of Solar Flare Example
Gamma ray has the most penetrating power among all the radiations. The burst of gamma rays are thought to be, due to the collapse of stars called hypernovas, the most powerful events so far discovered in the cosmos. The measurement of gamma rays are in counts, and it is the number of atoms in a given quantity of radioactive material that are detected by an instrument to have decayed. We have collected gamma ray data from solar flares, from November 2008 to September 2020, from NASA (National Aeronautics and Space Administration, 2020 [31]). Full data-set is available at http://hesperia.gsfc.nasa. gov/fermi/gbm/qlook/fermi_gbm_flare_list.txt.
The solar flare travels hundreds of miles per second, and can reach the Earth within hours. It can disrupt communication navigational equipment, damage satellites, and even cause blackouts by damaging power plants. In 1989, a strong solar storm knocked out the power grid in Québec, Canada, causing 6 million people to lose power for more than 9 hours, and it cost millions of dollars to repair. It can bring additional radiation around the north and south poles, a risk that forces airlines to reroute flights. The Fermi Gamma-ray Space Telescope was launched in late 2008 to explore high-energy phenomena in the Universe. It is worth noting that more than one trigger may have occurred during the flare, the one nearest the peak of the flare is listed, resulting in a sample size of 5128. Solar flares are classified as A, B, C, M or X according to the peak flux (in watts per square meter, W/m 2 ) of 1 to 8 angstrom (The angstrom is a unit of length equal to 1/10,000,000,000 (one ten-billionth) of a meter.) X-rays near the Earth, as measured on the GOES spacecraft. Gamma ray activity is correlated with the X ray activity, as shown in Figure 11 (NOAA, 2020 [32]. When the amount of gamma ray released is over 5 million counts, it usually corresponds to an X rated flare or significant M rated flares. Figure 11. Two weeks plot of gamma ray & X ray from July 2 to 16, 2012. Figure 12a shows a Gamma ray chart of n * = 5128 flares, and n = 104 flares remaining after the threshold of 86 million counts. The most powerful gamma ray was released in March 7, 2012 with nearly 1.5 billion counts, the sun was brightened by 1000 times, and became the brightest object in the gamma ray sky. The top three events are circled in the chart. Figure 12b shows a histogram of n * = 5128 flares. We are interested in the 99% quantile, x 0.99 , such that 99% gamma ray released from solar flares are under this value, or equivalently, with a 1% possibility, the amount of gamma ray a solar flare releases would be in excess of this value. During the spring and fall, the satellites that are used to detect solar flares experience eclipses, in which the Earth or the Moon blocks between the satellites and the Sun for a short period every day. Eclipse season lasts for about 45 to 60 days and ranges from minutes to just over an hour. The quantile estimation would provide useful predictions for these times. x 0.99 is approximately located in the plot since we do not know this value yet. We chose the threshold as the mean of the data from the peak period. The solar cycle is every 11.6 years, and the sun's activity peaked from 2011 to 2014. In Figure 12a we can see that the top 3 flares, in fact, almost 90% of the top 100 flares, are from the 2011 to 2014 time period. Taking the average of all the X rated and significant M rated flares from this peak period, we obtained a mean of 86 million counts, resulting in a remaining sample size of n = 104.
For the Gamma ray of solar flare example, our goal is to find out the high quantiles, specifically, the 5% VaR and 1% VaR of the amount of gamma ray a solar flare would release, and their 95% confidence intervals.

Gooness-of-Fit Tests
Similar as Flu in Canada Example, we set µ = 86 million, and obtain λ MLE = 171.0708592, and γ MLE = 0.2580384847. Figure 13a is a log-log plot of gamma ray data under GPD model, with the horizontal axis ln (x) against the vertical axis ln (P{x < X}). Figure 13b shows the histogram fits the GPD model. Next, we will perform three goodness-of-fit tests: Kolmogorov-Smirnov test, the Anderson-Darling test and the Cramér-von-Mises test. The results listed in Table 10, the data fits the GPD with γ MLE the best, nearly 59%. Table 10. Compare the goodness-of-fit tests under the GPD model for the gamma ray data. In Table 11, all the errors are less than 0.07 for AE, and less than 0.01 for IE. Next, we can compare the four high quantile estimators and their confidence intervals of this example.

Compare Four Estimation Methods
Similar as Example 1, we use the four quantile estimators in Table 1: ln q H , ln Q H , ln Q H , and the ln Q new,H .
We use ρ τ (k) and β ρ 0 (k), and τ = 0, thus we have ρ 0 (k1) = −0.7269 and β ρ 0 (k 1 ) = 1.0257, where k 1 is the optimal k value for the second-order parameters. The results are in Figure 14.  Figure 14a shows the estimates of the second-order parameters ρ and ρ τ (k), τ = 0. Figure 14b shows β and β ρ 0 (k). Figure 14c shows the two different tail index estimators, H, H. We have H = 0.5324 at its optimal level with k 0 = 21, H = 0.6517 at its optimal level with k 01 = 41. Figure 14d shows all four quantile estimators of gamma ray example, with p = 0.01. We note that ln Q new,H has a constant value which does not depend on k. Figure 15 compares the confidence intervals of our ln-quantile estimators in (7), (12) and (19). This figure shows that the new quantile estimator ln Q new,,H has the smallest confidence interval with length 1.4451, where we use α = ρ = −0.7269. The solid circles "•" in the plot are the values of the quantile estimators at their optimal k level. In Table 12, we compare all four quantile estimators under VaR 0.05 and VaR 0.01 . Table 13 compares the size of confidence intervals of ln VaR 0.01 and VaR 0.01 by three quantile estimators.  Table 13 shows that the new estimator has the shortest confidence interval, compared to lnQ H , and lnQ H , with the highest efficiency of 1.6016.

Summary
Based on Figure 15 and Table 13, we conclude that the new estimator ln Q new,H in (19) is the best estimator for Gamma Ray example. We predict that VaR 0.01 is a gamma ray release of 1102.57 million counts, this is most likely an X rated solar flare. This is shown in Figure 13b.

Conclusions
Based on the studies in this paper, we conclude that: 1. High quantile and its CI estimation provides important information for risk management and for extreme event predictions.
2. Based on the theoretical and simulation results, the proposed new method for estimating confidence interval of high quantiles has advantages properties comparing with other existing methods. The estimation is consistent and stable with less error. The proposed method provides a useful computational algorithm to the readers.
3. The confidence interval of high quantile obtained by the new proposed method also has the highest efficiency compared to the existing methods, in terms of having the smallest size of confidence interval, and the highest probability coverage of the true quantile values in most cases.
4. Based on the analysis of the two real-world examples, flu in Canada and gamma ray from the solar flare, we can see that the new proposed method can be applied to many more fields, including other extreme events such as insurance claims, natural disasters, stock market predictions and pandemic disease monitoring.  Data Availability Statement: Publicly available datasets were analyzed in this study. The datasets can be found here: http://hesperia.gsfc.nasa.gov/fermi/gbm/qlook/fermi_gbm_flare_list.txt [31] and https://www.who.int/influenza/gisrs_laboratory/flunet/en [28]. Furthermore, we use (21) and (A2), we obtain (22)