Previous Article in Journal
The Rise of the Chaebol: A Bibliometric Analysis of Business Groups in South Korea
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Bayesian Estimation of Extreme Quantiles and the Distribution of Exceedances for Measuring Tail Risk

by
Douglas E. Johnston
Department of Applied Mathematics, Farmingdale State College, Farmingdale, NY 11735, USA
J. Risk Financial Manag. 2025, 18(12), 659; https://doi.org/10.3390/jrfm18120659
Submission received: 17 October 2025 / Revised: 12 November 2025 / Accepted: 15 November 2025 / Published: 21 November 2025
(This article belongs to the Special Issue Tail Risk and Quantile Methods in Financial Econometrics)

Abstract

Estimating extreme quantiles and the number of future exceedances is an important task in financial risk management. More important than estimating the quantile itself is to insure zero coverage error, which implies the quantile estimate should, on average, reflect the desired probability of exceedance. In this research, we show that for unconditional distributions isomorphic to the exponential, a Bayesian quantile estimate results in zero coverage error. This compares to the traditional maximum likelihood method, where the coverage error can be significant under small sample sizes even though the quantile estimate is unbiased. More generally, we prove a sufficient condition for an unbiased quantile estimator to result in coverage error and we show our result holds by virtue of using a Jeffreys prior for the unknown parameters and is independent of the true prior. We derive a new, predictive distribution, and the moments, for the number of quantile exceedances, and highlight its superior performance. We extend our results to the conditional tail of distributions with asymptotic Paretian tails and, in particular, those in the Fréchet maximum domain of attraction which are typically encountered in finance. We illustrate our results using simulations for a variety of light and heavy-tailed distributions.

1. Introduction

One of the key statistical tasks in many applications is the estimation of prediction intervals, or quantiles. In fields as diverse as target detection (Broadwater & Chellappa, 2010), communication systems (Resnick & Rootzeń, 2000), image analysis (Roberts, 2000), power systems (Shenoy & Gorinevsky, 2015), and hydrology (Nerantzaki & Papalexiou, 2022), estimating extreme quantiles is vital. In the field of finance, quantile estimation (i.e., VaR analysis) is a standard tool used by risk-managers, actuaries, and bank supervisors (Johnston & Djurić, 2011). In addition, quantile estimation is a key ingredient in other risk measures such as expected shortfall or the distribution of quantile exceedances (McNeil et al., 2005). The latter is particularly important since risk managers must assess not only the severity of extreme financial losses but also their frequency to insure adequate capital and regulatory compliance.
In finance, the underlying data typically exhibits extreme, or heavy-tailed, behavior where the underlying probability distribution decays according to a power-law. The exponent, or tail-index, is usually estimated from past data using methods such as maximum-likelihood or the method of moments (Smith, 1987; Hosking & Wallis, 1987) and analyses of those quantile estimators have been studied (Buishand, 1989). Once an estimate for the tail-index is obtained, it is typically used to make inferences regarding prediction intervals, or quantile exceedances, of future events and poor small sample size performance has been noted (S. G. Coles & Dixon, 1999). While Bayesian methods have been employed, the focus has primarily been on the adoption of prior information and the use of loss functions (S. G. Coles & Powell, 1996) or in the context of quantile regression (Yu & Moyeed, 2001). Estimating the distribution of exceedances over an order statistic, or a record, has also been studied (Gumbel & von Schelling, 1950; Wesolowski & Ahsanullah, 1998) where distribution-free methods have been derived.
Our research is related to this body of work and addresses the limitations in existing methods. First, we derive a quantile estimate that results in zero coverage error so that, regardless of sample size, the number of exceedances are unbiased. This is crucial for risk-assessment and an improvement over existing methods, such as the maximum likelihood approach, which are shown to understate the actual risk of a quantile exceedance. Second, we derive, in analytic form, the discrete distribution for the number of exceedances. This is a novel result and is used to analyze the risk of multiple exceedances, which, in financial markets, can lead to significant drawdowns and impairments. Other methods, such as using the empirical distribution, are suboptimal as they do not exploit the inherent model structure. Lastly, we extend our results to heavy-tailed distributions that are typical of financial market data. By definition, this results in small sample sizes which our approach is ideally suited for. Given the prime user of quantile estimates are risk managers, our research is a significant step forward in more accurate risk assessment.
Our paper proceeds as follows. In Section 2.1, we drive a Bayesian quantile estimate by marginalizing the predictive distribution of future samples over the parameter posterior given past observations. In particular, we show that for unconditional distributions isomorphic to the exponential, a Bayesian quantile estimate results in zero coverage error (ZCE), which means that a quantile estimate at the α -percentile will be exceeded, on average 100 ( 1 α ) % of the time. This compares to the traditional maximum likelihood method (MLM), where the coverage error can be significant under small sample sizes even though the maximum likelihood estimate is unbiased.
A similar quantile estimate was obtained in (Yu & Ally, 2009), although they derived their estimator by requiring zero-coverage error and then solving for the appropriate estimator. Our result holds by virtue of using a Bayeisan approach with the Jeffreys prior for the unknown parameters and is independent of the true prior for the model parameters. This illustrates the probability matching criteria of the Jeffreys prior in computing Bayesian prediction intervals (Datta et al., 2000). In addition, we prove a sufficient condition for an unbiased quantile estimator to result in coverage error. Given that most distributions encountered in practice satisfy the condition, this implies that seeking an unbiased estimator is not ideal.
In Section 2.2, we derive an expression for the distribution, and moments, of future exceedances. We obtain a new discrete distribution, which we term B E G and it resembles the distribution for record exceedances found in (Wesolowski & Ahsanullah, 1998; Bairamov, 1996) although in our case we deal with the exceedance over a quantile estimate, which is not necessarily related to past records. We compare the B E G distribution to that obtained with the MLM and in (Gumbel & von Schelling, 1950), which is for exceedances over an order statistic. The latter is a distribution-free method but only readily applies for levels of α associated with empirical quantiles. As shown in (Hall & Rieck, 2001), such non-parametric methods can be improved on to allow extrapolation to extreme quantiles, but they do not result in ZCE. Our method not only results in ZCE but has lower moments and variance. In Section 2.3, we show that our results can be extended to any distribution that can be transformed into the exponential, which includes for example, the Rayleigh and standard Pareto distributions.
In Section 3, we apply our results to the conditional tail of distributions that are in the Fréchet maximum domain of attraction, which includes many heavy-tailed distributions encountered in finance and risk management. We model the point process of exceedances over a threshold u, as a Poisson process where the predictive posterior distribution for the number of future exceedances is a negative binomial and the exceedance level is a Lomax (Pareto 2) distribution. From this, we derive the Bayesian estimate with zero coverage error. Lastly, in Section 4, we illustrate the performance for a variety of distributions, both heavy-tailed and not.

2. Unconditional Quantile Estimation

In this section, we derive unconditional, predictive quantile estimates and the exceedance distribution for the exponential case, making use of all the available data. In particular, we assume there are n past observations from which we wish to make inferences about N future observations. We show that not only is the derived predictive interval using the Bayesian method superior, but can be applied to other distributions that are transformable to the exponential.

2.1. Bayesian and ML Quantile Estimation

Let X 1 , , X n X 1 : n and Y 1 , , Y N Y 1 : N be independent random variables with common distribution function F ( τ / λ ) = 1 e λ τ . From the past observations, x 1 : n , we estimate quantiles of future observations, y 1 : N , based on a predictive cdf F ^ ( y / x 1 : n ) that is inverted at the level α ( 0 , 1 ) so that η ^ α = F ^ 1 ( α / x 1 : n ) . One traditional method to compute a predictive cdf is the maximum likelihood (ML) method (MLM) which uses the underlying cdf of the training data, F ( x / λ ) , where the ML estimates for unknown parameters are substituted. For the exponential case, this results in
F ^ ( y / x 1 : n ) = 1 e λ ^ y , λ ^ = n x i ,
where the summation, x i , runs from i = 1 to n. The ML quantile estimate is then derived as
η ^ α , M L = log ( 1 α ) λ ^ = log ( 1 α ) n x i .
and the MLM estimate is unbiased,
E { η ^ α , M L } = log ( 1 α ) n E { x i } = log ( 1 α ) λ = η α ,
and η ^ α , M L converges to η α in probability.
A Bayesian approach is to compute the expected cdf under the posterior for λ , P ( λ / x 1 : n ) , which is then inverted to obtain quantile estimates. For n i.i.d. samples from Exp ( λ ) and Jeffreys prior ( P ( λ ) 1 / λ ), the posterior is a gamma distribution, Γ ( n , x i ) and the predictive cdf is then defined as
F ^ ( y / x 1 : n ) = E λ / x 1 : n [ F ( y / λ ) ] = F ( y / λ ) P ( λ / x 1 : n ) d λ
which is the expected cdf marginalized over the posterior. Substituting in the distributions we have
F ^ ( y / x 1 : n ) = ( x i ) n Γ ( n ) 0 ( 1 e λ x ) λ n 1 e λ x i d λ
and the predictive posterior is
F ^ ( y / x 1 : n ) = 1 1 ( 1 + y x i ) n .
This is a Pareto II (Lomax) distribution which we invert to obtain
η ^ α , B a y e s = 1 ( 1 α ) 1 / n 1 x i
and we can compare to the unbiased ML
η ^ α , M L = log 1 ( 1 α ) 1 / n x i .
Both estimators can be written as Ψ x i , with Ψ > 0 , and, since log ( x ) < x 1 for x 1 , η ^ α , B a y e s > η ^ α , M L with 0 < α < 1 and, therefore, the Bayesian quantile estimate is biased. That said, η ^ α , B a y e s η ^ α , M L as n so it is asymptotically unbiased and converges in probability to the true quantile.
The convergence of the quantile estimators implies that the conditional probability of a future sample, Y, exceeding η ^ α is e λ η ^ α and, since η ^ α η α = log ( 1 α ) λ as n , we have P [ Y > η ^ α / x 1 : n ] 1 α . By the law of iterated expectations, we can extend this to the unconditional probability of an exceedance. Given enough samples, either quantile estimate can be reliably used; however, for small or even modest sample sizes and high quantiles, the quantile estimates and their coverage error can be substantially different.
Given a quantile estimate that is a function of our data, η ^ α ( x i ) , we wish to compute E [ N α ( y 1 : N ) ] , the expected number of test samples that are greater than η ^ α ( x i ) . For convenience, we drop the explicit dependence of η ^ α and N α on the samples x 1 : n and y 1 : N , respectively, and use Σ i n x i . The expectation is with respect to the training and test samples as well as the model parameter,
E [ N α ] = E λ , Σ , y 1 : N [ N α ] = E λ , Σ E y 1 : N / λ , Σ [ N α ] .
Conditioned on λ , Σ , the cdf of a test sample is F ( y / λ , Σ ) and N α is a binomial random variable with
E y 1 : N / λ , Σ [ N α ] = N ( 1 F ( η ^ α / λ , σ ) ) = N P ( Y > η ^ α / λ , Σ )
and therefore
E [ N α ] = N λ , Σ ( 1 F ( η ^ α / λ , Σ ) f ( λ , Σ ) d λ d Σ = N P ( Y > η ^ α )
where f ( λ , Σ ) is the joint density of the training samples and model parameters. We integrate over Σ first using f ( λ , Σ ) = f ( Σ / λ ) f ( λ ) where f ( λ ) is the true, unknown prior distribution for λ , not the prior used to form the quantile estimate. Therefore,
E [ N α ] = N λ Σ ( 1 F ( η ^ α / λ , Σ ) f ( Σ / λ ) d Σ f ( λ ) d λ
and, since both quantile estimators are of the form Ψ Σ , we can write
E y 1 : N / x 1 : n , λ [ N α ] = N e λ Ψ Σ .
We first marginalize over Σ conditioned on λ which has a gamma distribution, f ( Σ / λ ) Γ ( n , λ ) , to solve the inner integral of Equation (12) as
0 e λ Ψ Σ λ n Γ ( n ) ( Σ ) n 1 e λ Σ d Σ = 1 ( Ψ + 1 ) n .
Interestingly, this integral does not involve λ , which we have yet to marginalize over and, therefore, regardless of f ( λ ) , we can state
E [ N α ] = N ( Ψ + 1 ) n .
Further, since the Bayesian quantile estimator has Ψ = 1 ( 1 α ) 1 / n 1 we would obtain
E [ N α ] = N ( 1 α ) = N P ( y > η ^ α , B a y e s )
which shows that the Bayesian quantile estimator has zero coverage error (ZCE) with P ( y > η ^ α , B a y e s ) = 1 α , for all n. It is worth noting that achieving zero coverage error is achieved by virtue of using the Jeffreys prior in forming the Bayesian estimate. The ML quantile estimator, while unbiased, results in E [ N α ] = N P ( y > η ^ α , M L ) > N ( 1 α ) . That is, under repeated sampling, the ML quantile estimate will produce too many exceedances.
Figure 1 shows the expected exceedences using the MLM quantile estimator versus the number of training samples, n, for α = 99 % , 99.9 % , and 99.99% where the number of test samples was chosen to be N = 1 / ( 1 α ) . In this case, the Bayes estimator will have E [ N α ] = 1 so we may justifiably call the Bayesian quantile estimate a one in N-year event. We can see that the MLM’s probability coverage error persists and that a large number of training samples ( n 250 ) are needed for effective high-quantile estimation.
A natural question to ask is under what conditions will P ( y > η ^ α ) > ( 1 α ) . A sufficient condition is if the quantile estimator is unbiased and the cdf F ( . ) is analytic and concave. Since the distribution tails we typically encounter have either exponential or power-law decay, such as the Pareto, the assumptions on F are fairly benign.
Since N α is a binomial RV, E [ N α / η ^ α ) ] = N ( 1 F ( η ^ α ) ) . If we assume that F is analytic and concave then
F ( η ^ α ) < F ( η α ) + F ( η α ) ( η ^ α η α ) η ^ α η α
and
F ( η ^ α ) = F ( η α ) + F ( η α ) ( η ^ α η α ) + R ( η ^ α ; η α )
with the residual R ( η ^ α ; η α ) < 0 . Given
E [ N α ] = E η ^ α [ E [ N α / η ^ α ] ] = N E η ^ α [ 1 F ( η ^ α ) ] ,
we have
E [ N α ] = N E η ^ α [ 1 ( F ( η α ) + F ( η α ) ( η ^ α η α ) + R ( η ^ α ; η α ) ) ] .
Since the estimate is assumed unbiased, this results in
E [ N α ] = N ( 1 F ( η α ) E [ R ( η ^ α ; η α ) ] ) .
Since R ( η ^ α ; η α ) < 0 η ^ α η α then as long as P ( η ^ α η α ) > 0 , E [ R ( η ^ α ; η α ) ] < 0 and
E [ N α ] > N ( 1 F ( η α ) ) = N ( 1 α ) .
Therefore, under mild conditions, an unbiased quantile estimator results in probability coverage error.

2.2. The Distribution of Exceedances

Similarly to computing E [ N α ] , we can calculate P ( N α = k ) for k = 0 , 1 , N . Conditioned on Σ and λ , N α is a Binomial RV B i n ( k ; N , p ) with p = P ( Y > η ^ α ) = e λ Ψ Σ . Thus,
P ( N α = k / Σ , λ , N ) = N k ( e λ Ψ Σ ) k ( 1 e λ Ψ Σ ) N k
which, as before, we integrate over the joint density f ( Σ , λ ) = f ( Σ / λ ) f ( λ ) . First integrating over f ( Σ / λ ) yields
0 N k ( e λ Ψ Σ ) k ( 1 e λ Ψ Σ ) N k λ n Γ ( n ) ( Σ ) n 1 e λ Σ d Σ
and using the binomial expansion produces an expression independent of λ and therefore
P ( N α = k ) = N k j = 0 N k ( 1 ) N k j N k j 1 ( Ψ ( N j ) + 1 ) n .
We term this the Binomial-Exponential-Gamma (BEG) distribution, BEG ( k ; n , N , α ) , where Ψ depends on the quantile estimation method. The k’th moment for the BEG distribution can be derived as
E [ N α k ] = i = 0 k k i N i ̲ ( i Ψ + 1 ) n
where k i are Stirling numbers of the second kind and N i ̲ is the i’th falling power of N. We note that all Bayes moments are lower than with the MLM. Also, given the true binomial distribution of exceedances has moments
E [ N α k ] = i = 0 k k i N i ̲ ( 1 α ) i ,
the Bayesian choice for Ψ can be viewed through the lens of the method of moments, where we match the mean of the exceedance distribution. We can derive the variance of the number of exceedences as
V a r [ N α ] = E [ N α ] ( 1 E [ N α ] ) + N ( N 1 ) ( 2 Ψ + 1 ) n .
On the left of Figure 2 are the theoretical distributions, B E G ( k ; 50 , 100 , 99 % ) , for both the Bayes and ML quantile estimators with n = 50 training samples, N = 100 test samples and α = 99 % . For all k, the P ( N α > k ) is higher for the ML-derived B E G distribution with E [ N α ] = 1.22 and V a r [ N α ] = 2.11 compared to the Bayesian-derived B E G distribution with E [ N α ] = 1 and V a r [ N α ] = 1.46 . We note that for this case, since n < N and α = 1 1 / N , an empirical quantile estimate using the highest-order statistic is not directly applicable.
On the right of Figure 2 are the B E G ( k ; 100 , 100 , 99 % ) distributions along with the GvS distribution, w ( 100 , 1 , 100 , k ) , of rare exceedances (Gumbel & von Schelling, 1950), which is a distribution-free estimate using, in this case, the first-order statistic for the α -quantile since n = N = 1 / ( 1 α ) . While the GvS distribution has mean 1, its variance is 2 compared to the Bayesian-derived B E G distribution which has variance 1.21. The GvS distribution has higher probabilities for a larger number of exceedences, which is not totally surprising given it is a distribution-free, empirical, quantile estimator. The-ML derived B E G distribution results in subpar perfomance with E [ N α ] = 1.11 and V a r [ N α ] = 1.48 .

2.3. Extending the Exponential Distribution

In general, if Z is from a distribution that can be transformed by a one-to-one transformation to X = h ( Z ) E x p ( λ ) , then we can produce quantile estimates from the transformed data and η ^ α Z = h 1 ( Ψ h ( z i ) ) . Because h is one-to-one, η ^ α Z = h 1 ( η ^ α X ) or P ( Z > η ^ α Z ) = P ( X > η ^ α X ) . In addition, since Z > η ^ α Z iff X > η ^ α X , we have N α X = N α Z and, therefore, E [ N α X ] = E [ N α Z ] . Therefore, our results apply to any distribution that is isomorphic to the exponential distribution and the distribution of exceedances will be invariant.
The MLM estimate of the quantile itself is invariant under the 1-1 RV transformation as long as the transformation does not depend on the parameter. Given f ( x / λ ) = f ( z / λ ) | d z / d x | evaluated at y = h 1 ( x ) , if | d z / d x | is just a scale factor, independent of λ , then the MLM estimate is λ ^ M L = 1 / X ¯ = 1 / h ( Z ) ¯ so that
η ^ α , M L Z = h 1 log ( 1 α ) n h ( z i )
and the MLM, like the Bayesian method, is invariant under h although, in general, the MLM quantile estimate will no longer be unbiased.
More generally, if Z is a random variable with cdf F ( z ) = 1 e λ h ( z ) with λ > 0 , h ( z ) 0 , and h strictly increasing, or F ( z ) = e λ h ( z ) with h strictly decreasing, and the support is [ u , ) and h(u) = 0 (or ( u , ) and lim w u h ( w ) = 0 ), then a ZCE quantile estimate is constructed as η ^ α Z = h 1 ( Ψ h ( z i ) ) with Ψ = ( 1 α ) 1 / n 1 or Ψ = ( α ) 1 / n 1 , respectively.
In the first case, we can show that X = log ( 1 F 1 ( Z ) ) E x p ( λ ) where F 1 ( · ) is F ( · ) with λ = 1 . In this case, X = log ( 1 ( 1 e h ( z ) ) ) = h ( z ) and d X / d Z = h ( x ) so by the usual change in variables
f X ( x ) = f Z ( z ) | d X / d Z | = λ h ( x ) e λ h ( h 1 ( x ) ) h ( x ) = λ e λ x
and similarly for the second case. An example with numerous applications is the Rayleigh distribution, f Z ( z ) = z σ 2 e z 2 / 2 σ 2 , where h ( Z ) = Z 2 . Another example is the standard Pareto distribution, with tail-index 1 / λ ,
F ( z ) = 1 z u λ = 1 e λ log ( z / u ) z u
so that h ( Z ) = log ( Z / u ) and a ZCE quantile estimate is
η ^ α Z = u e Ψ log ( z i / u ) Ψ = ( 1 α ) 1 / n 1
assuming u is fixed. In the next section, we use this result to model the conditional tail of distributions that have asymptotic Paretian tails such as those in the Fréchet maximum domain of attraction.
It is worth noting that if Z F ( Z ) can be transformed to a uniform distribution, U [ 0 , 1 ] , then log ( F ( Z ) ) E x p ( 1 ) , hence a ZCE quantile estimate can be formed. This approach, suggested by (Yu & Ally, 2009), can be used when the parameters of the cdf, F, are known such as if the scale parameter λ = 1 .

3. ZCE Quantile Estimation for Paretian Tails

In this section, we extend the results from Section 2 to heavy-tailed distributions, with power-law decay, that are commonly used in financial risk-management. Using standard results from Extreme Value Theory, we model the exceedances above a high threshold as a marked Poisson process with Pareto exceedances and derive the ZCE quantile estimate, and exceedance distribution, using the Bayesian method of marginalizing over the Pareto and Poisson parameters.

3.1. Extreme Value Theory

The theory of extremes is a well-established branch of probability theory and statistical science (S. Coles, 2004; Gumbel, 1958). Consider the maximum, M n = max { X 1 , X 2 , , X n } , of i.i.d. random variables (RVs) with common distribution function (cdf) X G ( x ) . If the cdf of M n G n ( x ) , properly normalized, converges to a non-degenerate distribution, the Fisher–Tippet–Gnedenko theorem states that it is in the generalized extreme value (GEV) family with tail-index ξ (Embrechts et al., 2003). This distribution has the Jenkinson–von Mises representation,
H ξ ( y ) = exp ( ( 1 + ξ y ) 1 / ξ ) ,
and we say G is in the maximum domain of attraction of H ξ denoted as G M D A ( H ξ ) . Our interest lies with ξ > 0 , y 1 / ξ , resulting in the standard Fréchet distribution which is the limiting class for many underlying heavy-tailed distributions. By the convergence in type theorem, all normalizing sequences converge to the same limit so, more generally, we can state G M D A ( H ξ , μ , β ) with location μ and scale β .
The excess distribution, or conditional tail, is defined as G u ( x ) = P ( X u x | X > u ) and the Pickands–Balkema–de Haan theorem states that there exists a positive-measurable function β ( u ) such that
lim u sup x 0 | G u ( x ) G ξ , 0 , β ( u ) ( x ) | = 0
if and only if G M D A ( H ξ ) where G ξ , μ , β is the generalized Pareto distribution (GPD) with the same tail-index as the GEV (Balkema & de Haan, 1974; Pickands, 1975).
Once again, more generally, if G M D A ( H ξ , μ , β ) , the conditional tail distribution, for x u , is
G u ( x ) = P X < x | X > u 1 1 + x u β / ξ + u μ 1 / ξ G ξ , u , β + ξ ( u μ ) .
and a simplifying assumption (Johnston & Djurić, 2020) is that the support of M n is [ 0 , ) , or μ = β / ξ , and the conditional tail reduces to a standard Pareto,
G ξ , u ( x ) = 1 x u 1 / ξ x u ,
with scale parameter u. In addition to reducing the parameter set, the Pareto RV can be transformed to an exponential distribution which is the basic assumption used for traditional methods of tail index, and a large quantile estimation (Dekkers & de Haan, 1989; Hill, 1975; Pickands, 1975).
Given a high threshold, u, the point process of exceedances converges to a compound Poisson process and is commonly referred to as the ‘Peaks over Threshold’ (POT) model (Leadbetter, 1991). Methods for threshold selection (Northrop et al., 2017) or marginalization of u (Johnston & Djurić, 2021) have been investigated, although often the threshold is set to include a fixed number of high exceedances (Buishand, 1989).

3.2. Data Definition and Solution

The underlying data consist of n ˜ “blocks” with the i’th block having length m i and, to be concrete, we assume each block is a year. The exceedances { x 1 , , x n } are culled from the data by choosing the threshold u equal to the ( n + 1 ) ’st largest-order statistic. This results in exceedances that are in the top 100 × n n ˜ m ¯ percentile with m ¯ the average number of observations per year. In some fields, such as hydrology, the resulting samples are referred to as the partial duration series, or annual exceedance series when n = n ˜ . On average, there are n n ˜ observations per year and, if m ¯ is large enough, the POT model is justified.
Since a Pareto RV is isomorphic to an exponential RV, we denote the exceedances as x 1 : n = { ln ( x 1 / u ) , , ln ( x n / u ) } , which are i.i.d. and have distribution F ( x / λ ) Exp( λ ) with λ = 1 / ξ . Furthermore, the number of exceedances are Poisson, Poi( λ u ), so that our observations, or training data, ( n , x 1 : n ) , form a marked Poisson process with both parameters, λ and λ u , unknown. Given N future years, we let y 1 : N u denote the N u threshold exceedances of u and N α the number of those exceedances above η ^ α . Typical levels for α range from 99% to an extreme 99.99%. As before, we wish to determine the unconditional distribution of future exceedances of η ^ α or P ( N α = k ) for k = 0 , 1 , N u and, in particular, E [ N α ] .
Given our u threshold, N u is unknown but follows a Poisson process with rate parameter λ u . The maximum likelihood estimate for E [ N u ] would simply be N u ^ = n n ˜ N based on a Poisson posterior with λ u ^ = n n ˜ . Given the partial duration series, with n samples in n ˜ years, the posterior, P ( λ u / n ) is a Γ ( n + a , n ˜ + b ) distribution, using a Γ ( a , b ) conjugate prior, and the predictive distribution is therefore a negative binomial
N u NB n + a , n ˜ + b n ˜ + N + b .
Using a Jeffreys prior, with a = 1 / 2 , b = 0 , results in
E [ N u ] = n n ˜ N ( 1 + 1 2 n )
and if n = n ˜ > > 1 then E [ N u ] N .
Given the results from Section 2.1, we can write the conditional expectation for the number of quantile estimate exceedances as
E [ N α | N u ] = N u ( Ψ + 1 ) n
and, since we desire a ZCE quantile estimator with P ( y > η ^ α , B a y e s ) = 1 α , our Bayesian estimate requires
Ψ = n n ˜ ( 1 + 1 2 n ) 1 α 1 / n 1
so that the unconditional estimate E [ N α ] = 1 and, as before, α is the annual quantile. We note that if n = n ˜ > > 1 , then Ψ ( 1 α ) 1 / n 1 as for the unconditional estimate. If n = n ˜ but small, then
Ψ ( 1 + 1 2 n ) 1 α 1 / n 1
and η ^ α is higher than the unconditional estimate to account for the added uncertainty in the number of u exceedances, N u . If n < n ˜ but large then
Ψ n / n ˜ 1 α 1 / n 1
and η ^ α is lower than the unconditional estimate because the threshold u is lower. In effect, we do not need as high of a conditional quantile to achieve the equivalent annual, unconditional, quantile. The ML estimate would result in a similar value of Ψ as for the unconditional distribution
Ψ M L = n n ˜ N 1 α 1 / n 1 .
The unconditional distribution for exceedance count is
P ( N α = k ) = N u = k N m B E G ( k ; n , N u , α ) × N B ( N u ; n + 1 2 , n ˜ n ˜ + N )
where the upper limit is the total number of test samples. In practice, the summation can be truncated and simulation results show that using N u = E [ N u ] = N does not affect the exceedance distribution significantly.

4. Simulation Results

To illustrate the performance of the Bayesian and MLM quantile estimate, we began with a simulation of a standard Pareto RV with tail-index ξ = 0.3 , an example of which is shown in Figure 3. We used n ˜ = 50 with m ¯ = 100 samples per year. The threshold u was chosen to select the n = 50 extreme events highlighted in the Figure. Given the underlying data are Pareto, we might expect the Bayesian method to perform well and, also, we could apply our results directly to all of the data if desired. We assumed N = 100 years and we performed 10,000 runs to gather statistics.
Figure 4 shows the theoretical B E G ( k ; 50 , 100 , 99 % ) distribution along with the observed frequency distribution from the simulation using the Bayesian quantile estimator. The results agree and, while the expected number of exceedances is one, there is a significant probability of more than one exceedance. In some applications, where an exceedance implies a crititcal failure, this point may be moot, but the exceedance distribution is insightful for risk managers and policy makers to consider. In fact, a more desirable quantile estimator might be to solve for Ψ in Equation (25) such that P ( N α = 0 ) is a fixed value (e.g., 50%).
Figure 5 (left) shows the simulation exceedance count divided by the number of simulation runs for the prediction intervals of 1% from 90% to 99% with N = 100 for both the Bayes and the MLM quantile estimators. The Bayesian estimator performs as advertised with an average number of 1 sample per simulation run which compares the MLM, which underestimates the quantile, in terms of coverage error, resulting in excessive exceedances.
Figure 5 (right) shows the more stressful case with prediction intervals of 0.1% from 99% to 99.9% with N = 1000 . Once again, the Bayesian results in ZCE while the MLM performs rather poorly, particularly at the highest 99.9% quantile. While such a stressful scenario of trying to estimate a 99.9% quantile given only 50 samples may seem unlikely, we note that similar estimates have been required in response to natural disasters (van Dantzig, 1954).
As expected, our method works well when the underlying distribution has a Paretian tail. To explore how the estimator performs when the underlying distribution has an asymptotic Paretian tail, we ran the simulation using a stable α s symmetric (S α S) distribution for different values of the characteristic exponent α s . We do not necessarily advocate using our method if an S α S distribution is appropriate and, in that context, estimating α s and quantiles directly is advised (DuMouchel, 1983). As in our first example, we used n ˜ = 50 with m ¯ = 100 samples per year resulting in 5000 total samples. We used N = 100 years with α = 99 % , and we performed 10,000 runs.
As can be seen in Figure 6, the performance of our quantile estimate is sensitive to the number of tail-samples chosen (n) as α s 2 . This is not surprising, as the inflection point, beyond which a Paritian tail can be deemed appropriate, increases as α s 2 (Fofack & Nolan, 1999). This results in a need to reduce the number of tail-samples (increase threshold u) to 0.1% of total samples although this comes with the cost of higher variability unless the total number of samples ( n ˜ m ¯ ) can be increased.
For values of α s < 1.5 , the method performs well for all values of n with n = 50 or 1% of the total samples resulting in the best performance with the lowest variability. Since α s < 1.5 implies a heavy-tail with most of the distribution having power-law decay, there is less bias introduced to the quantile estimate. When α s = 2 , a normal distribution not in the Fréchet domain, the sensitivity of the quantile to outliers is evident as is documented in the literature (Weron, 2001).
To further investigate the performance of the Bayesian quantile estimator, we applied it to different underlying distributions with statistics reported in Table 1. Similarly to the results when α s = 2 , the estimator is too conservative for thinner, exponential-tailed distribution such as the exponential or lognormal. These distributions, which belong to the Gumbel maximum domain of attraction, require less tail-samples, about 0.1% of total samples, to avoid outlier-bias issues. This may not be that surprising as we are fitting a power-law tail to exponential decay or, effectively, using the wrong tool for the job.
The method works well for the standard Pareto even when the tail index is closer to 0 (i.e., ξ = 0.1 ) and also for the GEV, or Fréchet, distribution. We specifically chose μ = 0 β / ξ so that our condition to simplify the conditional tail to a standard Pareto is violated (see Section 3.1). The variance–bias trade-off is seen as the average number of exceedences declines to 0.88 for 1% sample selection ( n = 50 ) while the variance in the number of exceedences also declines. Lastly, we show the simulation results for two Student t-distributions. A heavy-tailed version with degrees of freedom, ν = 2 , and lighter tail with ν = 10 .

5. Discussion

In this research, we have derived a Bayesian quantile estimator and the distribution of exceedances that can be applied to distributions that are isomorphic to the exponential distribution or whose tail exhibits power-law decay. We show the derived quantile estimate has zero coverage error implying the probability of exceedance, from a frequentist perspective, agrees with the desired quantile. This is a critical requirement for proper risk management and insures the usefulness of our approach, particularly for small sample sizes. Small sample size analysis is increasingly important given the inherent non-stationarity of financial markets.
In addition to deriving the ZCE quantile estimate, we derive an analytic expression for the distribution of the number of exceedences. This is a new result and important in analyzing the risk of multiple extreme events, potentially leading to significant losses and financial impairment. Given traditional methods underestimate this risk, our novel approach is a clear benefit for analyzing financial markets and will allow financial risk-managers to make more reliable risk assessments.
The theoretical performance of our approach exceeds the traditional maximum likelihood method and simulation results prove the efficacy of our methodology. Our method can be applied to a variety of underlying heavy-tailed distributions, in the Fréchet domain, that are typically encountered in financial market data. For example, historical analysis of stock market returns indicates the underlying process to be in the Fréchet domain with tail-index values in the 0.25 to 0.5 range. This implies our method would produce superior VaR or expected shortfall analyses. While the usual trade-off between variance and bias exists, for tails that are not strictly Pareto, the Bayesian method produces results that are an improved alternative to traditional risk-management metrics.

Funding

This research was funded by Farmingdale State College’s Publishing Support Fund.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request.

Acknowledgments

The author would like to acknowledge the support of the Department of Applied Mathematics and Farmingdale State College. In addition, the author would like the thank the editor and reviewers for their thoughtful and helpful comments and suggestions.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Abbreviations

The following abbreviations are used in this manuscript:
BEGBinomial-Exponential-Gamma
GEVGeneralized extreme value
GPDGeneralized Pareto distribution
MDAMaximum domain of attraction
MLMaximum likelihood
MLMMaximum likelihood method
POTPeaks over threshold
RVRandom variable
VaRValue at risk
ZCEZero coverage error

References

  1. Bairamov, I. G. (1996). Some distribution free properties of statistics based on record values and characterizations of the distributions through records. Journal of Applied Statistical Science, 5(1), 17–25. [Google Scholar]
  2. Balkema, A., & de Haan, L. (1974). Residual life time at great age. Annals of Probability, 2(5), 792–804. [Google Scholar] [CrossRef]
  3. Broadwater, J. B., & Chellappa, R. (2010). Adaptive threshold estimation via extreme value theory. IEEE Transactions on Signal Processing, 58(2), 490–500. [Google Scholar] [CrossRef]
  4. Buishand, T. A. (1989). The partial duration series method with a fixed number of peaks. Journal of Hydrology, 109(1–2), 1–9. [Google Scholar] [CrossRef]
  5. Coles, S. (2004). An introduction to statistical modeling of extreme values. Springer. [Google Scholar]
  6. Coles, S. G., & Dixon, M. J. (1999). Likelihood-based inference for extreme value models. Extremes, 2(1), 5–23. [Google Scholar] [CrossRef]
  7. Coles, S. G., & Powell, E. A. (1996). Bayesian methods in extreme value modelling: A review and new developments. International Statistical Review, 64(1), 119–136. [Google Scholar] [CrossRef]
  8. Datta, G. S., Mukerjee, R., Ghosh, M., & Sweeting, T. J. (2000). Bayesian prediction with approximate frequentist validity. The Annals of Statistics, 28(5), 1414–1426. [Google Scholar] [CrossRef]
  9. Dekkers, A. L. M., & de Haan, L. (1989). On the estimation of the extreme-value index and large quantile estimation. Annals of Statistics, 17(4), 1795–1832. [Google Scholar] [CrossRef]
  10. DuMouchel, W. H. (1983). Estimating the stable index α in order to measure tail thickness: A critique. The Annals of Statistics, 11(4), 1019–1031. [Google Scholar] [CrossRef]
  11. Embrechts, P., Kluppelberg, C., & Mikosch, T. (2003). Modelling extremal events. Springer. [Google Scholar]
  12. Fofack, H., & Nolan, J. P. (1999). Tail behavior, modes and other characteristics of stable distributions. Extremes, 2, 39–58. [Google Scholar] [CrossRef]
  13. Gumbel, E. J. (1958). Statistics of extremes. Columbia University Press. [Google Scholar]
  14. Gumbel, E. J., & von Schelling, H. (1950). The distribution of the number of exceedances. The Annals of Mathematical Statistics, 21(2), 247–262. [Google Scholar] [CrossRef]
  15. Hall, P., & Rieck, A. (2001). Improving coverage accuracy of non-parametric prediction intervals. Journal of the Royal Statistical Society B, 63(4), 717–725. [Google Scholar] [CrossRef]
  16. Hill, B. M. (1975). A simple general approach to inference about the tail of a distribution. Annals of Statistics, 3(5), 1163–1174. [Google Scholar] [CrossRef]
  17. Hosking, J. R. M., & Wallis, J. R. (1987). Parameter and quantile estimation for the generalized Pareto distribution. Technometrics, 29(3), 339–349. [Google Scholar] [CrossRef]
  18. Johnston, D. E., & Djurić, P. M. (2011). The science behind risk management: A signal processing perspective. IEEE Signal Processing Magazine, 28(5), 26–36. [Google Scholar]
  19. Johnston, D. E., & Djurić, P. M. (2020, May 4–8). A recursive Bayesian solution for the excess over threshold distribution with stochastic parameters. ICASSP 2020, Barcelona, Spain. [Google Scholar]
  20. Johnston, D. E., & Djurić, P. M. (2021, June 6–11). Baysian estimation of a tail-index with marginalized threshold. ICASSP 2021 (pp. 5569–5573), Toronto, ON, Canada. [Google Scholar]
  21. Leadbetter, M. R. (1991). On a basis for ‘Peaks over Threshold’ modeling. Statistics and Probability Letters, 12(4), 357–362. [Google Scholar] [CrossRef]
  22. McNeil, A., Frey, R., & Embrechts, P. (2005). Quantitative risk management. Princeton University Press. [Google Scholar]
  23. Nerantzaki, S. D., & Papalexiou, S. M. (2022). Assessing extremes in hydroclimatology: A review on probabilistic methods. Journal of Hydrology, 605, 1–20. [Google Scholar] [CrossRef]
  24. Northrop, P. J., Attalides, N., & Jonathan, P. (2017). Cross-validatory extreme value threshold selection and uncertainty with application to ocean storm severity. Journal of the Royal Statistical Society: Applied Statistics, 66(1), 93–120. [Google Scholar] [CrossRef]
  25. Pickands, J. (1975). Statistical inference using extreme order statistics. Annals of Statistics, 3(1), 119–131. [Google Scholar]
  26. Resnick, S. I., & Rootzeń, H. (2000). Self-similar communication models and very heavy tails. Annals of Applied Probability, 10(3), 753–778. [Google Scholar] [CrossRef]
  27. Roberts, S. J. (2000). Extreme value statistics for novelty detection in biomedical data processing. IEE Proceedings-Science, Measurement and Technology, 147(6), 363–367. [Google Scholar]
  28. Shenoy, S., & Gorinevsky, D. (2015). Estimating long tail models for risk trends. IEEE Signal Processing Letters, 22(7), 968–972. [Google Scholar] [CrossRef]
  29. Smith, R. L. (1987). Estimating tails of probability distributions. The Annals of Statistics, 15(3), 1174–1207. [Google Scholar] [CrossRef]
  30. van Dantzig, D. (1954, September 2–9). Mathematical problems raised by the flood disaster of 1953. International Congress of Mathematicians (pp. 218–239), Amsterdam, Denmark. [Google Scholar]
  31. Weron, R. (2001). Levy-stable distributions revisited: Tail index > 2 does not exclude the Levy-stable regime. International Journal of Modern Physics, 12(2), 209–223. [Google Scholar] [CrossRef]
  32. Wesolowski, J., & Ahsanullah, M. (1998). Distributional properties of exceedance statistics. Annals of the Institute of Statistical Mathematics, 50(3), 543–565. [Google Scholar] [CrossRef]
  33. Yu, K., & Ally, A. (2009). Improving prediction intervals: Some elementary methods. The American Statistician, 63(1), 17–19. [Google Scholar] [CrossRef]
  34. Yu, K., & Moyeed, R. A. (2001). Bayesian quantile regression. Statistics and Probability Letters, 54(4), 437–447. [Google Scholar]
Figure 1. E [ N α ] versus n for MLM with N = 1 / ( 1 α ) and α = 99 % , 99.9 % , 99.99 % .
Figure 1. E [ N α ] versus n for MLM with N = 1 / ( 1 α ) and α = 99 % , 99.9 % , 99.99 % .
Jrfm 18 00659 g001
Figure 2. B E G ( k ; 50 , 100 , 99 % ) , left, and B E G ( k ; 100 , 100 , 99 % ) , right, along with GvS: w ( 100 , 1 , 100 , k ) exceedance distributions.
Figure 2. B E G ( k ; 50 , 100 , 99 % ) , left, and B E G ( k ; 100 , 100 , 99 % ) , right, along with GvS: w ( 100 , 1 , 100 , k ) exceedance distributions.
Jrfm 18 00659 g002
Figure 3. Simulated Pareto RV samples (lines) and threshold exceedances (dots) with ξ = 0.3 , n = n ˜ = 50 , m ¯ = 100 .
Figure 3. Simulated Pareto RV samples (lines) and threshold exceedances (dots) with ξ = 0.3 , n = n ˜ = 50 , m ¯ = 100 .
Jrfm 18 00659 g003
Figure 4. Theoretical B E G ( k ; 50 , 100 , 99 % ) and simulation frequency distribution for Bayesian quantile estimator.
Figure 4. Theoretical B E G ( k ; 50 , 100 , 99 % ) and simulation frequency distribution for Bayesian quantile estimator.
Jrfm 18 00659 g004
Figure 5. Normalized simulation exceedance count for 1% prediction intervals with n = 50 and N = 100 (left) and 0.1% prediction intervals with n = 50 and N = 1000 (right).
Figure 5. Normalized simulation exceedance count for 1% prediction intervals with n = 50 and N = 100 (left) and 0.1% prediction intervals with n = 50 and N = 1000 (right).
Jrfm 18 00659 g005
Figure 6. Simulation sample mean (left) and standard deviation (right) of the number of exceedances ( N α ) as a function of S α S characteristic exponent α s for different tail-samples (n) with n ˜ m ¯ = 5000, N = 100, and α = 99 % .
Figure 6. Simulation sample mean (left) and standard deviation (right) of the number of exceedances ( N α ) as a function of S α S characteristic exponent α s for different tail-samples (n) with n ˜ m ¯ = 5000, N = 100, and α = 99 % .
Jrfm 18 00659 g006
Table 1. Simulation statistics for the tail-index, ξ , and the number of exceedances, N α , for various distributions and for different tail-samples (n) with n ˜ m ¯ = 5000, N = 100, and α = 99 % .
Table 1. Simulation statistics for the tail-index, ξ , and the number of exceedances, N α , for various distributions and for different tail-samples (n) with n ˜ m ¯ = 5000, N = 100, and α = 99 % .
Exp ( μ = 1 )Lognormal ( μ = 0 , σ = 1 )
n = 5n = 10n = 25n = 50n = 5n = 10n = 25n = 50
μ ¯ [ ξ ] 0.130.140.160.180.280.290.320.34
σ ¯ [ ξ ] 0.050.040.030.020.120.090.060.04
μ ¯ [ N α ] 0.950.780.480.250.980.900.710.53
σ ¯ [ N α ] 1.671.370.930.601.601.451.120.88
P ¯ ( N α > 1 ) 0.210.180.100.040.230.220.170.11
StdPar ( ξ = 0.1)GEV ( ξ = 0.5, β = 1, μ = 0)
n = 5n = 10n = 25n = 50n = 5n = 10n = 25n = 50
μ ¯ [ ξ ] 0.100.100.100.100.520.510.530.54
σ ¯ [ ξ ] 0.040.030.020.010.230.160.100.07
μ ¯ [ N α ] 1.081.041.031.01.051.030.940.88
σ ¯ [ N α ] 1.701.531.371.221.701.501.301.14
P ¯ ( N α > 1 ) 0.260.250.260.260.250.250.230.23
StudentT ( ν = 2 )StudentT ( ν = 10 )
n = 5n = 10n = 25n = 50n = 5n = 10n = 25n = 50
μ ¯ [ ξ ] 0.500.500.500.510.140.150.170.18
σ ¯ [ ξ ] 0.230.160.100.070.060.040.030.02
μ ¯ [ N α ] 1.081.041.00.951.020.860.640.40
σ ¯ [ N α ] 1.711.571.341.191.691.391.070.73
P ¯ ( N α > 1 ) 0.250.250.250.240.240.200.140.08
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Johnston, D.E. Bayesian Estimation of Extreme Quantiles and the Distribution of Exceedances for Measuring Tail Risk. J. Risk Financial Manag. 2025, 18, 659. https://doi.org/10.3390/jrfm18120659

AMA Style

Johnston DE. Bayesian Estimation of Extreme Quantiles and the Distribution of Exceedances for Measuring Tail Risk. Journal of Risk and Financial Management. 2025; 18(12):659. https://doi.org/10.3390/jrfm18120659

Chicago/Turabian Style

Johnston, Douglas E. 2025. "Bayesian Estimation of Extreme Quantiles and the Distribution of Exceedances for Measuring Tail Risk" Journal of Risk and Financial Management 18, no. 12: 659. https://doi.org/10.3390/jrfm18120659

APA Style

Johnston, D. E. (2025). Bayesian Estimation of Extreme Quantiles and the Distribution of Exceedances for Measuring Tail Risk. Journal of Risk and Financial Management, 18(12), 659. https://doi.org/10.3390/jrfm18120659

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.
Back to TopTop