Interval Estimation of Value-at-Risk Based on Nonparametric Models

: Value-at-Risk (VaR) has become the most important benchmark for measuring risk in portfolios of different types of ﬁnancial instruments. However, as reported by many authors, estimating VaR is subject to a high level of uncertainty. One of the sources of uncertainty stems from the dependence of the VaR estimation on the choice of the computation method. As we show in our experiment, the lower the number of samples, the higher this dependence. In this paper, we propose a new nonparametric approach called maxitive kernel estimation of the VaR. This estimation is based on a coherent extension of the kernel-based estimation of the cumulative distribution function to convex sets of kernel. We thus obtain a convex set of VaR estimates gathering all the conventional estimates based on a kernel belonging to the above considered convex set. We illustrate this method in an empirical application to daily stock returns. We compare the approach we propose to other parametric and nonparametric approaches. In our experiment, we show that the interval-valued estimate of the VaR we obtain is likely to lead to more careful decision, i.e., decisions that cannot be biased by an arbitrary choice of the computation method. In fact, the imprecision of the obtained interval-valued estimate is likely to be representative of the uncertainty in VaR estimate.


Introduction
Controlling financial risk is an important issue for financial institutions. For the necessity of risk management, the first task is to measure risk. Value-at-Risk (VaR) is probably the most widely used risk measurement in financial institutions. It has made its way into the Basel II Capital-Adequacy framework. VaR is an estimate of the largest loss over a specified time horizon at a particular probability level. For example, if the daily VaR of an investment portfolio is $10 million with a 95% confidence level, this means that we can be 95% confident that the portfolio will not lose more than $10 million over the next day. In a more formal way, the VaR of a portfolio at a probability level α can be defined as McNeil et al. (2005, p. 38).
where X is a random variable representing daily percent return for a total stock index with Cumulative Distribution Function (CDF) F X . As McNeil et al. (2005) point out, VaR is simply a quantile of the corresponding loss distribution, which makes its computation easy. Various methods have been proposed in the relevant literature to compute the VaR. Each method differs in the methodology used, the hypotheses, and the way the models are implemented. A highly questionable fact is that each method leads to different results. Thus the choice of computational method is likely to have a large impact on the way a financial institution manages its credit portfolio. The most commonly used approach is the nonparametric Historical Simulation (HS) method described in Linsmeier and Pearson (1997). It is based on the empirical CDF of the historically simulated returns by attributing an equal probability weight to each day's return. The HS approach is easy to implement but suffers from two major drawbacks: First, the success of the approach depends on the ability to collect a large series of data; second, this is an unconditional model and thus we need a number of extreme scenarios in the historical record to provide more informative estimates of the tail of the loss distribution McNeil et al. (2005). Moreover, estimates of extreme quantiles are inefficient since extrapolation beyond past observations is impossible. To avoid these drawbacks, several parametric and nonparametric approaches have been developed. For example, Butler and Schachter (1998) propose the kernel estimators in conjunction with the historical simulation method. Also, Charpentier and Oulidi (2010) calculate the VaR by using several nonparametric estimators based on Beta kernel. This study shows that those estimators improve the efficiency of traditional ones, not only for light tailed, but also heavy tailed distributions.
Numerous recent VaR models are referred to as parametric. From the point of view of the risk manager, estimate VaR assuming normal distribution of asset returns is inappropriate and lead to underestimate the left tail at low probability level. For this reason, most recent research papers deal with going beyond the normal model and attempt to capture the related phenomena of heavy and long tails and asymmetric form of the returns series. EVT (Extreme Value Theory) and the so-called GHYP (Generalized HYPerbolic) distributions are among the most widely used. The main advantage of hypothesizing a GHYP distribution is its ability to account for the statistical properties of financial market data such as volatility clustering, asymmetry and heavy-tail phenomena (see McNeil et al. (2005) for an introduction and Paolella and Polak (2015) for a recent application). Kuester et al. (2006) use an EVT-based approach and focuses on the long tails of the return distribution. Braione and Scholtes (2016) study the performance of forecasting VaR under different parametric distributional assumptions and show the predominance and the predictive ability for the skewed and heavy-tailed distributions in the univariate case. The main drawback of using a parametric model is the high dependency of the obtained method to the hypothesized distribution model. To lower this dependency, some authors have proposed a group of semiparametric methods that are based on extreme value theory. These methods have been successfully used by financial analysts in estimating VaR Danielsson and De Vries (2000). However, all the above mentioned attempts have the common weakness of a low robustness to modelling leading to variations in the VaR estimation. To enable this method to be used in a prudent manner, it would be of instrumental interest to estimate the variation of the computed VaR w.r.t. the modelling. Some attempts have been proposed to achieve such a goal. For example, Butler and Schachter (1998) propose a method to measure the precision of a VaR estimate. Jorion (1996) suggests that VaR always be reported with confidence intervals and shows that it is possible to improve the efficiency of VaR estimates using their standard errors. Kupiec (1995) proposes a method for quantifying the uncertainty, in the estimated VaR, induced by the fact that the return distribution is unknown. Pritsker (1997) proposes to estimate a standard error by using a Monte Carlo VaR analysis.
In this paper, we propose a completely new approach to estimate the variations in the VaR induced by choosing a particular model in a kernel-based approach. The idea is that an estimate of the CDF of the daily percent return can be used to compute the VaR as suggested by Equation (1). Such an estimate can be obtained by using a kernel-based approach Silvermann (1986). In Kernel Density Estimation (KDE), the role of the kernel is to achieve an interpolation to lower the impact of sampling on the obtained estimation. Roughly speaking, it smoothes the empirical CDF. However, as in the above mentioned methods, there is a systematic bias induced by the choice of the kernel used to estimate the CDF. Our proposal is to focus on the new nonparametric approach developed by Loquin and Strauss (2008a) to estimate the CDF. This approach makes use of the ability of a new kind of interpolating kernel, called maxitive kernel, to represent a convex set of conventional kernels-that we call summative kernels Loquin and Strauss (2008b). Within this approach, the estimate of the CDF is interval-valued. Such an interval-valued estimate of the CDF, also called a p-box Destercke et al. (2007), is the convex set of all the CDF that would have been computed by using the convex set of conventional summative kernel represented by the maxitive kernel. We show that this approach can advantageously be used to compute with accuracy the corresponding convex set of kernel-based VaR estimates. This approach has advantages over Monte Carlo based approaches since its complexity is comparable to that of classical kernel estimation. Moreover, within this approach the bounds are exact while Monte Carlo approaches provide an inner estimation of those bounds. This paper is structured as follows. Section 2 reviews some nonparametric and parametric approaches and bootstrap methods used to compute the confidence interval of VaR. In Section 3, we introduce the empirical distribution function and the kernel cumulative estimator based on summative kernel. We define in Section 4 the maxitive kernel, which forms the basis of our approach, and it is shown how an interval-valued estimation of VaR, based on maxtive kernel, can be derived that has relevant properties in this context. Section 5 presents and discusses our empirical findings. Firstly, we show how the choice of kernel function affects the VaR estimates. Secondly, we investigate the performance of the interval-valued proposed in this paper by comparing it to three very competitive approaches: The simple Normal VaR, the HS VaR, and the GHYP VaR. Finally, Section 6 concludes this paper with some further remarks.
Throughout this paper, we consider that the observations belong to a convex and compact subset (universe) Ω of IR, called the reference set. P (Ω) is the collection of all measurable subsets of Ω. It naturally contains the empty set and is closed under complementation and countable unions. Then (Ω, P (Ω)) is a measurable space. Let L(Ω) be the set of functions defined on Ω with values in IR.

Historical Simulation
The HS method calculates the Value-at-Risk using real historical data of asset returns and captures the non-normal distribution of the returns. HS is a nonparametric method because it doesn't make a specific assumption about distribution of returns. However HS method assumes that the distribution of past returns is a good and complete representation of expected future returns.
The VaR with α% of probability level is calculated as the α% percentile of the sorted data return values. For example, with a returns data with 1000 observations, the 1% VaR estimate is simply the negative of the 10 th sample order statistic. This HS VaR can be defined as follows: where r t is the asset return at time t.

GHYP Parametric Distribution
The generalized hyperbolic distribution (GHYP) was introduced by Barndorff-Nielsen (1978) to fit financial returns. The GHYP is an asymmetric heavy-tailed distribution that can account for the extreme events and cater for skewness embedded in the data. It has since been applied in diverse disciplines such as physics, biology, financial mathematics (see Eberlein and Keller (1995); Sørensen and Bibby (1997); Paolella (2007, chp. 9)).
The probability density function (pdf) of univariate GHYP distribution with the parameterization of Eberlein et al. (1998) is given, for x ∈ R, by: where (a) K λ denotes a modified Bessel function of the third kind with index λ.
(b) λ defines the subclasses of GHYP and is related to the fail flatness.
(c) χ and ψ determine the distribution shape; in general, the larger those parameters are, the closer the distribution is to the normal distribution. (d) µ is the location parameter. (e) σ is the dispersion parameter (standard deviation) (f) γ is the skewness parameter (if γ = 0, the distribution reduces to the symmetric generalized hyperbolic distribution).
The GHYP family contains many special cases known under special names, listed as follows: • Hyperbolic Distribution (HYP) If λ = 1, we get the hyperbolic distribution. However, HYP distribution is characterized by having a hyperbolic log-density function whereas the logdensity for the normal distribution is a parabola. Thus, one may expect the HYP distribution to be coherent alternatives for heavy tailed data.

• Normal Inverse Gaussian (NIG) Distribution
If λ = − 1 2 , then the distribution is known as normal inverse gaussian (NIG). NIG distribution is also widely used in financial modeling.

• Variance Gamma (VG) Distribution
If λ > 0 and χ = 0, then we obtain the limiting case which is known as variance gamma (VG) distribution.

• The Skewed Student's -Distribution (St)
When λ < 0 and χ = 0 we get another limiting case called the generalized hyperbolic skew student's t distribution because it generalizes the usual Student t distribution, obtained from the skewed-t by setting the skewness parameter γ = 0.

Nonparametric Bootstrap Approach
The bootstrap method is used in an important number of statistics topics based on building a sampling distribution for a statistic by resampling from the data at hand. The term bootstrap was coined by Efron (1979), and is an allusion to the expression pulling oneself up by one's bootstraps-in this case, using the sample data as a population from which repeated samples are drawn Fox (2002). The nonparametric bootstrap allows us to empirically estimate the sampling distribution of a statistic θ-such as a mean, median, standard deviation, or quantile (VaR)-or an estimatorθ without making assumptions about the form of the population, and without deriving the sampling distribution explicitly. The basic idea of the nonparametric bootstrap is as follows: Assuming a data set S = (x 1 , . . . , x n ) is available. We draw a sample of size n from among the elements of S, sampling with replacement. This result is called bootstrap sample S 1 = (x 11 , . . . , x 1n ). We repeat this procedure a large number of times, B, selecting many bootstrap samples; the bth such bootstrap sample is denoted S b = {x b1 , . . . , x bn }. Next, we compute the statistic θ for each of the bootstrap samples; that is θ b = t(x b ), with t denoting some function, that we can use to estimate from data. Then the distribution of θ b around the original estimate θ is analogous to the sampling distribution of the estimator θ around the population parameter. The variance is estimated by the sample variance (but for the bootstrap sample ofθ) and the bias is estimated by the difference between the average of the bootstrap sample and the originalθ. Then, the bootstrap estimates of bias and variance are given approximately by: There are various methods for constructing bootstrap confidence intervals. The normal-theory interval assumes that the distribution ofθ is normally distributed (which is often approximately the case for statistics in sufficiently large samples), and uses the bootstrap estimate of sampling variance, and perhaps of bias, to construct a 100(1 − α)-percent confidence interval of the form: where se(θ ) = var(θ ) is the bootstrap estimate of the standard error ofθ, and z 1− α 2 is the 1 − α 2 quantile of the standard-normal distribution (e.g., 1.96 for a 95-percent confidence interval, where α = 0.05).
We can also estimate confidence intervals using percentiles of the sample distribution: The upper and lower bounds of the confidence interval would be given by the percentile points (or quantiles) of the sample distribution of parameter estimates. This alternative approach, called the bootstrap percentile interval, is to use the empirical quantiles of θ b to form a confidence interval [θ (lo) , θ (up) ], where θ (1) , . . . , θ (r) are the ordered bootstrap replicates of the statistic; lo = [(r + 1)α/2]; up = [(r + 1)(1 − α/2)]; and [ • ] is the nearest integer function. This basic percentile interval approach is limited itself, particularly if parameter estimators are biased Dowd (2005). It is therefore often better to use the bias-corrected, accelerated (or BC a ) percentile intervals. In order ton find the BC a , we firstly calculate is the adjust proportion of bootstrap replicates at or below the original-sample estimate θ. If the bootstrap sampling distribution is symmetric, and if θ is unbiased, then this proportion will be close to 0.5, and z will be close to 0. Now, let θ (−i) be the value of θ produced when the ith observation is deleted from the sample there are n of these quantities. Letθ = 1 n ∑ n i=1 θ (−i) be the average of the θ (−i) . We calculate We calculate . When β and z are both 0, β 1 = Φ(−z 1− α 2 ) = Φ(z α 2 ) = α 2 and β 2 = Φ(z 1− α 2 ) = 1 − α 2 , which corresponds to the (uncorrected) percentile interval.
For more background on the bootstrap approach and a broader array of applications see Efron and Tibshirani (1993); Davison and Hinkley (1997).
The bootstrap approach can be used to estimate the Value-at-Risk (VaR) and a confidence intervals for VaR. A resampling method based on the bootstrap and a bias-correction to improving the Value-at-Risk (VaR) forecasting ability of the normal-GARCH model has been developed in Hartz et al. (2006). As mentioned in Dowd (2005), if we have a data set of n observations, we create a new data set by taking n drawings, each taken from the whole of the original data set. Each new data set created in this way gives us a new VaR estimate. We then create a large number of such data sets and estimate the VaR of each. The resulting VaR distribution function enables us to obtain estimates of the confidence interval for our VaR. For estimate confidence intervals using a bootstrap approach, we produce a bootstrapped histogram of resample-based VaR estimates, and then read the confidence interval from the quantiles of this histogram. A very good discussion on this and other improvements can be seen in Dowd (2005, chp. 4). In Section 5.3, we use the bias corrected method to estimate the confidence interval of VaR obtained form HS, Normal and GHYP distribution. The purpose of this simulation exercise is to compare these methodologies with our maxitive kernel approach, in order to identify that our interval-valued estimation of VaR perform properly.

Summative Kernels and Probability Distributions
Kernels are used in signal processing and nonparametric statistics to define a weighted neighborhood around a location u ∈ Ω. In kernel regression, it is used to estimate the conditional expectation of a random variable (see e.g., Silvermann 1986;Wand and Jones 1995).
From a basic summative kernel κ, we can define a summative kernel κ x ∆ translated in x ∈ Ω and dilated with a bandwidth ∆ > 0 by By convention, κ(u) = κ 0 1 (u). The most used summative kernels in functional estimation are usually unimodal, symmetric, and centered (i.e., defining a weighted neighborhood around the origin). When κ(u) is symmetric and unimodal function, then the following conditions are fulfilled Ω uκ(u)du = 0 and For example, the Epanechnikov kernel proposed by Epanechnikov (1969) illustrated in Figure 1 is defined by In the following, K(Ω) represents the set of unimodal, symmetric, and centered kernels. A summative kernel κ can be seen as a probability distribution inducing a probability measure P κ : P (Ω) → [0, 1] on the measurable space (Ω, P (Ω)). P κ is defined by Considering the summative kernel κ, we can define what we call the summative expectation as follows: Definition 2. Let s be a function of L(Ω) and let κ be a summative kernel of K(Ω). The summative expectation of s, in the neighborhood defined by κ, is the classical expectation of s w.r.t. P κ : since dP κ = κ(u)du.

Empirical Distribution Function
Let (x 1 , . . . , x n ) be a finite set of observations of n random variables (X 1 , . . . , X n ) i.i.d with unknown pdf f and CDF F. One can estimate F by the empirical CDF F n defined by where 1l is the indicator function, namely 1l( Then ∀x ∈ Ω, This implies that the empirical CDF is convergent in probability to the true CDF: Thus the empirical estimate of Value-at-Risk converges towards the true VaR α .

Summative Kernel Cumulative Estimator
The empirical distribution function (Equation (4)) is not smooth as it jumps by 1 n at each point x i (i = 1 . . . n). A kernel estimator of the CDF, introduced by authors such as Nadaraya (1964); and Watson and Leadbetter (1964), is a smoothed version of the empirical distribution estimator. Such an estimator arises as an integral of the Parzen-Rosenblatt kernel density estimator (see Rosenblatt 1956;Parzen 1962).
Definition 3. The summative-kernel cumulative estimator of the CDF (also called the Parzen-Rosenblatt kernel cumulative estimator), based on the summative kernel κ is given, in each point x ∈ Ω, by where Let κ ∈ K(Ω) be a summative kernel function of the second order with support [−1, 1], the function Γ(u) verifies the following properties Note that if κ is the pdf of a probability measure P κ , then Γ is the cumulative distribution function of P κ . For example, κ being the Epanechnikov kernel defined by (2), the function Γ, depicted in Figure 2, has the following expression: Let κ ∈ K(Ω) be a summative kernel with support [−1, 1], for a fixed x, the bias and the variance of F n κ (x) are given by Some theoretical properties of the estimator F n κ have been investigated, among others, by Winter (1973); Winter (1979); Sarda (1993); Yamato (1973); and Jones (1990). Some properties have long been known, e.g., the uniform convergence of F n κ towads F when the pdf f = dF is continuous Nadaraya (1964); and Yamato (1973) or without conditions on f Singh et al. (1983). The asymptotic expression of the Mean Integrated Squared Error (MISE) (i.e., Ω ( F n κ (x) − F(x)dx) 2 has been studied in Swanepoel (1988). For a continuous pdf f , it has been proved that the best kernel is the uniform kernel of bandwidth ∆ > 0 defined by κ(u) = 1 2∆ 1l [−∆,∆] (u) and for a discontinuous function f , in a finite number of points, the best kernel is the exponential kernel of bandwidth ∆ > 0 defined by The method of choosing the optimal value of bandwidth in kernel estimation of the cumulative distribution function is of crucial interest. Many procedures-such as plug-in and cross validation-have been proposed in the relevant literature (see e.g., Sarda 1993;Polansky and Baker 2000;Altman and Leger 1995) to choose (estimate) the optimal bandwidth for estimating the CDF of the random process underlying a sample. As mentioned in Quintela Del Rio and Estevez-Perez (2013), the Polansky and Baker plug-in bandwidth is given by where r ≥ 2 is an even integer and The kernel function H is not necessarily equal to κ. An iterative method for calculating the plug-in bandwidth has been proposed by Polansky and Baker (2000). As mentioned in Quintela Del Rio and Estevez-Perez (2013), the plug-in bandwidth is calculated as follows: let b > 0 be an integer, firstly, we calculateV 2b+2 usingV where σ is the standard deviation of the data which can be estimated byσ(x i ) = min{ŝ, Q 3 −Q 1 1.349 }, withŝ is the sample standard deviation, and Q 1 , Q 3 denoting the first and third quartile, respectively. Secondly, begin form j = b to j = 1, we calculateV 2j (ĝ 2j ), wherê Thirdly, the plug-in bandwidth iŝ In practice, for most applications, we consider b = 2 Quintela Del Rio and Estevez-Perez (2013). Sarda (1993) introduced a cross-validation method to estimate the optimal bandwidth which minimize the MISE: where F n is the empirical distribution function (Equation (4)) and F n κ;−i is the kernel cumulative distribution estimator computed by leaving out x i : Adrian et al. (1998) proposed a modified cross-validation method which minimizes the function For more details about bandwidth selection of summative kernel CDF estimation, we refer our reader to Quintela Del Rio and Estevez-Perez (2013). Now, if we suppose that all parameters have been chosen appropriately for F n κ to be a consistent estimate of the CDF, then a kernel estimator of VaR, denoted VaR α,κ , can be easily obtained by inverting the equation F n κ (x) = α. In that case, VaR α,κ satisfies: The kernel VaR estimator VaR α,κ can be seen as a smoothed version of the empirical distribution function of VaR (VaR α,n ) defined by (Equation (5)). For further details about the properties of the kernel VaR estimator (see e.g., Gourieroux et al. 2000;Chen and Tang 2005;Sheather and Marron 1990). However, the validity of such an estimate highly depends on the appropriateness of the chosen kernel and bandwidth. In the next Section, we propose another approach that allows to estimate the dependance of the obtained estimate to the parameters of the method, and therefore to increase the robustness of the decision process based on the data.

Maxitive Kernels and Possibility Distributions
In many applications the summative kernels and their bandwidth are chosen in a very empirical way. As proposed in Loquin and Strauss (2008b), the empirical character of choosing a kernel could be taken in consideration by taking a family of kernels rather than one kernel. This family of kernels can be represented by using a maxitive kernel.
From a basic maxitive kernel π, we can define a maxitive kernel π x ∆ translated in x ∈ Ω and dilated with a bandwidth ∆ > 0 by By convention, π(u) = π 0 1 (u). For example, the triangular maxitive kernel proposed in Loquin and Strauss (2008b) Figure 3 is defined by A maxitive kernel can be seen as a possibility distribution Loquin and Strauss (2008b), inducing two dual non-additive confidence measures, a possibility measure Π π , and a necessity measure N π Dubois and Prade (1988);and De Cooman (1997)

illustrated in
with A c being the complementary set of A in Ω.

Choquet Integrals and Maxitive Expectation
We start with the definition of a capacity or non-additive measure which generalize the notion of additive measure, i.e., probability. The notion of capacity was introduced by Gustave Choquet in 1953 and has played an important role in game theory, fuzzy set theory, Dempster-Shafer theory and many others (see Denneberg 1994;Shafer 1976;Choquet 1953).
Definition 5. Let (Ω, P (Ω)) be the measurable space. A capacity ν on (Ω, P (Ω)) is a set function ν : P (Ω) → [0, 1] satisfying: One of the most important concepts closely related to additive measures is integration. It has a natural generalization to non-additive measure theory. Historically the first applied integral with respect to non-additive measures is the Choquet Integral formalized by Gustave Choquet.
Definition 6. Let (Ω, P (Ω)) be the measurable space and ν : P (Ω) → [0, 1] a capacity. Let s be a bounded function of L(Ω). The continuous Choquet integral of s w.r.t. ν is defined by where the integral on the right hand side is a Riemann integral.
Note that, a probability P being a special case of (additive) capacity, the Choquet integral w.r.t. P coincides with the classical expected value w.r.t. P.
Recall that, if X is a random variable defined on a probability space (Ω, P (Ω), P), then the classical expected value of X is given by where P : P (Ω) → [0, 1] is a probability measure on the measurable space (Ω, P (Ω)) and the integral is a Lebesgue-Stieltjes integral. The notion of expectation w.r.t. a summative kernel can be extended to a maxitive kernel, as defined in Loquin and Strauss (2008b).
Definition 8. Let π be a maxitive kernel and let s be a bounded function of L(Ω). The expectation of s w.r.t π is defined by: where Π π (rsp. N π ) is the possibility (rsp. necessity) measure induced by the maxitive kernel π and C is the Choquet integral.
An important property, that will be used in the sequel, is that the maxitive interval-valued expectation of s w.r.t. π is the convex set of all the summative precise-valued expectations w.r.t. all the summative kernels dominated by π Loquin and Strauss (2008b). Property 1. Let π be a maxitive kernel and let M(π) be the set of the summative kernels it dominates (see Equation (10)). Let s be a bounded function of L(Ω). We have ∀g ∈ E π (s), ∃κ ∈ M(π) : E κ (s) = g, where E κ (s) resp. E π (s) is the summative (resp. maxitive) expectation of s w.r.t. κ (resp.π).

Maxitive Kernel Cumulative and VaR Estimator
As mentioned in Loquin and Strauss (2008a) (Theorem 4), the summative kernel cumulative estimator F n κ , defined by (Equation (3)), in each point x ∈ Ω, can be written as the summative expectation of the empirical CDF F n , defined by (Equation (4)), in a probabilistic neighborhood defined by the summative kernel κ translated in x Based on this reformulation, Loquin and Strauss (2008a) propose an extension of the summative kernel estimator of the CDF. This extension called maxitive kernel estimator of the CDF or interval-valued estimation of the CDF. Definition 9. Let F n be the empirical CDF based on the sample (x 1 , . . . , x n ). Let π be a maxitive kernel and let E π ( • ) be the maxitive expectation w.r.t. π (Expression (14)). The maxitive kernel cumulative estimator is defined, ∀x ∈ Ω, by The computation of F n π involves two Choquet integrals. This computation Loquin and Strauss (2008a) is given, for all x ∈ Ω, by The estimation of the CDF is usually computed on p regularly spaced points of Ω. Let {y j } j∈{1,...,p} be those p points. The algorithm of compute of F n π (y j ) j∈{1,...,p} , in each point {y j } j∈{1,...,p} of Ω, is given by Algorithm 1.
Notice that, due to Property 2, both F π and F π are also estimates of the sought after CDF. Clearly, F n π and F n π are two strictly increasing continuous functions on Ω, and so they both have an inverse. Then, we can infer immediately an maxitive interval-valued estimation of the VaR VaR α,π = [VaR α,π , VaR α,π ]: that inherits from the properties of the CDF: VaR α,π = VaR α,κ |κ ∈ M{π} . The MATLAB program computing the interval-valued estimation of VaR is outlined in Appendix A to the present paper.

Data and Experimental Process
The data used in the present study consisted of daily closing prices collected between January 2010 and December 2016 for four stock indexes from developed countries: S&P500, DJI, Nikkei225, and CAC40. The daily close prices were converted to daily log-returns. We took the first differences of the natural logarithm of the daily prices. For an observed price P t , the corresponding one-day log-return on day t was defined by: r t = ln P t P t−1 . Table 1 contains descriptive statistics for the sample return of the four considered stock indexes. We observed that the returns were negatively skewed and characterized by heavy tails since the kurtoses were significantly greater than 3. The series had a distribution with tails that were significantly fatter than those of normal distribution. The Jarque-Bera test also indicated that hypothesizing the four data to be normally distributed can be rejected with a very low significance level. Due to these properties, the normal distribution would be an inappropriate model for calculating the daily VaR-as mentioned in the introduction. We thus rather envisage a nonparametric asymmetric approach based on kernel estimation of the CDF. However, as previously mentioned, estimating the VaR using a kernel based approach can be highly biased, since the choice of the kernel has an important impact on assessing the VaR estimate at a given probability level.
In this section, we propose to gauge this impact by estimating the VaR at a several probability levels with different types of kernels, whose bandwidths have been adapted to the available datasets. We confirmed that no kernel can be considered as optimal in this context and show that the new maxitive interval-valued nonparametric approach we propose leads to a more cautious behavior when applied to real data. In fact, the bounds of the interval-valued estimate of the VaR, for a given probability level, can be instrumental in applications such as reserve requirements for banks. We then show that this approach allows discarding some parametric approaches since they are not adapted to heavy tailed data like those we consider.

Nonparametric Estimate of the VaR: Comparing Summative and Maxitive Approaches
In this experiment, we have considered estimating the VaR for low probability levels-namely at levels lower than 0.1-for each of the four datasets. In the first experiment, we have focused on DJI and Nikkei225 indexes. The CDF(s) have been estimated by considering four of the most used summative kernels whose bandwidth have been adapted using the rule of thumb method: Epanechnikov, normal, biweight, and triweight (linear)-see Silvermann (1986) for their analytical expressions. Figure 4 illustrates the impact of the kernel choice in the VaR estimation. This impact is more marked with high volatility stocks (Nikkei225). For example, referring to Figure 4a, it can be seen that the VaR estimate based on the Epanechnikov kernel is absolutely greater than the one based on the normal kernel at the probability level of 1% while the opposite situation occurs at the probability level of 2.5%. Referring to each plots of Figure 4, it is obvious that no kernel can be seen as always providing the upper or lower valued CDF. Since the CDF curves intersect each other for different probability levels then the risk of bias in the VaR estimate is relatively great, especially when the number of observations in the data set is low-compare Figure 4b-d. In this context, no kernel function can be considered as more or less conservative than the others. Also, other indexes would have produced similar results.
This first experiment shows that the high dependence of the CDF estimate to the chosen kernel shape can highly impact and bias the VaR estimate. The goal of the second experiment aims to show that none of the four considered kernel can be considered as optimal in this context. This experiment needs a ground truth which is not available. A resampling methodology was carried out to estimate such a ground truth. This methodology consists of five steps: • Step 1: Fit a GARCH model to the four stock returns data. The fitted model considered for the returns r t is a AR(1) 1 -GARCH(1,1) given by: ε t = σ t z t , ∀t = 1, ..., n, where r t is the return value at time t, z t is a standard Normal random variable and n is the sample size (n = 200 and 1500). The conditional mean µ t , is assumed to follow an AR(1) model given by: µ t = ϕ 0 + ϕ 1 r t−1 . By definition ε t is serially uncorrelated with mean zero, but a time varying conditional variance equal to σ 2 t . The three positive parameters η 0 , η 1 , η 2 and the two parameters ϕ 0 , ϕ 1 are, respectively, the parameters of the GARCH(1,1) and the AR(1) models. The ML method provides a systematic way to adjust the parameters of the model to give the best fit. Table 2 lists the fitted models for each daily stock returns data. 1 Autoregressive of order 1.   The blue curve corresponds to the Biweight distribution and the green curve is the Normal distribution.
• Step 2: Generate 1000 simulated samples from each returns data using the coefficients obtained from the above fitted models. The main reason behind proposing GARCH models to simulate our real data is the dependence properties and the volatility phenomena of stock returns; see Engle (1982).
In a first step, we generate an i.i.d series, z t , by the random generator in MATLAB (R2010a) and another series σ t . The random numbers sampled were all assumed to be normally distributed with expectation zero and unit variance. Then, the innovations for the GARCH series have been obtained via the equation ε t = σ t z t . Finally the generation of r t from the AR(1) process is straightforward. Therefore we run the series for 1500 and 200 times in each of the 1000 simulated samples.
• Step 3: Calculate the VaR estimates for each sample data generated in step two using the 4 kernel functions. These VaRs are calculated at five chosen levels of probability (α = 1%, 2.5%, 5%, 7.5%, 10%) with 2 times horizons (n = 200 and n = 1500). • Step 4: Assess the performance of each kernel function by comparing their VaRs with the empirical VaRs. The performance criterion we examine is the mean square error (MSE), i.e.,: where Q (n) (p) is a vector of quantiles obtained from the simulated samples. • Step 5: Tabulate the results that lead us to some important conclusions.
The results of the MSE of each α (%) VaR estimates are reported in Tables 3 and 4. From Tables 3  and 4, for S&P500, the best kernel function that estimate the VaR at level of 10%, 7.5%, 5% and 2.5% is the Epanechnikov kernel, while the Normal kernel is the best at 1% probabilities level. For Nikkei225, it seems that the Normal perform better than the other kernel functions. For DJI, the Epanechnikov kernel is more accurate to estimate the VaR, while for CAC40 the Normal and is matched more accurately. From Table 4, when n = 200, it appears that the Biwieight and Epanechnikov kernel are more likely to estimate VaR more accurately at several probability levels. However, the performance of kernel functions rapidly declines as n and α get smaller. The simulations results showed that, for large sample sizes (here n = 1500), all kernel functions are consistent estimators, i.e., MSE values are close to 0. On the other hand, MSE values are larger for the smaller sample size (here n = 200). This demonstrates that choosing a particular kernel, when the sample size is low, is risky. Therefore a coherent interval-valued of VaR as we propose is likely to provide a more careful decision in this context.

Interval Estimation of Value-at-Risk and Some Numerical Comparisons
Here we apply the maxitive kernel estimator presented in Section 4.3 to obtain the lower and upper bound for VaR corresponding to each four stock indexes for three probability levels (α = 10%, 5%, 1%). To obtain the optimal bounds for the VaR, the bandwidth has been, once again, chosen using the most popular methods such as biased cross-validation method and plug-in method presented in Section 3.3. As shown in Table 5, the maximum available bandwidth is taken to insure that all kernel estimators are inside the interval given by the maxitive kernel estimator. In order to examine the performance of the maxitive kernel estimation method, we divide the data into three samples: the first sample corresponds to data with 6 year time horizons, the second sample is chosen with time horizons of 3 years and the third correspond to one year time horizon. In a first step, after choosing the best-fit distribution from the family of GHYP distribution by using the AIC 2 criteria, we compute the VaR across the candidate distribution. Next, we compute the VaR confidence interval based on these three distributions using the bias-corrected and accelerated (BCa) bootstrap method. Finally, we compare our proposed maxitive interval-valued with those based on the bootstrap technique under different samples sizes. So far, the bootstrap confidence interval of VaR for the three distributions: GHYP, Normal, and HS are presented together in Tables 6-8. With these punctual VaRs at hand for comparison purposes, we evaluate the performance of our approach. In this section, we have chosen to illustrate our results and to discuss the benefit of our approach in two ways. In Tables 6-8, we show the 2 Akaike Information Criterial. AIC = −2 ln L GHYP + 2ς, where ς is the number of estimated parameters and L(GHYP) is the likelihood of the GHYP model. explicit results for the minimum and maximum value, as well as the width of the interval estimation of VaR for several probability levels between 1% and 10%. Figure 5 shows the PP-plots, i.e., the F(x i ) of all models against the F(x i ). Based on the numerical results we can formulate several conclusions: Evidently, the lower and upper bounds increase with the probability level α. However, it is important to note that the intervals estimation of VaRs for the two indexes Nikkei225 and CAC40 are larger than those of DJI and S&P500 indexes. Note also that the influence of the volatility stock market is much more important than the influence of the sample size. Also, the widths of maxitive VaR intervals are rather tight than those derived from the GHYP, HS, and normal distributions. For example, for the short time horizon 1 year with the high volatility stock (Nikkei225) and at the 1% of probability level the width of the interval-valued of VaR is 1.073 while the widths of the BCa (99%) confidence interval based on HS, normal, and GHYP distributions are respectively 4.236, 1.925, and 4.246 respectively (See Table 6). Furthermore, we can remark that the maxitive interval-valued estimation of the VaR is on the left side of the normal VaR and this shows the ability of our approach to model very dangerous financial risks while the normal distribution is not consistent with tail-thickness and right tail risk. This results indicate that our approach is more accurate and informative especially for the smaller sample size.    Next, in order to inspect the goodness-of-fit of the used models a graphical tool (PP-plot) is constructed ( Figure 5) to compare the empirical cumulative function to the fitted cumulative functions. This plot confirms that the Epanechnikov kernel function and the GHYP distribution give a good global fit for the 4 returns data. The points of the PP-plot are close to the 45 degree line and they also lie within the interval estimation using maxitive kernel.
In contrast, from the same figure, the PP-plot of the normal cumulative function against the empirical cumulative function shows that the left end pattern is above the 45 degree line and the right end is below it. Thus, the normal distribution underestimates the VaR at low probability level. This is due to the fact that the normal distribution ignores the presence of fat tails in the actual distribution. Based on these results, we can conclude that the GHYP and the maxitive kernel method provide a better fit than a normal distribution to market return data. Thus, the maxitive kernel method seems to be a good choice to estimate the risk for VaR.

Conclusions
Using an estimate of the Value-at-Risk (VaR) based on a small-sized sample may pose a risk to financial application due to the high dependence of this VaR estimate to the computation method. In fact, computing a VaR estimate can be performed in many ways including parametric and nonparametric approaches. We have shown, with experiments based on daily closing prices data of four stock indexes, that no method can be said to be optimal to achieve this estimate. Moreover, the bias induced by choosing a particular method is particularly sensitive for estimates based on small samples. In this paper, we have presented a new method for computing the VaR which highly differs from its competitors in the sense that it is interval-valued. This interval-valued VaR estimate is the set of all estimates that could have been obtained, with the same data sample, by using a set of kernel-based estimation methods. In our experiments, we noted that the output of parametric methods, like the GHYP VaR estimate, always belong to the interval-valued VaR we propose while others, like the normal VaR estimate, do not. It appears that VaR estimates belonging to the interval-valued VaR estimate we propose are likely to be less risky than those that do not. Moreover, a wide interval-valued VaR estimate is a marker of a high risk for a trader since it reflects thick tails, pronounced skewness, and excess kurtosis of financial asset price returns.
The interval-valued estimation, based on maxitive kernel, that we propose in this paper is a convex envelope of the kernel VaR estimation. Although the VaR is probably one of the most popular tool in risk management, an alternative measure for Value-at-Risk which satisfied the conditions for a coherent risk measure has been proposed (see e.g., Rockafellar and Uryasev 2000;Artzner et al. 1999). This risk measure is called Expected Shortfall (ES; also known as conditional VaR or average VaR). Indeed, the Basel Committee published, in January 2016 Basel Committee on Banking Supervision (2016), revised standards for minimum capital requirements for market risk which include a shift from Value-at-Risk to expected shortfall as the preferred risk measure. Then, the future research should be conducted into finding an interval-valued estimation of ES, based on maxitive kernel, and to compare it with the interval-valued estimation of VaR and the ES for many distributions. In this regard, Broda and Paolella (2011) presents easily-computed expressions for evaluating the expected shortfall for numerous distributions which are now commonly used for modelling asset returns. Another interesting avenue of research would be to construct a CoVaR interval combining GARCH model with maxitive kernel. Integrated Summative Kernels % I n t e g r a t e d Epanechnikov k e r n e l f u n c t i o n [ y ] = K e r n e l I n t e g r a l E p a n e c h n i k o v ( x , c e n t r e , d e l t a ) x = ( x−c e n t r e )/ d e l t a ; y = z e r o s ( s i z e ( x ) ) ; xxx = x . * x . * x ; y = ( ( x > −1 ) . * ( x <= 0 ) . * ( −0.25 * xxx + 0 . 7 5 * x + 0 . 5 ) ) ; y = y + ( ( x < 1 ) . * ( x >= 0 ) . * ( −0.25 * xxx + 0 . 7 5 * x + 0 . 5 ) ) + ( x > = 1 ) ; % I n t e g r a t e d Cosinus k e r n e l f u n c t i o n [ y ] = K e r n e l I n t e g r a l C o s i n e ( x , c e n t r e , d e l t a ) x = ( x−c e n t r e )/ d e l t a ; y = z e r o s ( s i z e ( x ) ) ; y = ( ( x > −1 ) . * ( x <= 0 ) . * ( 0 . 5 * s i n ( 0 . 5 * p i * x )+ 0 . 5 ) ) ; y = y + ( ( x < 1 ) . * ( x >= 0 ) . * ( 0 . 5 * s i n ( 0 . 5 * p i * x )+ 0 . 5 ) + ( x > = 1 ) ) ; % I n t e g r a t e d T r i w e i g h t k e r n e l f u n c t i o n [ y ] = K e r n e l I n t e g r a l T r i w e i g h t ( x , c e n t r e , d e l t a ) x = ( x−c e n t r e )/ d e l t a ; y = z e r o s ( s i z e ( x ) ) ; x3 = x . * x . * x ; x5 = x . * x . * x . * x . * x ; x7 = x . * x . * x . * x . * x . * x . * x ; y = ( ( x > −1). * ( x <= 0 ) . * ( ( 3 5 / 3 2 ) * x −(35/32) * x3 + ( 2 1 / 3 2 ) * x5 −(5/32) * x7 + 0 . 5 ) ) ; y = y + ( ( x < 1 ) . * ( x >= 0 ) . * ( ( 3 5 / 3 2 ) * x −(35/32) * x3 + ( 2 1 / 3 2 ) * x5 −(5/32) * x7 + 0 . 5 ) + ( x > = 1 ) ) ; % I n t e g r a t e d Biweight k e r n e l f u n c t i o n [ y ] = K e r n e l I n t e g r a l B i w e i g h t ( x , c e n t r e , d e l t a ) x = ( x−c e n t r e )/ d e l t a ; y = z e r o s ( s i z e ( x ) ) ; xxx = x . * x . * x ; xxxxx = x . * x . * x . * x . * x ; y = ( ( x > −1 ) . * ( x <= 0 ) . * ( ( 1 5 / 1 6 ) * x −(5/8) * xxx + ( 3 / 1 6 ) * xxxxx + 0 . 5 ) ) ; y = y + ( ( x < 1 ) . * ( x >= 0 ) . * ( ( 1 5 / 1 6 ) * x −(5/8) * xxx + ( 3 / 1 6 ) * xxxxx + 0 . 5 ) + ( x >=1) ) ; Triangular Maxitive Kernel % T r i a n g u l a i r e m a x i t i v e k e r n e l f u n c t i o n [ y ] = Max_kernel_Triangular ( x , c e n t r e , d e l t a ) x = ( x−c e n t r e )/ d e l t a ; y = z e r o s ( s i z e ( x ) ) ; x = abs ( x ) ; Mask = x <= 1 ; y = ( 1 . 0 − x ) . * Mask ; r e t u r n ;