A Sequential Importance Sampling for Estimating Multi-Period Tail Risk

Seo, Ye-Ji; Kim, Sunggon

doi:10.3390/risks12120201

Open AccessFeature PaperArticle

A Sequential Importance Sampling for Estimating Multi-Period Tail Risk

by

Ye-Ji Seo

and

Sunggon Kim

^*

Department of Statistics, University of Seoul, 163 Seoulsiripdaero, Dongdaemun-gu, Seoul 02504, Republic of Korea

^*

Author to whom correspondence should be addressed.

Risks 2024, 12(12), 201; https://doi.org/10.3390/risks12120201

Submission received: 1 November 2024 / Revised: 7 December 2024 / Accepted: 10 December 2024 / Published: 13 December 2024

(This article belongs to the Special Issue Financial Derivatives: Market Risk, Pricing, and Hedging)

Download

Browse Figures

Versions Notes

Abstract

Plain or crude Monte Carlo simulation (CMC) is commonly applied for estimating multi-period tail risk measures such as value-at-risk (VaR) and expected shortfall (ES). After fitting a volatility model to the past history of returns and estimating the conditional distribution of innovations, one can simulate the return process following the fitted volatility model with the estimated conditional distribution of innovations. Repeated generation of the return processes with the desired length gives a sufficient number of simulated multi-period returns. Then, the multi-period VaR and ES are directly estimated from the empirical distribution of them. CMC is easily applicable. However, it needs to generate a huge number of multi-period returns for the accurate estimation of a tail risk measure, especially when the confidence level of the measure is close to 1. To overcome this shortcoming, we propose a sequential importance sampling, which is a modification of CMC. In the proposed method. The sampling distribution of innovations is chosen differently from the estimated conditional distribution of innovations so that the simulated multi-period losses are more severe than in the case of CMC. In other words, the simulated losses over the VaR that is wanted to estimate are common in the proposed method, which reduces very much the estimation error of ES, and requires the less simulated samples. We propose how to find the near optimal sampling distribution. The multi-period VaR and ES are estimated from the weighted empirical distribution of the simulated multi-period returns. We propose how to compute the weight of a simulated multi-period return. An empirical study is given to backtest the estimated VaRs and ESs by the proposed method, and to compare the performance of the proposed sequential importance sampling with CMC.

Keywords:

multi-period tail risk; crude Monte Carlo simulation; sequential importance sampling

1. Introduction

Value at Risk (VaR) and Expected Shortfall (ES) are two widely used risk measures in financial institutions to assess the potential loss in value of an asset or a portfolio over a given time period. VaR represents the maximum potential loss over a specified time period at a given confidence level. It tells the worst-case loss that will not be exceeded at a certain level of confidence. If the VaR of an asset or a portfolio over a specified time period is given, then ES measures the expected loss over the VaR in the case that the loss exceeds the VaR. ES provides additional information on the tail risk.

There are a huge amount of studies on VaR and ES estimation (Hong et al. 2014; Nadarajah et al. 2014; Nieto and Ruiz 2016). In earlier studies, VaR and ES are estimated by forecasting the quantile of the distribution of returns directly. Such studies include historical simulation, the kernel smoothing method (Chen and Tang 2005), the conditional autoregressive value at risk (CAViaR) (Engle and Manganelli 2004), and the extreme value approach (Longin 2000). Monte Carlo simulation is also useful in the case that the closed-form expression of the risk measures can not be obtained analytically. However, a very large number of simulated samples are required to obtain the desired level of estimation error in the method.

In recent studies, a volatility model is assumed that adequately describes the dynamic behavior of the conditional variance of the return. The GARCH-type models are representative. The studies on VaR and ES estimation can be classified into three types, based on how the conditional distribution of the innovations in the volatility model is estimated from previous returns. Hull and White (1998) and Barone-Adesi et al. (1999) developed non-parametric methods for the estimation of the conditional distribution of innovations. The method is called the filtered historical simulation (FHS). Gao and Song (2008) derived the limiting distribution of VaR and ES estimated by FHS. Although the FHS method offers the advantage of not relying on distributional assumptions, it needs long time series of returns in order to reflect extreme losses or gains in the estimation of the distribution of innovations. In parametric approaches, the standard normal or the standardized Student’s t-distribution are commonly assumed as the distribution of innovations. However, some general distributions were shown to give more accurate estimate to the risk measures (Bernardi et al. 2012; Broda and Paolella 2009; Simonato 2011). Applying a parametric method enables us to estimate the innovation distribution easily and efficiently, while if the adopted parametric model differs significantly from the actual distribution of innovations, it cannot accurately estimate VaR and ES. McNeil and Frey (2000) proposed a semi-parametric method, in which the tail of the innovation is fitted by the extreme value distribution. Jalal and Rockinger (2008) showed that the method developed by McNeil and Frey (2000) provides very good and robust VaR and ES forecasts when the threshold for determining which innovations are extreme is well chosen. However, it is not an easy task to choose the appropriate threshold. Moreover, if the number of innovations determined to be extreme is too small, then it leads to inaccurate parameter estimates of the extreme value distribution. Through an empirical study, Kuester et al. (2006) compared the existing methods for VaR estimation, and showed that the hybrid method combining a GARCH filter with an extreme value theory-based approach performs best, closely followed by that with the FHS. Righi and Ceretta (2015) compared the performance of some estimation methods of ES by an empirical study and a Monte Carlo experiment.

The estimation of VaR and ES over multiple days, such as two weeks (10 business days) or more, is essential according to Basel Committee on Banking Supervision (2013). The most simple and widely used method to estimate the multiday VaR is the square root of time method. In the method, the k-day VaR is estimated by scaling up the one-day VaR estimate by the square root of k. However, it is well known that the method has serious flaws (Brummelhuis and Kaufmann 2007; Lönnbark 2016). Quantile regressions have been developed for the estimation of the multiday risk measures by Ghysels et al. (2016); Le (2020), and Chen et al. (2021). In the method, a linear function of the short-horizon returns is fitted to the quantile of the multiday returns. The direct and iterated estimations are two main methods studied in the literature. In the direct approach, the k-day risk measures are estimated by the same method as the one-day risk measures, except that the k-day returns instead of daily returns are used for fitting the volatility model and the distribution of innovations. Since it is usually required for the k-day returns to be non-overlapping in the estimation, the direct approach may suffer from lack of data. In the iterated approach, the conditional variance of a k-day return is estimated by iterating the volatility model specified by the one-day returns. The moments of the standardized k-day returns is also estimated from the volatility model specified by the one-day returns. Applying the Cornish–Fisher expansion or Gram–Charlier expansion to the estimated moments, the distribution of the standardized k-day returns is estimated, which gives the estimates to the k-day risk measures (Lönnbark 2016; Zhou et al. 2016). For a more detailed discussion on the direct and iterated approaches, we refer to Ghysels et al. (2019) and Ruiz and Nieto (2023).

In the estimation of the multiday risk measures, crude Monte Carlo simulation (CMC) is also useful (Christoffersen 2011). In the method, the parameters of the volatility model of daily returns and the distribution of innovations are estimated from the past returns, and the daily returns over next k days is simulated from the fitted volatility model and the estimated conditional distribution of innovations. By generating a large number of return processes, one can obtain a sufficient number of simulated k-day returns. Then, the k-day VaR and ES are directly estimated from the empirical distribution of them. CMC is easily applicable. However, in order to estimate the risk measures accurately, the volatility model of daily returns should be correctly specified, and also the distribution of innovations. It also needs to generate a huge number of multi-period returns in order to obtain an accurate tail risk estimation, especially when the confidence level is very high. Since ES measures the expected loss over VaR, even severe losses that occur with very low probability must be reproduced for the accurate estimation of ES. Instead of directly estimating the risk measures from the simulated k-day returns, the Cornish–Fisher expansion can be applied to approximate the distribution of the k-day returns (Zhang et al. 2023). In the method, the mean, variance, and some higher moments of the k-day returns are estimated from the simulated k-day returns, and then the expansions are applied. Glasserman et al. (2000) proposed an importance sampling to reduce the estimation error of VaR in a static model. Hoogerheide and van Dijk (2010) considered GARCH-type volatility models, and proposed an adaptive importance sampling for the multiday VaR and ES estimation in a Bayesian framework. They proposed how to find the approximate optimal importance joint sampling density of the parameters of the volatility model and the returns over the multiple days.

To overcome the shortcoming of CMC, we propose a sequential importance sampling (SIS) for estimating the multi-period risk measures in GARCH-type volatility models. The proposed method is a modification of the crude Monte Carlo (CMC) simulation. In the proposed method, we choose the sampling distribution of innovations differently from the distribution estimated from the past standardized returns. The latter is used as the sampling distribution of innovations in CMC. We call the sampling distribution the importance sampling distribution. In our proposal, the importance sampling distribution is chosen so that the simulated losses over k days are more severe than the case of CMC, in which the daily log return processes are simulated from the fitted volatility model and the estimated distribution of innovations. Compared to CMC, SIS generates much more samples of k-day losses over the VaR, which results in reduced estimation error of the risk measures (Rubinstein and Kroese 2016) and requiring the less simulated samples.

In the proposed method, we first fit a GARCH-type volatility model to the past daily returns and estimate the distribution of innovations. We choose the exponential twisting of the estimated distribution of innovations as the importance sampling distribution of innovations. The daily return process over k days is simulated from the fitted volatility model and the importance sampling distribution of innovations. A large number of simulated k-day returns are obtained by generating return processes repeatedly as proposed above. Since the sampling distribution of the innovations is chosen differently from the estimated distribution of innovations, the empirical distribution of the simulated k-day returns is different from the target distribution of the k-day return. After assigning a weight to each simulated k-day return, the k-day VaR and ES are estimated from the weighted empirical distribution of the simulated k-day returns. We propose how to compute the weight of a simulated k-day return. We also prove that the optimal twisting parameter is unique, and show that an approximate value to it can be found by applying the stochastic approximation.

We have performed an empirical study to compare the performance of the proposed sequential importance sampling with two CMCs, in which the distribution of innovations is assumed to follow the standard normal distribution and the standardized Student’s t-distribution, respectively. Empirical results show that the proposed method does not reduce the estimation error in VaR estimation, but reduces the estimation error in ES estimation even if we consider the time it takes to obtain the estimate. We have performed backtestings to determine whether the proposed method and CMCs give accurate VaR and ES estimates.

The outline of the paper is as follows: In Section 2, we briefly review the crude Monte Carlo simulation for multi-period VaR and ES estimation. We propose a sequential importance sampling for multi-period VaR and ES estimation in Section 3. In Section 4, empirical results are given. Finally, we conclude the paper in Section 5.

2. Crude Monte Carlo Simulation for Multi-Period Risk Estimation

Let

P_{t}

,

t = 1, 2, \dots,

be the price of an asset or a portfolio at the end of the t-th time period, and let

R_{t} = log P_{t} - log P_{t - 1}

. Then,

R_{t}

is the log return during the t-th time period. We assume that

R_{t}

is a continuous random variable. In this paper, we consider the case that the length of a time period is a day. We denote by

H_{t} = {R_{t}, R_{t - 1}, \dots, R_{1}}

the history of log returns until time t. Then, the conditional mean of

R_{t}

given

H_{t - 1}

is defined as

μ_{t} = E [R_{t} | H_{t - 1}],

and the conditional variance of

R_{t}

given

H_{t - 1}

is defined as

σ_{t}^{2} = E [{(R_{t} - μ_{t})}^{2} | H_{t - 1}] .

Since the mean of the daily log return is very small compared to

σ_{t}

,

μ_{t}

is usually assumed to be 0 (Christoffersen 2011). Then, the above equation is rewritten as

σ_{t}^{2} = E [R_{t}^{2} | H_{t - 1}] .

(1)

The generic model of the daily log return is as follows: given

H_{t - 1}

,

R_{t} = σ_{t} Z_{t}, Z_{t} \overset{i . i . d .}{\sim} f (z),

(2)

where

f (z)

is a probability density function (pdf) with mean 0 and variance 1. In the above model, the innovations

Z_{1}, Z_{2}, \dots

are conditional standardized log returns. The usual choice of

f (z)

is the standard normal or the standardized Student’s t-distribution (Christoffersen 2011). However, we do not need to restrict

f (z)

to these distributions.

2.1. Volatility Model

There have been huge studies on the model explaining the dynamics of

{σ_{t}^{2}, t \geq 1}

. Among them, GARCH-type models are representative (Francq and Zakoian 2019; Teräsvirta 2009). In this paper, we consider the GJR-GARCH (Glosten et al. 1993) as the model of

{σ_{t}^{2}, t \geq 1}

for convenience. The GJR-GARCH model is capable of capturing most of the stylized facts about the time series of returns: the non-normality of the marginal return distribution, very low autocorrelations, the volatility clustering, and the leverage effect. However, other GARCH-type models such as EGARCH, threshold GARCH, and FIGARCH can be applied to our proposed method. In the GJR-GARCH

(1, 1)

model,

σ_{t}

in Equation (2) follows that

σ_{t}^{2} = ω + (α_{1} + γ_{1} I (R_{t - 1} < 0)) R_{t - 1}^{2} + β_{1} σ_{t - 1}^{2},

(3)

where

ω \geq 0

,

α_{1} > 0

,

β_{1} > 0

,

γ_{1} > 0

, and

I (A)

is the indicator of event A, i.e.,

I (A) = \{\begin{matrix} 1, if A occurs, \\ 0, otherwise . \end{matrix}

If

α_{1} + β_{1} < 1

, then

{σ_{t}^{2}, t \geq 1}

is a weakly stationary process.

The volatility clustering phenomenon is successfully explained by the GJR-GARCH model. By introducing the term

γ_{1} I (R_{t - 1} < 0)

into the model equation, the model also can incorporate the volatility’s asymmetric response, depending on whether the returns are positive or negative. The value of

γ_{1}

indicates the amount of additional response to

σ_{t}^{2}

per unit value of the squared log return during the previous time period when it is negative.

The parameters of a volatility model can be easily estimated using the maximum likelihood estimation when

{Z_{t}, t \geq 1}

follows the normal distribution. However, as known from many empirical analyses, the conditional distribution of returns observed in the financial market does not follow a normal distribution. Even in this case, the parameters can be estimated assuming that conditional returns are normally distributed. This method is called quasi-maximum likelihood estimation (QMLE). According to Bollerslev and Wooldridge (1992), the estimates obtained in this way are also consistent.

In applying Model (2) with a GARCH-type volatility model, we assume that the log return process is stationary in the sense that the parameters of the volatility model are constant, as well as those of the distribution model of

f (z)

. However, the assumption of the stationary log return process is not valid for a long time period, but only valid for a short time period, such as two or three years, for various reasons (Akgiray 1989). The rolling window method is generally applied to fitting a volatility model to a observed log-return process. In the method, we assume that the parameters of the volatility model do not change during a time period of fixed length, i.e., for

m > 0

,

{R_{t - 1}, \dots, R_{t - m}}

follows a volatility model with fixed parameters. The values of the parameters are estimated using

{R_{t - 1}, \dots, R_{t - m}}

instead of the full history of log returns.

2.2. Estimation of the Innovation Distribution

The distribution of the innovations must have mean zero and variance one. The typical choices for the distribution are the standard normal or the standardized Student’s t-distribution. Filtered historical simulation (FHS) is a non-parametric approach to estimate the distribution of the innovations (Barone-Adesi et al. 1999, 2002). Suppose that we have fitted a model of volatility to the log returns

{R_{t - 1}, \dots, R_{t - m}}

, and obtained the estimate

{\hat{σ}}_{s}

to

σ_{s}

,

s = t - 1, \dots, t - m

. Then, the latent variable

Z_{s}

is estimated as

{\hat{Z}}_{s} = \frac{R_{s}}{{\hat{σ}}_{s}}, s = t - 1, \dots, t - m .

(4)

We call

{{\hat{Z}}_{t - 1}, \dots, {\hat{Z}}_{t - m}}

the residuals of the fitted model. The residuals are approximately independent and identically distributed (i.i.d.) due to Equation (2). We denote by

f_{t - 1} (z)

the estimated pdf of the innovations from

{{\hat{Z}}_{t - 1}, \dots, {\hat{Z}}_{t - m}}

.

In FHS, the empirical pmf of

{{\hat{Z}}_{t - 1}, \dots, {\hat{Z}}_{t - m}}

is used as the approximated pmf of the innovations, i.e.,

f_{t - 1} (z) = \frac{1}{m} \sum_{j = 1}^{m} δ (z - {\hat{Z}}_{t - j}),

(5)

where

δ (z)

is the Dirac measure having the probability mass of 1 at the origin. The p quantile of

f (z)

at time t is estimated as the p quantile of the empirical pmf (5).

Butler and Schachter (1997) proposed to carry out Gaussian kernel smoothing of the past returns in the historical simulation. By exploiting their idea, we obtain a continuous approximation of the empirical pmf (5) as follows:

f_{t - 1} (z) = \frac{1}{m} \sum_{j = 1}^{m} ϕ_{δ} (z - {\hat{Z}}_{t - j}),

(6)

where

ϕ_{δ} (z)

is the pdf of the normal distribution with mean 0 and variance

δ^{2}

. Then,

f_{t - 1} (z)

in the above equation is also an approximate pdf of the innovations.

2.3. Multi-Period Risk Measures

Let

R_{t} (k) = log (P_{t + k - 1} / P_{t - 1})

. Then, it is the log return of the portfolio over k time periods, and also represented as

R_{t} (k) = \sum_{i = 0}^{k - 1} R_{t + i}

. The value-at-risk of the portfolio over the time interval with confidence level q is defined as

{VaR}_{t}^{q} (k) = - sup {r \in R | Pr {R_{t} (k) \geq r | H_{t - 1}} \geq q} .

(7)

The expected shortfall of the portfolio over k time periods with confidence level q is defined as

{ES}_{t}^{q} (k) = - E [R_{t} (k) | R_{t} (k) \leq - {VaR}_{t}^{q} (k)] .

(8)

Since the log returns are assumed to be continuous,

R_{t} (k)

is less than

- {VaR}_{t}^{q} (k)

with probability

1 - q

. Let

p = 1 - q

. Then, we have that

{ES}_{t}^{q} (k) = - \frac{E [R_{t} (k) I (R_{t} (k) \leq - {VaR}_{t}^{q} (k))]}{p} .

(9)

For the case of

k = 1

, we obtain from Equation (2) that

{VaR}_{t}^{q} (1) = - σ_{t} z_{p},

(10)

and

{ES}_{t}^{q} (1) = - σ_{t} E [Z_{t} | Z_{t} < z_{p}],

(11)

where

z_{p}

is the p quantile of

f (z)

. If we have obtained

{\hat{z}}_{p, t - 1}

, the p quantile of

f_{t - 1} (z)

, then we have from Equation (10) that

{\hat{VaR}}_{t}^{q} (1) = - {\hat{σ}}_{t} {\hat{z}}_{p, t - 1} .

(12)

Suppose that

{\hat{ES}}_{f, t - 1}^{q}

is the conditional mean of

- Z

, given

Z < {\hat{z}}_{p, t - 1}

when Z follows the pdf

f_{t - 1} (z)

. Then, the plug-in estimator to

{ES}_{t}^{q} (1)

from Equation (11) is as follows:

{\hat{ES}}_{t}^{q} (1) = {\hat{σ}}_{t} {\hat{ES}}_{f, t - 1}^{q} .

(13)

For

k > 1

, we can also estimate

{VaR}_{t}^{q} (k)

and

{ES}_{t}^{q} (k)

from Equations (12) and (13) by letting the time period consisting of k consecutive days being a single period. As mentioned above, the log return process can be assumed to be stationary during two or three years. For a time period longer than this, it might be difficult to assume that the process is stationary. If a single period consists of k consecutive days, then there are

500 / k

periods during two years, and

750 / k

periods during three years. If

k = 10

, then the numbers of periods are computed to be 50 and 75, respectively, which are too little to obtain reliable estimates for the parameters of the volatility model as well as those of the innovation distribution. Thus, instead of applying Equations (12) and (13), we rely on Monte Carlo simulation (Christoffersen 2011). In the simulation, a day constitutes a single period, and the stochastic process of daily log returns over next k periods from period t are simulated using the fitted volatility model of daily log returns and

f_{t - 1} (z)

.

2.4. Crude Monte-Carlo Simulation

A GARCH-type volatility model reflects the impact of past returns on current volatility, and it is a recursive function of volatilities. We can see from Equation (3) that given the volatility at

t_{0} - 1

, the volatility at time

t \geq t_{0}

is determined by the returns

H_{(t_{0} - 1) : (t - 1)} = {R_{t_{0} - 1}, \dots, R_{t - 1}}

. Let

θ = (ω, α_{1}, β_{1}, γ_{1})

be the vector of the GJR-GARCH(1,1) parameters. We denote by

ψ (H_{t - 1}; θ)

the volatility at time t given

H_{t - 1}

, i.e.,

σ_{t}^{2} = ψ (H_{t - 1}; θ) .

(14)

Suppose that the log return process is stationary in a time interval of length lager than

m + k

. Then, the parameter

\hat{θ}

estimated from the past m log returns from a time t can be assumed to govern the volatility process during next k periods. We denote by

{\hat{θ}}_{t - 1}

the estimate to

θ

obtained by fitting the model (2) and (3) to

{R_{t - 1}, \dots, R_{t - m}}

. The residuals during the past m periods are obtained from Equation (4). Let

f_{t - 1} (z)

be the approximate distribution of

f (z)

obtained by one of the methods described in Section 2.2.

Suppose that we have generated k random samples independently from

f_{t - 1} (z)

. We denote them by

{{\tilde{Z}}_{t + i}, i = 0, 1, \dots, k - 1}

. Then, the log returns over next k time periods are simulated as follows: for

i = 0, 1, \dots, k - 1

,

\begin{matrix} {\tilde{σ}}_{t + i}^{2} & = ψ ({\tilde{H}}_{t + i - 1}; {\hat{θ}}_{t - 1}), \\ {\tilde{R}}_{t + i} & = {\tilde{σ}}_{t + i} {\tilde{Z}}_{t + i}, \end{matrix}

(15)

where

{\tilde{H}}_{t + i - 1} = {{\tilde{R}}_{t + i - 1}, \dots, {\tilde{R}}_{t}} \cup H_{t - 1}

for

i > 0

. For

i = 0

,

{\tilde{H}}_{t + i - 1} = H_{t - 1}

. Note that

{\tilde{σ}}_{t}^{2}

is equal to

{\hat{σ}}_{t}^{2}

, which is computed by substituting

{\hat{θ}}_{t - 1}

for

θ

in Equation (14). Then,

{\tilde{σ}}_{t + 1}^{2}, \dots, {\tilde{σ}}_{t + k - 1}^{2}

are the simulated volatilities during the period from

t + 1

to

t + k - 1

, and

{\tilde{R}}_{t}, \dots, {\tilde{R}}_{t + k - 1}

are the simulated log returns during the period from t to

t + k - 1

. In what follows, we denote by the filtered historical simulation (FHS) the above procedure of simulating daily log returns with

f_{t - 1} (z)

given in Equation (5). If

f_{t - 1} (z)

given in Equation (6) is applied in the procedure instead of Equation (5), we call the procedure the FHS with Gaussian kernel smoothing.

The iterative generation of daily log returns gives a simulated log return over k periods from period t as follows:

{\tilde{R}}_{t} (k) = \sum_{i = 0}^{k - 1} {\tilde{R}}_{t + i} .

(16)

Repeating the above procedure N times independently, we obtain N independent samples of

{\tilde{R}}_{t} (k)

. Let them be

{\tilde{R}}_{t}^{(1)} (k), \dots, {\tilde{R}}_{t}^{(N)} (k)

. Then, the empirical cdf of them is an approximate cdf of

R_{t} (k)

, and

{VaR}_{t}^{q} (k)

is estimated as the

100 q

-th percentile of the negated log returns over k days, i.e.,

{\hat{VaR}}_{t}^{q} (k) = Percentile ({- {\tilde{R}}_{t}^{(1)} (k), \dots, - {\tilde{R}}_{t}^{(N)} (k)}, 100 q),

(17)

and that

{\hat{ES}}_{t}^{q} (k) = - \frac{\sum_{j = 1}^{N} {\tilde{R}}_{t}^{(j)} (k) I ({\tilde{R}}_{t}^{(j)} (k) \leq - {\hat{VaR}}_{t}^{q} (k))}{\sum_{j = 1}^{N} I ({\tilde{R}}_{t}^{(j)} (k) \leq - {\hat{VaR}}_{t}^{q} (k))} .

(18)

2.5. Back-Testing

In this subsection, we summarize the back-testings of VaR proposed by Kupiec (1995) and Christoffersen (1998), and the back-testing of ES proposed by Acerbi and Szekely (2014). We define

I_{t}

as follows: for

t \in T

,

I_{t} = \{\begin{matrix} 1, & if R_{t} (k) < - {\hat{VaR}}_{t}^{q} (k) \\ 0, & otherwise . \end{matrix}

(19)

If

R_{t} (k) < - {\hat{VaR}}_{t}^{q} (k)

at time t, then we call that a violation occurs at time t. Then,

I_{t}

is the indicator of the occurrence of a violation at time t. We call

{I_{t}, t \in T}

the violation process. In what follows, we consider the violation process as a sequence of 0s and 1s listed according to the order of the day at which the estimation was performed. The exact value of t is ignored in the process.

In the estimation of the k-period VaR at time t, only

H_{t - 1}

is used. Suppose that another piece of information,

A_{t - 1}

, is available in the estimation at time t. If

Pr {I_{t} = 1 | H_{t - 1}, A_{t - 1}}

is equal to

Pr {I_{t} = 1 | H_{t - 1}}

for any available information

A_{t - 1}

up to time t, then

H_{t - 1}

is sufficient for the current estimation of the k-day VaR. If

Pr {I_{t} = 1 | H_{t - 1}, A_{t - 1}}

is not equal to

Pr {I_{t} = 1 | H_{t - 1}}

for an information

A_{t - 1}

, then

A_{t - 1}

should be incorporated into constructing a better VaR estimation (Berkowitz et al. 2011).

Suppose that

H_{t - 1}, t \in T

is a series of sufficient information. Then, the past history of violations

{I_{s}, s < t, s \in T}

also are not helpful for the current estimation of the k-day VaR, which implies that they are independent with the current violation

I_{t}

. Under the hypothesis

H_{0} : {I_{t}, t \in T} is a Bernoulli process with success probability p,

(20)

the number of violations follows the Binomial distribution with n (the size of

T

) and success probability p, which enables us to obtain the confidence interval of the number of excess losses (Kupiec 1995). Let

n_{1}

be the number of violations, and let

n_{0} = n - n_{1}

. Under

H_{0}

in (20), the likelihood function of p is given by

L (p; I_{t}, t \in T) = q^{n_{0}} p^{n_{1}} .

Kupiec (1995) considered the following alternative hypothesis:

H_{1} : {I_{t}, t \in T} is a Bernoulli process with success probability π (\neq p) .

(21)

Let

\hat{π}

be the maximum likelihood estimator (m.l.e.) of p. Then,

\hat{π}

is computed to be

n_{1} / (n_{0} + n_{1})

. The likelihood ratio test on

H_{0}

vs.

H_{1}

is performed with the test statistic

L R_{u c} = - 2 log [\frac{L (p; I_{t}, t \in T)}{L (\hat{π}; I_{t}, t \in T)}] .

(22)

L R_{u c}

follows asymptotically the

χ^{2} (1)

distribution. We call the above test the unconditional coverage test.

The unconditional coverage test does not capture temporal dependence of the violation process. This results in lowering the power of the test (Berkowitz et al. 2011). To address the problem, Christoffersen (1998) considered a test for the temporal independence. He assumed that

{I_{t}, t \in T}

is a Markov chain with transition probability matrix

Π_{1} = [\begin{matrix} π_{00} & π_{01} \\ π_{10} & π_{11} \end{matrix}] .

Temporal independence of

{I_{t}, t \in T}

implies that

π_{00} = π_{10}

and

π_{01} = π_{11}

. The former and the latter equations are the same since

π_{00} = 1 - π_{01}

and

π_{10} = 1 - π_{11}

. Christoffersen (1998) considered the following hypothesis test:

H_{0} : π_{01} = π_{11} v . s . H_{1} : π_{01} \neq π_{11},

(23)

and proposed a likelihood-ratio test. We call the above hypothesis test the independent test. Let

n_{i j}

be the number of transitions from i to j in

{I_{t}, t \in T}

. Then, the likelihood function under

H_{1}

is as follows:

L (Π_{1}; I_{t}, t \in T) = {(1 - π_{01})}^{n_{00}} {π_{01}}^{n_{01}} {(1 - π_{11})}^{n_{10}} {π_{11}}^{n_{11}} .

Let

π = π_{01}

under

H_{0}

, and let

Π_{0}

be the 2 by 2 matrix obtained by letting

π_{01} = π_{11} = π

and

π_{00} = π_{10} = 1 - π

in

Π_{1}

. Then, the likelihood function under

H_{0}

is

L (Π_{0}; I_{t}, t \in T)

, i.e.,

L (Π_{0}; I_{t}, t \in T) = {(1 - π)}^{(n_{00} + n_{10})} π^{(n_{01} + n_{11})} .

The m.l.e. of

π

is computed to be

\hat{π} = (n_{01} + n_{11}) / n

, while the m.l.e. of

π_{01}

and

π_{11}

under

H_{1}

are computed to be

{\hat{π}}_{01} = n_{01} / (n_{00} + n_{01})

and

{\hat{π}}_{11} = n_{11} / (n_{10} + n_{11})

, respectively. By substituting

\hat{π}

into

L (Π_{0}; I_{t}, t \in T)

, and substituting

{\hat{π}}_{01}

and

{\hat{π}}_{11}

into

L (Π_{1}; I_{t}, t \in T)

, we obtain the following log likelihood-ratio statistic:

L R_{i n d} = - 2 log [\frac{L ({\hat{Π}}_{0}; I_{t}, t \in T)}{L ({\hat{Π}}_{1}; I_{t}, t \in T)}] .

(24)

L R_{i n d}

follows asymptotically the

χ^{2} (1)

distribution.

In order to test the unconditional coverage of estimated VaRs and the temporal independence of

{I_{t}, t \in T}

simultaneously, Christoffersen (1998) define the following statistic on

{I_{t}, t \in T}

:

L R_{c c} = L R_{u c} + L R_{i n d} .

(25)

Under the assumption that

{I_{t}, t \in T}

is a Markov chain,

H_{0}

in (20) is equivalent to that

π_{01} = π_{11} = p

, and

H_{0}

in (23) is equivalent to H₁ in (21). Then, the definition of

L R_{c c}

implies that

L R_{c c}

is the log likelihood-ratio statistic corresponding to the following hypothesis test:

H_{0} : π_{01} = π_{11} = p v . s . H_{1} : π_{01} \neq π_{11} .

(26)

L R_{c c}

follows asymptotically the

χ^{2} (2)

distribution. We call the above hypothesis test the conditional coverage test.

Acerbi and Szekely (2014) proposed three backtestings for ES: testing ES after VaR estimation, testing ES directly, estimating ES from realized ranks. Among them, we apply the testing ES after VaR estimation called

Z_{1}

test in this paper. In the test, the following hypothesis test is considered: for an positive integer k,

\begin{matrix} H_{0} : {ES}_{t}^{q} (k) = {\hat{ES}}_{t}^{q} (k), {VaR}_{t}^{q} (k) = {\hat{VaR}}_{t}^{q} (k), t \in T, \\ H_{1} : {ES}_{t}^{q} (k) > {\hat{ES}}_{t}^{q} (k), {VaR}_{t}^{q} (k) = {\hat{VaR}}_{t}^{q} (k), t \in T . \end{matrix}

(27)

For the above hypothesis test, Acerbi and Szekely (2014) proposed the following test statistic of the observed 10-day log returns

{R_{t} (k), t \in T}

:

Z_{1} = \frac{1}{n_{1}} \sum_{t \in T} \frac{R_{t} (k) I_{t}}{{\hat{ES}}_{t}^{q} (k)} + 1 .

(28)

In computing the above statistic, it is assumed that one or more violations have occurred, i.e.,

n_{1} > 0

.

Contrary to the assumption on

{R_{t} (k), t \in T}

of Acerbi and Szekely (2014),

{R_{t} (k), t \in T}

are not independent. However, the distribution of

Z_{1}

under

H_{0}

can be estimated by the bootstrapping as in Acerbi and Szekely (2014). In the method, we generate random copies of

Z_{1}

under

H_{0}

. Then, the empirical distribution of them is used as the approximate distribution of

Z_{1}

under

H_{0}

. Let

Z_{1}^{(1)}, \dots, Z_{1}^{(m)}

be the random copies. The p-value of

Z_{1}

given in Equation (28) is computed as follows:

p = \frac{1}{m} \sum_{i = 1}^{m} I (Z_{1}^{(i)} < Z_{1}) .

(29)

If the k-period ESs are underestimated many times, then the value of

Z_{1}

is less than that of the case in which the k-period ESs are mostly estimated accurately. This results in low values of p. If p is less than the prespecified significant level, then we reject

H_{0}

.

3. The Proposed Method

3.1. Sequential Importance Sampling of Log Return Process

When q is close to 1, most of the samples

{\tilde{R}}_{t}^{(1)} (k), \dots, {\tilde{R}}_{t}^{(N)} (k)

generated by the crude Monte Carlo simulation are larger than

- {VaR}_{t}^{q} (k)

. In this case,

{\hat{VaR}}_{t}^{q} (k)

is determined by few large losses, and the estimation error of

{VaR}_{t}^{q} (k)

increases. In the same reason, the estimation error of

{ES}_{t}^{q} (k)

also increases. To address the problem, we propose a sequential importance sampling for the estimation of

{VaR}_{t}^{q} (k)

and

{ES}_{t}^{q} (k)

. In our proposal, we choose the exponentially twisted distribution of

f_{t - 1} (z)

given in (6) as the sampling distribution of the innovations when we apply the Monte Carlo simulation (15). Suppose that the residuals

{{\hat{Z}}_{t - 1}, \dots, {\hat{Z}}_{t - m}}

are obtained at the end of period

t - 1

for a sufficiently large m. We define the importance sampling pdf of

{\tilde{Z}}_{t + i}

for

i = 0, \dots, k - 1

as follows: for

λ \in (- \infty, \infty)

,

g_{t - 1} (z; λ) = \frac{1}{m c (λ)} exp {λ z} \sum_{j = 1}^{m} ϕ_{δ} (z - {\hat{Z}}_{t - j}), - \infty < z < \infty,

(30)

where

c (λ) = \frac{1}{m} \sum_{j = 1}^{m} exp \{λ {\hat{Z}}_{t - j} + \frac{δ^{2} λ^{2}}{2}\} .

(31)

Note that

c (λ)

is the moment generating function of

f_{t - 1} (z)

, i.e.,

c (λ) = E_{f_{t - 1}} [e^{λ Z}]

. Equation (30) is rewritten as

g_{t - 1} (z; λ) = \sum_{j = 1}^{m} c_{j} ϕ_{δ} (z - ({\hat{Z}}_{t - j} + λ δ^{2})), - \infty < z < \infty,

(32)

where

c_{j} = \frac{exp {λ {\hat{Z}}_{t - j}}}{\sum_{j = 1}^{m} exp {λ {\hat{Z}}_{t - j}}}, j = 1, \dots, m .

Equation (32) says that the importance sampling distribution of

{\tilde{Z}}_{t + i}

,

i = 0, \dots, k - 1

, is a mixture of the normal distributions

N ({\hat{Z}}_{t - 1} + λ δ^{2}, δ^{2}), \dots, N ({\hat{Z}}_{t - m} + λ δ^{2}, δ^{2})

with weights

c_{1}, \dots, c_{m}

. Suppose that J is a discrete random variable with pmf

Pr {J = j} = c_{j}

,

j = 1, \dots, m

. Then, the random variable Z generated from

N ({\hat{Z}}_{t - J} + λ δ^{2}, δ^{2})

follows the pdf

g_{t - 1} (z; λ)

.

In the proposed scheme, we generate

{\tilde{Z}}_{t}, \dots, {\tilde{Z}}_{t + k - 1}

independently from

g_{t - 1} (z; λ)

. For

λ = 0

,

g_{t - 1} (z; λ)

is the same as

f_{t - 1} (z)

given in Equation (6). We have that

\frac{f_{t - 1} (z)}{g_{t - 1} (z; λ)} \propto exp {- λ z} .

(33)

For the case of negative value of

λ

, the above equation implies that the negative values of innovations occur more frequently in the proposed method compared to the FHS with Gaussian kernel smoothing. This tendency becomes more evident at smaller values of

λ

. In order to obtain a sample process of log returns during the period from t to

t + k - 1

, we apply the iteration (15). We can see that the probability of the occurrence of a large loss over k periods increases with the small values of

{\tilde{Z}}_{t}, \dots, {\tilde{Z}}_{t + k - 1}

during the periods. The smaller the value of

λ

, the probability of the occurrence of large losses higher.

We denote by

{\tilde{Z}}_{t : (t + k - 1)} = {{\tilde{Z}}_{t}, \dots, {\tilde{Z}}_{t + k - 1}}

the generated sequence of innovations, and denote by

{\tilde{R}}_{t : (t + k - 1)} = {{\tilde{R}}_{t}, \dots, {\tilde{R}}_{t + k - 1}}

the simulated process of log returns. By abuse of notation, we denote by

g_{t - 1} (r)

,

r \in R^{k}

, the pdf of

{\tilde{R}}_{t : (t + k - 1)}

, and denote by

f_{t - 1} (r)

,

r \in R^{k}

, the pdf of

{\tilde{R}}_{t : (t + k - 1)}

if the innovations were generated from

f_{t - 1} (z)

. We assume that

f_{t - 1} (z)

is very close to

f (z)

so that

f_{t - 1} (r)

is a good approximation to the true pdf of

R_{t : (t + k - 1)}

.

The likelihood ratio of a sample process

{\tilde{R}}_{t : (t + k - 1)}

following

f_{t - 1} (r)

with respect to the importance sampling pdf

g_{t - 1} (r)

is as follows:

W ({\tilde{R}}_{t : (t + k - 1)}) = \frac{f_{t - 1} ({\tilde{R}}_{t : (t + k - 1)})}{g_{t - 1} ({\tilde{R}}_{t : (t + k - 1)})} .

Given

H_{t - 1}

, a sample process of innovations from t to

t + k - 1

uniquely determines the log return process during the time interval. It can be easily shown that the above likelihood ratio of

{\tilde{R}}_{t : (t + k - 1)}

is equal to that of

{\tilde{Z}}_{t : (t + k - 1)}

. It gives that

W ({\tilde{R}}_{t : (t + k - 1)}) = \prod_{i = 0}^{k - 1} \frac{f_{t - 1} ({\tilde{Z}}_{t + i})}{g_{t - 1} ({\tilde{Z}}_{t + i})} .

(34)

Since

{Pr}_{f_{t - 1}} {{\tilde{R}}_{t} (k) \leq x}

is represented as

E_{f_{t - 1}} [I ({\tilde{R}}_{t} (k) \leq x)]

, it follows from Rubinstein and Kroese (2016) that

\Pr_{f_{t - 1}} {{\tilde{R}}_{t} (k) \leq x} = E_{g_{t - 1}} [I ({\tilde{R}}_{t} (k) \leq x) W ({\tilde{R}}_{t : t + k - 1})],

(35)

where

\Pr_{g} {\cdot}

and

E_{g} [\cdot]

are the probability and the expectation with respect to a pdf

g (\cdot)

, respectively.

Suppose that we have simulated N processes of innovations

{{\tilde{Z}}_{t}^{(j)}, {\tilde{Z}}_{t - 1}^{(j)}, \dots, {\tilde{Z}}_{t + k - 1}^{(j)}}

from

g_{t - 1} (z; λ)

,

j = 1, \dots, N

. Let

{\tilde{R}}_{t : (t + k - 1)}^{(j)}

be the log return process corresponding to

{\tilde{Z}}_{t : (t + k - 1)}^{(j)}

. We obtain from Equations (33) and (34) that the unnormalized likelihood ratio of

{\tilde{R}}_{t : (t + k - 1)}^{(j)}

is given by

w^{(j)} = exp \{- λ \sum_{i = 0}^{k - 1} {\tilde{Z}}_{t + i}^{(j)}\}, j = 1, \dots, N .

Then, the sample mean of

{w^{(1)}, \dots, w^{(N)}}

gives a strongly consistent estimator of the normalizing constant of the likelihood ratio. Let

{\tilde{R}}_{t}^{(j)} (k)

be the k-period log return corresponding to

{\tilde{R}}_{t : (t + k - 1)}^{(j)}

. It follows from Equation (35) that a strongly consistent estimator of

\Pr_{f_{t - 1}} {{\tilde{R}}_{t} (k) \leq x}

is given by

{\hat{\Pr}}_{f_{t - 1}} {{\tilde{R}}_{t} (k) \leq x} = \frac{\sum_{j = 1}^{N} I ({\tilde{R}}_{t}^{(j)} (k) \leq x) w^{(j)}}{\sum_{j = 1}^{N} w^{(j)}} .

If we let

W^{(j)} = w^{(j)} / \sum_{j = 1}^{N} w^{(j)}

, then we have that

{\hat{\Pr}}_{f_{t - 1}} {{\tilde{R}}_{t} (k) \leq x} = \sum_{j = 1}^{N} I ({\tilde{R}}_{t}^{(j)} (k) \leq x) W^{(j)} .

(36)

Let

{r_{1}, \dots, r_{N}}

be the order statistic of

{{\tilde{R}}_{t}^{(1)} (k), \dots, {\tilde{R}}_{t}^{(N)} (k)}

, and

W_{j}

be the likelihood ratio corresponding to

r_{j}

,

j = 1, \dots, N

. Then, Equation (36) is rewritten as

{\hat{\Pr}}_{f_{t - 1}} {{\tilde{R}}_{t} (k) \leq x} = \sum_{j = 1}^{N} I (r_{j} \leq x) W_{j} .

(37)

We define

j^{*}

as follows:

j^{*} = \max {m : \sum_{j = 1}^{m} W_{j} \leq 1 - q} .

Since

{r_{1}, \dots, r_{N}}

are in ascending order, it follows from Equation (37) that

{\hat{VaR}}_{t}^{q} (k) = - \frac{r_{j^{*}} + r_{j^{*} + 1}}{2},

(38)

and that

{\hat{ES}}_{t}^{q} (k) = - \frac{\sum_{j = 1}^{j^{*}} r_{j} W_{j}}{\sum_{j = 1}^{j^{*}} W_{j}} .

(39)

Suppose that we have observed

{R_{1}, \dots, R_{n}}

. Algorithm 1 shows the procedure to find

{\hat{VaR}}_{t}^{q} (k)

and

{\hat{ES}}_{t}^{q} (k)

,

t = m + 1, \dots, n

, by the proposed method.

Algorithm 1 Sequential importance sampling for multi-period VaR and ES estimation

Require:: a volatility model $ψ$ , ${R_{i}, 1 \leq i \leq n}$ (log returns), m (the size of the rolling window), k (the number of periods during which the risk measures are estimated), q (the confidence level), $λ$ (the twisting parameter), $δ^{2}$ (the variance of the smoothing distribution)
Ensure:: ${\hat{VaR}}_{t}^{q} (k)$ , ${\hat{ES}}_{t}^{q} (k), t = m + 1, \dots, m + n$
1:: for $t \in {m + 1, m + 2, \dots, n}$ do
2:: Fit the volatility model $ψ$ to ${R_{t - j}, j = 1, \dots, m}$ and obtain $\hat{θ}$
3:: Estimate volatilities ${{\hat{σ}}_{t - j}^{2}, j = 1, \dots, m}$
4:: ${\hat{Z}}_{t - j} \leftarrow R_{t - j} / {\hat{σ}}_{t - j}$ for $j = 1, \dots, m$
5:: $c_{j} \leftarrow exp {λ {\hat{Z}}_{t - j}} / \sum_{j = 1}^{m} exp {λ {\hat{Z}}_{t - j}}, j = 1, \dots, m$ .
6:: $g_{t - 1} (z) \leftarrow \sum_{j = 1}^{m} c_{j} ϕ_{δ} (z - ({\hat{Z}}_{t - j} + λ δ^{2}))$ .
7:: for $j \in {1, 2, \dots, N}$ do
8:: Generate random samples ${{\tilde{Z}}_{t}, {\tilde{Z}}_{t + 1}, \dots, {\tilde{Z}}_{t + k - 1}}$ independently from $g_{t - 1} (z)$
9:: Generate a process of log returns using the iteration: for $i = 0, 1, \dots, k - 1$ ,

$\begin{matrix} {\tilde{σ}}_{t + i}^{2} \leftarrow ψ ({\tilde{H}}_{t + i - 1}; \hat{θ}) \\ {\tilde{R}}_{t + i} \leftarrow {\tilde{σ}}_{t + i} {\tilde{Z}}_{t + i} \end{matrix}$
10:: Compute the unnormalized likelihood ratio of the log return process ${\tilde{R}}_{t : t + k - 1}$ :

$w^{(j)} \leftarrow exp \{- λ \sum_{i = 0}^{k - 1} {\tilde{Z}}_{t + i}\}$
11:: ${\tilde{R}}_{t}^{(j)} (k) \leftarrow \sum_{i = 0}^{k - 1} {\tilde{R}}_{t + i}$
12:: end for
13:: Normalize the likelihood ratios: $W^{(j)} \leftarrow w^{(j)} / \sum_{j = 1}^{N} w^{(j)}$ , $j = 1, 2, \dots, N$ .
14:: Find ${r_{1}, \dots, r_{N}}$ the order statistic of ${{\tilde{R}}_{t}^{(1)} (k), \dots, {\tilde{R}}_{t}^{(N)} (k)}$ .
15:: $W_{j} \leftarrow$ the likelihood ratio corresponding to $r_{j}$ , $j = 1, \dots, N$ .
16:: $j^{*} \leftarrow \max {m : \sum_{j = 1}^{m} W_{j} \leq 1 - q}$
17:: ${\hat{VaR}}_{t}^{q} (k) \leftarrow - (r_{j^{*}} + r_{j^{*} + 1}) / 2$
18:: ${\hat{ES}}_{t}^{q} (k) \leftarrow - \frac{\sum_{j = 1}^{j^{*}} r_{j} W_{j}}{\sum_{j = 1}^{j^{*}} W_{j}}$
19:: end for

3.2. Finding the Optimal Twisting Parameter

Given

H_{t - 1}

and

θ

,

R_{t : (t + k - 1)}

is determined uniquely by

Z_{t : (t + k - 1)}

. Thus, we can see that

{\tilde{R}}_{t} (k)

in Equation (16) is a function of

{\tilde{Z}}_{t : t + k - 1}

given

H_{t - 1}

and

{\hat{θ}}_{t - 1}

. We denote it by

r (z; H_{t - 1}, {\hat{θ}}_{t - 1})

,

z \in R^{k}

, i.e.,

{\tilde{R}}_{t} (k) = r ({\tilde{Z}}_{t : (t + k - 1)}; H_{t - 1}, {\hat{θ}}_{t - 1}) .

By abuse of notation, we denote by

f_{t - 1} (z) = \prod_{i = 1}^{k} f_{t - 1} (z_{i})

the joint pdf of

{\tilde{Z}}_{t : (t + k - 1)}

. In what follows, we call

f_{t - 1} (z)

the nominal pdf of

{\tilde{Z}}_{t : (t + k - 1)}

. We have from Equation (9) that an approximate form of

{ES}_{t}^{q} (k)

is obtained as

{ES}_{t}^{q} (k) \approx E_{f_{t - 1}} [τ_{t} ({\tilde{Z}}_{t : (t + k - 1)})],

where

τ_{t} (z) = \frac{- r (z; H_{t - 1}, {\hat{θ}}_{t - 1}) I (r (z; H_{t - 1}, {\hat{θ}}_{t - 1}) \leq - {VaR}_{t}^{q} (k))}{p}, z \in R^{k} .

Since

τ_{t} ({\tilde{Z}}_{t : (t + k - 1)})

is nonnegative, the optimal importance sampling pdf of

{\tilde{Z}}_{t : (t + k - 1)}

for the estimation of

E_{f_{t - 1}} [τ_{t} ({\tilde{Z}}_{t : (t + k - 1)})]

is as follows (Rubinstein and Kroese 2016):

g_{t - 1}^{*} (z) = \frac{τ_{t} (z) f_{t - 1} (z)}{E_{f_{t - 1}} [τ_{t} ({\tilde{Z}}_{t : (t + k - 1)})]}, z \in R^{k} .

(40)

In the proposed sequential importance sampling, the pdf of an importance sample

{\tilde{Z}}_{t : (t + k - 1)}

has the following form:

g_{t - 1} (z; λ) = \prod_{i = 1}^{k} g_{t - 1} (z_{i}; λ), z \in R^{k} .

We want to determine the value of the twisting parameter

λ

so that the cross entropy of

g_{t - 1} (z; λ)

relative to

g_{t - 1}^{*} (z)

is minimized. Then, the desired value of

λ

is given by

λ_{t - 1}^{*} = \underset{λ}{argmin} E_{g_{t - 1}^{*}} [log \frac{g_{t - 1}^{*} (Z)}{g_{t - 1} (Z; λ)}] .

It can be easily shown that

λ_{t - 1}^{*} = \underset{λ}{argmin} E_{f_{t - 1}} [- τ_{t} (Z) log g_{t - 1} (Z; λ)] .

(41)

We can see from Equation (30) that

log g_{t - 1} (Z; λ) = λ \sum_{i = 1}^{k} Z_{i} + \sum_{i = 1}^{k} log (\sum_{j = 1}^{m} ϕ_{δ} (Z_{i} - {\hat{Z}}_{t - j})) - k log c (λ) - k log m .

(42)

In the above equation,

log c (λ)

is the cumulant generating function of

f_{t - 1} (z)

, which is infinitely differentiable and convex. Thus, for

z \in R^{k}

,

log g_{t - 1} (z; λ)

is concave with respect to

λ

, which implies that the term

E_{f_{t - 1}} [- τ_{t} (Z) log g_{t - 1} (Z; λ)]

in Equation (41) is convex with respect to

λ

. Thus,

λ_{t - 1}^{*}

is unique if it exists.

Since

log g_{t - 1} (z; λ)

is differentiable with respect to

λ

, we can apply the stochastic gradient descent method to finding the solution of optimization problem (41). In applying the method, we generate samples of

Z

from

f_{t - 1} (z)

in order to estimate the derivative of

E_{f_{t - 1}} [τ_{t} (Z) log g_{t - 1} (Z; λ)]

with respect to

λ

. However, only about p proportion of the simulated

{\tilde{R}}_{t} (k)

s are less than

- {VaR}_{t}^{q} (k)

. In other words, most of the generated samples make

τ_{t} (Z)

become zero, and do not contribute to estimate the derivative of

E_{f_{t - 1}} [τ_{t} (Z) log g_{t - 1} (Z; λ)]

for given

λ

, which results in a large estimation error. Thus, we need to resort to another sampling pdf of

Z

.

Equation (41) is rewritten as follows: for a

\tilde{λ} \in R

,

λ_{t - 1}^{*} = \underset{λ}{argmin} E_{\tilde{λ}} [- τ_{t} (Z) log g_{t - 1} (Z; λ) \frac{f_{t - 1} (Z)}{g_{t - 1} (Z; \tilde{λ})}],

(43)

where

E_{\tilde{λ}} [\cdot]

denotes the expectation with respect to the pdf

g_{t - 1} (z; \tilde{λ})

. We need to choose a negative

\tilde{λ}

much less than 0 so that a fairly large number of samples of

{\tilde{R}}_{t} (k)

are smaller than

- {VaR}_{t}^{q} (k)

. We denote by

W_{t - 1} (Z; \tilde{λ})

the likelihood ratio

f_{t - 1} (Z) / g_{t - 1} (Z; \tilde{λ})

. Then, it follows from Equation (42) that

\begin{matrix} \frac{d}{d λ} E_{\tilde{λ}} [- τ_{t} (Z) log g_{t - 1} (Z; λ) W_{t - 1} (Z; \tilde{λ})] \\ = E_{\tilde{λ}} [- τ_{t} (Z) W_{t - 1} (Z; \tilde{λ}) (\sum_{i = 1}^{k} Z_{i} - k \frac{c^{'} (λ)}{c (λ)})] . \end{matrix}

(44)

Now, we can apply the stochastic gradient descent method to find

λ_{t - 1}^{*}

in an iterative manner. Suppose that we have obtained an estimate for

λ_{t - 1}^{*}

at the j-th iteration. We denote it by

λ_{j}

. We generate

Z^{(1)}, \dots, Z^{(L)}

from

g_{t - 1} (z; λ_{j})

at the

(j + 1)

-st iteration. We define

{\tilde{R}}^{(l)} (k) = r (Z^{(l)}; H_{t - 1}, {\hat{θ}}_{t - 1})

and

S = {l : {\tilde{R}}^{(l)} (k) \leq - {VaR}_{t}^{q} (k)}

. Then, for

l \in S

,

τ_{t} (Z^{(l)})

is equal to

- {\tilde{R}}^{(l)} (k) / p

, and for

l \notin S

, it is equal to 0. It follows from Equation (44) that

\begin{matrix} \frac{d}{d λ} E_{λ_{j}} [- τ_{t} (Z) log g_{t - 1} (Z; λ) W_{t - 1} (Z; λ_{j})] \\ \approx \frac{1}{p L} \sum_{l \in S} {\tilde{R}}^{(l)} (k) W_{t - 1} (Z^{(l)}; λ_{j}) (\sum_{i = 1}^{k} Z_{i}^{(l)} - k \frac{c^{'} (λ)}{c (λ)}) . \end{matrix}

Note that

W_{t - 1} (Z; λ_{j})

is proportional to

exp {- λ_{j} \sum_{i = 1}^{k} Z_{i}}

. Let

w^{(l)} = exp {- λ_{j} \sum_{i = 1}^{k} Z_{i}^{(l)}}

,

l = 1, \dots, L

. Then, the next estimate to

λ_{t - 1}^{*}

is set to be

λ_{j + 1} = λ_{j} - \frac{a (j)}{L} \sum_{l \in S} {\tilde{R}}^{(l)} (k) w^{(l)} (\sum_{i = 1}^{k} Z_{i}^{(l)} - k \frac{c^{'} (λ_{j})}{c (λ_{j})}),

where

a (j)

is the learning rate. By the above iteration with an appropriate initial value

λ_{1}

, we obtain a sequence

{λ_{1}, λ_{2}, \dots}

convergent to

λ_{t - 1}^{*}

. We terminate the above iteration when the difference between

λ_{j}

and

λ_{j + 1}

is sufficiently small. The final estimate to

λ_{t - 1}^{*}

may be appropriate as the initial value

λ_{1}

in finding

λ_{t}^{*}

.

We adopt the learning rate of the form

a (j) = \frac{a}{j + b}, j = 1, 2, \dots,

(45)

for appropriate constants a and b. In the above described method to find

λ_{t - 1}^{*}

, the value of

{VaR}_{t}^{q} (k)

needs to be known in advance. Suppose that we estimate the k-period VaR and ES at each day. Then, we may use

{\hat{VaR}}_{t - 1}^{q} (k)

as the value of

{VaR}_{t}^{q} (k)

instead.

4. Illustration

We have estimated the 10-day VaR and the 10-day ES of the S&P 500 Index every 10 days from 21 December 1973 to 29 December 2023 by the proposed method, the crude Monte Carlo simulation with normal innovations, and the crude Monte Carlo simulation with standardized Student’s t innovations. We call the Monte Carlo methods SIS, CMC-N, and CMC-t, respectively. We denote by

T

the set of every 10 days from 21 December 1973 to 29 December 2023. The size of set

T

is 1261. In order to statistically test whether the estimates are correct, we have performed some backtestings; the conditional coverage test (Christoffersen 1998) for the 10-day VaR estimates and the

Z_{1}

test (Acerbi and Szekely 2014) for the 10-day ES estimates.

Table 1 shows summary statistics of

{R_{t} (10), t \in T}

, the 10-day log returns of the S&P 500 Index observed every 10 days from 21 December 1973 to 29 December 2023. The histogram of the 10-day log returns is shown in Figure 1. We can see from Table 1 and Figure 1 that the 10-day log returns of the S&P 500 Index have the sample mean close to 0, and are skewed to the left. There are many large losses compared to large profits.

4.1. Estimation of Risk Measures

By applying CMC-N, CMC-t, and SIS, we obtained

{{\hat{VaR}}_{t}^{q} (10), t \in T}

, the 10-day VaR estimates at every 10 days from 21 December 1973 to 29 December 2023, and

{{\hat{ES}}_{t}^{q} (10), t \in T}

, the 10-day ES estimates at every 10 days during the same period. The confidence levels of the estimates were set to be 0.95, 0.975, 0.99. In each method, the rolling window method with the window size of 750 was applied. At the beginning of each day t,

t \in T

, we fitted the GJR-GARCH(1,1) model to the daily log returns of previous 750 days, and obtained the estimated daily volatilities and the residuals during the previous 750 days. When we fitted the GJR-GARCH(1,1) model in CMC-N and SIS, we assumed that the innovations follow the standard normal distribution. In CMC-t, the innovations were assumed to follow the standardized Student’s t-distribution. In CMC-N and CMC-t, we applied the crude Monte Carlo simulation described in Section 2.4 to generate

10^{4}

sample processes of

{\tilde{R}}_{t : t + 9}

, the daily log returns over next 10 days, where the approximate distribution of the innovations is the standard normal distribution in CMC-N, and the standardized Student’s t-distribution in CMC-t. In SIS, we also have generated

10^{4}

importance sample processes of

{\tilde{R}}_{t : t + 9}

as described by Section 3. The value of

δ

in Equation (30) was set to be 0.25. We implemented CMC-N, CMC-t, and SIS in R (R Core Team 2024). The estimation of the risk measures described above was performed in a desktop computer with six cores and 8 GB RAM.

For t,

t \in T

,

{\hat{VaR}}_{t}^{q} (10)

and

{\hat{ES}}_{t}^{q} (10)

were estimated simultaneously from the simulated

10^{4}

sample processes of

{\tilde{R}}_{t : t + 9}

. A total of 1261 number of 10-day VaR and 10-day ES estimations were performed for each method and confidence level. Table 2 shows the elapsed time to obtain the estimates for each estimation method and confidence level.

For all confidence levels, CMC-t took more time to obtain the estimates than CMC-N and SIS. This is due to the fact that fitting the GJR-GARCH model with the standardized Student’s t innovations to the observed log returns takes more time than fitting with the normal innovations. In SIS, we find the optimal value of the twisting parameter

λ

and compute the unnormalized likelihood ratio for each of the importance samples of

{\tilde{R}}_{t : t + 9}

,

t \in T

, which made the time difference between CMC-N and SIS in Table 2. However, the time difference was not so much. SIS took approximately 20% more time than CMC-N. We can see that the computational burden incurred by using SIS instead of CMC-N is not that large, and that SIS is more efficient than CMC-t in terms of the computational burden.

In estimating the

{VaR}_{t}^{q}

(10),

t \in T

, we obtained the simulated values of the 10-day log return from the simulated

{\tilde{R}}_{t : t + 9}

. We denote them by

{\tilde{R}}_{t}^{(1)} (10), \dots, {\tilde{R}}_{t}^{(N)} (10)

, where N is equal to

10^{4}

. We also computed the unnormalized likelihood ratio for each

{\tilde{R}}_{t : t + 9}

in SIS. We divide the set

{{\tilde{R}}_{t}^{(1)} (10), \dots, {\tilde{R}}_{t}^{(N)} (10)}

into 10 sets of equal numbers of

{\tilde{R}}_{t} (10)

. For each of the sets, we computed the 10-day VaR estimate from Equation (17) in CMC-N and CMC-t, and from Equation (38) in SIS. Then, the sample mean of these 10-day VaR estimates becomes

{\hat{VaR}}_{t}^{q} (10)

in each method. The standard error of the sample mean of these 10-day VaR estimates becomes the standard error of

{\hat{VaR}}_{t}^{q} (10)

. We also computed the relative error of

{\hat{VaR}}_{t}^{q} (10)

, which is the S.E. of

{\hat{VaR}}_{t}^{q} (10)

divided by

{\hat{VaR}}_{t}^{q} (10)

. Figure 2 shows the negated 10-day VaR estimates obtained by SIS. In the figure, the confidence level of the VaR estimates is the

0.975

. The figure also shows 10-day log returns computed at day t,

t \in T

. The behavior of VaR estimates by CMC-N (and CMC-t) looks similar to the figure.

If

R_{t} (10) < - {\hat{VaR}}_{t}^{q} (10)

at day t, then we call that a violation occurs at day t, and call such a loss the excessive loss, i.e., a 10-day excessive loss means a negated 10-day log return larger than the corresponding 10-day VaR estimate. We computed

{I_{t}, t \in T}

the violation process defined in Equation (19) with k being equal to 10, and counted the number of violations. If a method estimates

{\hat{VaR}}_{t}^{q} (10)

correctly for

t \in T

, then a violation occurs at day t with probability p. Since a total of 1261 number of 10-day VaR estimations were performed, the expected number of violationsis 1261 × p. Table 3 shows the expected number of violation, and the number of observed violation for each method and confidence level. The table also shows the 95% confidence interval of the number of violations under the null hypothesis (20). In the table, the numbers of observed violations are within their confidence intervals for all confidence levels and methods. It seems that all methods estimated the 10-day VaR appropriately. We will discuss this point in more detail in the next subsection.

Table 4 shows the average S.E. and R.E. of

{{\hat{VaR}}_{t}^{q} (10), t \in T}

for each method when the confidence levels are 0.95, 0.975, and 0.99. We can see that CMC-N gives the lowest average S.E and average R.E. for all confidence levels. However, when the confidence level is 0.99, the average S.E. (and also the average R.E.) of CMC-N and SIS are similar. Thus, SIS was not effective in reducing the average S.E. of the 10-day VaR estimates. This is due to the fact that the twisting parameter of SIS is found to minimize the estimation error of the 10-day ES, not the 10-day VaR.

When we estimate a value by a Monte Carlo simulation, the efficiency of the Monte Carlo simulation is inversely proportional to the product of the variance of the estimate and the simulation time to obtain the estimate (Glynn and Whitt 1992; Sak and Hörmann 2012). If the estimate obtained by a Monte Carlo simulation has the same variance as the estimate obtained by another Monte Carlo simulation, then the ratio of the products represents the ratio of the simulation times taken for the two Monte Carlo simulations to obtain the estimates with the same accuracy. Suppose that it took the same simulation time for both the Monte Carlo simulations to obtain the estimate. If we recall that the variance of the estimate is inversely proportional to the sample size, and that the simulation time increases almost linearly with the sample size, then we can see that the ratio represents how much simulation time will be taken for a Monte Carlo simulation to obtain the estimate with the same accuracy as that by the other Monte Carlo simulation.

We have obtained a series of estimates to the 10-day VaRs, not a single estimate. Thus, the efficiency of a method is inversely proportional to the product of the square of the average S.E., and the simulation time to obtain the estimates. We call the product the time-variance. The lower the time-variance of a method is, the more efficient the method is. Table 4 shows the time-variance of the 10-day VaR estimation in each method. It can be seen from the table that CMC-N is the most efficient, regardless of the confidence level. Both CMC-T and SIS took much time to obtain the estimates, and have the larger S.E. than CMC-N. These result in the higher time-variances of CMC-T and SIS compared to CMC-N. When we computed the time-variance for each method and confidence level in Table 4, we used the elapsed time in Table 2 as the simulation time taken to obtain

{{\hat{VaR}}_{t}^{q} (10), t \in T}

. Computing

{\hat{ES}}_{t}^{q} (10)

requires only one more step than computing

{\hat{VaR}}_{t}^{q} (10)

in all methods. This additional step takes very little time compared to the overall process to obtain

{\hat{VaR}}_{t}^{q} (10)

. Thus, the elapsed time shown in Table 2 can be used as the approximate simulation time to obtain

{{\hat{VaR}}_{t}^{q} (10), t \in T}

.

After we obtained

{\hat{VaR}}_{t}^{q} (10)

,

t \in T

, we also computed the 10-day ES estimate for each of the 10 sets of

{\tilde{R}}_{t} (10)

obtained from

{{\tilde{R}}_{t}^{(1)} (10), \dots, {\tilde{R}}_{t}^{(N)} (10)}

. In the computation, we applied Equation (18) in both CMC-N and CMC-t, and applied Equation (39) in SIS. Then, the sample mean of these 10-day ES estimates becomes

{\hat{ES}}_{t}^{q} (10)

. The standard error of the sample mean of these 10-day ES estimates becomes the standard error of

{\hat{ES}}_{t}^{q} (10)

. The relative error of

{\hat{ES}}_{t}^{q} (10)

is computed to be the S.E. of

{\hat{ES}}_{t}^{q} (10)

divided by

{\hat{ES}}_{t}^{q} (10)

.

Table 5 shows the average S.E. and the average R.E. of

{{\hat{ES}}_{t}^{q} (10), t \in T}

for each method and confidence level. We can see that SIS gives the lowest average S.E and R.E. for all confidence levels. Using SIS, the average S.E. (and also the average R.E.) of the 10-day ES estimates is reduced by about 3 to 5 times compared to CMC-N, and by about 3 to 7 times compared to CMC-t. We can see that our proposed scheme to find the optimal twisting parameter works well in reducing the estimation error of the 10-day ES.

Table 5 also shows the time-variance of the 10-day ES estimation in each method. We can see from the table that SIS is the most efficient for all confidence levels. Using SIS, the time-variance of the estimates is reduced by about 5 to 20 times compared to CMC-N, which means that SIS will take about 5 to 20 times less time than CMC-N to obtain a 10-day ES estimate with the same accuracy. Using SIS, the time-variance of the estimates is reduced by about 12 to 60 times compared to CMC-t. Thus, SIS will take significantly less time than CMC-t to obtain a 10-day ES estimate with the same accuracy.

4.2. Backtesting on 10-Day VaR Estimates

In this subsection, we apply the backtestings described in Section 2.5 to statistically test whether our proposed method, as well as CMC-N and CMC-t, estimate the 10-day VaRs appropriately. For each method and confidence level, we obtained the violation process

{I_{t}, t \in T}

from

{{\hat{VaR}}_{t}^{q} (10), t \in T}

and the observed 10-day log returns

{R_{t} (10), t \in T}

. In order to test whether

Pr {R_{t} (10) < - {\hat{VaR}}^{q} (10)} = p

,

t \in T

, i.e., the violation rate is p, we computed

L R_{u c}

in Equation (22). Table 6 shows the value of

L R_{u c}

and the significant probability of it. We can see that for all methods and confidence levels the estimates to the 10-day VaRs are appropriate in the sense that we can not reject the null hypothesis (20) at

5 %

significant level. In other words, if the violations are independent, then there is no reason to deny that the violation rate is p.

We have tested the temporal independence of the violation process

{I_{t}, t \in T}

. Table 6 shows the value of

L R_{i n d}

in Equation (24) and the significant probability of it for each method and confidence level. The table says that the null hypothesis

H_{0}

in (23) can not be rejected at

5 %

significant level for all methods and confidence levels, and that the violation process follows a Bernoulli process rather than a Markov chain, i.e., there is no temporal dependence in the violation process.

In order to test whether

{I_{t}, t \in T}

is temporally independent with desired violation rate p, i.e., the violation process follows a Bernoulli process with success process p, we have performed the conditional coverage test on

{I_{t}, t \in T}

. Table 6 shows the value of

L R_{c c}

and the significant probability of it for each method and confidence level. We can see from the table that for all methods and confidence levels, the null hypothesis

H_{0}

in (26) can not be rejected at at

5 %

significant level. Thus, we accept that, for all estimation methods and confidence levels, the violation process follows a Bernoulli process with success process p, equivalently,

Pr {R_{t} (10) < - {\hat{VaR}}^{q} (10) | H_{t - 1}} = p

,

t \in T

. The past information of violations is not helpful in the current estimation of the 10-day VaR. We conclude that SIS as well as CMC-N and CMC-t estimated the 10-day VaR accurately in this sense.

4.3. Backtesting on 10-Day ES Estimates

From the conclusion on the 10-day VaR estimates, we can assume that

{VaR}_{t}^{q} (10) = {\hat{VaR}}_{t}^{q} (10), t \in T

for all methods and confidence levels. In order to test statistically whether CMC-N, CMC-t, and SIS estimate the 10-day ESs accurately, we applied the

Z_{1}

test described in Section 2.5. When

{{\hat{VaR}}_{t}^{q} (10), t \in T}

and

{{\hat{ES}}_{t}^{q} (10), t \in T}

were obtained by CMC-N, we first computed the

Z_{1}

statistic of the observed 10-day log returns

{R_{t} (10), t \in T}

from Equation (28). To obtain the p-value of

Z_{1}

under

H_{0}

, we have generated

10^{4}

sample processes of

{{\tilde{R}}_{t} (10), t \in T}

by the same manner as in CMC-N. In order to obtain a sample process of

{{\tilde{R}}_{t} (10), t \in T}

, we applied the rolling window method with window size 750 to estimate the parameters of GJR-GARCH(1,1) with normal innovations every

t \in T

, and generated the daily log returns

R_{t : t + 9}

by applying the crude Monte Carlo method described in Section 2.4. In the generation of daily log returns, the innovations were assumed to follow the standard normal distribution. We repeated the procedure

10^{4}

times, and denote by

{R_{t}^{(i)} (10), t \in T}

the i-th sample process of the 10-day log returns. By substituting

R_{t}^{(i)} (10)

for

R_{t} (k)

in Equation (28), we obtained the simulated

Z_{1}

statistic of the sample process

{R_{t}^{(i)} (10), t \in T}

. We call it

Z_{1}^{(i)}

,

i = 1, \dots, 10^{4}

. Then, the p-value of

Z_{1}

under

H_{0}

is obtained from Equation (29).

When

{{\hat{VaR}}_{t}^{q} (10), t \in T}

and

{{\hat{ES}}_{t}^{q} (10), t \in T}

were obtained by CMC-t, we have performed the hypothesis test (27) in the same manner as described in the previous paragraph, except that in applying the rolling window method, we fitted the GJR-GARCH(1,1) with standardized Student’s t innovations to the observed log return process, and that the innovations were assumed to follow the standardized Student’s t-distribution in applying the crude Monte Carlo method to generating the daily log returns

R_{t : t + 9}

.

In the case that

{{\hat{VaR}}_{t}^{q} (10), t \in T}

and

{{\hat{ES}}_{t}^{q} (10), t \in T}

were obtained by SIS, they are estimated risk measures on the 10-day log returns under the assumption that the log return process follows the GJR-GARCH(1,1) with innovations having time-varying distributions, and that the distribution at a time is approximated well by the Gaussian kernel smoothing (6) of previous innovations. Thus, for computing the p-value of

Z_{1}

in this case, we need the sample processes of

{{\tilde{R}}_{t} (10), t \in T}

following the assumption. In the generation of a sample process, we fitted the GJR-GARCH(1,1) with standard normal innovations to the observed log return process in applying the rolling window method, and applied the crude Monte Carlo method with innovations of pdf (6) as described in Section 2.4.

Table 7 shows the value of

Z_{1}

and its p-value for each method and confidence level of 0.95, 0.975, and 0.99. If we set the significance level at

5 %

, then

{{\hat{ES}}_{t}^{0.95} (10), t \in T}

by CMC-N appears to be underestimated. In other cases,

H_{0}

could not be rejected, and we conclude that all methods worked well in the estimation of the 10-day ES.

5. Conclusions

In this paper, a sequential importance sampling for the estimation of multi-period VaR and ES has been proposed in order to overcome the shortcomings of crude Monte Carlo simulation. By choosing the sampling distribution of innovations differently from crude Monte Carlo simulation, we can reduce very much the estimation error of the multi-period tail risk measures, and also the required number of simulations to obtain the accurate estimates of them. In our proposal, we adopt the exponential twisting of the Gaussian kernel smoothing of the past innovations as the importance sampling distribution. We have shown that the optimal twisting parameter is unique, and that an approximate value to it can be found by applying the stochastic approximation. We also propose how to compute the weight of each sample of multi-period return, which enables to estimate the tail risk measures from the simulated multi-period returns. Empirical results showed that the proposed method shows better performance than the crude Monte Carlo simulation in terms of variance reduction for the multi-period ES estimation. Through backtestings, we have seen that the proposed method gives accurate VaR and ES estimates.

The proposed method is a Monte Carlo simulation. The computing power of modern computers makes the estimation of a multi-period VaR and a multi-period ES by the proposed method possible within seconds. By applying the proposed SIS method, practitioners can estimate the multi-period VaR and ES more accurately, so that they can develop more efficient and optimal risk management strategies. Moreover, the proposed method enables to manage extreme losses more effectively when designing long-term investment. Therefore, it could make a significant contribution to improving risk management strategies and enhancing the stability of investment portfolios.

Our proposed method is limited to the case of the univariate return process. In order to construct a portfolio optimally during a time period based on risk measures such as VaR and ES, we need to estimate the multi-period VaR or ES with varying allocation of assets making up the portfolio. In this case, it is efficient to consider the multivariate return process of assets making up the portfolio. As an interesting future research, the proposed sequential importance sampling can be extended to the case that the return process is modeled as a multivariate GARCH process such as the constant conditional correlation model or the dynamic conditional correlation model.

Author Contributions

Conceptualization, S.K.; methodology, Y.-J.S. and S.K.; software, Y.-J.S.; validation, Y.-J.S.; formal analysis, Y.-J.S. and S.K.; investigation, Y.-J.S.; resources, Y.-J.S.; data curation, Y.-J.S.; writing—original draft preparation, Y.-J.S. and S.K.; writing—review and editing, S.K.; visualization, Y.-J.S.; supervision, S.K.; project administration, S.K. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The original data presented in the study are openly available in Yahoo Finance.

Acknowledgments

The authors would like to thank the anonymous reviewers for their comments and suggestions on the first draft of this paper. Their suggestions have greatly improve the quality of the paper.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Acerbi, Carlo, and Balazs Szekely. 2014. Back-testing expected shortfall. Risk 27: 76–81. [Google Scholar]
Akgiray, Vedat. 1989. Conditional heteroscedasticity in time series of stock returns: Evidence and forecasts. Journal of Business 62: 55–80. [Google Scholar] [CrossRef]
Barone-Adesi, Giovanni, Kostas Giannopoulos, and Les Vosper. 1999. Var without correlations for portfolios of derivative securities. Journal of Futures Markets 19: 583–602. [Google Scholar] [CrossRef]
Barone-Adesi, Giovanni, Kostas Giannopoulos, and Les Vosper. 2002. Backtesting derivative portfolios with filtered historical simulation (fhs). European Financial Management 8: 31–58. [Google Scholar] [CrossRef]
Basel Committee on Banking Supervision. 2013. Fundamental Review of the Trading Book: A Revised Market Risk Framework. Consultative Document. Available online: https://www.bis.org/publ/bcbs265.pdf (accessed on 6 December 2024).
Berkowitz, Jeremy, Peter Christoffersen, and Denis Pelletier. 2011. Evaluating value-at-risk models with desk-level data. Management Science 57: 2213–27. [Google Scholar] [CrossRef]
Bernardi, Mauro, Antonello Maruotti, and Lea Petrella. 2012. Skew mixture models for loss distributions: A bayesian approach. Insurance: Mathematics and Economics 51: 617–23. [Google Scholar] [CrossRef]
Bollerslev, Tim, and Jeffrey M. Wooldridge. 1992. Quasi-maximum likelihood estimation and inference in dynamic models with time-varying covariances. Econometric reviews 11: 143–72. [Google Scholar] [CrossRef]
Broda, Simon A., and Marc S. Paolella. 2009. Chicago: A fast and accurate method for portfolio risk calculation. Journal of Financial Econometrics 7: 412–36. [Google Scholar] [CrossRef]
Brummelhuis, Raymond, and Roger Kaufmann. 2007. Time-scaling of value-at-risk in garch (1, 1) and ar (1)-garch (1, 1) processes. The Journal of Risk 9: 39. [Google Scholar] [CrossRef]
Butler, J. S., and Barry Schachter. 1997. Estimating value-at-risk with a precision measure by combining kernel estimation with historical simulation. Review of Derivatives Research 1: 371–90. [Google Scholar]
Chen, Qian, Xiang Gao, Xiaoxuan Huang, and Xi Li. 2021. Multiple-step value-at-risk forecasts based on volatility-filtered midas quantile regression: Evidence from major investment assets. Investment Management and Financial Innovations 18: 372–84. [Google Scholar] [CrossRef]
Chen, Song Xi, and Cheng Yong Tang. 2005. Nonparametric inference of value-at-risk for dependent financial returns. Journal of Financial Econometrics 3: 227–55. [Google Scholar] [CrossRef]
Christoffersen, Peter. 2011. Elements of Financial Risk Management. Cambridge: Academic Press. [Google Scholar]
Christoffersen, Peter F. 1998. Evaluating interval forecasts. International Economic Review 39: 841–62. [Google Scholar] [CrossRef]
Engle, Robert F., and Simone Manganelli. 2004. Caviar: Conditional autoregressive value at risk by regression quantiles. Journal of Business & Economic Statistics 22: 367–81. [Google Scholar]
Francq, Christian, and Jean-Michel Zakoian. 2019. GARCH Models: Structure, Statistical Inference and Financial Applications. Hoboken: John Wiley & Sons. [Google Scholar]
Gao, Feng, and Fengming Song. 2008. Estimation risk in garch var and es estimates. Econometric Theory 24: 1404–24. [Google Scholar] [CrossRef]
Ghysels, Eric, Alberto Plazzi, and Rossen Valkanov. 2016. Why invest in emerging markets? the role of conditional return asymmetry. The Journal of Finance 71: 2145–92. [Google Scholar] [CrossRef]
Ghysels, Eric, Alberto Plazzi, Rossen Valkanov, Antonio Rubia, and Asad Dossani. 2019. Direct versus iterated multiperiod volatility forecasts. Annual Review of Financial Economics 11: 173–95. [Google Scholar] [CrossRef]
Glasserman, Paul, Philip Heidelberger, and Perwez Shahabuddin. 2000. Variance reduction techniques for estimating value-at-risk. Management Science 46: 1349–64. [Google Scholar] [CrossRef]
Glosten, Lawrence R., Ravi Jagannathan, and David E. Runkle. 1993. On the relation between the expected value and the volatility of the nominal excess return on stocks. The Journal of Finance 48: 1779–801. [Google Scholar] [CrossRef]
Glynn, Peter W., and Ward Whitt. 1992. The asymptotic efficiency of simulation estimators. Operations Research 40: 505–20. [Google Scholar] [CrossRef]
Hong, Jeff L., Zhaolin Hu, and Guangwu Liu. 2014. Monte carlo methods for value-at-risk and conditional value-at-risk: A review. ACM Transactions on Modeling and Computer Simulation (TOMACS) 24: 1–37. [Google Scholar] [CrossRef]
Hoogerheide, Lennart, and Herman K. van Dijk. 2010. Bayesian forecasting of value at risk and expected shortfall using adaptive importance sampling. International Journal of Forecasting 26: 231–47. [Google Scholar] [CrossRef]
Hull, John, and Alan White. 1998. Incorporating volatility updating into the historical simulation method for value-at-risk. Journal of Risk 1: 5–19. [Google Scholar] [CrossRef]
Jalal, Amine, and Michael Rockinger. 2008. Predicting tail-related risk measures: The consequences of using garch filters for non-garch data. Journal of Empirical Finance 15: 868–77. [Google Scholar] [CrossRef]
Kuester, Keith, Stefan Mittnik, and Marc S. Paolella. 2006. Value-at-risk prediction: A comparison of alternative strategies. Journal of Financial Econometrics 4: 53–89. [Google Scholar] [CrossRef]
Kupiec, Paul H. 1995. Techniques for verifying the accuracy of risk measurement models. Journal of Derivatives 3: 73–84. [Google Scholar] [CrossRef]
Le, Trung H. 2020. Forecasting value at risk and expected shortfall with mixed data sampling. International Journal of Forecasting 36: 1362–79. [Google Scholar] [CrossRef]
Longin, Francois M. 2000. From value at risk to stress testing: The extreme value approach. Journal of Banking & Finance 24: 1097–130. [Google Scholar]
Lönnbark, Carl. 2016. Approximation methods for multiple period value at risk and expected shortfall prediction. Quantitative Finance 16: 947–68. [Google Scholar] [CrossRef]
McNeil, Alexander J., and Rüdiger Frey. 2000. Estimation of tail-related risk measures for heteroscedastic financial time series: An extreme value approach. Journal of Empirical Finance 7: 271–300. [Google Scholar] [CrossRef]
Nadarajah, Saralees, Bo Zhang, and Stephen Chan. 2014. Estimation methods for expected shortfall. Quantitative Finance 14: 271–91. [Google Scholar] [CrossRef]
Nieto, Maria Rosa, and Esther Ruiz. 2016. Frontiers in var forecasting and backtesting. International Journal of Forecasting 32: 475–501. [Google Scholar] [CrossRef]
R Core Team. 2024. R: A Language and Environment for Statistical Computing. Vienna: R Foundation for Statistical Computing. [Google Scholar]
Righi, Marcelo Brutti, and Paulo Sergio Ceretta. 2015. A comparison of expected shortfall estimation models. Journal of Economics and Business 78: 14–47. [Google Scholar] [CrossRef]
Rubinstein, Reuven Y., and Dirk P. Kroese. 2016. Simulation and the Monte Carlo Method. Hoboken: John Wiley & Sons. [Google Scholar]
Ruiz, Esther, and María Rosa Nieto. 2023. Direct versus iterated multiperiod value-at-risk forecasts. Journal of Economic Surveys 37: 915–49. [Google Scholar] [CrossRef]
Sak, Halis, and Wolfgang Hörmann. 2012. Fast simulations in credit risk. Quantitative Finance 12: 1557–69. [Google Scholar] [CrossRef]
Simonato, Jean-Guy. 2011. The performance of johnson distributions for computing value at risk and expected shortfall. Journal of Derivatives 19: 7. [Google Scholar] [CrossRef]
Teräsvirta, Timo. 2009. An introduction to univariate garch models. In Handbook of Financial Time Series. Berlin/Heidelberg: Springer, pp. 17–42. [Google Scholar]
Zhang, Ning, Xiaoman Su, and Shuyuan Qi. 2023. An empirical investigation of multiperiod tail risk forecasting models. International Review of Financial Analysis 86: 102498. [Google Scholar] [CrossRef]
Zhou, Chunyang, Xiao Qin, Xundi Diao, and Yingchen He. 2016. Estimating multi-period value at risk of oil futures prices. Applied Economics 48: 2994–3004. [Google Scholar] [CrossRef]

Figure 1. Histogram of the 10-day log returns of the S&P 500 Index observed every 10 days from 21 December 1973 to 29 December 2023.

Figure 2. The 10-day log return of the S&P 500 Index observed every 10 days from 21 December 1973 to 29 December 2023, and their corresponding negated 10-day VaR estimates obtained by SIS method with confidence level of

0.975

.

Figure 2. The 10-day log return of the S&P 500 Index observed every 10 days from 21 December 1973 to 29 December 2023, and their corresponding negated 10-day VaR estimates obtained by SIS method with confidence level of

0.975

.

Table 1. Summary statistics of the 10-day log returns of the S&P 500 Index observed every 10 days from 21 December 1973 to 29 December 2023.

Min.	1st Qu.	Median	Mean	3rd Qu.	Max.
−0.2688	−0.0137	0.0053	0.0031	0.0216	0.1360

Table 2. Elapsed time (in seconds) to estimate the 10-day VaR and ES every 10 days from 21 December 1973 to 29 December 2023.

	Confidence Level
Method	0.95	0.975	0.99
CMC-N	419.7	399.9	401.0
CMC-t	647.5	661.4	683.0
SIS	497.3	486.6	497.2

Table 3. The number of violations, its expected value, and

95 %

confidence interval of the number of violations under the assumption that the violations are independent with rate

1 - q

.

Table 3. The number of violations, its expected value, and

95 %

confidence interval of the number of violations under the assumption that the violations are independent with rate

1 - q

.

	Confidence Level $(q)$
Method	0.95	0.975	0.99
Expected number (C.I.)	63.1 (48, 79)	31.5 (21, 43)	12.6 (6, 20)
CMC-N	54	35	15
CMC-t	56	37	16
SIS	57	32	13

Table 4. Average S.E., average R.E., and the time-variance of the 10-day VaR estimates.

Confidence Level	Method	Average S.E	Average R.E	Time-Variance
0.95	CMC-N	8.436 × $10^{- 4}$	1.533 × $10^{- 2}$	2.987 × $10^{- 4}$
	CMC-t	8.737 × $10^{- 4}$	1.618 × $10^{- 2}$	4.943 × $10^{- 4}$
	SIS	2.178 × $10^{- 3}$	4.101 × $10^{- 2}$	2.359 × $10^{- 3}$
0.975	CMC-N	1.197 × $10^{- 3}$	1.724 × $10^{- 2}$	5.730 × $10^{- 4}$
	CMC-t	1.253 × $10^{- 3}$	1.821 × $10^{- 2}$	1.038 × $10^{- 3}$
	SIS	2.060 × $10^{- 3}$	2.996 × $10^{- 2}$	2.065 × $10^{- 3}$
0.99	CMC-N	1.918 × $10^{- 3}$	2.153 × $10^{- 2}$	1.475 × $10^{- 3}$
	CMC-t	2.129 × $10^{- 3}$	2.351 × $10^{- 2}$	3.096 × $10^{- 3}$
	SIS	2.122 × $10^{- 3}$	2.317 × $10^{- 2}$	2.239 × $10^{- 3}$

Table 5. Average S.E., average R.E., and the time-variance of the 10-day ES estimates.

Confidence Level	Method	Average S.E.	Average R.E.	Time-Variance
0.95	CMC-N	9.064 × $10^{- 4}$	1.172 × $10^{- 2}$	3.448 × $10^{- 4}$
	CMC-t	1.104 × $10^{- 3}$	1.378 × $10^{- 2}$	7.892 × $10^{- 4}$
	SIS	3.622 × $10^{- 4}$	4.401 × $10^{- 3}$	6.525 × $10^{- 5}$
0.975	CMC-N	1.360 × $10^{- 3}$	1.444 × $10^{- 2}$	7.400 × $10^{- 4}$
	CMC-t	1.656 × $10^{- 3}$	1.714 × $10^{- 2}$	1.814 × $10^{- 3}$
	SIS	3.858 × $10^{- 4}$	3.807 × $10^{- 3}$	7.243 × $10^{- 5}$
0.99	CMC-N	2.235 × $10^{- 3}$	1.930 × $10^{- 2}$	2.004 × $10^{- 3}$
	CMC-t	2.986 × $10^{- 3}$	2.404 × $10^{- 2}$	6.090 × $10^{- 3}$
	SIS	4.532 × $10^{- 4}$	3.392 × $10^{- 3}$	1.021 × $10^{- 4}$

Table 6. Test statistics and significant probabilities for the unconditional coverage, the independence, and the conditional coverage tests.

		Confidence Level
Test	Method	0.95	0.975	0.99
$L R_{u c}$	CMC-N	1.434 (0.231)	0.3795 (0.538)	0.431 (0.511)
	CMC-t	0.861 (0.353)	0.925 (0.336)	0.848 (0.357)
	SIS	0.631 (0.427)	0.007 (0.932)	0.012 (0.913)
$L R_{i n d}$	CMC-N	0.204 (0.651)	0.0008 (0.977)	0.361 (0.548)
	CMC-t	0.108 (0.742)	0.008 (0.931)	0.412 (0.521)
	SIS	0.072 (0.788)	1.668 (0.197)	0.271 (0.603)
$L R_{c c}$	CMC-N	1.639 (0.441)	0.3804 (0.827)	0.793 (0.673)
	CMC-t	0.969 (0.616)	0.932 (0.627)	1.260 (0.533)
	SIS	0.703 (0.704)	1.675 (0.433)	0.283 (0.868)

Table 7. The value of

Z_{1}

and its significant probability.

Table 7. The value of

Z_{1}

and its significant probability.

	Confidence Level
Method	0.95	0.975	0.99
CMC-N	−0.0821 (0.0189)	−0.0392 (0.1741)	−0.0540 (0.1689)
CMC-t	−0.0508 (0.1131)	−0.0157 (0.3473)	−0.0353 (0.2736)
SIS	−0.0389 (0.1804)	−0.0077 (0.3955)	−0.0505 (0.2089)

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Seo, Y.-J.; Kim, S. A Sequential Importance Sampling for Estimating Multi-Period Tail Risk. Risks 2024, 12, 201. https://doi.org/10.3390/risks12120201

AMA Style

Seo Y-J, Kim S. A Sequential Importance Sampling for Estimating Multi-Period Tail Risk. Risks. 2024; 12(12):201. https://doi.org/10.3390/risks12120201

Chicago/Turabian Style

Seo, Ye-Ji, and Sunggon Kim. 2024. "A Sequential Importance Sampling for Estimating Multi-Period Tail Risk" Risks 12, no. 12: 201. https://doi.org/10.3390/risks12120201

APA Style

Seo, Y.-J., & Kim, S. (2024). A Sequential Importance Sampling for Estimating Multi-Period Tail Risk. Risks, 12(12), 201. https://doi.org/10.3390/risks12120201

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Sequential Importance Sampling for Estimating Multi-Period Tail Risk

Abstract

1. Introduction

2. Crude Monte Carlo Simulation for Multi-Period Risk Estimation

2.1. Volatility Model

2.2. Estimation of the Innovation Distribution

2.3. Multi-Period Risk Measures

2.4. Crude Monte-Carlo Simulation

2.5. Back-Testing

3. The Proposed Method

3.1. Sequential Importance Sampling of Log Return Process

3.2. Finding the Optimal Twisting Parameter

4. Illustration

4.1. Estimation of Risk Measures

4.2. Backtesting on 10-Day VaR Estimates

4.3. Backtesting on 10-Day ES Estimates

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI