Open Access
This article is

- freely available
- re-usable

*Econometrics*
**2016**,
*4*(1),
3;
doi:10.3390/econometrics4010003

Article

Forecasting Value-at-Risk under Different Distributional Assumptions

^{1}

Center for Operations Research and Econometrics (CORE), Université catholique de Louvain, 34 Voie du Romans Pays, B-1348 Louvain-la-Neuve, Belgium

^{2}

Center for Research in Finance and Management (CeReFiM), Université de Namur, 62 Rue de Bruxelles, B-5000 Namur, Belgium

*

Correspondence: Tel.: +32-10-474-321

^{†}

These authors contributed equally to this work.

Academic Editors:
Fredj Jawadi,
Tony S. Wirjanto
and
Nuttanan Wichitaksorn

Received: 15 July 2015 / Accepted: 21 December 2015 / Published: 11 January 2016

## Abstract

**:**

Financial asset returns are known to be conditionally heteroskedastic and generally non-normally distributed, fat-tailed and often skewed. These features must be taken into account to produce accurate forecasts of Value-at-Risk (VaR). We provide a comprehensive look at the problem by considering the impact that different distributional assumptions have on the accuracy of both univariate and multivariate GARCH models in out-of-sample VaR prediction. The set of analyzed distributions comprises the normal, Student, Multivariate Exponential Power and their corresponding skewed counterparts. The accuracy of the VaR forecasts is assessed by implementing standard statistical backtesting procedures used to rank the different specifications. The results show the importance of allowing for heavy-tails and skewness in the distributional assumption with the skew-Student outperforming the others across all tests and confidence levels.

Keywords:

Value-at-Risk; forecast accuracy; distributions; backtestingJEL classification:

C01; C22; C52; C58## 1. Introduction

Value-at-Risk (VaR) is a quantitative tool used to measure the maximum potential loss in value of a portfolio of assets over a defined period for a given probability. Specifically, VaR construction requires a quantile estimate of the far-left tail of the unconditional returns distribution. Though widely-used as a risk measure in the past, standard methods of VaR construction assuming iid-ness and normality have come under criticism due to their failure to incorporate three stylized facts of financial returns, namely $\left(i\right)$ the presence of volatility clustering, indicated by high autocorrelation of absolute and squared returns, $\left(ii\right)$ excess kurtosis (fat tails) and $\left(iii\right)$ skewness in the density of the unconditional returns distribution.

The ability to account for volatility clustering is one of the key strengths of the ARCH modelling approach developed in Engle [1] and extended in Bollerslev [2]. By combining this approach with a non-normal conditional distribution assumption for the returns, several papers have shown that univariate GARCH models can produce reliable out-of-sample volatility forecasts. For example, Angelidis et al. [3] combine three GARCH specifications with the univariate skew-Student and skew-GED (Generalized Error) distributions to show that these are able to produce superior VaR forecasts compared to the normal. Specifically, they apply the exponential GARCH (EGARCH) model of Nelson [4] and the threshold ARCH (TARCH) model to five univariate returns series and find that while the choice of a skewed, heavy-tailed distribution significantly improves the forecasting performance, the choice of the volatility model appears to be irrelevant. These findings are echoed in Mittnik and Paolella [5] who combine the Asymmetric Power ARCH (APARCH) model of Ding et al. [6] with an asymmetric generalised Student distribution. Within the univariate framework the most complete study of VaR prediction methods is provided by Kuester et al. [7] who compare fully parametric models with VaR constructed using historical simulation, extreme-value theory and quantile regression. Their results show that considerable improvement over normality is achieved when using innovation distributions that allow for skewness and fat tails.

Another salient feature of financial returns series is the fact that comovements between markets increase during periods of high volatility, as shown for example by Longin and Solnik [8] and Brooks et al. [9]. In light of this, a number of studies predict the VaR of a portfolio using multivariate models for the system of individual asset returns in order to achieve forecast improvements due to the use of more information. However, as mentioned in Bauwens et al. [10], in high dimensional frameworks these models can suffer from the “curse of dimensionality” problem thus being more computationally intensive.

In order to shed some light on this issue, several papers focus on direct comparison of the predictive performance of univariate and multivariate GARCH (MGARCH) models under various distributional assumptions with the aim of providing evidence in favour of one of the two approaches. Key studies in this literature include Giot and Laurent [11] and Santos et al. [12]. In the first, the authors compare the univariate APARCH model with a multivariate TVC-APARCH combined with the normal, Student and skew-Student distributions, showing that in both the univariate and multivariate settings the latter produced superior VaR forecasts. In the second, the authors study a number of univariate and multivariate volatility models with normal and Student distributions, finding that the multivariate models combined with the Student offer superior out-of-sample performances.

This paper builds on their approach by widening the set of distributions used to model the error term in both the univariate and multivariate frameworks while maintaining a generic specification for the conditional volatility. Specifically, we consider three symmetric distributions, i.e., normal, Student and Multivariate Exponential Power (MEP), and their corresponding skewed counterparts obtained by applying the transformation of Bauwens and Laurent [13] 1. By incorporating skewness in the corresponding symmetric densities by means of new parameters, we can explicitly analyse its marginal contribution as well as its joint effect with heavy-tails in the model forecasting performance.

As for the choice of the volatility models, within the MGARCH literature we employ the Rotated BEKK (RBEKK) model of Noureldin et al. [15] as it is easy to estimate using covariance targeting even for moderately large cross sections, while in the univariate setting GARCH(1,1) specifications are used to modeling the conditional portfolio variance. Our choice of a relatively simple volatility model allows us to limit our focus entirely on whether the chosen distribution contributes to adequately capture features of the return data. Within the univariate scenario we also consider the NCT-APARCH model proposed by Krause and Paolella [16], as the authors developed an extremely fast method for parameter estimation that is found to outperform highly competitive models and can be easily compared to ours 2.

Both univariate and multivariate models are estimated employing the aforementioned set of distributional assumptions and their accuracy in producing out-of-sample VaR forecasts is assessed by means of statistical backtesting procedures. The selected tests include the Unconditional Coverage (UC), Independence (IND) and Conditional Coverage (CC) tests of Christoffersen [17], the Duration-Based Test of Independence (DBI) of Dumitrescu et al. [18], the Time Until First Failure (TUFF) test of Kupiec [19] and the Dynamic Quantile (DQ) test of Engle and Manganelli [20]. The results of the tests are summarized using a grading scheme based on the number of acceptances of the null hypothesis which determines the distributional assumption providing the most accurate VaR forecasts.

Results from VaR backtesting show that in the multivariate setup the skew-Student clearly outperforms all other distributions. Moreover, its univariate version produces more accurate VaR forecasts than the NCT-GARCH and is able to compete with the NCT-APARCH which incorporates asymmetry into the conditional volatility specification. Overall, our results show that allowing for heavy-tails and skewness produces the most accurate VaR forecasts in both the univariate and multivariate setups. By comparison, specifications including only heavy tails underperform relative to their skewed counterparts with the difference being more pronounced at 5% VaR. Interestingly, the univariate skew-normal distribution produces Var forecasts comparable to the more heavily-parametrized skew-Student and skew-MEP. As regards comparing the performance of univariate and multivariate models in general, our results do not allow us to clearly advocate the use of one methodology over the other. However, given that the hierarchy of distributions according to VaR forecast accuracy is preserved under both frameworks, the univariate approach may be preferred due to its lower computational burden.

The paper is organized as follows: Section 2 reviews the GARCH modelling framework, the theoretical methodology used for constructing the skewed distributions and the Maximum Likelihood (ML) estimation of the models with the selected distributional assumptions. Section 3 introduces the empirical methodology, comprising the portfolio construction, VaR estimation and backtesting procedures. Section 4 provides estimation results and outcomes of the VaR tests and Section 5 concludes with some final remarks.

## 2. Theoretical Framework

This section illustrates the key points of our theoretical framework. Namely, we outline the alternative approaches to obtain portfolio VaR forecasts using univariate and multivariate models, we describe the procedure used to construct skewed distributions from the corresponding symmetric counterparts and finally we provide an overview of the set of employed distributions, comprising their likelihood derivation.

Two things are worth mentioning. First, we focus on the portfolio VaR for a long position, implying that the predictive power of the models is linked to their ability in modelling large negative returns. Second, we define the asset allocation scheme via an N-dimensional vector of equal weights ${w}_{t}$, where N denotes the number of assets in the portfolio and ${w}_{t}=({w}_{1,t},\dots {w}_{N,t})$, with ${w}_{i}=1/N$, ${\sum}_{i=1}^{N}{w}_{i}=1$. As shown in DeMiguel et al. [21], Tu and Zhou [22], Brown et al. [23] and Fugazza et al. [24], this “naive” diversification rule is able to consistently outperform more sophisticated methods. Moreover, it has the advantage of not being affected by the specified target return as in the Markowitz framework, being only driven by the number of assets.

#### 2.1. VaR Estimation

Let ${y}_{t}={({y}_{1,t},...,{y}_{N,t})}^{\prime}$ denote the N-dimensional discrete time vector of de-meaned daily returns at time t and ${w}_{t}$ the vector of equal weights known at time $t-1$. The portfolio return is obtained as ${r}_{p,t}={w}_{t-1}^{\prime}{y}_{t}$, and the portfolio VaR at time t is equal to
where ${\mu}_{p,t}$ and ${\sigma}_{p,t}$ are respectively the portfolio mean and standard deviation and ${q}_{\alpha}$ is the left quantile of the assumed conditional distribution at α%. As usually done in practical applications, the considered VaR confidence levels are $\alpha =5\%$ and $1\%$. Note that in the remainder of the paper we will set ${\mu}_{p,t}=0$, thus considering a simplified analytical formula for the computation of the VaR which only accounts for the portfolio conditional variance. Alternative approaches, as done for example in Bauwens et al. [25], fit an ARMA-type structure to the portfolio conditional mean or just assume it to be constant over time (see Santos et al. [12]). Ultimately, when present, the dynamic dependence in the conditional means of portfolio returns is known to be weak and quite difficult to predict, thus assuming a zero mean will have a negligible effects on the VaR forecasts.

$$Va{R}_{t,\alpha}={\mu}_{p,t}+{\sigma}_{p,t}{q}_{\alpha},$$

The specification of the portfolio standard deviation ${\sigma}_{p,t}$ depends on whether we consider a univariate or multivariate approach, with the difference between the two occurring in the conditioning set used.

In the univariate case, the portfolio standard deviation is obtained as the standard deviation of the portfolio returns conditional on past portfolio returns, i.e., as the square root of
where the conditional variance ${\sigma}_{p,t}^{2}$ is estimated using a model chosen from the univariate GARCH class. More precisely, we use a simple GARCH(1,1) specification with variance targeting, which is written as
where $\overline{\sigma}=(1-{a}_{1}-{b}_{1})\overline{\omega}$ and $\overline{\omega}$ equals the unconditional variance of returns. Covariance stationarity requires that ${a}_{1}+{b}_{1}<1$, with $\{{a}_{1},{b}_{1}\}\ge 0$ scalar parameters to be estimated.

$${\sigma}_{p,t}^{2}=E\left[{r}_{p,t}^{2}|{r}_{p,1},\dots {r}_{p,t-1}\right],$$

$${\sigma}_{p,t}^{2}=\overline{\sigma}+{a}_{1}{r}_{p,t-1}^{2}+{b}_{1}{\sigma}_{t-1}^{2},$$

In the multivariate setup, the conditioning set is made up of the entire vector of past returns:
such that ${H}_{t|t-1}=E({H}_{t}|{\Im}_{t-1})$ is the conditional covariance matrix of returns given the information set available at time $t-1$. In this case, a multivariate model for ${H}_{t}$ needs to be specified. Given its computational ease in practical application, we implement the rotated BEKK (RBEKK) model of Noureldin et al. [15], as it can be estimated for relatively large cross sections even with reach dynamics. The basic idea underlying the model is to transform the original data by performing a rotation and then to fit to the rotated returns the popular BEKK specification of Engle and Kroner [26]. Consequently, the model can be easily estimated with covariance targeting, where the long-run covariance is given by the identity matrix. More precisely, let $\overline{H}$ define the unconditional covariance of ${y}_{t}$ and let the latter be rewritten as ${y}_{t}={\overline{H}}^{1/2}{\u03f5}_{t}$. Using the spectral decomposition of $\overline{H}$, i.e., $\overline{H}=P\Lambda {P}^{\prime}$, where P is a squared matrix of eigenvectors and the eigenvalue matrix Λ is diagonal with nonnegative elements, we can get the symmetric square root of $\overline{H}$ as ${\overline{H}}^{1/2}=P{\Lambda}^{1/2}{P}^{\prime}$. The series of rotated returns is thus defined as
with $Var\left({\u03f5}_{t}\right)={I}_{N}$, the N-dimensional identity matrix. Note that the conditional covariance of the rotated returns is $Var({\u03f5}_{t}|{\Im}_{t-1})={G}_{t}$, given that $Var({y}_{t}|{\Im}_{t-1})={H}_{t}={\overline{H}}^{1/2}{G}_{t}{\overline{H}}^{1/2}$. Hence, any dynamic covariance model for ${H}_{t}$ can also be applied to model the dynamics of ${G}_{t}$. Since the rotated returns are orthogonal, a suitable parameterization is represented by the scalar BEKK, which ensures the positive definiteness of the matrix ${G}_{t}$ being a function of only two scalar parameters, under the assumption of covariance stationarity. Its dynamic equation is expressed as
where, as in the univariate setup, $\{{a}_{2},{b}_{2}\}\ge 0$ and ${a}_{2}+{b}_{2}<1$.

$$\begin{array}{ccc}\hfill {\sigma}_{p,t}^{2}& =& E\left[{r}_{p,t}^{2}|{Y}_{1},\dots {Y}_{t-1}\right]\hfill \\ & =& {w}_{t-1}^{\prime}{H}_{t|t-1}{w}_{t-1}\hfill \end{array}$$

$${\u03f5}_{t}={\overline{H}}^{-1/2}{y}_{t}=P{\Lambda}^{-1/2}{P}^{\prime}{y}_{t},$$

$${G}_{t}=(1-{a}_{2}-{b}_{2}){I}_{N}+{a}_{2}{\u03f5}_{t-1}{\u03f5}_{t-1}^{{}^{\prime}}+{b}_{2}{G}_{t-1},$$

#### 2.2. Constructing Skew Densities

As far as financial applications are concerned, modelling and inference based on the normal distribution have often been proven to be of limited usefulness, as it is possible to gain statistical efficiency by allowing for more involved distributions featuring heavy tails and skewness. As a way to capture higher moments, the literature offers several alternatives. For example, the multivariate noncentral t distribution has fat tails and is skewed; however, the skewness is linked directly to the location parameter, making it somewhat inflexible. The lognormal distribution has also been used to model asset returns, but its skewness is a function of its mean and variance, not a separate parameter. Others, such as the generalized hyperbolic (GH) or mixtures of distributions have also been employed in financial applications, despite being more computationally demanding (see Barndorff-Nielsen [27] for an introduction and Paolella and Polak [28] for a recent application).

In this respect, Fernández and Steel [29] in the univariate case and Bauwens and Laurent [13] in the multivariate one developed a practical procedure for constructing skewed densities from their symmetric unimodal counterparts. These densities can be defined by introducing skewness in the corresponding symmetric densities by means of new parameters, such that the symmetric density results as a particular case. We build on their findings in order to enlarge the set of distributions to be used for VaR forecasting. In the following, we briefly recall the main steps of the procedure, being the discussion restricted to the more general multivariate framework for sake of space.

We begin by defining the notion of symmetry of a standardized density used hereafter. In the univariate case, symmetry corresponds to $g\left(x\right)=g(-x)$, where $g\left(x\right)$ is a unimodal probability density function with zero mean. In the multivariate case, we rely on the general notion of M-symmetry stated in Definition 1 of Bauwens and Laurent [13], which encompasses the class of spherically symmetric densities. These can be obtained as a special case of the general family of multivariate elliptical distributions, denoted as
where x is a random vector with an integrable, positive function $h(\xb7):{\mathbb{R}}^{+}\to {\mathbb{R}}^{+}$, and η captures the shape parameter of the distribution, when present. The spherically symmetric set of distributions, comprising the standard normal, Student and MEP, are obtained by setting μ and Σ equal to zero and ${I}_{N}$, respectively.

$$g(x;\mu ,\Sigma ,\eta )\propto h({(x-\mu )}^{\prime}{\Sigma}^{-1}(x-\mu ),\eta ),$$

The idea of introducing skewness into an M-symmetric standardized distribution revolves around scaling it differently for negative (positive) values by multiplying (dividing) by a positive constant. The value of this scaling parameter (hereafter referred to as ξ) determines whether the resulting distribution is skewed to the left ($0<\xi <1$) or to the right ($\xi >1$).

**Definition 1.**Given a random vector $z={({z}_{1},\dots ,{z}_{N})}^{\prime}$ with multivariate symmetric standardized distribution $g(z;\eta )$ following Equation (6), the standardized skewed density $f(z|\mathit{\xi};\eta )$ with vector of asymmetry parameters $\mathit{\xi}={({\xi}_{1},\dots {\xi}_{N})}^{\prime}$, is obtained as:

$$f(z|\mathit{\xi},\eta )={2}^{N}\left({\displaystyle \prod _{i=1}^{N}\frac{{\xi}_{i}}{1+{\xi}_{i}^{2}}}\right)g({z}^{\u2605};\eta )$$

$${z}^{\u2605}={({z}_{1}^{\u2605},...,{z}_{N}^{\u2605})}^{\prime}$$

$${z}_{i}^{\u2605}={z}_{i}{\xi}_{i}^{{I}_{i}}$$

$${I}_{i}=\left\{\begin{array}{c}-1\text{if}{z}_{i}\ge 0\hfill \\ 1\phantom{\rule{0.166667em}{0ex}}\phantom{\rule{0.166667em}{0ex}}\phantom{\rule{0.166667em}{0ex}}\phantom{\rule{0.166667em}{0ex}}\text{if}{z}_{i}0\hfill \end{array}\right..$$

The marginal ${r}^{th}$-order moment of the obtained skewed distribution can be computed directly from the standardized ${r}^{th}$ moment of the symmetric density $g(\xb7)$. This is accomplished by applying the following transformation function:
where the ${r}^{th}$-order moment of the marginal ${g}_{i}(\xb7)$, truncated to the positive real values, is given by
Since only the first two moments are required in the transformation process, their analytical expression for $r=1,2$ in Equation (11) is reported below:

$$E({z}_{i}^{\u2605r}|\mathit{\xi})={M}_{i,r}\frac{{\xi}_{i}^{r+1}+\frac{{(-1)}^{r}}{{\xi}_{i}^{r+1}}}{{\xi}_{i}+\frac{1}{{\xi}_{i}}},$$

$${M}_{i,r}={\int}_{0}^{\infty}2{u}^{r}{g}_{i}\left(u\right)du.$$

$$\begin{array}{ccc}\hfill {m}_{i}& =& \mathrm{E}({z}_{i}^{\u2605}|{\xi}_{i})={M}_{i,1}\left({\xi}_{i}-\frac{1}{{\xi}_{i}}\right)\hfill \end{array}$$

$$\begin{array}{ccc}\hfill {s}_{i}^{2}& =& \text{Var}({z}_{i}^{\u2605}|{\xi}_{i})=\left({M}_{i,2}-{M}_{i,1}^{2}\right)\left({\xi}_{i}^{2}+\frac{1}{{\xi}_{i}^{2}}\right)+2{M}_{i,1}^{2}-{M}_{i,2}.\hfill \end{array}$$

As shown in Appendix A, ${s}_{i}^{2}$ can also be expressed directly as a function of ${m}_{i}^{2}$. Note that the resulting skewed distribution, $f(z|\mathit{\xi},\eta )$ from Definition 1, is not centered at 0 and the variance is a function of
where $m=({m}_{1},...,{m}_{N})$ and $s=({s}_{1},...,{s}_{N})$ are the vectors of unconditional means and standard deviations of ${z}^{\u2605}$ computed in Equations (13) and (14) respectively and “./” denotes element-by-element division. Consequently, the standardized form of Definition 1 requires replacing Equation (9) with the following one:
where

**ξ**(and, where is the case, of the shape parameter η). Given that the elements of ${z}^{\u2605}$ are uncorrelated (since those of x are uncorrelated by assumption), standardization of ${z}^{\u2605}$ is achieved by the following transformation:
$$z=({z}^{\u2605}-m)./s,$$

$${z}_{i}^{\u2605}=({s}_{i}{z}_{i}+{m}_{i}){\xi}_{i}^{{I}_{i}},$$

$${I}_{i}=\left\{\begin{array}{c}-1\text{if}{z}_{i}\ge -\frac{{m}_{i}}{{s}_{i}}\hfill \\ 1\phantom{\rule{0.166667em}{0ex}}\phantom{\rule{0.166667em}{0ex}}\phantom{\rule{0.166667em}{0ex}}\phantom{\rule{0.166667em}{0ex}}\text{if}{z}_{i}-\frac{{m}_{i}}{{s}_{i}}\hfill \end{array}\right.$$

#### 2.3. Distributions

As already mentioned, three symmetric and three asymmetric distributions are considered in the univariate and multivariate framework. Again, for sake of brevity, we only report the multivariate log-likelihood functions as the univariate can be obtained as special cases 3. The algebraic derivations of the formulas of the moments can be found in Appendix A.2. In all cases, estimation of the parameters is performed in one step by Maximum Likelihood (ML). Namely, the log-likelihood function for T observations is expressed as
where

$${\ell}_{T}\left(\mathit{\psi}\right)=\sum _{t=1}^{T}logf(\xb7|\mathit{\psi},{\mathcal{F}}_{t-1}),$$

**ψ**is the finite-dimensional vector of model parameters and $f(\xb7|\mathit{\psi},{\mathcal{F}}_{t-1})$ denotes the assumed conditional density function of the portfolio returns, in the univariate case, or of the asset return vector in the multivariate one.**Multivariate normal distribution**This is the most commonly employed distribution in the literature as it is uniquely identified by its conditional first and second moments, which renders ML estimation much simpler from a computational point of view. In addition, given that the score of the normal log-likelihood function has the martingale difference property when the first two conditional moments are correctly specified, the Quasi Maximum Likelihood (QML) estimates are still consistent and asymptotically normal even if the true DGP is not normally-distributed Bollerslev and Wooldridge [30]. The log-likelihood function for T observations is expressed as follows

$${\ell}_{T}\left(\mathit{\psi}\right)=-\frac{1}{2}\sum _{t=1}^{T}\left[Nlog\left(2\pi \right)+log|{H}_{t}|+{y}_{t}^{\prime}{H}_{t}^{-1}{y}_{t}\right].$$

**Multivariate Student distribution**The Student distribution is a symmetric and bell-shaped distribution, with heavier tails than the normal. Under the multivariate Student assumption, the log-likelihood function is obtained as

$$\begin{array}{ccc}\hfill {\ell}_{T}\left(\mathit{\psi}\right)& =& {\displaystyle -\frac{1}{2}\sum _{t=1}^{T}\left[log|{H}_{t}|+(N+\nu )log\left(1+\frac{{y}_{t}^{\prime}{H}_{t}^{-1}{y}_{t}}{\nu -2}\right)\right]}\hfill \\ & +& T\left[log\Gamma \left(\frac{\nu +N}{2}\right)-log\Gamma \left(\frac{\nu}{2}\right)-\frac{N}{2}log\left(\pi \right)-\frac{N}{2}log(\nu -2)\right],\hfill \end{array}$$

**Multivariate Exponential Power (MEP) distribution**This distribution belongs to the Kotz family of distributions (a particular class of symmetric and elliptical distributions discussed extensively in Fang et al. [31]) and is known to have several equivalent definitions in the literature. It can also include both the normal and the Laplace as special cases, as a function of the value of the non-normality parameter β dictating the tail-behaviour of the distribution. Given its simple implementation, in this paper we consider the pdf given in Solaro [32], which gives rise to the following log-likelihood function:

$$\begin{array}{ccc}\hfill {\ell}_{T}\left(\mathit{\psi}\right)& =& {\displaystyle -\frac{1}{2}\sum _{t=1}^{T}\left[log|{H}_{t}|+{\left({y}_{t}^{\prime}{H}_{t}^{-1}{y}_{t}\right)}^{\frac{\beta}{2}}\right]}\hfill \\ & +& T\left[log\left(N\right)+log\Gamma \left(\frac{N}{2}\right)-\frac{N}{2}log\left(\pi \right)-log\Gamma \left(1+\frac{N}{\beta}\right)-\left(1+\frac{N}{\beta}\right)log\left(2\right)\right],\hfill \end{array}$$

**Multivariate skew-normal distribution**The multivariate skew-normal is the first non-symmetric distribution we consider herein; it accounts for the skewness of the return distribution without taking into account its kurtosis (as it does not involve a tail parameter). Applying Definition 1 we derive the skew-normal density function, with corresponding log-likelihood function equal to

$$\begin{array}{ccc}\hfill {\ell}_{T}\left(\mathit{\psi}\right)& =& {\displaystyle -\frac{1}{2}\sum _{t=1}^{T}\left[{\displaystyle log|{H}_{t}|+\sum _{i=1}^{N}{\left({\displaystyle {s}_{i}\sum _{j=1}^{N}{p}_{ij,t}{y}_{j,t}+{m}_{i}}\right)}^{2}{\xi}_{i}^{2{I}_{i}}}\right]}\hfill \\ & +& T\left[\sum _{i=1}^{N}(log{\xi}_{i}+log{s}_{i})-log(1+{\xi}_{i}^{2})\right]+\frac{TN}{2}\left[log\left(2\right)-log\left(\pi \right)\right],\hfill \end{array}$$

**Multivariate skew-Student distributio**n Applying the same procedure as for the skew-normal, the log-likelihood function of the skew-Student distribution is given by the following expression:

$$\begin{array}{ccc}\hfill {\ell}_{T}\left(\mathit{\psi}\right)& =& {\displaystyle -\frac{1}{2}\sum _{t=1}^{T}\left[log|{H}_{t}|+(\nu +N)log\left(1+\frac{{\displaystyle \sum _{i=1}^{N}{\left({\displaystyle {s}_{i}\sum _{j=1}^{N}{p}_{ij,t}{y}_{j,t}+{m}_{i}}\right)}^{2}{\xi}_{i}^{2{I}_{i}}}}{\nu -2}\right)\right]}\hfill \\ \hfill & +& T\left[Nlog\left(2\right)-\frac{N}{2}log\left(\pi \right)+log\Gamma \left(\frac{\nu +N}{2}\right)-log\Gamma \left(\frac{\nu}{2}\right)-\frac{N}{2}log(\nu -2)\right]\hfill \\ & +& T\left[\sum _{i=1}^{N}(log{\xi}_{i}+log{s}_{i})-log(1+{\xi}_{i}^{2})\right],\hfill \end{array}$$

**Multivariate skew-MEP distribution**The log-likelihood function to be maximized is given by

$$\begin{array}{ccc}\hfill {\ell}_{T}\left(\mathit{\psi}\right)& =& {\displaystyle -\frac{1}{2}\sum _{t=1}^{T}\left[log|{H}_{t}|+{\left({\displaystyle \sum _{i=1}^{N}{\left({\displaystyle {s}_{i}\sum _{j=1}^{N}{p}_{ij,t}{y}_{j,t}+{m}_{i}}\right)}^{2}{\xi}_{i}^{2{I}_{i}}}\right)}^{\frac{\beta}{2}}\right]}\hfill \\ \hfill & +& T\left[\sum _{i=1}^{N}(log{\xi}_{i}+log{s}_{i})-log(1+{\xi}_{i}^{2})\right]\hfill \\ \hfill & +& T\left[log\left(N\right)+log\Gamma \left(\frac{N}{2}\right)-\frac{N}{2}log\left(\pi \right)\right.\hfill \\ & -& \left.log\Gamma \left(1+\frac{N}{\beta}\right)-\left(1+\frac{N}{\beta}\right)log\left(2\right)\right],\hfill \end{array}$$

## 3. Empirical Application

#### 3.1. Data and Forecasting Scheme

Our dataset (used in the paper of Noureldin et al. [33]) 5 comprises daily open-to-close returns of 10 stocks from the Dow Jones Industrial Average: Bank of America (BAC), JP Morgan (JPM), International Business Machines (IBM), Microsoft (MSFT), Exxon Mobil (XOM), Alcoa (AA), American Express (AXP), Du Pont (DD), General Electric (GE) and Coca Cola (KO). Each univariate vector of returns is calculated as ${y}_{t}=100\times (log{p}_{t}-log{p}_{t-1})$ and covers a period of 2200 days, from February 2001 to November 2009. Univariate descriptive statistics over the period of interest are provided in Table 1.

Stock | Mean | Std.dev. | Skewness | Kurtosis | KS Test | JB Test |
---|---|---|---|---|---|---|

Estimation sample: 1 February 2001 to 23 January 2007 (1500 observations) | ||||||

BAC | 0.09 | 1.09 | −0.18 | 7.45 | 0.00 | 0.00 |

JPM | 0.00 | 1.68 | 0.90 | 31.02 | 0.00 | 0.00 |

IBM | −0.04 | 1.24 | 0.01 | 5.96 | 0.01 | 0.00 |

MSFT | −0.01 | 1.37 | 0.37 | 6.01 | 0.00 | 0.00 |

XOM | −0.01 | 1.13 | 0.05 | 8.27 | 0.82 | 0.00 |

AA | 0.01 | 1.59 | 0.14 | 4.74 | 0.00 | 0.00 |

AXP | −0.02 | 1.44 | 0.33 | 7.73 | 0.00 | 0.00 |

DD | 0.02 | 1.21 | 0.37 | 6.76 | 0.21 | 0.00 |

GE | −0.01 | 1.34 | 0.13 | 7.90 | 0.02 | 0.00 |

KO | 0.01 | 0.99 | 0.16 | 5.53 | 0.00 | 0.00 |

Forecasting sample: 24 January 2007 to 30 October 2009 (700 observations) | ||||||

BAC | −0.18 | 3.95 | 0.37 | 9.36 | 0.00 | 0.00 |

JPM | 0.01 | 3.06 | 0.36 | 8.53 | 0.00 | 0.00 |

IBM | 0.08 | 1.45 | −0.02 | 6.31 | 0.00 | 0.00 |

MSFT | 0.02 | 1.60 | 0.08 | 5.90 | 0.00 | 0.00 |

XOM | 0.03 | 1.61 | −0.39 | 11.31 | 0.00 | 0.00 |

AA | −0.04 | 2.93 | −0.83 | 7.50 | 0.00 | 0.00 |

AXP | 0.04 | 3.06 | 0.22 | 6.96 | 0.00 | 0.00 |

DD | −0.04 | 1.89 | −0.12 | 5.70 | 0.00 | 0.00 |

GE | 0.02 | 2.17 | 0.21 | 8.96 | 0.00 | 0.00 |

KO | −0.03 | 1.22 | 0.07 | 7.68 | 0.06 | 0.00 |

Full sample: 1 February 2001 to 30 October 2009 (2200 observations) | ||||||

BAC | 0.01 | 2.40 | 0.33 | 21.72 | 0.00 | 0.00 |

JPM | 0.00 | 2.21 | 0.57 | 16.90 | 0.00 | 0.00 |

IBM | 0.00 | 1.31 | 0.02 | 6.24 | 0.02 | 0.00 |

MSFT | 0.00 | 1.45 | 0.25 | 6.08 | 0.00 | 0.00 |

XOM | 0.00 | 1.30 | −0.20 | 11.56 | 0.04 | 0.00 |

AA | 0.00 | 2.11 | −0.69 | 9.95 | 0.00 | 0.00 |

AXP | 0.00 | 2.09 | 0.32 | 11.23 | 0.00 | 0.00 |

DD | 0.00 | 1.46 | 0.03 | 7.25 | 0.00 | 0.00 |

GE | 0.00 | 1.65 | 0.22 | 10.85 | 0.00 | 0.00 |

KO | 0.00 | 1.07 | 0.11 | 6.89 | 0.00 | 0.00 |

Descriptive statistics of the stock return time series used in the empirical application. The three panels report the statistics for the in-sample period, the out-of-sample period and the full sample period, respectively. “KS test” and “JB test” denotes the Kolmogorov-Smirnov test and Jarque Bera test, with corresponding p-values in column.

Across the three panels, the values of skewness and kurtosis show that the assets are far from being unconditionally normally distributed, thus supporting the conjecture that more flexible distributional assumptions can be conducive to enhanced model performance.

To this extent, one-step ahead forecasts of the conditional portfolio variance (in the univariate case) and of the conditional covariance matrix of returns (in the multivariate one) are recursively obtained as:
where ${\Im}_{t}$ is the information set at time t and ${\sigma}_{p,t}^{2},{H}_{t}$ are defined as in Equations (2) and (5), respectively. Using a rolling-fixed-window scheme, the parameters are estimated over a window length of 1500 observations and used to predict the conditional variance process for the following 20 days. Each time the window is shifted forward by 20 observations and the parameters are re-estimated over the new period in order to compute the next set of forecasts. We iterate this process until the end of the dataset for a total of 35 parameter estimates and 700 one-step ahead forecasts. Table A1 in Appendix B reports the complete list of windows and forecast horizons along with their corresponding calendar dates.

$$\begin{array}{ccc}\hfill {\widehat{\sigma}}_{p,t+1}^{2}& =& E({\sigma}_{p,t+1}^{2}|{\Im}_{t}),\hfill \end{array}$$

$$\begin{array}{ccc}\hfill {\widehat{H}}_{t+1}& =& E({H}_{t+1}|{\Im}_{t}),\hfill \end{array}$$

For each model, the portfolio VaR forecast at α% confidence level is then obtained as

$$Va{R}_{t+1,\alpha}={\widehat{\sigma}}_{p,t+1}{q}_{\alpha}.$$

For the symmetric distributions in our analysis (normal, Student and MEP), one can easily compute the long VaR of the portfolio by applying Equation (27) and the inverse of each CDF at α%. However, for the non-symmetric distributions this is not straightforward. In order to bypass this complication, for each non-symmetric distribution we apply a simple Monte-Carlo simulation approach, as widely used in VaR computations. Namely, we draw $10,000$ random vectors (numbers) from each symmetric multivariate (univariate) standardized distribution ${z}_{t}$ and then we use the estimated skewness parameters to construct the corresponding skewed distribution ${z}_{t}^{\u2605}$. By assuming ${r}_{j}={\widehat{H}}_{t|t-1}^{1/2}{z}_{j}^{\u2605}$ (${r}_{j}={\widehat{\sigma}}_{t|t-1}{z}_{j}^{\u2605}$) as the true DGP, we obtain a set of 10,000 simulated returns over the period of interest. Finally, the simulated return distribution is used to derive the 5% and 1% quantiles for the one-step-ahead VaR.

#### 3.2. Testing the Accuracy of VaR Forecasts

The models accuracy in predicting VaR is assessed using multiple statistical backtesting methods. A common starting point for this procedure is the so-called hit function, or indicator function, which is equal to
i.e., it takes the value one if the ex-post portfolio loss exceeds the VaR predicted at time $t-1$ and the value zero otherwise. According to Christoffersen [34], in order to be accurate, the hit sequence has to satisfy the two properties of correct failure rate and independence of exceptions. The former implies that the probability of realizing a VaR violation should be equal to $\alpha \times 100\%$, while the latter further requires the violations to be independent of each other. These properties can be combined together into one single statement assessing that the hit function has to be an i.i.d. Bernoulli random variable with probability p, i.e., ${I}_{t}\left(p\right)\stackrel{i.i.d}{\sim}B\left(p\right)$.

$${I}_{t}\left(\alpha \right)=\left\{\begin{array}{c}1\text{if}{r}_{t}\le VaR\left(\alpha \right)\hfill \\ 0\text{if}{r}_{t}VaR\left(\alpha \right)\hfill \end{array}\right.$$

This represents the key foundation to many of the backtesting procedures developed in recent years and particularly to the accuracy tests being used in this paper. We focus on tests included in the following three categories:

- Evaluation of the Frequency of Violations
- Evaluation of the Independence of Violations
- Evaluation of the Duration between Violations.

**Frequency of Violations**The first way of testing the VaR accuracy is to test the number or the frequency of margin exceedances. A test designed to this aim is the Kupiec test (Kupiec [35]), also known as the Unconditional Coverage (UC) test. Its null hypothesis is simply that the percentage of violated VaR forecasts or failure rate p is consistent with the given confidence level α, i.e., ${H}_{0}:p=\alpha $.

Denoting by F the length of the forecasting period and with v the number of violations occurred throughout this period, the log-likelihood ratio test statistic is defined as
where $\widehat{p}=v/F$ is the maximum likelihood estimator under the alternative hypothesis. This ratio test statistic is asymptotically ${\chi}^{2}\left(1\right)$ distributed and the null hypothesis is rejected if the critical value at the $\alpha \%$ confidence level is exceeded.

$$UC=-2\left(ln\left(\frac{{p}^{v}{(1-p)}^{F-v}}{{\widehat{p}}^{v}{(1-\widehat{p})}^{F-v}}\right)\right),$$

A similar useful test is the TUFF (Time Until First Failure) test (Kupiec [35]). Under the null, the probability of an exception is equal to the inverse probability of the VaR confidence level, namely ${H}_{0}:p=\widehat{p}=1/v$. Its basic assumptions are similar to those of the Kupiec test and the t-statistic under the null is obtained as
The TUFF statistic is also asymptotically ${\chi}^{2}\left(1\right)$ distributed.

$$TUFF=-2\left(ln\left(\frac{p{(1-p)}^{v-1}}{\frac{1}{v}{\left(1-\frac{1}{v}\right)}^{(v-1)}}\right)\right).$$

**Independence of Violations**A limitation of the Kupiec test is that it is only concerned with the coverage of the VaR estimates without accounting for any clustering of the violations. This aspect is crucial for VaR practitioners, as large losses occurring in rapid succession are more likely to lead to disastrous events than individual exceptions.

The Independence test (IND) of Christoffersen [34] uses the same likelihood ratio framework as the previous tests but is designed to explicitly detect clustering in the VaR violations. Under the null hypothesis of independence, the IND test assumes that the probability of an exceedance on a given day t is not influenced by what happened the day before. Formally, ${H}_{0}:{p}_{10}={p}_{11}$, where ${p}_{ij}$ denotes the probability of an i event on day $t-1$ being followed by a j event on day t.

The relevant IND test statistic can be derived as
where ${v}_{ij}$ is the number of violations with value i at time $t-1$ followed by j at time t. Under the null, the IND statistic is also asymptotically distributed as a ${\chi}^{2}\left(1\right)$ random variable.

$$\begin{array}{c}\hfill IND=-2\left(ln\left(\frac{{\widehat{p}}^{v}{(1-\widehat{p})}^{F-v}}{{\widehat{p}}_{11}^{{v}_{11}}{(1-{\widehat{p}}_{11})}^{{v}_{01}}{\widehat{p}}_{10}^{{v}_{10}}{(1-{\widehat{p}}_{10})}^{{v}_{00}}}\right)\right),\end{array}$$

Although the aforementioned test has received support in the literature, Christoffersen [34] noted that it was not complete on its own. For this reason, he proposed a joint test, the Conditional Coverage (CC) test, which combines the properties of both UC and IND tests.

Formally, the CC ratio statistic can be proven to be the sum of the UC and the IND statistics:
where we added and subtracted the quantity $ln{\left({L}_{1}\right)}^{UC}$ and substituted $ln{\left({L}_{0}\right)}^{IND}$ for $ln{\left({L}_{1}\right)}^{UC}$. CC is also ${\chi}^{2}$ distributed, but with two degrees of freedom since there are two separate statistics in the test. According to Campbell [36], in some cases it is possible that a VaR model passes the joint test while still failing either the independence test or the unconditional coverage test. Thus it is advisable to run them separately even when the joint test yields a positive result.

$$\begin{array}{ccc}\hfill CC& =& -2(ln({L}_{0}^{UC)}-ln\left({L}_{1}^{IND}\right))\hfill \\ \hfill & =& -2(ln\left({L}_{0}^{UC}\right)-ln\left({L}_{1}^{UC}\right)+ln\left({L}_{1}^{UC}\right)-ln\left({L}_{1}^{IND}\right))\hfill \\ \hfill & =& -2(ln\left({L}_{0}^{UC}\right)-ln\left({L}_{1}^{UC}\right)+ln\left({L}_{0}^{IND}\right)-ln\left({L}_{1}^{IND}\right))\hfill \\ & =& -2\underset{UC}{\underbrace{(ln\left({L}_{0}^{UC}\right)-ln\left({L}_{1}^{UC}\right))}}-2\underset{IND}{\underbrace{(ln\left({L}_{0}^{IND}\right)-ln\left({L}_{1}^{IND}\right))}},\hfill \end{array}$$

A second test belonging to this class is the Regression-based test of Engle and Manganelli [37], also known as Dynamic Quantile (DQ) test. Instead of directly considering the hit sequence, the test is based on its associated quantile process ${H}_{t}\left(\alpha \right)={I}_{t}\left(\alpha \right)-\alpha $ which assumes the following values:

$$\begin{array}{c}\hfill {H}_{t}\left(\alpha \right)=\left\{\begin{array}{c}1-\alpha \text{if}{I}_{t}=1\hfill \\ -\alpha \phantom{\rule{0.166667em}{0ex}}\phantom{\rule{0.166667em}{0ex}}\phantom{\rule{0.166667em}{0ex}}\phantom{\rule{0.166667em}{0ex}}\phantom{\rule{0.166667em}{0ex}}\text{if}{I}_{t}=0\hfill \end{array}\right..\end{array}$$

The idea of this approach is to regress current violations on past violations in order to test for different restrictions on the parameters of the model. That is, we estimate the linear regression model ${H}_{t}\left(\alpha \right)=\delta +{\sum}_{k=1}^{K}{\beta}_{k}{H}_{t-k}\left(\alpha \right)+{\u03f5}_{t}$ and then we test the joint hypothesis ${H}_{0}\left(D{Q}_{cc}\right):\delta ={\beta}_{1}=...={\beta}_{K}=0$. This assumption coincides with the null of Christoffersen’s CC test. It is also possible to split the test and separately test the independence hypothesis and the unconditional coverage hypothesis, respectively as ${H}_{0}\left(D{Q}_{ind}\right):{\beta}_{1}=...={\beta}_{K}=0$ and ${H}_{0}\left(D{Q}_{uc}\right):\delta =0$. $\left(D{Q}_{cc}\right),\left(D{Q}_{ind}\right)$ and $\left(D{Q}_{uc}\right)$ are asymptotically ${\chi}^{2}$ distributed with respectively $\{K+1\}$, K and one degrees of freedom.

**Duration between Violations**One of the drawbacks of Christoffersen’s CC test is that it is not capable of capturing dependence in all forms, since it only considers the dependence of observations between two successive days. To address this, Christofferson and Pelletier [38] introduced the Duration-Based test of independence (DBI), which is an improved test for both independence and coverage. Its basic intuition is that if exceptions are completely independent of each other, then the upcoming VaR violations should be independent of the time that has elapsed since the occurrence of the last exceedance (Campbell [36]). The duration (in days) between two exceptions is defined via the no-hit-duration ${D}_{i}={t}_{i}-{t}_{i-1}$, where ${t}_{i}$ is the day of ${i}^{\mathrm{th}}$ violation.

A correctly specified model should have an expected conditional duration of $1/p$ days and the no-hit duration should have no memory. The authors construct the ratio statistic considering different distributions for the null and the alternative hypotheses, namely the exponential, since it is the only memory-free (continuous) random distribution, and the Weibull, which allows for duration dependence. The likelihood ratio statistic is derived as
which has a ${\chi}^{2}$ distribution with one degree of freedom.

$$DBI=-2\left(ln\left(\frac{{L}_{0}}{{L}_{1}}\right)\right)=-2\left(ln\left(\frac{pexp\{-pD\}}{{a}^{b}b{D}^{b-1}exp\{-{\left(aD\right)}^{b}\}}\right)\right),$$

Under the null hypothesis of independent violations, $b=1$ and a is estimated via numerical maximization of $ln\left({L}_{1}\right)$. Whenever $b<1$, the Weibull function has a decreasing path which corresponds to an excessive number of very long durations (very calm period) while $b>1$ corresponds to an excessive number of very short durations, namely very volatile periods.

## 4. Results

#### 4.1. Parameter Estimates

Before turning to the out of sample analysis, it is worth first looking at some parameter estimates obtained by fitting the models on the data. Table 2 reports full sample estimation results over the period January 2001 to February 2009 for a total of 2200 daily observations.

Normal | Student | MEP | Skew-Normal | Skew-Student | Skew-MEP | NCT-APARCH | NCT-GARCH | |
---|---|---|---|---|---|---|---|---|

Univariate models | ||||||||

$\overline{\sigma}$ | 0.005 | 0.004 | 0.002 | 0.005 | 0.004 | 0.010 | 0.010 | 0.010 |

${a}_{1}$ | $\underset{(0.016)}{0.07}$ | $\underset{(0.013)}{0.06}$ | $\underset{(0.016)}{0.07}$ | $\underset{(0.029)}{0.07}$ | $\underset{(0.013)}{0.06}$ | $\underset{(0.015)}{0.07}$ | 0.05 | 0.05 |

${b}_{1}$ | $\underset{(0.017)}{0.92}$ | $\underset{(0.013)}{0.93}$ | $\underset{(0.017)}{0.92}$ | $\underset{(0.031)}{0.92}$ | $\underset{(0.014)}{0.93}$ | $\underset{(0.017)}{0.93}$ | 0.90 | 0.90 |

ν | $\underset{(1.358)}{8.44}$ | $\underset{(1.366)}{8.42}$ | 7.20 | 5.20 | ||||

γ | −0.360 | −0.120 | ||||||

β | $\underset{(0.041)}{1.875}$ | $\underset{(0.038)}{1.873}$ | ||||||

$\overline{\xi}$ | $\underset{(0.026)}{0.938}$ | $\underset{(0.023)}{0.935}$ | $\underset{(0.02)}{0.946}$ | |||||

LogLik | −2971 | −2946 | −2965 | −2968 | −2943 | −2963 | −2944 | -2963 |

AIC | 2.703 | 2.681 | 2.698 | 2.701 | 2.679 | 2.697 | 2.680 | 2.696 |

Multivariate models | ||||||||

${a}_{2}$ | $\underset{(0.001)}{0.021}$ | $\underset{(0.001)}{0.015}$ | $\underset{(0.001)}{0.022}$ | $\underset{(0.002)}{0.032}$ | $\underset{(0.001)}{0.015}$ | $\underset{(0.001)}{0.026}$ | ||

${b}_{2}$ | $\underset{(0.001)}{0.976}$ | $\underset{(0.001)}{0.983}$ | $\underset{(0.001)}{0.976}$ | $\underset{(0.002)}{0.966}$ | $\underset{(0.001)}{0.983}$ | $\underset{(0.002)}{0.972}$ | ||

ν | $\underset{(0.370)}{8.42}$ | $\underset{(0.37)}{8.30}$ | ||||||

$\overline{\xi}$ | $\underset{(0.035)}{1.016}$ | $\underset{(0.022)}{1.018}$ | $\underset{(0.034)}{1.018}$ | |||||

β | $\underset{(0.013)}{1.919}$ | $\underset{(0.031)}{2.036}$ | ||||||

LogLik | −32,154 | −31,352 | −32,193 | −32,121 | −31,330 | −32,067 | ||

AIC | 29.232 | 28.504 | 29.278 | 29.211 | 28.449 | 29.154 |

Note: The table reports test statistics and robust standard errors obtained from full sample parameter estimation, for T = 2200. Note that $\overline{\xi}$ denotes the value of the skewness parameters averaged across univariate series with Mean Asymptotic Square Errors (MASE) reported in brackets. For both the NCT-GARCH and NCT-APARCH models $\mathit{\psi}={(\mu ,\nu ,\gamma )}^{\prime}$, see Appendix D. The AIC is rescaled by T.

A common feature emerging from both univariate and multivariate panels is that the use of skewed distribution assumptions seems to be justified, as all asymmetric coefficients are significant at standard levels. Moreover, the direct comparison of models fit via the Akaike Information Criteria (AIC) highlights that the models incorporating skewed distributions consistently outperform their symmetric counterparts, with the skew-Student achieving the best fit. In both cases the Student has the lowest AIC among the symmetric densities, while the normal possesses the highest.

In the univariate setting, the NCT-APARCH model substantially improves over the NCT-GARCH due to the introduction of skewness into the volatility model, but it performs slightly worse than its closest competitor, the skew-Student GARCH model. This performance is even more impressive considering that the AIC penalizes the NCT-based models for only three parameters, given that remaining ones are fixed prior to maximization of the likelihood function according to the fast procedure of Krause and Paolella [16] (see Appendix D).

The evolution of the parameters across re-estimations $\tau =1,...,35$ allows us to discern their sensitivity to the financial crisis by comparing their dynamics against the key dates outlined in Table A1 in Appendix B. From Figure 1, which looks at the tail parameter estimates for the MEP and Student distributions along with their skewed counterparts, a number of commonalities are immediately apparent. Panel (a) reveals a small increase followed by a sharp drop in β between $\tau =18$ and 22 for both the univariate and multivariate MEP, indicating a thickening of the tails. Since the time frame corresponds to the windows including the events leading up to the collapse of Lehman Brothers on 15 September 2008, we interpret our results as β adjusting to incorporate the extreme negative events associated to this period.

Panel (b) exhibits the same dynamics as before albeit at a reduced magnitude. This occurs due to the introduction of skewness which is able to capture some of the negative returns associated to the crisis, thereby reducing the need for such a large increase in tail thickness. Evidence of such compensating effects between skewness and heavy-tails is given in Figure 2 which provides the evolution of the skew-parameter ξ across re-estimations (given as an average over the assets in the multivariate case for ease of exposition). Specifically, the dynamics of ξ for the univariate skew-MEP in Panel (a) appear to track the tail dynamics: as β decreases on average between τ equal 1 and 18, ξ exhibits a gradual increase, indicating a reduction of negative skewness. Moreover, the aforementioned drop is accompanied by ξ attaining its maximum value.

Turning to the Student and skew-Student distributions, the dynamics are somewhat different along a number of dimensions. First, there is now a marked difference between the univariate and multivariate cases. In the former, the dynamics are similar to the MEP and skew-MEP results wherein the tails thicken at around $\tau =18$ as evidenced by the sharp drop in ν. This is then followed by a short-lived increase after which the tails thicken gradually over the remaining estimation period. The reasoning for this, whereby the tails thicken to accommodate the negative returns associated to the crisis, is the same as the MEP-case.

Unlike the MEP and skew-MEP cases, where the multivariate specification exhibits thinner tails than the univariate but both exhibit congruent dynamics, the Student and skew-Student exhibit fewer similarities between specifications. In contrast, moving from the Student to skew-Student setting reveals almost identical tail parameter dynamics. Combined with the positive skew exhibited in Panel (b) of Figure 2, this indicates that in this case tail thickness does not adjust to incorporate the addition of skewness. The effect of these features on VaR forecast accuracy will be seen in Section 4.2.

#### 4.2. VaR Backtesting Results

In the multivariate setting, the out-of-sample covariance matrix predictions are used to construct equally-weighted portfolios for the computation of the one-step-ahead VaR. Table 3 compares portfolios standard deviation obtained using the univariate and multivariate approaches over both the in- and out-of-sample periods.

Univariate Models | ||||||||
---|---|---|---|---|---|---|---|---|

Normal | Student | MEP | Skew-Normal | Skew-Student | Skew-MEP | NCT-APARCH | NCT-GARCH | |

In-sample: 1 February 2001 to 23 January 2007 (1500 observations) | ||||||||

${\overline{\sigma}}_{p}$ | 0.8716 | 0.8722 | 0.8662 | 0.8709 | 0.8718 | 0.8780 | 0.8222 | 0.8720 |

$min\left\{{\sigma}_{p}\right\}$ | 0.4159 | 0.4155 | 0.3899 | 0.4134 | 0.4141 | 0.4457 | 0.3727 | 0.3852 |

$max\left\{{\sigma}_{p}\right\}$ | 2.7805 | 2.6827 | 2.8234 | 2.8147 | 2.6965 | 2.7095 | 2.5242 | 2.5826 |

Forecasting sample: 24 January 2007 to 30 October 2009 (700 observations) | ||||||||

${\overline{\sigma}}_{p}$ | 1.4659 | 1.4601 | 1.4800 | 1.4656 | 1.4550 | 1.4573 | 1.4569 | 1.4644 |

$min\left\{{\sigma}_{p}\right\}$ | 0.4398 | 0.4448 | 0.4139 | 0.4334 | 0.4427 | 0.4718 | 0.3745 | 0.3899 |

$max\left\{{\sigma}_{p}\right\}$ | 3.9589 | 3.8967 | 3.9743 | 3.9656 | 3.8033 | 3.9281 | 4.2310 | 3.1567 |

Multivariate Models | ||||||||

Normal | Student | MEP | Skew-Normal | Skew-Student | Skew-MEP | |||

In-sample: 1 February 2001 to 23 January 2007 (1500 observations) | ||||||||

${\overline{\sigma}}_{p}$ | 0.9021 | 0.9115 | 0.8983 | 0.9018 | 0.9115 | 0.8924 | ||

$min\left\{{\sigma}_{p}\right\}$ | 0.5226 | 0.5474 | 0.5077 | 0.5212 | 0.5478 | 0.4840 | ||

$max\left\{{\sigma}_{p}\right\}$ | 1.9067 | 1.7923 | 1.9391 | 1.9097 | 1.7935 | 1.9972 | ||

Forecasting sample: January 24, 2007 to October 30, 2009 (700 observations) | ||||||||

${\overline{\sigma}}_{p}$ | 1.4721 | 1.4646 | 1.4875 | 1.4740 | 1.4549 | 1.4691 | ||

$min\left\{{\sigma}_{p}\right\}$ | 0.5226 | 0.5474 | 0.5105 | 0.5233 | 0.5479 | 0.4908 | ||

$max\left\{{\sigma}_{p}\right\}$ | 3.1733 | 3.0657 | 3.2108 | 3.1728 | 3.0674 | 3.2683 |

Note: The table reports average, minimum and maximum value of portfolio standard deviation over the in- and out-of-sample periods.

As already noted, the financial crisis features heavily in the summary statistics. Since this period is included in the forecasting sample (starting from observation 1921 according to Table A1), we notice a sharp increase in the portfolio standard deviation of all the models (see also the figures reported in Appendix C). On a general basis, heavy-tailed or skewed distributions do not lead to remarkable gains over the in-sample period, as all the models report very similar values on average. This pattern is partly reversed in the forecasting period: while the normal and skew-normal remain very close, the skew-Student and skew-MEP exhibit a lower portfolio standard deviation than their symmetric counterparts. Remarkably good is the performance of the models featuring the NCT distribution; the NCT-APARCH achieves the lowest standard deviation among the univariate alternatives and compares favorably also to many multivariate models, being only outperformed by the skew-Student RBEKK. It appears that the uncertainty due to the larger number of parameters to be estimated is not fully compensated by the gain coming from a better representation of the volatility dynamics in multivariate models.

Overall, according to this table, the univariate and multivariate approaches deliver quite similar portfolio summary statistics. If the focus was on the predicted portfolio variance alone, then the ideal choice would be to use a univariate volatility model coupled with either the skew-Student or the NCT distribution, as they are easy to estimate and computationally faster than the multivariate specifications. Ultimately, we are interested in the models accuracy in forecasting the one-step-ahead portfolio VaR, so we move to analyse the outcomes of the statistical backtesting procedures.

Table 4 and Table 5 report the results from the TUFF, UC, IND, CC and DBI tests in the univariate and multivariate cases, respectively.

All statistical tests are computed for the 5% and 1% VaR confidence level. For each portfolio we report test statistics along with their corresponding p-values in brackets. Since the applied tests measure the models accuracy in forecasting VaR along several dimensions (as detailed in Section 3.2), the overall results are summarized using a performance measure which considers the percentage of acceptances of the null hypothesis across the different tests at the standard $5\%$ significance level. To this end, rejections of the null are highlighted in bold with the total grade across distributions reported in Table 6.

Norm | Skew-Norm | Student | Skew-Student | MEP | Skew-MEP | NCT-APARCH | NCT-GARCH | |
---|---|---|---|---|---|---|---|---|

5% VaR | ||||||||

violation/frequency | $\underset{(0.074)}{52}$ | $\underset{(0.060)}{42}$ | $\underset{(0.076)}{53}$ | $\underset{(0.041)}{29}$ | $\underset{(0.074)}{52}$ | $\underset{(0.057)}{40}$ | $\underset{(0.076)}{53}$ | $\underset{(0.036)}{25}$ |

TUFF | $\underset{\left(\mathbf{0}.\mathbf{014}\right)}{5.991}$ | $\underset{\left(\mathbf{0}.\mathbf{014}\right)}{5.991}$ | $\underset{\left(\mathbf{0}.\mathbf{014}\right)}{5.991}$ | $\underset{\left(\mathbf{0}.\mathbf{014}\right)}{5.991}$ | $\underset{\left(\mathbf{0}.\mathbf{014}\right)}{5.991}$ | $\underset{\left(\mathbf{0}.\mathbf{014}\right)}{5.991}$ | $\underset{(0.883)}{0.022}$ | $\underset{\left(\mathbf{0}.\mathbf{014}\right)}{5.991}$ |

UC | $\underset{\left(\mathbf{0}.\mathbf{006}\right)}{7.611}$ | $\underset{(0.239)}{1.389}$ | $\underset{\left(\mathbf{0}.\mathbf{004}\right)}{8.476}$ | $\underset{(0.284)}{1.147}$ | $\underset{\left(\mathbf{0}.\mathbf{006}\right)}{7.611}$ | $\underset{(0.396)}{0.720}$ | $\underset{(0.068)}{3.326}$ | $\underset{\left(\mathbf{0}.\mathbf{004}\right)}{8.476}$ |

IND | $\underset{\left(\mathbf{0}.\mathbf{003}\right)}{8.829}$ | $\underset{(0.104)}{2.646}$ | $\underset{\left(\mathbf{0}.\mathbf{003}\right)}{8.815}$ | $\underset{(0.283)}{1.155}$ | $\underset{\left(\mathbf{0}.\mathbf{005}\right)}{7.870}$ | $\underset{(0.195)}{1.676}$ | $\underset{\left(\mathbf{0}.\mathbf{023}\right)}{5.151}$ | $\underset{\left(\mathbf{0}.\mathbf{002}\right)}{9.861}$ |

CC | $\underset{\left(\mathbf{0}.\mathbf{000}\right)}{16.440}$ | $\underset{(0.133)}{4.035}$ | $\underset{\left(\mathbf{0}.\mathbf{000}\right)}{17.290}$ | $\underset{(0.316)}{2.302}$ | $\underset{\left(\mathbf{0}.\mathbf{000}\right)}{15.481}$ | $\underset{(0.302)}{2.397}$ | $\underset{\left(\mathbf{0}.\mathbf{014}\right)}{8.477}$ | $\underset{\left(\mathbf{0}.\mathbf{000}\right)}{18.337}$ |

DBI | $\underset{(0.880)}{0.023}$ | $\underset{(0.778)}{0.079}$ | $\underset{(0.768)}{0.087}$ | $\underset{(0.362)}{0.830}$ | $\underset{(0.831)}{0.046}$ | $\underset{(0.751)}{0.100}$ | $\underset{(0.372)}{0.797}$ | $\underset{(0.582)}{0.303}$ |

1% VaR | ||||||||

violation/frequency | $\underset{(0.021)}{15}$ | $\underset{(0.017)}{12}$ | $\underset{(0.017)}{12}$ | $\underset{(0.007)}{5}$ | $\underset{(0.020)}{14}$ | $\underset{(0.016)}{11}$ | $\underset{(0.016)}{11}$ | $\underset{(0.011)}{8}$ |

TUFF | $\underset{(0.232)}{1.426}$ | $\underset{(0.232)}{1.426}$ | $\underset{(0.232)}{1.426}$ | $\underset{(0.232)}{1.426}$ | $\underset{(0.232)}{1.426}$ | $\underset{(0.232)}{1.426}$ | $\underset{(0.232)}{1.426}$ | $\underset{(0.232)}{1.426}$ |

UC | $\underset{\left(\mathbf{0}.\mathbf{008}\right)}{6.957}$ | $\underset{(0.085)}{2.972}$ | $\underset{(0.085)}{2.972}$ | $\underset{(0.423)}{0.641}$ | $\underset{(0.019)}{5.479}$ | $\underset{(0.161)}{1.967}$ | $\underset{(0.710)}{0.138}$ | $\underset{(0.161)}{1.967}$ |

IND | $\underset{\left(\mathbf{0}.\mathbf{006}\right)}{7.638}$ | $\underset{(0.065)}{3.406}$ | $\underset{(0.065)}{3.406}$ | $\underset{(0.400)}{0.707}$ | $\underset{(0.014)}{6.072}$ | $\underset{(0.127)}{2.330}$ | $\underset{(0.568)}{0.326}$ | $\underset{(0.127)}{2.330}$ |

CC | $\underset{\left(\mathbf{0}.\mathbf{001}\right)}{14.595}$ | $\underset{\left(\mathbf{0}.\mathbf{041}\right)}{6.378}$ | $\underset{\left(\mathbf{0}.\mathbf{041}\right)}{6.378}$ | $\underset{(0.510)}{1.348}$ | $\underset{\left(\mathbf{0}.\mathbf{003}\right)}{11.551}$ | $\underset{(0.117)}{4.297}$ | $\underset{(0.793)}{0.464}$ | $\underset{(0.117)}{4.297}$ |

DBI | $\underset{(0.472)}{0.518}$ | $\underset{(0.723)}{0.126}$ | $\underset{(0.969)}{0.002}$ | $\underset{(0.985)}{0.000}$ | $\underset{(0.459)}{0.547}$ | $\underset{(0.596)}{0.281}$ | $\underset{(0.479)}{0.501}$ | $\underset{(0.596)}{0.281}$ |

Norm | Skew-Norm | Student | Skew-Student | MEP | Skew-MEP | |
---|---|---|---|---|---|---|

5% VaR | ||||||

violation/frequency | $\underset{(0.076)}{53}$ | $\underset{(0.073)}{51}$ | $\underset{(0.080)}{56}$ | $\underset{(0.054)}{38}$ | $\underset{(0.074)}{52}$ | $\underset{(0.063)}{44}$ |

TUFF | $\underset{\left(\mathbf{0}.\mathbf{014}\right)}{5.991}$ | $\underset{\left(\mathbf{0}.\mathbf{014}\right)}{5.991}$ | $\underset{\left(\mathbf{0}.\mathbf{014}\right)}{5.991}$ | $\underset{(0.883)}{0.022}$ | $\underset{\left(\mathbf{0}.\mathbf{014}\right)}{5.991}$ | $\underset{\left(\mathbf{0}.\mathbf{014}\right)}{5.991}$ |

UC | $\underset{\left(\mathbf{0}.\mathbf{004}\right)}{8.476}$ | $\underset{\left(\mathbf{0}.\mathbf{009}\right)}{6.789}$ | $\underset{\left(\mathbf{0}.\mathbf{001}\right)}{11.311}$ | $\underset{(0.608)}{0.264}$ | $\underset{\left(\mathbf{0}.\mathbf{006}\right)}{7.611}$ | $\underset{(0.133)}{2.260}$ |

IND | $\underset{\left(\mathbf{0}.\mathbf{002}\right)}{9.861}$ | $\underset{\left(\mathbf{0}.\mathbf{005}\right)}{7.849}$ | $\underset{\left(\mathbf{0}.\mathbf{000}\right)}{13.261}$ | $\underset{(0.601)}{0.273}$ | $\underset{\left(\mathbf{0}.\mathbf{003}\right)}{8.829}$ | $\underset{(0.113)}{2.516}$ |

CC | $\underset{\left(\mathbf{0}.\mathbf{000}\right)}{18.337}$ | $\underset{\left(\mathbf{0}.\mathbf{001}\right)}{14.637}$ | $\underset{\left(\mathbf{0}.\mathbf{000}\right)}{24.572}$ | $\underset{(0.765)}{0.536}$ | $\underset{\left(\mathbf{0}.\mathbf{000}\right)}{16.440}$ | $\underset{(0.092)}{4.776}$ |

DBI | $\underset{(0.354)}{0.858}$ | $\underset{(0.366)}{0.818}$ | $\underset{(0.305)}{1.052}$ | $\underset{(0.221)}{1.501}$ | $\underset{(0.372)}{0.797}$ | $\underset{(0.207)}{1.593}$ |

1% VaR | ||||||

violation/frequency | $\underset{(0.027)}{19}$ | $\underset{(0.026)}{18}$ | $\underset{(0.024)}{17}$ | $\underset{(0.011)}{8}$ | $\underset{(0.024)}{17}$ | $\underset{(0.021)}{15}$ |

TUFF | $\underset{(0.232)}{1.426}$ | $\underset{(0.232)}{1.426}$ | $\underset{(0.232)}{1.426}$ | $\underset{(0.232)}{1.426}$ | $\underset{(0.232)}{1.426}$ | $\underset{(0.232)}{1.426}$ |

UC | $\underset{\left(\mathbf{0}.\mathbf{000}\right)}{14.153}$ | $\underset{\left(\mathbf{0}.\mathbf{000}\right)}{12.176}$ | $\underset{\left(\mathbf{0}.\mathbf{001}\right)}{10.313}$ | $\underset{(0.710)}{0.138}$ | $\underset{\left(\mathbf{0}.\mathbf{001}\right)}{10.313}$ | $\underset{\left(\mathbf{0}.\mathbf{044}\right)}{4.051}$ |

IND | $\underset{\left(\mathbf{0}.\mathbf{000}\right)}{14.568}$ | $\underset{\left(\mathbf{0}.\mathbf{000}\right)}{13.106}$ | $\underset{\left(\mathbf{0}.\mathbf{001}\right)}{11.190}$ | $\underset{(0.568)}{0.326}$ | $\underset{\left(\mathbf{0}.\mathbf{001}\right)}{10.978}$ | $\underset{\left(0.199\right)}{1.652}$ |

CC | $\underset{\left(\mathbf{0}.\mathbf{000}\right)}{28.721}$ | $\underset{\left(\mathbf{0}.\mathbf{000}\right)}{25.282}$ | $\underset{\left(\mathbf{0}.\mathbf{000}\right)}{21.503}$ | $\underset{(0.793)}{0.464}$ | $\underset{\left(\mathbf{0}.\mathbf{000}\right)}{21.292}$ | $\underset{\left(0.057\right)}{5.702}$ |

DBI | $\underset{(0.078)}{3.109}$ | $\underset{(0.342)}{0.902}$ | $\underset{(0.055)}{3.798}$ | $\underset{(0.595)}{0.282}$ | $\underset{(0.210)}{1.573}$ | $\underset{(0.379)}{0.774}$ |

Norm | Skew-Norm | Student | Skew-Student | MEP | Skew-MEP | NCT-APARCH | NCT-GARCH | |
---|---|---|---|---|---|---|---|---|

5% VaR | ||||||||

Univariate grade | 20% | 80% | 20% | 80% | 20% | 80% | 60% | 20% |

Multivariate grade | 20% | 20% | 20% | 100% | 20% | 80% | ||

1% VaR | ||||||||

Univariate grade | 40% | 80% | 80% | 100% | 80% | 100% | 100% | 100% |

Multivariate grade | 40% | 40% | 40% | 100% | 40% | 80% |

The first distinguishing feature from the VaR backtesting results is the clear predominance of the skew-Student distribution. This holds for both the univariate and multivariate frameworks with the former producing VaR forecasts that outperform the NCT-APARCH at 5% VaR. Similarly, the skew-MEP distribution produces highly accurate VaR forecasts across the board. These findings exemplify a second feature of the results namely, the impact on VaR forecast accuracy of introducing skewness into a heavy-tailed distribution. Clearly, the performance of both the skew-Student and skew-MEP distributions improves compared to their symmetric counterparts, but this effect is less pronounced at 1% VaR. Indeed, the performance of the Student and MEP distributions improves when moving from the 5 % to the 1% VaR scenario, suggesting that despite the improvement arising from the introduction of skewness, heavy-tails remain useful in capturing larger swings in returns, i.e., those events located further out in the tails. As a final remark on the effect of skewness, we observe the large difference in performance between the NCT-APARCH and NCT-GARCH at 5% VaR. Though both feature a skewed distributional specification, recall that the NCT-APARCH also features asymmetry in the conditional volatility model which explains its superior performance. An analogous result is obtained in Giot and Laurent [11].

Analysis of the differences between the univariate and multivariate models reveals two key points. First, while the skew-Student and skew-MEP retain their dominance under both frameworks, the performance of their symmetric counterparts at 1% VaR is worse under the multivariate specification. Second, within the univariate framework, the skew-normal is capable of producing VaR forecasts comparable to the high performance skewed and heavy-tailed distributions at both VaR confidence levels. This does not hold in the multivariate setup where the skew-normal offers no improvements in VaR accuracy over the normal. Besides, we observe that in both frameworks the empirical failure rate (i.e., the frequency of violations) of the skewed distributions is closer to the nominal value than their symmetric counterparts, which are oversized. The NCT-GARCH represents the only exception, being considerably more conservative than the NCT-APARCH at both 5% and 1% VaR confidence levels.

With respect to the previous backtesting methods, the DQ test takes into account a more general temporal dependence between the series of violations and is considered the most reliable in assessing VaR accuracy 6. The DQ test results are reported in Table 7 and Table 8. In order to compare the different distributional assumptions, Table 9 summarises the percentage of null hypothesis acceptances over the two lag-lengths for each confidence level.

The DQ test results tell a similar story to the VaR backtesting procedures. As before, the skew-Student outperforms its competitors at both 5% and 1% VaR under both univariate and multivariate setups. The univariate skew-normal continues to produce VaR forecasts with an accuracy comparable to the top-performers. Again, this does not extend to the multivariate case.

Finally, forecast accuracy at 5% VaR again reveals a more pronounced improvement when moving from symmetric to skewed distributions than in the 1% case. Overall, introducing skewness into heavy-tailed distributions continues to offer the highest VaR forecast accuracy.

Norm | Skew-Norm | Student | Skew-Student | MEP | Skew-MEP | NCT-APARCH | NCT-GARCH | |
---|---|---|---|---|---|---|---|---|

5% VaR | ||||||||

K = 1 | ||||||||

$D{Q}_{UC}$ | $\underset{\left(\mathbf{0}.\mathbf{002}\right)}{9.319}$ | $\underset{(0.207)}{1.594}$ | $\underset{\left(\mathbf{0}.\mathbf{001}\right)}{10.092}$ | $\underset{(0.299)}{1.079}$ | $\underset{\left(\mathbf{0}.\mathbf{003}\right)}{8.965}$ | $\underset{(0.368)}{0.809}$ | $\underset{(0.074)}{3.188}$ | $\underset{\left(\mathbf{0}.\mathbf{001}\right)}{10.485}$ |

$D{Q}_{IND}$ | $\underset{(0.231)}{1.433}$ | $\underset{(0.280)}{1.168}$ | $\underset{(0.533)}{0.389}$ | $\underset{(0.886)}{0.020}$ | $\underset{(0.596)}{0.281}$ | $\underset{(0.352)}{0.867}$ | $\underset{(0.403)}{0.698}$ | $\underset{(0.199)}{1.651}$ |

$D{Q}_{CC}$ | $\underset{\left(\mathbf{0}.\mathbf{006}\right)}{10.189}$ | $\underset{(0.264)}{2.665}$ | $\underset{\left(\mathbf{0}.\mathbf{006}\right)}{10.201}$ | $\underset{(0.581)}{1.087}$ | $\underset{\left(\mathbf{0}.\mathbf{011}\right)}{9.036}$ | $\underset{(0.441)}{1.636}$ | $\underset{(0.159)}{3.680}$ | $\underset{\left(\mathbf{0}.\mathbf{003}\right)}{11.463}$ |

K = 3 | ||||||||

$D{Q}_{UC}$ | $\underset{\left(\mathbf{0}.\mathbf{008}\right)}{7.028}$ | $\underset{(0.274)}{1.196}$ | $\underset{\left(\mathbf{0}.\mathbf{009}\right)}{6.821}$ | $\underset{(0.265)}{1.240}$ | $\underset{\left(\mathbf{0}.\mathbf{013}\right)}{6.193}$ | $\underset{(0.469)}{0.525}$ | $\underset{(0.082)}{3.024}$ | $\underset{\left(\mathbf{0}.\mathbf{005}\right)}{8.068}$ |

$D{Q}_{IND}$ | $\underset{(0.426)}{0.633}$ | $\underset{(0.616)}{0.251}$ | $\underset{\left(\mathbf{0}.\mathbf{033}\right)}{4.536}$ | $\underset{(0.399)}{0.712}$ | $\underset{(0.053)}{3.738}$ | $\underset{(0.838)}{0.042}$ | $\underset{(0.749)}{0.102}$ | $\underset{(0.537)}{0.382}$ |

$D{Q}_{CC}$ | $\underset{\left(\mathbf{0}.\mathbf{015}\right)}{12.280}$ | $\underset{(0.663)}{2.398}$ | $\underset{\left(\mathbf{0}.\mathbf{001}\right)}{18.210}$ | $\underset{(0.621)}{2.633}$ | $\underset{\left(\mathbf{0}.\mathbf{004}\right)}{15.467}$ | $\underset{(0.805)}{1.624}$ | $\underset{(0.265)}{5.220}$ | $\underset{\left(\mathbf{0}.\mathbf{011}\right)}{12.955}$ |

1% VaR | ||||||||

K = 1 | ||||||||

$D{Q}_{UC}$ | $\underset{\left(\mathbf{0}.\mathbf{002}\right)}{9.622}$ | $\underset{(0.053)}{3.744}$ | $\underset{(0.055)}{3.744}$ | $\underset{(0.446)}{0.580}$ | $\underset{\left(\mathbf{0}.\mathbf{007}\right)}{7.357}$ | $\underset{(0.122)}{2.393}$ | $\underset{(0.698)}{0.151}$ | $\underset{(0.122)}{2.393}$ |

$D{Q}_{IND}$ | $\underset{(0.398)}{0.713}$ | $\underset{(0.547)}{0.363}$ | $\underset{(0.557)}{0.363}$ | $\underset{(0.872)}{0.026}$ | $\underset{(0.447)}{0.579}$ | $\underset{(0.597)}{0.280}$ | $\underset{(0.744)}{0.107}$ | $\underset{(0.597)}{0.280}$ |

$D{Q}_{CC}$ | $\underset{\left(\mathbf{0}.\mathbf{007}\right)}{9.985}$ | $\underset{(0.136)}{3.991}$ | $\underset{(0.136)}{3.991}$ | $\underset{(0.741)}{0.598}$ | $\underset{(\mathbf{0}.\mathbf{021})}{7.680}$ | $\underset{(0.272)}{2.603}$ | $\underset{(0.881)}{0.254}$ | $\underset{(0.272)}{2.603}$ |

K = 3 | ||||||||

$D{Q}_{UC}$ | $\underset{\left(\mathbf{0}.\mathbf{003}\right)}{9.078}$ | $\underset{(0.066)}{3.376}$ | $\underset{(0.066)}{3.376}$ | $\underset{(0.541)}{0.373}$ | $\underset{\left(\mathbf{0}.\mathbf{009}\right)}{6.848}$ | $\underset{(0.146)}{2.118}$ | $\underset{(0.723)}{0.126}$ | $\underset{(0.146)}{2.118}$ |

$D{Q}_{IND}$ | $\underset{(0.910)}{0.013}$ | $\underset{(0.468)}{0.528}$ | $\underset{(0.468)}{0.528}$ | $\underset{(0.432)}{0.617}$ | $\underset{(0.751)}{0.100}$ | $\underset{(0.349)}{0.876}$ | $\underset{(0.106)}{2.605}$ | $\underset{(0.349)}{0.876}$ |

$D{Q}_{CC}$ | $\underset{\left(\mathbf{0}.\mathbf{008}\right)}{13.749}$ | $\underset{(\mathbf{0}.\mathbf{047})}{9.654}$ | $\underset{(\mathbf{0}.\mathbf{047})}{9.654}$ | $\underset{\left(\mathbf{0}.\mathbf{001}\right)}{19.505}$ | $\underset{(\mathbf{0}.\mathbf{018})}{11.961}$ | $\underset{(0.057)}{9.171}$ | $\underset{\left(\mathbf{0}.\mathbf{028}\right)}{10.840}$ | $\underset{(0.057)}{9.171}$ |

Norm | Skew-norm | Student | Skew-Student | MEP | Skew-MEP | |
---|---|---|---|---|---|---|

5% VaR | ||||||

K = 1 | ||||||

$D{Q}_{UC}$ | $\underset{\left(\mathbf{0}.\mathbf{001}\right)}{10.485}$ | $\underset{\left(\mathbf{0}.\mathbf{004}\right)}{8.226}$ | $\underset{\left(\mathbf{0}.\mathbf{000}\right)}{14.416}$ | $\underset{(0.597)}{0.280}$ | $\underset{\left(\mathbf{0}.\mathbf{002}\right)}{9.319}$ | $\underset{(0.111)}{2.538}$ |

$D{Q}_{IND}$ | $\underset{(0.199)}{1.651}$ | $\underset{(0.267)}{1.232}$ | $\underset{(0.121)}{2.406}$ | $\underset{(0.993)}{0.000}$ | $\underset{(0.231)}{1.433}$ | $\underset{(0.610)}{0.261}$ |

$D{Q}_{CC}$ | $\underset{\left(\mathbf{0}.\mathbf{003}\right)}{11.463}$ | $\underset{\left(\mathbf{0}.\mathbf{011}\right)}{8.990}$ | $\underset{\left(\mathbf{0}.\mathbf{000}\right)}{15.751}$ | $\underset{(0.869)}{0.280}$ | $\underset{\left(\mathbf{0}.\mathbf{006}\right)}{10.189}$ | $\underset{(0.256)}{2.727}$ |

K = 3 | ||||||

$D{Q}_{UC}$ | $\underset{\left(\mathbf{0}.\mathbf{005}\right)}{7.733}$ | $\underset{\left(\mathbf{0}.\mathbf{014}\right)}{6.096}$ | $\underset{\left(\mathbf{0}.\mathbf{001}\right)}{10.767}$ | $\underset{(0.599)}{0.277}$ | $\underset{\left(\mathbf{0}.\mathbf{008}\right)}{7.028}$ | $\underset{(0.205)}{1.607}$ |

$D{Q}_{IND}$ | $\underset{(0.316)}{1.006}$ | $\underset{(0.339)}{0.914}$ | $\underset{(0.406)}{0.690}$ | $\underset{(0.374)}{0.789}$ | $\underset{(0.426)}{0.633}$ | $\underset{(0.076)}{3.148}$ |

$D{Q}_{CC}$ | $\underset{\left(\mathbf{0}.\mathbf{006}\right)}{14.578}$ | $\underset{\left(\mathbf{0}.\mathbf{019}\right)}{11.839}$ | $\underset{\left(\mathbf{0}.\mathbf{001}\right)}{18.650}$ | $\underset{(0.616)}{2.662}$ | $\underset{\left(\mathbf{0}.\mathbf{015}\right)}{12.280}$ | $\underset{(0.056)}{9.201}$ |

1% VaR | ||||||

K = 1 | ||||||

$D{Q}_{UC}$ | $\underset{\left(\mathbf{0}.\mathbf{000}\right)}{19.549}$ | $\underset{\left(\mathbf{0}.\mathbf{000}\right)}{18.210}$ | $\underset{\left(\mathbf{0}.\mathbf{000}\right)}{15.080}$ | $\underset{(0.698)}{0.151}$ | $\underset{\left(\mathbf{0}.\mathbf{000}\right)}{13.358}$ | $\underset{\left(\mathbf{0}.\mathbf{002}\right)}{9.622}$ |

$D{Q}_{IND}$ | $\underset{(0.258)}{1.278}$ | $\underset{(0.280)}{1.167}$ | $\underset{(0.308)}{1.041}$ | $\underset{(0.744)}{0.107}$ | $\underset{(0.148)}{2.095}$ | $\underset{(0.398)}{0.713}$ |

$D{Q}_{CC}$ | $\underset{\left(\mathbf{0}.\mathbf{000}\right)}{22.121}$ | $\underset{\left(\mathbf{0}.\mathbf{000}\right)}{18.684}$ | $\underset{\left(\mathbf{0}.\mathbf{000}\right)}{15.521}$ | $\underset{(0.881)}{0.254}$ | $\underset{\left(\mathbf{0}.\mathbf{000}\right)}{16.575}$ | $\underset{\left(\mathbf{0}.\mathbf{007}\right)}{9.985}$ |

K = 3 | ||||||

$D{Q}_{UC}$ | $\underset{\left(\mathbf{0}.\mathbf{000}\right)}{13.935}$ | $\underset{\left(\mathbf{0}.\mathbf{000}\right)}{14.056}$ | $\underset{\left(\mathbf{0}.\mathbf{002}\right)}{9.885}$ | $\underset{(0.723)}{0.126}$ | $\underset{\left(\mathbf{0}.\mathbf{001}\right)}{10.157}$ | $\underset{\left(\mathbf{0}.\mathbf{009}\right)}{6.809}$ |

$D{Q}_{IND}$ | $\underset{\left(\mathbf{0}.\mathbf{000}\right)}{20.104}$ | $\underset{(0.094)}{6.375}$ | $\underset{\left(\mathbf{0}.\mathbf{000}\right)}{15.843}$ | $\underset{(0.106)}{2.605}$ | $\underset{\left(\mathbf{0}.\mathbf{000}\right)}{14.404}$ | $\underset{\left(0.255\right)}{4.056}$ |

$D{Q}_{CC}$ | $\underset{\left(\mathbf{0}.\mathbf{000}\right)}{44.966}$ | $\underset{\left(\mathbf{0}.\mathbf{000}\right)}{35.949}$ | $\underset{\left(\mathbf{0}.\mathbf{000}\right)}{47.702}$ | $\underset{(0.065)}{8.842}$ | $\underset{\left(\mathbf{0}.\mathbf{000}\right)}{33.192}$ | $\underset{(0.054)}{9.271}$ |

Norm | Skew-Norm | Student | Skew-Student | MEP | Skew-MEP | NCT-APARCH | NCT-GARCH | |
---|---|---|---|---|---|---|---|---|

5% VaR | ||||||||

Univariate grade | 33% | 100% | 17% | 100% | 33% | 100% | 100% | 33% |

Multivariate grade | 33% | 33% | 33% | 100% | 33% | 100% | ||

1% VaR | ||||||||

Univariate grade | 33% | 83% | 83% | 83% | 33% | 100% | 83% | 100% |

Multivariate grade | 17% | 33% | 17% | 100% | 17% | 50% |

## 5. Conclusions

Given its importance in risk management, practitioners must be capable of forecasting the Value-at-Risk of their asset portfolios to a high degree of accuracy. This requires taking into account a number of properties of financial returns namely, non-normality, heavy-tails, skewness and the possibility of comovements between assets. In this article, we focus primarily on the effect of varying the distributional assumption used to forecast VaR. Moreover, we addressed the still open question of whether univariate or multivariate models are most appropriate for the problem of portfolio VaR forecasting.

The distributions treated in the paper comprised three symmetric and three skewed distributional assumptions (i.e., normal, Student, MEP and their skewed counterparts) which were coupled with the RBEKK model in the multivariate framework and GARCH in the univariate. In addition, we compared our specification with a novel method for fast estimation of the (univariate) NCT-GARCH and NCT-APARCH models. We then proceeded to the models’ accuracy in predicting equally-weighted portfolio VaR.

Employing a series of standard backtesting methods to compare the distribution-based model performance, the results reveal that the skew-Student specification produces the most accurate one-step ahead VaR forecast across all multivariate specifications and is able to compete with the high-performance NCT-APARCH in the univariate setup. This finding is echoed in the univariate MEP results wherein the skewed version outperforms the symmetric across all tests and confidence levels. By contrast, the multivariate skew-MEP exhibits thinner tails than the normal and thus, performs poorly at the 1% confidence level. More generally, The test results reveal a clear hierarchy of distributional assumptions within the univariate and multivariate setups, with that hierarchy being preserved when moving from one to the other. However, attempting to compare the performance of univariate to multivariate distributions in general does not reveal any key differences with the exception of certain cases in the skew-normal and skew-MEP specifications. Consequently, the additional computational burden of estimating multivariate models does not seem to be justified.

There are several possible avenues of research extending from this work. Given the focus on distributions, we limited our attention to relatively simple parametric models namely, the RBEKK and GARCH models. As the NCT-GARCH/APARCH results showed, capturing asymmetries in returns volatility improves forecast accuracy. Considering their multivariate versions would provide a useful contribution and allow for a detailed study of the optimal model-distribution-dimension combination. Another possibility would be to consider higher forecast horizons for the VaR in order to check if the inclusion of skewness and asymmetric forms of dependence can lead to significant improvements in the long run.

## Acknowledgments

Manuela Braione and Nicolas K. Scholtes acknowledge support of the “Communauté française de Belgique” through contracts “Projet d’Actions de Recherche Concertées” “12/17-045” and “13/17-055”, respectively granted by the Académie universitaire Louvain. We thank seminar participants at the 13th Journée d’économétrie at the Université Paris Ouest Nanterre La Défense along with Luc Bauwens and Florian Ielpo for useful comments. We also thank two anonymous referees and the academic editors for their comments and advice, all of which led to considerable improvements in the paper.

## Author Contributions

Manuela Braione and Nicolas K. Scholtes contributed equally to the writing of the MATLAB code as well as the preparation of the current manuscript.

## Conflicts of Interest

The authors declare no conflict of interest.

## Appendixes

## A. Derivations

#### A.1. Transformation

The transformation ${z}_{t}={H}_{t}^{-1/2}{y}_{t}$ is incorporated into the symmetric, standardised pdfs as follows:
where ${p}_{ij}$ corresponds to the ${j}^{th}$ element of the ${i}^{th}$ row of ${H}_{t}^{-1/2}$. Note that the t subscript is dropped for simplicity. The matrix square root operation is carried out by applying the Cholesky decomposition of ${H}_{t}$ such that $B{B}^{\prime}={H}_{t}$. As a result, each ${z}_{i}$ is obtained by multiplying the row vector of ${H}_{t}^{-1/2}$ corresponding to asset i with the demeaned return vector (giving us the inner summation above) which is then multiplied by the univariate standard deviation and added to the univariate mean. The presence of skewness is factored in by the term ${\xi}_{i}^{{I}_{i}}$, where the factor ${I}_{i}$ is defined as in Equation (17).

$$\begin{array}{ccc}\hfill {\kappa}^{\u2605\prime}{\kappa}^{\u2605}& =& {({\kappa}_{1}^{\u2605},...,{\kappa}_{N}^{\u2605})}^{\prime}({\kappa}_{1}^{\u2605},...,{\kappa}_{N}^{\u2605})\hfill \\ & =& {\left(\dots ({s}_{i}{z}_{i}+{m}_{i}){\xi}_{i}^{{I}_{i}}\dots \right)}^{\prime}\left(\dots ({s}_{i}{z}_{i}+{m}_{i}){\xi}_{i}^{{I}_{i}}\dots \right)\hfill \\ & =& {\left({\displaystyle \dots ({s}_{i}\sum _{j=1}^{N}{p}_{ij}{y}_{j}+{m}_{i}){\xi}_{i}^{{I}_{i}}\dots}\right)}^{\prime}\left({\displaystyle \dots ({s}_{i}\sum _{j=1}^{N}{p}_{ij}{y}_{j}+{m}_{i}){\xi}_{i}^{{I}_{i}}\dots}\right)\hfill \\ & =& {\displaystyle \sum _{i=1}^{N}\left({\displaystyle {s}_{i}\sum _{j=1}^{N}{p}_{ij}{y}_{j}+{m}_{i}}\right){\xi}_{i}^{2{I}_{i}},}\hfill \end{array}$$

#### A.2. Distributions Moments

We report the first two moments of the univariate symmetric normal, Student and MEP distributions used to compute the log-likelihood function as given in Section 2.3.

Note that the following relation holds:
Hence, substituting this result into Equation (14), the 2nd order moment of the skewed distributions can be obtained as a function of the first:

$${m}_{i}={M}_{i,1}\left({\xi}_{i}-\frac{1}{{\xi}_{i}}\right)\Rightarrow {M}_{i,1}^{2}={m}_{i}^{2}\left(\frac{{\xi}_{i}^{2}}{{({\xi}_{i}^{2}-1)}^{2}}\right).$$

$$\begin{array}{ccc}\hfill {s}_{i}^{2}& =& {M}_{i,1}^{2}\left(-{\xi}_{i}^{2}-\frac{1}{{\xi}_{i}^{2}}+2\right)+{M}_{i,2}\left({\xi}_{i}^{2}+\frac{1}{{\xi}_{i}^{2}}-1\right)\hfill \\ & =& \frac{{\xi}_{i}^{2}}{{({\xi}_{i}^{2}-1)}^{2}}\left(\frac{-{\xi}_{i}^{4}+2{\xi}_{i}^{2}-1}{{\xi}_{i}^{2}}\right){m}_{i}^{2}+{M}_{i,2}\left(\frac{{\xi}_{i}^{4}-{\xi}_{i}^{2}+1}{{\xi}_{i}^{2}}\right)\hfill \\ & =& \frac{{\xi}_{i}^{2}}{{({\xi}_{i}^{2}-1)}^{2}}\left(\frac{(-{\xi}_{i}^{2}+1)({\xi}_{i}^{2}-1)}{{\xi}_{i}^{2}}\right){m}_{i}^{2}+{M}_{i,2}\frac{{\xi}_{i}^{2}\left({\xi}_{i}^{2}-1+\frac{1}{{\xi}_{i}^{2}}\right)}{{\xi}_{i}^{2}}\hfill \\ & =& {m}_{i}^{2}\left(\frac{1-{\xi}_{i}^{2}}{{\xi}_{i}^{2}-1}\right)+{M}_{i,2}\left({\xi}_{i}^{2}+\frac{1}{{\xi}_{i}^{2}}-1\right).\hfill \end{array}$$

**Skew-Normal**

$$\begin{array}{ccc}\hfill {M}_{i,1}& =& {\displaystyle {\int}_{0}^{\infty}\frac{2}{\sqrt{2\pi}}uexp\left\{-\frac{1}{2}{u}^{2}\right\}du=\sqrt{\frac{2}{\pi}}{\int}_{0}^{\infty}uexp\left\{-\frac{1}{2}{u}^{2}\right\}du=\sqrt{\frac{2}{\pi}}}\hfill \\ \hfill {M}_{i,2}& =& {\displaystyle {\int}_{0}^{\infty}\frac{2}{\sqrt{2\pi}}{u}^{2}exp\left\{-\frac{1}{2}{u}^{2}\right\}du=\sqrt{\frac{2}{\pi}}{\int}_{0}^{\infty}{u}^{2}exp\left\{-\frac{1}{2}{u}^{2}\right\}du=1}\hfill \end{array}$$

**Skew-Student**

$$\begin{array}{ccc}\hfill {M}_{i,1}& =& {\displaystyle \frac{2\Gamma \left(\frac{\nu +1}{2}\right)}{\Gamma \left(\frac{\nu}{2}\right)\sqrt{\pi (\nu -2)}}{\int}_{0}^{\infty}u{\left(1+\frac{{u}^{2}}{\nu -2}\right)}^{-\frac{1+\nu}{2}}du}\hfill \\ & =& \frac{2\sqrt{\nu -2}\left(\frac{\nu -1}{2}\right)\Gamma \left(\frac{\nu -1}{2}\right)}{\sqrt{\pi}(\nu -1)\Gamma \left(\frac{\nu}{2}\right)}\hfill \\ & =& \frac{2\sqrt{\nu -2}\Gamma \left(\frac{\nu +1}{2}\right)}{\sqrt{\pi}(\nu -1)\Gamma \left(\frac{\nu}{2}\right)}\hfill \\ & =& \frac{\Gamma \left(\frac{\nu -1}{2}\right)\sqrt{\nu -2}}{\sqrt{\pi}\Gamma \left(\frac{\nu}{2}\right)}\hfill \\ \hfill {M}_{i,2}& =& {\displaystyle \frac{2\Gamma \left(\frac{\nu +1}{2}\right)}{\Gamma \left(\frac{\nu}{2}\right)\sqrt{\pi (\nu -2)}}{\int}_{0}^{\infty}{u}^{2}{\left(1+\frac{{u}^{2}}{\nu -2}\right)}^{-\frac{1+\nu}{2}}du}\hfill \\ & =& \frac{(\nu -2)\Gamma \left(\frac{\nu}{2}-1\right)}{2\Gamma \left(\frac{\nu}{2}\right)}\hfill \\ & =& \frac{(\nu -2)\Gamma \left(\frac{\nu}{2}-1\right)}{2\left(\frac{\nu -2}{2}\right)\Gamma \left(\frac{\nu}{2}-1\right)}=1\hfill \end{array}$$

**Skew-MEP**

$$\begin{array}{ccc}\hfill {M}_{i,1}& =& {\displaystyle \frac{{2}^{-\frac{1}{\beta}}}{\Gamma \left(1+\frac{1}{\beta}\right)}{\int}_{0}^{\infty}uexp\left\{-\frac{1}{2}{u}^{\beta}\right\}du}\hfill \\ & =& \frac{{2}^{-1+\frac{1}{\beta}}\Gamma \left(\frac{2+\beta}{\beta}\right)}{\Gamma \left(1+\frac{1}{\beta}\right)}\hfill \\ \hfill {M}_{i,2}& =& {\displaystyle \frac{{2}^{-\frac{1}{\beta}}}{\Gamma \left(1+\frac{1}{\beta}\right)}{\int}_{0}^{\infty}{u}^{2}exp\left\{-\frac{1}{2}{u}^{\beta}\right\}du}\hfill \\ & =& \frac{{4}^{\frac{1}{\beta}}\Gamma \left(\frac{3}{\beta}\right)}{\beta \Gamma \left(1+\frac{1}{\beta}\right)}\hfill \end{array}$$

## B. Tables

Rolling Fixed-Window | Forecast Horizon | |||
---|---|---|---|---|

It. | Observations | Days | Observations | Days |

1 | 1–1500 | 1 Febraury 2001–23 January 2007 | 1501–1520 | 24 January 2007–21 Febraury 2007 |

2 | 21–1520 | 2 March 2001–21 Febraury 2007 | 1521–1540 | 22 Febraury 2007–21 March 2007 |

3 | 41–1540 | 30 March 2001–21 March 2007 | 1541–1560 | 22 March 2007–19 April 2007 |

4 | 61–1560 | 30 April 2001–19 April 2007 | 1561–1580 | 20 April 2007–17 May 2007 |

5 | 81–1580 | 29 May2001–17 May 2007 | 1581–1600 | 18 May 2007–15 June 2007 |

6 | 101–1600 | 26 June 2001– 15 June 2007 | 1601–1620 | 18 June 2007–16 July 2007 |

7 | 121–1620 | 25 July 2001–16 July 2007 | 1621–1640 | 17 July 2007–13 August 2007 |

8 | 141–1640 | 22 August 2001–13 August 2007 | 1641–1660 | 14 August 2007–11 September 2007 |

9 | 161–1660 | 26 September 2001–11 September 2007 | 1661–1680 | 12 September 2007–9 October 2007 |

10 | 181–1680 | 24 October 2001–9 October 2007 | 1681–1700 | 10 October 2007–6 November 2007 |

11 | 201–1700 | 21 November 2001–6 Novembe 2007 | 1701–1720 | 7 November 2007–5 December 2007 |

12 | 221–1720 | 20 December 2001–5 December 2007 | 1721–1740 | 6 December 2007–4 January 2008 |

13 | 241–1740 | 22 January 2002–4 January 2008 | 1741–1760 | 7 January 2008–4 Febraury 2008 |

14 | 261–1760 | 20 Febraury 2002–4 Febraury 2008 | 1761–1780 | 5 Febraury 2008– 4 March 2008 |

15 | 281–1780 | 20 March 2002–4 March 2008 | 1781–1800 | 5 March 2008–2 April 2008 |

16 | 301–1800 | 18 April 2002–2 April 2008 | 1801–1820 | 3 April 2008–30 April 2008 |

17 | 321–1820 | 16 May 2002–30 May 2008 | 1821–1840 | 1 May 2008–29 May 2008 |

18 | 341–1840 | 14 June 2002–29 May 2008 | 1841–1860 | 30 May 2008–26 June 2008 |

19 | 361–1860 | 15 July 2002–26 June 2008 | 1861–1880 | 27 June 2008–25 July 2008 |

20 | 381–1880 | 12 August 2002–25 July 2008 | 1881–1900 | 28 July 2008– 22 August 2008 |

21 | 401–1900 | 10 September2002–22 August 2008 | 1901–1920 | 25 August 2008– 22 September 2008 |

22 | 421–1920 | 8 October 2002–22 September 2008 | 1921–1940 | 23 September 2008–20 October 2008 |

23 | 441–1940 | 5 November 2002–20 October 2008 | 1941–1960 | 21 October 2008–17 November 2008 |

24 | 461-1960 | 4 December 2002–17 November 2008 | 1961–1980 | 18 November 2008–16 December 2008 |

25 | 481–1980 | 3 January 2003–16 December 2008 | 1981–2000 | 17 December 2008–15 January 2009 |

26 | 501–2000 | 3 Febraury 2003–15 January 2009 | 2001–2020 | 16 January 2009–13 Febraury 2009 |

27 | 521–2020 | 4 March 2003–13 Febraury 2009 | 2021–2040 | 17 Febraury 2009–16 March 2009 |

28 | 541–2040 | 1 April 2003–16 March 2009 | 2041–2060 | 17 March 2009–14 April 2009 |

29 | 561–2060 | 30 April 2003–14 April 2009 | 2061–2080 | 15 April 2009–12 May 2009 |

30 | 581–2080 | 29 May 2003–12 May 2009 | 2081–2100 | 13 May 2009–10 June 2009 |

31 | 601–2100 | 26 June 2003–10 June 2009 | 2101–2120 | 11 June 2009–9 July 2009 |

32 | 621–2120 | 25 July 2003–9 July 2009 | 2121–2140 | 10 July 2009–6 August 2009 |

33 | 641–2140 | 22 August 2003–6 August 2009 | 2141–2160 | 7 August 2009–3 September 2009 |

34 | 661–2160 | 22 September 2003–3 September 2009 | 2161–2180 | 4 September 2009–2 October 2009 |

35 | 681-2180 | 20 October 2003–2 October 2009 | 2181–2200 | 5 October 2009–30 October 2009 |

## C. Figures

## D. The Univariate NCT-APARCH Model

The main ingredients of the NCT-APARCH model discussed in the paper by Krause and Paolella [16] are, as the name may suggest, the use of a univariate noncentral t-distribution assumption for the error term and of an APARCH specification for the conditional variance equation.

The NCT density function is given as follows:
where $\nu >0$ denotes the degrees of freedom parameter and $\gamma \in \Re $ is the noncentrality parameter dictating the degree of asymmetry (with the Student distribution recovered when $\gamma =0$).

$$\begin{array}{ccc}\hfill {f}_{Z}\left(t;\nu ,\gamma \right)=& {e}^{-{\gamma}^{2}/2}& \frac{\Gamma ((\nu +1)/2){\nu}^{\nu /2}}{\sqrt{\pi}\Gamma (\nu /2)}{\left(\frac{1}{\nu +{t}^{2}}\right)}^{\frac{\nu +1}{2}}\hfill \\ & \times & \left[{\displaystyle \sum _{i=0}^{\infty}\frac{{\left(t\gamma \right)}^{i}}{i!}{\left(\frac{2}{{t}^{2}+\nu}\right)}^{i/2}\frac{\Gamma \left\{\right(\nu +i+1)/2\}}{\Gamma \left\{\right(\nu +1)/2\}}}\right],\hfill \end{array}$$

The evolution of the conditional variance is modeled according to the APARCH model, which allows for both heavy-tails and asymmetry. It is defined as follows:
where ${a}_{1}>0$, ${b}_{1}\ge 0$, $\delta >0$ and $|{\gamma}_{1}|<1$. In order to speed up the estimation procedure, the authors suggest to fix some parameters before the maximization of the likelihood function. In this paper, we calibrated the model parameters on our dataset in order to achieve the highest in-sample likelihood value (this applies also for the model incorporating the GARCH specification). Specifically, we set $\overline{\sigma}=0.01$, ${a}_{1}=0.05$, ${b}_{1}=0.90$, $\delta =2$ and ${\gamma}_{1}=0.4$. The vector of parameters to be estimated, then, reduces to $\mathit{\psi}={(\mu ,\nu ,\gamma )}^{\prime}$, where $E\left({y}_{t}\right)=\mu $ denotes the location coefficient which is fastly estimated using the method of trimmed mean. We refer to the original paper by Krause and Paolella [16] for further details on the implemented estimation procedure.

$${\sigma}_{t}^{\delta}=\overline{\sigma}+{a}_{1}(|{\u03f5}_{t-1}{|-{\gamma}_{1}{\u03f5}_{t-1})}^{\delta}+{b}_{1}{\sigma}_{t-1}^{\delta}$$

## E. Backtesting VaR: Augmented Independence and Conditional Coverage Tests

Table A2 and Table A3 report test-statistics and corresponding p-values obtained from the Independence (IND) and Conditional Coverage (CC) tests of Christoffersen (see Section 4.2) computed with a number of lagged observations $K>1$.

The tests are built in an extended Markov framework that allows for higher, or K-th order, dependence in the VaR observations, thus highlighting possible clustering in the series of violations. The distribution of the generalized IND and CC tests is asymptotically ${\chi}_{\left(1\right)}^{2}$ and ${\chi}_{\left(2\right)}^{2}$, respectively. For further details on the construction of the tests we remand the interested reader to theoretical framework developed in Pajhede [39].

In order to make these results directly comparable with those given in Table 7 and Table 8, we set the number of lags $K=3$ and run the tests for the 5% and $1\%$ VaR confidence level.

Results for both univariate and multivariate specifications are reported below.

Norm | Skew-Norm | Student | Skew-Student | MEP | Skew-MEP | NCT-APARCH | NCT-GARCH | |
---|---|---|---|---|---|---|---|---|

5% VaR | ||||||||

IND (K = 3) | $\underset{(0.358)}{0.843}$ | $\underset{(0.699)}{0.1494}$ | $\underset{(0.130)}{2.294}$ | $\underset{(0.299)}{1.0807}$ | $\underset{(0.179)}{1.802}$ | $\underset{(0.924)}{0.0091}$ | $\underset{(0.670)}{0.182}$ | $\underset{(0.448)}{0.576}$ |

CC (K = 3) | $\underset{\left(\mathbf{0}.\mathbf{020}\right)}{7.779}$ | $\underset{(0.540)}{1.233}$ | $\underset{\left(\mathbf{0}.\mathbf{007}\right)}{10.062}$ | $\underset{(0.273)}{2.596}$ | $\underset{\left(\mathbf{0}.\mathbf{013}\right)}{8.739}$ | $\underset{(0.775)}{0.511}$ | $\underset{(0.181)}{3.419}$ | $\underset{\left(\mathbf{0}.\mathbf{015}\right)}{8.344}$ |

1% VaR | ||||||||

IND (K = 3) | $\underset{(0.955)}{0.003}$ | $\underset{(0.626)}{0.2371}$ | $\underset{(0.626)}{0.237}$ | $\underset{(0.082)}{3.0345}$ | $\underset{(0.844)}{0.038}$ | $\underset{(0.523)}{0.408}$ | $\underset{(0.256)}{1.288}$ | $\underset{(0.523)}{0.408}$ |

CC (K = 3) | $\underset{\left(\mathbf{0}.\mathbf{030}\right)}{7.030}$ | $\underset{(0.197)}{3.253}$ | $\underset{(0.197)}{3.253}$ | $\underset{(0.161)}{3.658}$ | $\underset{(0.061)}{5.579}$ | $\underset{(0.300)}{2.410}$ | $\underset{(0.488)}{1.436}$ | $\underset{(0.300)}{2.410}$ |

Norm | Skew-norm | Student | Skew-Student | MEP | Skew-MEP | |
---|---|---|---|---|---|---|

5% VaR | ||||||

IND (K = 3) | $\underset{(0.272)}{1.204}$ | $\underset{(0.293)}{1.105}$ | $\underset{(0.332)}{0.940}$ | $\underset{(0.285)}{1.142}$ | $\underset{(0.358)}{0.843}$ | $\underset{(0.063)}{3.453}$ |

CC (K = 3) | $\underset{\left(\mathbf{0}.\mathbf{011}\right)}{8.972}$ | $\underset{\left(\mathbf{0}.\mathbf{027}\right)}{7.251}$ | $\underset{\left(\mathbf{0}.\mathbf{003}\right)}{11.451}$ | $\underset{(0.488)}{1.434}$ | $\underset{\left(\mathbf{0}.\mathbf{020}\right)}{7.779}$ | $\underset{(0.070)}{5.327}$ |

1% VaR | ||||||

IND (K = 3) | $\underset{(0.012)}{6.349}$ | $\underset{(0.161)}{1.965}$ | $\underset{(0.029)}{4.765}$ | $\underset{(0.256)}{1.288}$ | $\underset{(0.027)}{4.896}$ | $\underset{(0.072)}{3.234}$ |

CC (K = 3) | $\underset{\left(\mathbf{0}.\mathbf{000}\right)}{20.608}$ | $\underset{\left(\mathbf{0}.\mathbf{001}\right)}{14.237}$ | $\underset{\left(\mathbf{0}.\mathbf{001}\right)}{15.166}$ | $\underset{(0.488)}{1.436}$ | $\underset{\left(\mathbf{0}.\mathbf{000}\right)}{15.298}$ | $\underset{\left(0.058\right)}{5.671}$ |

Overall, adding more lags in the regression equation does not seem to crucially affect the outcome of the tests, as the new results are still in line with those obtained under $K=1$ in Table 4 and Table 5. The main difference registered for both univariate and multivariate approaches is in the outcome of the IND test, which now all models in the tables pass at the standard $5\%$ level for both the 5% and 1% VaR. Apparently, violations are not clustered in time for more than one or two lags.

As for the CC test, results concerning the multivariate models are basically the same obtained by setting $K=1$, while in the univariate case we observe an improvement in the performance of the models (excluding the normal) for the $1\%$ VaR.

Ultimately, univariate distributions featuring skewness and heavy tails confirm their predominance over the remaining alternatives in terms of better predictive ability. In the multivariate case this is particularly true for the skew-Student and the skew-MEP, as the skew-normal is not leading to remarkable improvements over the corresponding normal assumption.

## References

- R. Engle. “Autoregressive Conditional Heteroskedasticity with Estimates of United Kindgom Heteroskedasticity.” Econometrica 50 (1982): 987–1007. [Google Scholar] [CrossRef]
- T. Bollerslev. “Generalized Autoregressive Conditional Heteroskedasticity.” J. Econom. 31 (1986): 307–327. [Google Scholar] [CrossRef]
- T. Angelidis, A. Benos, and S. Degiannakis. “The Use of GARCH Models in VaR Estimation.” Stat. Methodol. 1 (2004): 105–128. [Google Scholar] [CrossRef]
- D.B. Nelson. “Conditional heteroskedasticity in asset returns: A new approach.” Econometrica 59 (1991): 347–370. [Google Scholar] [CrossRef]
- S. Mittnik, and M. Paolella. “Conditional density and value-at-risk prediction of Asian currency exchange rates.” J. Forecast. 19 (2000): 313–333. [Google Scholar] [CrossRef]
- Z. Ding, C.W. Granger, and R.F. Engle. “A long memory property of stock market returns and a new model.” J. Empir. Financ. 1 (1993): 83–106. [Google Scholar] [CrossRef]
- K. Kuester, S. Mittnik, and M. Paolella. “Value-at-risk prediction: A comparison of alternative strategies.” J. Financ. Econom. 4 (2006): 53–89. [Google Scholar] [CrossRef]
- F. Longin, and B. Solnik. “Is the correlation in international equity returns constant: 1960–1990? ” J. Int. Money Financ. 14 (1995): 3–26. [Google Scholar] [CrossRef]
- C. Brooks, and G. Persand. “Value-at-risk and market crashes.” J. Risk 2 (2000): 5–26. [Google Scholar]
- L. Bauwens, S. Laurent, and J. Rombouts. “Multivariate GARCH models: A Survey.” J. Appl. Econom. 21 (2006): 79–109. [Google Scholar] [CrossRef]
- P. Giot, and S. Laurent. “Value-at-risk for long and short trading positions.” J. Appl. Econom. 18 (2003): 641–663. [Google Scholar] [CrossRef]
- A.A. Santos, F.J. Nogales, and E. Ruiz. “Comparing univariate and multivariate models to forecast portfolio value-at-risk.” J. Financ. Econom. 11 (2013): 400–441. [Google Scholar] [CrossRef]
- L. Bauwens, and S. Laurent. “A New Class of Multivariate Skew Densities, with Application to GARCH Models.” J. Bus. Econ. Stat. 23 (2005): 346–354. [Google Scholar] [CrossRef]
- C. Ley, and D. Paindaveine. “Multivariate skewing mechanisms: A unified perspective based on the transformation approach.” Stat. Probab. Lett. 80 (2010): 1685–1694. [Google Scholar] [CrossRef]
- D. Noureldin, N. Shephard, and K. Sheppard. “Multivariate rotated ARCH models.” J. Econom. 179 (2014): 16–30. [Google Scholar] [CrossRef]
- J. Krause, and M.S. Paolella. “A fast, accurate method for value-at-risk and expected shortfall.” Econometrics 2 (2014): 98–122. [Google Scholar] [CrossRef]
- P.F. Christoffersen. “Evaluating interval forecasts.” Int. Econ. Rev. 39 (1998): 841–862. [Google Scholar] [CrossRef]
- E. Dumitrescu, C. Hurlin, and V. Pham. “Backtesting Value-at-Risk: From Dynamic Quantile to Dynamic Binary Tests.” Finance 33 (1995): 79–111. [Google Scholar]
- P.H. Kupiec. “Techniques for verifying the accuracy of risk measurement models.” J. Deriv. 3 (1995): 73–84. [Google Scholar] [CrossRef]
- R.F. Engle, and S. Manganelli. “CAViaR: Conditional autoregressive value at risk by regression quantiles.” J. Bus. Econ. Stat. 22 (2004): 367–381. [Google Scholar] [CrossRef]
- V. DeMiguel, L. Garlappi, and R. Uppal. “Optimal versus naive diversification: How inefficient is the 1/N portfolio strategy? ” Rev. Financ. Stud. 22 (2009): 1915–1953. [Google Scholar] [CrossRef]
- J. Tu, and G. Zhou. “Markowitz meets Talmud: A combination of sophisticated and naive diversification strategies.” J. Financ. Econ. 99 (2011): 204–215. [Google Scholar] [CrossRef]
- S.J. Brown, I. Hwang, and F. In. “Why Optimal Diversification Cannot Outperform Naive Diversification: Evidence from Tail Risk Exposure.” , 2013. [Google Scholar] [CrossRef]
- C. Fugazza, M. Guidolin, and G. Nicodano. “Equally Weighted vs. Long-Run Optimal Portfolios.” Eur. Financ. Manag. 21 (2014): 742–789. [Google Scholar] [CrossRef]
- L. Bauwens, W. Omrane, and E. Rengifo. Intra-Daily FX Optimal Portfolio Allocation. CORE Discussion Paper 2006/10; Louvain-la-Neuve, Belgium: CORE, 2006. [Google Scholar]
- R.F. Engle, and K.F. Kroner. “Multivariate simultaneous generalized ARCH.” Econometr. Theory 11 (1995): 122–150. [Google Scholar] [CrossRef]
- O. Barndorff-Nielsen. “Exponentially decreasing distributions for the logarithm of particle size.” Proc. R. Soc. Lond. A Math. Phys. Eng. Sci. 353 (1977): 401–419. [Google Scholar] [CrossRef]
- M.S. Paolella, and P. Polak. “COMFORT: A Common Market Factor Non-Gaussian Returns Model.” J. Econom. 187 (2015): 593–605. [Google Scholar] [CrossRef]
- C. Fernández, and M.F. Steel. “On Bayesian modeling of fat tails and skewness.” J. Am. Stat. Assoc. 93 (1998): 359–371. [Google Scholar]
- T. Bollerslev, and J. Wooldridge. “Quasi-Maximum Likelihood Estimation and Inference in Dynamic Models with Time-Varying Covariances.” Econom. Rev. 1 (1992): 143–172. [Google Scholar] [CrossRef]
- K. Fang, S. Kotz, and K. Ng. Symmetric Multivariate and Related Distributions. London, UK: Chapman and Hall, 1990. [Google Scholar]
- N. Solaro. “Random variate generation from multivariate exponential power distribution.” Stat. Appl. II 2 (2004): 25–44. [Google Scholar]
- D. Noureldin, N. Shephard, and K. Sheppard. “Multivariate high-frequency-based volatility (HEAVY) models.” J. Appl. Econom. 27 (2012): 907–933. [Google Scholar] [CrossRef]
- P. Christoffersen. “Evaluating Interval Forecasts.” Int. Eco. Rev. 39 (1998): 841–862. [Google Scholar] [CrossRef]
- P. Kupiec. “Techniques for Verifying the Accuracy of Risk Management Models.” J. Deriv. 23 (1995): 73–84. [Google Scholar] [CrossRef]
- S. Campbell. “A Review of Backtesting and Backtesting Procedures.” J. Risk 9 (2006): 1–17. [Google Scholar]
- R. Engle, and S. Manganelli. “CaViaR: Conditional Autoregressive Value at Risk by Regression Quantiles.” J. Bus. Econ. Stat. 22 (2004): 367–381. [Google Scholar] [CrossRef]
- P. Christofferson, and D. Pelletier. “Backtesting Value-at-Risk: A Duration-Based Approach.” J. Financ. Econom. 2 (2004): 84–108. [Google Scholar] [CrossRef]
- T. Pajhede. Backtesting Value-at-Risk: The generalized Markov Test. U. Copenhagen Economics Discussion Paper 15-18; Copenhagen, Denmark: University of Copenhagen, 2015. [Google Scholar]

^{1.}The interested reader is referred to Ley and Paindaveine [14] who provide a detailed overview of this approach.^{2.}We thank the authors for kindly providing us their MATLAB codes.^{3.}In Appendix D we provide a brief overwiew of the noncentral t distribution and its likelihood derivation.^{4.}In the univariate case it corresponds to $\mathit{\psi}={({a}_{1},{b}_{1})}^{\prime}$.^{5.}Downloaded from`http://realized.oxford-man.ox.ac.uk/data/download`.^{6.}We thank an anonymous referee who pointed out the possibility of better comparing the outcomes of the DQ and Christoffersen’s CC and IND tests by allowing the latter to be computed in an extended framework than the standard one described in Section 3.2. As results did not lead to significant improvements, this issue is briefly covered in Appendix E.

© 2016 by the authors; licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons by Attribution (CC-BY) license ( http://creativecommons.org/licenses/by/4.0/).