Value-at-Risk for South-East Asian Stock Markets: Stochastic Volatility vs. GARCH

: This study compares the performance of several methods to calculate the Value-at-Risk of the six main ASEAN stock markets. We use ﬁltered historical simulations, GARCH models, and stochastic volatility models. The out-of-sample performance is analyzed by various backtesting procedures. We ﬁnd that simpler models fail to produce sufﬁcient Value-at-Risk forecasts, which appears to stem from several econometric properties of the return distributions. With stochastic volatility models, we obtain better Value-at-Risk forecasts compared to GARCH. The quality varies over forecasting horizons and across markets. This indicates that, despite a regional proximity and homogeneity of the markets, index volatilities are driven by different factors.


Introduction
The members of the Association of Southeast Asian Nations (ASEAN) 1 already produce 3.43% of the worldwide Gross Domestic Product (GDP) in 2016 and even the economically smaller countries such as Vietnam are on the rise. The ASEAN-6 2 have an annual GDP growth from 2016 to 2017 of 4.91% and share 4.48% of the world's average annual GDP growth of 3.6%. 3 However, the crisis of Asian markets in 1997 shows that investors have to accept other risks than those in industrialized and developed western economies. The Asian crisis is characterized by an unparalleled contagion throughout the markets and extreme market and currency movements. While this crisis has almost only regional macroeconomic effects, consequences and lessons from it are drawn globally (Hunter et al. 1999). Another example for very high contagion in these markets are disruptions in the wake of the global financial crisis beginning in 2007, as Asian emerging markets do not offer diversification potential (Kenourgios and Dimitriou 2015). 1 The ASEAN consists of ten countries: Brunei, Cambodia, Indonesia, Laos, Malaysia, Myanmar, Philippines, Singapore, Thailand, and Vietnam.
During times of crises as well as during more moderate times of daily business, investors and portfolio managers face the challenge to properly estimate and model dispersion in market prices, formalized in its volatility or variance. Depending on the trading position, financial risk has to be determined for the long and short position. While the long trading position (e.g., having bought an asset to sell it at a later point) is concerned with falling prices or negative returns, the short trading position (e.g., short-selling an asset, i.e., borrow an asset and directly sell, to re-buy it at a later point to give it back to the owner) faces rising prices or positive returns (Giot and Laurent 2003). This is of particular importance if asymmetric distributions, such as the Skewed Student's-t distribution, are found to provide a better resemblance of the empirical price return distribution than symmetric distributions like the Normal or Student's-t distribution. In this work, we use the Value-at-Risk (VaR) as a measure for financial risk, which is determined by the volatility of an investment. Albeit the fact that VaR has been replaced by Expected Shortfall as the main tool to determine the minimum capital requirements for banks under the Basel framework, VaR is still in place for backtesting the internally used risk models (Basel Committee on Banking Supervision 2016). Here, we incorporate two competing classes of volatility models. Within the framework of Generalized Autoregressive Conditional Heteroskedasticity (GARCH) models (Bollerslev 1986;Engle 1982), we model volatility conditional on its past. This allows for including volatility clusters with periods of high and low market movements. In the next step, the Asymmetric Power ARCH (Ding et al. 1993) is applied to account to asymmetric news impact on volatility. Another so-called stylized fact is the long lasting dependence of shocks in a time series known as long memory. The Fractionally Integrated GARCH (FIGARCH, ) model is able to depict this pattern. The Fractionally Integrated Asymmetric Power ARCH (FIAPARCH) of Tse (1998) combines both long memory and asymmetry. The second class of models is based on the stochastic volatility (SV) model introduced by Taylor (1986). In addition to the standard SV model, we implement specifications that are able to depict the leverage effect as well as heavy tails. We use both classes to forecast the volatility over specific horizons based on estimates of a training window. With these variance forecasts, we then predict the VaR. These VaR predictions are evaluated and compared over different markets against standard approaches such as the non-parametric Historical Simulation (HS).
In this work, we focus on six major ASEAN stock market indices. Given the regional proximity and general similarity of these markets, we aim to understand if this also yields comparable variance properties. This would imply that methods of modeling and forecasting volatility as well as the VaR have comparable performances across the markets and that these markets could be grouped. While there is a plethora of literature on variance modeling for commodities, stock markets, and exchange rates of developed countries, academic advances on Asian stock indices and their comparison is relatively sparse. Walther (2017) identifies a sufficient performance of GARCH with a symmetric Student's-t distribution as well as FIAPARCH with a skewed Student's-t distribution in terms of variance and VaR forecasting for Vietnamese stock indices. Brooks and Persand (2003) show that asymmetric approaches work well to forecast the VaR for the Singapore and Thailand equity indices. So and Yu (2006) use different GARCH models to estimate the VaR in twelve different stock markets including Indonesia, Malaysia, Thailand, and Singapore. Su and Knowles (2006) perform a VaR analysis of Mixture Normal models on stock indices including Malaysia, Singapore, and Thailand. Lastly, McMillan and Kambouroudis (2009) and Sharma and Vipul (2015) provide large studies of different equity indices (including many ASEAN countries) for VaR forecasting.
Our results suggest that the volatility structures in the ASEAN markets is heterogeneous and include various so-called stylized facts. Hence, we observe that more sophisticated models provide better forecasts than standard approaches. However, given the different dynamics in the markets, we cannot conclude with one explicit model choice over all ASEAN equity markets.
The remainder of the paper is structured as follows: Section 2 presents methods of estimation of the volatility and forecasting and assessing the VaR. Section 3 provides the data basis. Section 4 presents the results and their discussion. Section 5 provides the conclusions.

Estimating Volatility
We incorporate two alternatives to calculate the daily volatility. The first model belongs to the family of Generalized Autoregressive Conditional Heteroskedasticity (GARCH) models originating from Engle (1982). Within the GARCH framework, the process for returns r t is formulated as where µ denotes the mean, the conditional variance is defined as h t = V (r t |F t−1 ), and the random variable z t follows a Skewed Student's-t distribution 4 with z t ∼ SkSt ν,ξ (0, 1) i.i.d. for all t = 1, . . . , n (Hansen 1994). Here, F t−1 is a sigma algebra containing all past information of returns and conditional volatilities up to time t − 1.
The distributional parameter for the Skewed Student's-t distribution, the degrees-of-freedom ν and the skewness ξ, are estimated along with the model parameters. For the conditional variance h t , we consider the GARCH(1,1) specification of Bollerslev (1986), which reads: A well-known characteristic of volatility is the negative correlation with returns, also known as the leverage effect (Black 1976;Christie 1982). In order to cope with this stylized fact, we implement the Asymmetric Power ARCH (APARCH, (Ding et al. 1993)), which is defined as: where γ ∈ (−1, 1) refers to the leverage parameter indicating whether negative or positive shocks have a larger impact on the daily volatility. For example, an estimated γ > 0 reveals that negative residuals increase the conditional volatility more than their positive equivalents, which is of particular interest for shocks. We include the Fractionally Integrated GARCH (FIGARCH, ) to cover the long memory effect. The standard FIGARCH(1,d,1) reads: where with the long memory parameter d and the lag operator L. To combine the leverage and the long memory effect, Tse (1998) proposed the Fractionally Integrated APARCH (FIAPARCH): Note that the ARCH(∞) representation in Equations (3) and (5) is carried out using the fast fractional differencing method of Klein and Walther (2017) with truncation lag 5000. All GARCH-type models introduced above are estimated with maximum-likelihood estimations (MLE), ensuring that non-negativity and stationarity conditions, if applicable, hold for each model. All parameter estimates and robust standard errors following Bollerslev and Wooldridge (1992) are available upon request.
As an alternative to the GARCH framework, we also consider the stochastic volatility framework. Stochastic volatility models belong to the family of state-space models (Sarkka 2013, ch. 4). The standard stochastic volatility (SV) model is introduced by Taylor (1986) as The SV model contains two noise processes, {z t } t and {η t } t , respectively accounting for the return shocks and the volatility shocks. In the SV model above, {z t } t and {η t } t are independent. Harvey and Shephard (1996) introduce a more general setting where the noise processes {z t } t and {η t } t are correlated as z t η t+1 instead of being independent as in Equation (8). The correlation coefficient ρ accounts for the leverage effect, defined as the negative correlation between shocks on return and volatility (i.e., ρ < 0). This model is called the asymmetric SV model or SV-leverage (SV-L) model. We consider a third stochastic volatility model where the return shocks {z t } t follow Student's t-distribution with ν degrees of freedom. It allows more extreme observations than with Gaussian return shocks as the Student's t-distribution has heavier tails. The volatility shocks {η t } t follow the standard Gaussian distribution. In this model, z t and η t are independent. It is referred to as the SV-t model.
We end up with three different stochastic volatility models: the SV model (Gaussian and independent shocks), the SV-L model (Gaussian and correlated shocks), and the SV-t model (t-distributed return shock, Gaussian volatility shock, independent shocks). In the three models, the parameters are estimated by Bayesian inference using the Markov Chain Monte Carlo (MCMC) sampling algorithms from Chan and Grant (2016). 5 Lastly, we consider the RiskMetrics approach, the historical simulation, as well as the semi-parametric filtered historical simulation, which we explain in detail in the next subsection. 5 We are thankful to Joshua Chan for providing the MatLab (MathWorks, Natick, Massachusetts, United States) code for estimating the stochastic volatility models on his personal webpage joshuachan.org.

Value-at-Risk Forecasting and Backtesting
One of the most important financial risk measures is the Value-at-Risk. The VaR is usually defined as a specific loss, which is not exceeded for a given probability (e.g., (1 − a)%).
When using GARCH models, the k-days ahead VaR forecast is simply derived by a k-steps forecast of the variance based on the estimated parameters at time T, h T+k = E (h T+k |F T ), which is then applied in the general VaR calculation scheme, yielding where µ T+k is the estimated mean, h T+k is the estimated conditional variance, and ν and ξ are the estimated distributional parameters from the training set 1, . . . , T. Q a (ν, ξ) denotes the a quantile function of the Skewed Student's-t distribution with degrees-of-freedom parameter ν and skewness ξ. Note that we only forecast the variance. Hence, in the VaR forecast, the quantile function depends on the estimated insample parameters, which are not forecasted separately. The calculation of the forecasted variance depends on the specific GARCH model. In this sense, there exists a closed form solution for the GARCH(1,1) k-days ahead forecast while, for the APARCH, FIGARCH, and FIAPARCH, the forecasts are calculated iteratively. 6 With stochastic volatility (SV, SV-L, SV-t) models, we approximate the conditional distribution of the returns at the forecast horizon given the observed returns non-parametrically, i.e., the conditional distribution of r T+k given r 1 , . . . , r T . We use particle filtering, a sequential Monte Carlo algorithm for state-space models (Sarkka 2013, chp. 7). The particle filter approximates the conditional distribution of the volatility h T given the returns r 1 , . . . , r T in the form of a sample of so-called "particles". This sample is propagated k times according to the volatility dynamics (Equation (7)), which is the same in the three stochastic volatility models. Then, from the volatility sample at time T + k, a return sample is generated according to the return model (Equation (6)). We compute the VaR by taking the empirical quantiles of this return sample.
For the RiskMetrics approach (Longerstaey and Spencer 1996)-also known as Exponentially Weighted Moving Average (EWMA)-we use the standard GARCH-like case with fixed parameters ω = 0.00, α = 0.06, and β = 0.94 for Equation (1). Since the RiskMetrics model is not stationary, we use the estimate h T for all k-days ahead forecasts.
Lastly, we derive VaR forecast non-parametrically by the Historical Simulation (HS) and the semi-parametric Filtered Historical Simulation (FHS). The former method takes the past 250 returns as possible scenarios of a future return distribution and the VaR is calculated from the empirical a-quantile of the past returns, i.e., For the FHS, we follow Barone-Adesi et al. (1999). The technique combines the aforementioned GARCH and HS. To calculate the VaR, we estimate the parameters for a GARCH model with Skewed Student's-t innovations over the whole insample. From the parameters, we derive a k-days ahead volatility forecast. Moreover, we calculate the empirical a-quantile from the most recent 250 standardized and centered GARCH residuals. The volatility forecast is then multiplied with the empirical quantile to estimate the VaR: An outline of forecasting conditional variance can be found in Klein and Walther (2016) and Walther (2017) for example. where is the standardized and centered GARCH residual.
To evaluate the quality of the VaR forecasts for the different models and classes, we use four different VaR tests: the regulatory traffic light test, the conditional coverage test, the multi-level unconditional coverage test, and the loss function based comparison. In what follows, we refer to VaR violations or exceptions for the cases, where r T+k < VaR T+k for the long trading position and where r T+k > VaR T+k for the short trading position.
The Basel traffic light backtest (Basel Committee on Banking Supervision 2016) sorts VaR test results in three different zones. The test uses the 1-day ahead a = 1% VaR for the last 250 trading days. A model is considered in the green zone, if four or less VaR violations occurred in that period. The yellow zone includes models yielding between five and nine exceptions. Lastly, the red zone covers all models with more than nine violations. The idea behind this color scheme is that the yellow zone is a buffer area for models that violate the VaR too often due to "bad luck" (type I error). Thus, banks only have to adjust their calculated minimum capital requirements by a fixed factor. However, models in the red zone are not allowed to be used; instead, the standard approach of the Basel framework has to be employed. Here, we calculate the traffic light test on a rolling time frame over the whole out-of-sample period and report how many days each model appears in the green, the yellow, or the red zone, respectively. Doing so, we gain a regulatory perspective of the results of the VaR forecasts.
In order to account for possible clustering of VaR violations, we include the conditional coverage test proposed by Christoffersen (1998). The test combines the unconditional coverage test with a test for the independence of VaR exceptions. Independence is assumed if the VaR violations do not follow a first order Markov chain. Thus, the test procedure penalizes models not only for an undesirable amount of violations, but also for not adjusting quickly after an exception occurred. Unfortunately, the test only evaluates a certain quantile and not the whole tail of the distribution.
The multi-level coverage test of Pérignon and Smith (2008) resembles a joint unconditional coverage test for three VaR levels at a = 1%, 2.5%, and 5%. Thus, the test is able to evaluate the whole tail in one single test. The test compares the actual coverage ratio (number of VaR violations to length of observation period) with the preferred one (i.e., the VaR level a) based on a likelihood ratio test. Hence, only the absolute number of VaR violations is important to that test and it penalizes too conservative models as well as too optimistic models. However, the test is not designed to cope with clustering of VaR violations.
The outcome of the two presented backtests can only decided whether a particular model pass the requirements of being in the admired VaR coverage zone and not having clustered violations. Nevertheless, the backtests cannot be used to compare the VaR forecasts among a given set of models. Therefore, we incorporate a loss function based comparison. Here, we follow the idea of Angelidis and Degiannakis (2007). The authors suggest a two-stage approach: (1) all models are tested with a backtest such as the conditional coverage test; and (2) for the models that pass this test, the following VaR loss function suggested by Lopez (1998) is used: The results of the loss functions for each model are compared with the Superior Predictive Ability test by Hansen (2005). We deviate from this procedure by using the Model Confidence Set (MCS, (Hansen et al. 2011)) in place of the Superior Predictive Ability test. The MCS yields a set of models of equally predictive ability. Thus, this procedure allows us to directly compare those models that pass the first-stage backtests.

Data
For our analysis of the main ASEAN financial markets, we include six country stock market indices. We choose the Indonesian Jakarta Stock Exchange Composite Index (JCI), the Kuala Lumpur Stock Exchange (KLSE) of Malaysia, the Philippines Stock Exchange PSEI Index (PCOMP), the Stock Exchange of Thailand (SET), the Singapore Strait's Time Index (STI), and the Vietnam Ho Chi Minh Stock Index (VNI). Hence, we exclude the smaller stock markets of Myanmar, Cambodia, Laos, and Brunei from our analysis. The data is retrieved from Bloomberg in USD denominations for the period from 1 July 2006 to 30 June 2017. The period is chosen such that we obtain an equal number of observations of around n = 2700 for all indices accounting for individual holidays. We note that the VNI has some zero volume trading days before our chosen period. We calculate the daily returns of the stock indices by logarithmic price differences. For the forecasting exercise, we use the in-sample data from 1 July 2006 to 30 June 2017. This leaves us with an out-of-sample period from 1 July 2006 to 30 June 2017 and six years of 1-, 5-, and 20-days ahead forecasts for each index.
Descriptive statistics, provided in Table 1, show evidence that the empirical distributions of the index returns are of leptokurtic shape, indicated by an increased excess kurtosis. The JCI has the highest kurtosis of 12.54 while the VNI features the lowest at 4.46. Moreover, all return series are skewed to the left; the series' distributions have large negative returns with a higher probability compared to their positive counterpart. In comparison to indices of developed countries and global benchmarks, the empirical moments are quite extreme for indices, in particular the kurtosis, suggesting less diversification effects within each index. This highlights the relatively high risks associated with investing in these markets. In addition to the non-normal appearance of moments, we test for autocorrelation in the return series. The Ljung-Box (LB) test and the ARCH test both reject the hypothesis of no autocorrelation in the returns. The Augmented Dickey-Fuller (ADF) test rejects the hypothesis of a unit root in the return series. Based on these results, we assume the underlying distribution for the GARCH framework as Skewed Student's-t. This distribution choice over alternatives such as the Normal or symmetric Student's-t distribution ensures that we cover heavy tails and skewness found in the series, which impacts the parameter estimation and forecasting exercise. The series are depicted in Figure 1.

Results and Discussion
In this section, we compare the VaR forecast results for the six ASEAN equity indices. Therefore, we first present the results for each index individually and compare the findings afterwards. For each model, we estimate 1-, 5-, and 20-days ahead predictions that correspond to forecasts a day, a week, and a month ahead. Note that we do not forecast the VaR for the whole period, but for a certain point in the future. The results are presented in Tables 2-7.
We start our analysis with the results from the Indonesian JCI (Table 2). The traffic light test does not find any of the models to be in the red zone. Moreover, we see that the GARCH is only present in the green zone, i.e., it never has no more than four violations in the whole out-of-sample for the long and the short trading position. However, the GARCH model does not pass the conditional coverage test by Christoffersen (1998) or the multilevel Pérignon and Smith (2008) test at any forecast horizon. There are a number of models that are able to depict the VaR at all quantiles under consideration for the 1-day ahead forecast on the long trading position: FHS, FIGARCH, SV, and SV-L, but only FIGARCH also shows the same ability on the short trading position. Its loss functions are satisfactory, but FIGARCH only belongs to the best performing models at the 2.5% quantile. The generally good performance of this model hints toward an elevated shock persistence in volatility. Regarding higher forecast horizon, it is only HS, which depicts good performance for all horizons on both trading sides with respect to the multilevel coverage test. The fact that HS is not able to pass the conditional coverage test may indicate that the model tends to build clustered violations, which is not covered by the multilevel coverage test.
The second equity index we analyze is the Malaysian KLSE (Table 3). Here, three models fail the regulatory traffic light test. While RiskMetrics has several days in the red zone of the long trading position, the HS and FHS models are included in the red zone for the short trading positions. Moreover, we observe some asymmetric behavior. RiskMetrics also completely fails to meet the criteria from the coverage test for the long trading position. However, it passes all tests for the short trading position. Almost the same behavior is observed for FIGARCH, with only exception for the 2.5% VaR of the conditional coverage test of Christoffersen (1998). On the long trading position, APARCH archives good results especially for 1-day ahead predictions. This suggests that both asymmetric news impact and long memory are present in this market's volatility, which is further underlined by the performance of FIAPARCH. All stochastic volatility models pass the multilevel coverage test for the long trading position.
Next, we compare the results from the Philippine PCOMP index (Table 4). No model appears in the red zone of the traffic light test and thus they could be used without being replaced by the regulator. Here, we find five complete failures of models regarding the two statistical coverage tests. Neither RiskMetrics for the long trading position, nor GARCH or any stochastic volatility specification for the short trading position pass any of the tests. In addition, the two asymmetric GARCH models seem to have problems with the specific dynamics of the PCOMP index. Both APARCH and FIAPARCH perform very poorly with only a few passed tests. Generally, our model set does not include a clear candidate to be preferred in terms of VaR prediction performance. However, the HS and the FHS deliver the most promising results with respect to the multilevel test and the corresponding loss function results.   Christoffersen (1998) test the 1-, 5-, and 20-days ahead forecasts results are reported for the a = 1%, 2.5%, and 5% VaR. If a specific test does not reject the null hypothesis, we present the corresponding VaR loss function result. − indicates that the null hypothesis is rejected at least at 10% level of significance. Bold faced loss functions represent the inclusion in the Model Confidence Set (Hansen et al. 2011) with level of significant 10% and 10,000 bootstraps. The test by Pérignon and Smith (2008) is reported in a similar manner, except for the fact that the three VaR levels (1%, 2.5%, and 5%) are tested jointly.
Numbers under the Basel traffic light test indicate the number of days in the green/yellow/red zone for a 250 rolling trading day window with 1-day ahead VaR forecasts at a = 1%. For the Christoffersen (1998) test the 1-, 5-, and 20-days ahead forecasts results are reported for the a = 1%, 2.5%, and 5% VaR. If a specific test does not reject the null hypothesis, we present the corresponding VaR loss function result. − indicates that the null hypothesis is rejected at least at 10% level of significance. Bold faced loss functions represent the inclusion in the Model Confidence Set (Hansen et al. 2011) with level of significant 10% and 10,000 bootstraps. The test by Pérignon and Smith (2008) is reported in a similar manner, except for the fact that the three VaR levels (1%, 2.5%, and 5%) are tested jointly.
Numbers under the Basel traffic light test indicate the number of days in the green/yellow/red zone for a 250 rolling trading day window with 1-day ahead VaR forecasts at a = 1%. For the Christoffersen (1998) test the 1-, 5-, and 20-days ahead forecasts results are reported for the a = 1%, 2.5%, and 5% VaR. If a specific test does not reject the null hypothesis, we present the corresponding VaR loss function result. − indicates that the null hypothesis is rejected at least at 10% level of significance. Bold faced loss functions represent the inclusion in the Model Confidence Set (Hansen et al. 2011) with level of significant 10% and 10,000 bootstraps. The test by Pérignon and Smith (2008) is reported in a similar manner, except for the fact that the three VaR levels (1%, 2.5%, and 5%) are tested jointly. Table 5 presents the results from the VaR backtests for the SET, traded in Bangkok. From the traffic light test, it becomes apparent that RiskMetrics and FHS lead to several days in the red zone for the long trading position. The general result for SET is that most models can cope with the long trading position to some extend, but completely fail to depict the dynamics on the short trading position. The results from HS suggest that it can be used for 1-day ahead predictions for the long trading position. Even though it is not rejected by the unconditional coverage tests, it has problems to avoid clustering of the VaR violations. This behavior is illustrated in Figure 2. It shows the slow reaction to VaR violations on the short trading position and the overall good coverage for 1% VaR forecasts. Additionally, SV-t shows somewhat good performance on the long trading position, which is reflected in the fact that it passes all multilevel tests and belongs to the set of the best models for 5-and 20-days ahead. For both trading positions however, only FIAPARCH and HS have good results with respect to the multilevel test at least. Hence, we conclude that both asymmetry and long memory play an important role in the variance dynamics of SET, indicating that variance shocks have an extended persistence which is of asymmetric shape, however. The STI from Singapore provides interesting results. From Table 6, we find that RiskMetrics (long) and FHS (short) are included in the red zone of the Basel traffic light test. Consequently, the models would be replaced by the regulatory standard approach and the institution would be penalized with a higher factor on the minimum capital requirements accounting for the bad model choice. Interestingly, RiskMetrics shows a good performance on the short trading side, where it passes most of the tests. The worst results are achieved by the GARCH model, which fails all tests, even though it stays in the green zone over the whole out-of-sample period. The stochastic volatility models show good performance on the long trading position but cannot provide equally good results on the short trading position. APARCH provides very good results for both trading positions regarding 1-day ahead forecasts, which indicates that asymmetries play an important role in the STI return structure.
Lastly, we compare the forecasting results for the Vietnamese equity index VNI (Table 7). Within our model set, which includes very widely used VaR estimation procedures, only the SV model provides an average to good performance. All other models are either in the red zone (RiskMetrics, FHS, FIGARCH) or fail most of the statistical coverage tests (GARCH, APARCH, FIAPARCH). The models in the red zone, however, are only included for one trading side. For the long trading position, FIGARCH and FHS provide good results from the coverage tests. The stochastic volatility models show very good performance for 1-and 5-days ahead predictions on the long trading position and belong to the model confidence set at every test they pass.
Finally, we compare all results from the six different equity indices of Indonesia, Malaysia, Philippines, Singapore, Thailand, and Vietnam. GARCH seems to be regulator's darling with respect to the traffic light characterization. Although popular, it fails almost all statistical coverage tests, while, for most indices, it is 100% of the time in the green zone of the traffic light test. This indicates that the model yields too conservative VaR forecast, which would result in particularly high minimum capital requirements. In addition, the very popular RiskMetrics model shows poor performance. McMillan and Kambouroudis (2009) concludes that the model only performs well in small markets and high VaR quantiles. Our findings suggest that the selected six markets in this paper may already be too big for the RiskMetrics approach. Interestingly, the HS is rarely rejected by the multilevel coverage test, i.e., regardless of the specific forecast horizon, it provides sufficient coverage ratios. However, it is not able to provide satisfying results for the conditional coverage test in all indices. This might be due to the slow adaption of shocks resulting in clustering of violations. The SV model specifications provide a framework with a good overall performance at all markets, but only on the long trading position. However, especially for shorter forecast horizons, the SV models belong to the model confidence sets.
Comparing the model performance for each index, we find evidence that the markets in our analyzed group are heterogeneous with respect to their volatility properties. For example, STI is dominated by asymmetric effects and long memory models are rejected by our coverage tests while, for the VNI, long memory models show a good forecasting performance and asymmetric models are rejected.
Numbers under the Basel traffic light test indicate the number of days in the green/yellow/red zone for a 250 rolling trading day window with 1-day ahead VaR forecasts at a = 1%. For the Christoffersen (1998) test the 1-, 5-, and 20-days ahead forecasts results are reported for the a = 1%, 2.5%, and 5% VaR. If a specific test does not reject the null hypothesis, we present the corresponding VaR loss function result. − indicates that the null hypothesis is rejected at least at 10% level of significance. Bold faced loss functions represent the inclusion in the Model Confidence Set (Hansen et al. 2011) with level of significant 10% and 10,000 bootstraps. The test by Pérignon and Smith (2008) is reported in a similar manner, except for the fact that the three VaR levels (1%, 2.5%, and 5%) are tested jointly.
Numbers under the Basel traffic light test indicate the number of days in the green/yellow/red zone for a 250 rolling trading day window with 1-day ahead VaR forecasts at a = 1%. For the Christoffersen (1998) test the 1-, 5-, and 20-days ahead forecasts results are reported for the a = 1%, 2.5%, and 5% VaR. If a specific test does not reject the null hypothesis, we present the corresponding VaR loss function result. − indicates that the null hypothesis is rejected at least at 10% level of significance. Bold faced loss functions represent the inclusion in the Model Confidence Set (Hansen et al. 2011) with level of significant 10% and 10,000 bootstraps. The test by Pérignon and Smith (2008) is reported in a similar manner, except for the fact that the three VaR levels (1%, 2.5%, and 5%) are tested jointly. Table 7. Value-at-Risk backtest results for VNI returns.

Conclusions
We compare the forecasting performance of different GARCH-type and Stochastic Volatility models as well as non-and semi-parametric approaches in terms of the widely-used Value-at-Risk measure. We obtain results that are not consistent across markets as well as trading positions. The results imply that, for the long and short trading positions, different forecasting methods should be implemented. Adding to this inconsistency, we find that, for different ASEAN stock indices, the model performances vary, indicating that the markets volatility might be driven by different factors. The simple GARCH and the RiskMetrics framework provide insufficient forecasts in terms of coverage and clustering. With only a few exceptions, the two models fail for all forecasting horizons and for all markets. This is a clear indication that the index volatilities should not be modeled by short memory and symmetric processes. Long memory models with or without asymmetric news impact, such as the FIGARCH, APARCH, or FIAPARCH, are potent alternatives.
Given the significant skewness in the empirical returns, skewed distributions driving the volatility processes are suggested. The Historical Simulation appears to be superior over its filtered extension and provides reasonably good results for the multilevel unconditional coverage test. With Stochastic Volatility models, we improve the quality of some forecasts. In general, we obtain better results for shorter horizons. In addition, there is no clear pattern in the failure rate of the unconditional and conditional coverage tests. Interestingly, for the stochastic volatility framework, we achieve a good overall VaR coverage, which is, however, clustered for most markets across the forecasting horizons. The clustering might be caused by periods of extreme market movements paired with only a minor reaction of the volatility models.
In summary, the results show that simple volatility models do not provide VaR forecasts of practical value and that more sophisticated models, which cover different stylized facts, are needed to properly quantify financial risk on the long and short side for ASEAN stock market indices. Moreover, we conclude that, despite their regional proximity and homogeneity of the markets, the stock index volatilities of the biggest ASEAN markets are driven by different factors. This needs to be addressed in further research.