Volatility Forecast in Crises and Expansions

We build a discrete-time non-linear model for volatility forecasting purposes. This model belongs to the class of threshold-autoregressive models, where changes in regimes are governed by past returns. The ability to capture changes in volatility regimes and using more accurate volatility measures allow outperforming other benchmark models, such as linear heterogeneous autoregressive model and GARCH specifications. Finally, we show how to derive closed-form expression for multiple-step-ahead forecasting by exploiting information about the conditional distribution of returns.


Introduction
Volatility plays an important role in financial econometrics.Measuring, modelling and forecasting financial volatility are essential for risk management purposes, portfolio allocation and option pricing.Although returns remain unpredictable, their second moment can be forecasted quite accurately, which generated a lot of research during the last thirty years motivated by Engle's seminal paper [1].The existing literature aiming to model and forecast financial volatility can be divided into two distinct groups: parametric and non-parametric models.The former assumes a specific functional form for volatility and models it as a function of observable variables, such as ARCH or GARCH models [1][2][3], or as a known function of latent variables resulting in stochastic volatility models [4,5].
The second class defines financial volatility without imposing any parametric assumptions hence called realized volatility models [6].The main idea of the latter models is to construct consistent estimators for the unobserved integrated volatility by summing the squared returns over a very short period within a fixed time span, typically one day.The availability of high-frequency data allows high precision estimation of the continuous time pure diffusion processes given the large datasets of discrete observations.As a result, volatility essentially becomes observable and, in the absence of microstructure noise, can be consistently estimated by a realized volatility measure.This approach has two main benefits compared with GARCH and stochastic volatility models.First, researchers can treat volatility as observable and model it by applying a time series technique, for example ARFIMA or autoregressive fractionally integrated moving average models [6].Second, realized volatility models significantly outperform models based on lower frequency (daily data) in terms of forecasting power; see, e.g., [7][8][9].Indeed, the latter models adapt new information and update the volatility forecast at a slower daily frequency, while the former models can incorporate changes in volatility faster due to the more frequent arrival of intraday information.
Although the literature proposes many different approaches for modelling volatility, there is still no unique model that explains all of the stylized facts simultaneously.In particular, there is no consensus on how to model long memory, since there are at least four approaches: the non-linear model with regime switching [9]; the linear fractionally-integrated process [10]; the mixture of heterogeneous run information arrivals [11]; and the aggregation of short memory stationary series [12].Numerous methods have been developed, since it is hard to distinguish between unit root and structural break data generating processes [13,14].[15] show that structural break models can outperform the long memory model if the timing and sizes of future breaks are known.Although few academics and practitioners accurately predicted the timing of the recent financial crises and European sovereign debt turmoil, a model with structural breaks seems to be more economically plausible than a fractionally-integrated long memory model.In addition, [15] recommend relying on economic intuition to choose between smooth transition auto regressive models (STAR) and abrupt structural break models.
In this paper, we extend the heterogeneous autoregressive model proposed by [16] to take into account different regimes of volatility.The resulting model is called a non-linear threshold autoregression model, where regimes are governed by an exogenous trigger variable.This model provides a better fit of the robust measure of realized volatility for both in-sample data and out-of-sample forecasting.In addition to an improved performance in particular samples, a non-linear model also produces superior multiple-step-ahead forecasts in population according to the Giacomini and White test [17].We also show that the superior performance of a non-linear model is achieved during periods of high volatility.This is especially important during times of financial crises, when investors are in particular need of more accurate forecasts.Finally, we derive a closed form expression for multiple-step-ahead forecast, where the past returns govern changes in volatility regimes.
Our paper finds that changes in the volatility regimes occur when return exceeds a −1% threshold, which is in line with previous findings [9,18].However, our model differs in terms of the estimation procedure and the most recent dataset that includes financial crises.In fact, the superior performance of a non-linear model becomes particularly significant during periods of elevated volatility, such as recent financial crises.More importantly, we a derive a closed-form expression of multiple-step-ahead forecasts, whereas other authors either focus on one-step ahead forecasts [9] or using conditional simulations [18].
The remainder of this paper is organized as follows.The non-linear threshold model for realized volatility is defined in Section 2. Section 3 describes preliminary data analysis and estimation results for the S&P 500 index.Section 4 describes one and multiple-step-ahead forecasts.Finally, Section 5 concludes and provides directions for future work.

Model
In this section, we introduce two building blocks: the heterogeneous autoregressive model and the regime switching model.Then, we describe the econometric framework designed for the estimation and inference of our threshold autoregressive model.Finally, we discuss the forecasting of our model and how to derive a closed form expression for its multiple-days-ahead forecasts.

HAR-RV Model with Regime Switching
In this section, we discuss extensions of the heterogeneous autoregressive model (HAR) of realized volatility proposed in [16].First, let us assume that returns follow a continuous diffusion process: where p(t) is the logarithm of instantaneous price, µ(t) is continuous with a finite variation mean process, σ(t) is instantaneous volatility and W (t) is standard Brownian motion.Given the process in (1), the integrated variance corresponding to day t is defined as: Several authors show that as sampling frequency increases, integrated volatility IV d t can be approximated by realized variance defined as a sum of the intraday squared returns [6,19,20].In essence, volatility becomes observable and can be forecasted using time series techniques.
The presence of market microstructure noise makes realized variance inconsistent and is a biased estimator of true volatility.Therefore, we use the realized kernel estimator developed in [21], which remains consistent under the presence of market microstructure noise.The realized kernel RK K,δ is an estimator of latent realized variance and is defined as follows: where is a weight function and p i,t is i-th intra-daily log price sampled at frequency δ and recorded at day t.In other words, i = 1, ..., n(δ) and n(δ) = n seconds /δ, where n seconds is the number of seconds during the trading day.Thus, the realized kernel is similar to the HAC (heteroskedasticity and autocorrelation consistent covariance matrix) estimator of the variance-covariance matrix for some stationary time series.Throughout this paper, realized variance will equal the realized kernel measure defined in Equation ( 3).
The realized kernel has several advantages over other high-frequency proxies of latent volatility.First, [22] show that the realized kernel performs better (in terms of forecasting value-at-risk) than other high-frequency measures, including realized volatility, bi-power realized volatility, two-scales realized volatility and daily range.Second, the realized kernel is a consistent estimator of latent variance, which is robust to the market microstructure noise.
The heterogeneous autoregressive model is able to replicate the majority of stylized facts observed in data: fat tails, volatility clustering and long memory.In particular, HAR is able to generate hyperbolic decays in the autocorrelation function in a parsimonious way due to the volatility cascade property, despite the fact that this model does not belong to the class of long memory models.This model is based on the heterogeneous market hypothesis [23] , which implies that lower frequency volatility (weekly) affects higher frequency volatility (daily), but not vice versa: where RV d t , RV w t and RV m t are daily, weekly and monthly realized variance, respectively, at period t.The lower frequency, for example weekly, realized variance is computed as: Similarly, the monthly realized variance is computed as the average of daily variances over 22 days.Although the HAR model is able to capture long memory and volatility clustering, it cannot explain abrupt changes in regimes.Indeed, recent subprime mortgage crises, European debt turmoil and a number of other financial calamities led to significantly different behaviour in the dynamics of the realized volatility during "good" and "bad" times, as we will discuss in Section 3. Therefore, we propose to extend the benchmark HAR model and allow the possibility of multiple regimes, governed by either endogenous or exogenous variables.We define the threshold HAR model with two regimes as follows: where T t−l is a trigger variable with some lag l and τ is the value of a threshold.In this paper, we consider only observable triggers, including returns and the realized kernel.

Econometric
Framework for the Non-Linear Model

Estimation
Next, we present the econometric techniques designed to model non-linear dynamics of time series: the self-exciting threshold autoregressive (SETAR) model and the threshold autoregressive (TAR) model introduced by [24] and [25].The main difference between these models is that the trigger variable can be either exogenous (TAR model) or endogenous (SETAR model).The TAR(m) model, where m denotes the number of regimes, is defined as follows: where Y t+1 is a univariate time series, X t = (1, Y t , ..., Y t−p ) (p + 1) × 1 vector, τ = (τ 1 , ..., τ m−1 ) and τ 1 < τ 2 < ... < τ m−1 , 1 j,t (τ, l) = 1(τ j−1 ≤ T t−l < τ j ), 1(•) is an indicator function and T t−l is a threshold variable.Let us assume that τ 0 = −∞ and τ m = ∞, while the error term t+1 is conditionally independent on information set I t and has a finite second moment: In particular, if variable Y t+1 follows the TAR(2) process, then the model ( 7) becomes: Recall that Model (9) nests a non-linear HAR specification (6) if we put constraints on the corresponding AR (22) model in each regime.Now, define the vector of all parameters of Model ( 9) as θ = (θ 1 , θ 2 , ..., θ m , τ , l) .Under Assumption (8), the estimation of the TAR(m) model is performed using a non-linear least squares approach: Here, the minimization can be done sequentially.In particular, θ = (θ 1 , ..., θ m ) can be computed through OLS regression of Y on X(τ, l) for fixed parameters d and τ : where Y is the T x1 vector consisting of observations of Y t+1 , while X(τ, l) is the T x4m matrix with t-th row X t (τ, l): Now, let us assume for simplicity that the non-linear model has only two regimes or m = 2. Thus, two parameters τ and l can be estimated through minimization of the residual sum of squared errors S(τ, l): (τ , l) = arg min where S(τ, l) = Y − X(τ, l) θ(τ, l) Y − X(τ, l) θ(τ, l) .The minimization can be performed through a grid search, while noting that l is discrete.We follow [26] approach, which allows speeding up the minimization algorithm.In particular, he recommends eliminating the smallest and largest quantiles for the threshold variable in the grid search.This elimination does not only reduce the computational time, but also serves as a necessary condition for having enough observation in each regime.Indeed, asymptotic theory places additional constraints on the optimal threshold level, such that n j T ≥ τ as n → ∞.Although, there is no clear procedure for how to optimally choose τ , [26] recommends to use a 10% quantile for the cut-off procedure.

Testing for Non-Linearity
We start by discussing the testing of the linear model or TAR(1) against the non-linear model or TAR(m), where m > 1.Under the null hypothesis, all parameters θ 1 , ..., θ m should be the same: Since the threshold parameter is not identified under the null hypothesis, the classical tests have a non-standard distribution.This problem is called "Davies' problem" due to [27,28].[26,29] overcomes this problem by using empirical process theory and derived the limiting distribution of the main statistics of interest F jk : where S j and S k are the sum of squared residuals and k > j.Computation of the asymptotic distribution is not straightforward, but might be faster than a bootstrap calculation.Although the literature does not assess the performance of the asymptotic against the bootstrap distribution in the context of SETAR models, [30] show that the bootstrap technique performs better in the AR(1) context with Andrews structural change test [31].Thus, we use the following bootstrap algorithm for testing the linear model against the non-linear TAR(2) model: 1. Draw residuals with replacement from the linear TAR(1) model.
3. Estimate the TAR(1) and TAR(2) models on the "fake" dataset.The algorithm in ( 1)-( 7) can be used to evaluate the distribution of F 12 under the assumption of either homoscedastic or heteroscedastic errors.We compute the bootstrap p-value under the latter assumption, since the residuals of Model (4) are heteroscedastic.This is in line with the literature [32].These diagnostic tests are available upon request.

Testing for Remaining Non-Linearity
The testing for remaining non-linearity is an important diagnostic check for the TAR (m) model.One way to address this question is to test whether the presence of the additional regime is statistically significant or not.This test relies on the aforementioned algorithm, while the bootstrap p-value is computed for statistics F jj+1 , where j > 1.

Asymptotic Distribution of the Threshold Parameter
The existing literature documents that the distribution of the parameter τ is non-standard if the threshold effect is significant [26,33].[29,34] derives an asymptotic distribution of likelihood ratio statistics: where S 1 (τ ) is the residual sum of squares given parameter τ and σ2 is the variance of residuals of the TAR(2) model and equals S 1 (τ ) T −4 .Moreover, [29,34] shows that the confidence interval for the threshold parameter is obtained by inverting the distribution function of a limiting random variable.In other words, the null hypothesis Alternatively, the confidence interval for the threshold parameter is formed as an area where LR 1 (τ ) ≤ c(α) and is called the "no-rejection region".We have to interpret the confidence interval for threshold parameter τ with caution, since it is typically conservative [26,29].However, the ultimate test of our non-linear model is the ability to produce superior out-of-sample forecasts, which requires a tight confidence interval for the threshold parameter.We provide more discussion on page 15.
Although estimates θ1 , ..., θm depend on the threshold parameter τ , the asymptotic distribution remains the same as in the linear model case, since estimate τ is super-consistent [35].[33] and [26] prove that dependency on the threshold parameter is not of first order asymptotic importance, thus the confidence interval for θ can be constructed as if τ is a known parameter.

Stationarity
The stationarity conditions for our TAR(2) model are not easily derived, and in general, not much is known about this property for non-linear models with heteroskedastic errors-see the discussion in [35] (pp.79-80).The literature does propose sufficient conditions for a restricted class of non-linear models and typically for models with homoscedastic errors.In particular, [36] consider SETAR(2) specification with the AR(1) model in both regimes, while [37] establish necessary and sufficient conditions for the existence of a stationary distribution for TAR(2) and SETAR(2) models with the AR(1) process.
In contrast, our model has a richer structure within each regime, since the HAR model is a restricted version of the AR (22) process.Because of this richer structure within each regime and because neither self-exciting nor exogenous thresholds are used, it is not possible to use the results from [36] and [37] to prove stationarity.In addition, our residuals exhibit volatility clustering, and because of the heteroscedastic errors, it is not possible to exploit the necessary and sufficient conditions for strict stationarity, even for the simple HAR model derived by [9].The diagnostic checks show that this assumption does not hold.
In conclusion, as is the case in much empirical work, we have to make a trade-off between the flexibility of the model and the analytical tractability of stationarity conditions.In this paper, we choose to design a model aiming at providing more accurate volatility forecasts, and we leave the question of stationarity for future work.

One-Step-Ahead Forecast
We assess the forecasting performance of various models by computing the one-step-ahead forecast of the realized volatility measured by the square root of the realized kernel.These forecasts are computed through rolling window estimation.First, the parameters of the model are estimated using an in-sample set, and then the one-step-ahead forecast is computed.Second, the rolling window is moved by one period ahead; the most distant observation is dropped, and the parameters of the model are re-estimated, while the threshold parameter τ and optimal lag l are kept time invariant.Finally, the one-step-ahead forecast is computed again.
We use the root mean square error (RMSE) and the mean absolute error (MAE) to compare the forecast performance of four models: where Y t+1|t is the one-step-ahead conditional forecast of the daily realized volatility computed based on the rolling window for one of the four models and Y t+1 is the daily realized volatility at period t + 1.In addition, we compute R 2 of the following Mincer-Zarnowitz regression: Finally, we investigate the forecasting performance of different models in population using the Giacomini and White (GW) test [17].The GW test fits nicely in our framework due to the following reasons.First, it does not favour models that overfit in-sample, but have high estimation errors.Second, this test is designed to compare not only unconditional, but conditional forecasts, as well.Finally, the GW test works with rolling window forecasts, where in-sample size is fixed, while out-of-sample size is growing.

Conditional Distribution of Returns
In this section, we discuss multiple-step-ahead forecasts for aggregate volatility over periods of five and 10 days.The extension of the multiple-step-ahead forecast to the linear model is straightforward, while the non-linear model has one important problem.We describe formulas used to compute the multiple-step-ahead forecast for the HAR, GARCH(1,1) and GJR-GARCH(1,1) (proposed by [38]) models in Appendix A. In particular, the one-step-ahead forecast remains the same for both non-linear and linear cases, while the two-step-ahead one is different: where I t is the information set available at period t, F is a non-linear function, θ is a vector of estimates and Y t is the realized volatility at period t.Equation (19) illustrates the main problem related to non-linear model: the expected value of a non-linear function differs from the value of a non-linear function evaluated at the expected value.In the literature, several methods have been proposed for the computation of the multiple-step-ahead forecast, including conditional simulations in [18].However, we choose a different strategy and derive a closed form solution for the multiple-step forecast.Specifically, we follow an approach similar to [39] and [40] to derive the conditional distribution of returns.Given the diffusion process (1), the standardized returns should follow a normal distribution: where I t = F (r t , r t−1 , ...) is information at the period t set generated by the history of returns and µ N is the mean of standardized returns, and µ N and σ 2 N should be close to zero and one, correspondingly.See Table B1 in Appendix B for details.Meanwhile, the conditional distribution of realized volatility is closely approximated by the inverse Gaussian distribution with the following density function: where σ t+1 is a conditional mean and λ IG is a shape parameter of the inverse Gaussian distribution.The conditional mean is assumed to be filtered from the non-linear TAR(2) model as follows: Combining Equations ( 20) and ( 21), the conditional distribution of returns becomes a normal-inverse Gaussian distribution (NIG) with the probability density function computed as: The NIG distribution provides a relatively accurate fit of the unconditional distribution of returns (see Appendix B for details).Having the distributional assumption for returns, Theorem 1 demonstrates how to obtain the closed form expression for the multiple-step ahead forecast of the realized volatility.
In essence, Formula ( 24) is similar to the multiple-step-ahead forecast of the GJR-GARCH(1,1) model -see Appendix A for details.However, the TAR model has an additional flexibility, since probability π t is time varying, while GJR-GARCH assumes that the corresponding probability equals to 0.5.To facilitate comparison between these two models, we compute the unconditional probability of a high volatility regime occurring based on the NIG distribution (23) and from returns data.Here, the probability equals the frequency of returns occurring, which is lower than the threshold value.The results show a close match between these two methods: 11.3% (NIG) vs. 13.2%(historical returns) for in-sample data.
Finally, we describe the multiple-step-ahead forecast using the rolling window approach.First, the parameters of the model are estimated using in-sample data, and probability π t is computed.Second, multiple-step-ahead forecasts for the TAR model are calculated based on Expression (24), while π t remains constant.Probability π t can be computed for each step of forecast, as well, but this will add additional computational burden, while the results should change only marginally.In other words, we assume that π t+h|t = π t ∀h, where π t+h|t = P r[r t+h < τ |I t ].We compute h-step-ahead forecasts for the HAR, GARCH(1,1) and GJR-GARCH(1,1) models based on the formulas presented in Appendix A. Finally, the rolling window is moved by one period ahead; the first observation is dropped, and the parameters of the model, including π t+1 , are re-estimated.

Data
The empirical analysis is based on high-frequency data for the S&P 500 index obtained through the Realized Library of Oxford-Man Institute of Quantitative Finance (Library Version 0.2), which is freely available: "Researchers may use this library freely without restrictions so long as they quote in any work which uses it: Heber, Gerd, Asger Lunde, Neil Shephard and Kevin Sheppard (2009) "Oxford-Man Institute's realized library", Oxford-Man Institute, University of Oxford." The sample covers the period from 3 January of 2000 to 12 June of 2014, overall 3603 trading days.We exclude all days from the sample when the market was closed.[41] have created the Realized Library database, which provides daily data for about 11 realized measures for 21 assets.The authors clean the raw data obtained through Reuters Data Scope Tick History and compute high-frequency estimators from cleaned data.We use a realized kernel [21] as a proxy for integrated variance.

Preliminary Data Analysis
We start with data analysis of five main time series of interest: standardized returns, returns, realized variance, realized volatility and the logarithm of realized variance.Table 1 presents the descriptive statistics, while Figure 1 illustrates the time series dynamics of these variables.p = 0.00 p = 0.00 p = 0.00 p = 0.00 p = 0.06 Normality test (J-Btest) p = 0.00 p = 0.00 p = 0.00 p = 0.00 p = 0.00 L-Btest 5 lags p = 0.01 p = 0.00 p = 0.00 p = 0.00 p = 0.00 L-B test 10 lags p = 0.08 p = 0.00 p = 0.00 p = 0.00 p = 0.00 L-B test 15 lags p = 0.07 p = 0.00 p = 0.00 p = 0.00 p = 0.00 ARCH effect p = 0.00 p = 0.00 p = 0.00 p = 0.00 p = 0.00 Four of the variables are stationary at 5% according to the augmented Dickey-Fuller test, while log( √ RV t ) is stationary at 6%.The recent financial crises and European sovereign debt turmoil affected the volatility pattern and led to several spikes in the realized variance series.Although these spikes look less pronounced in the logarithm of realized variance, they remain very distinct from the volatility behaviour observed during calm times.This observation motivates the introduction of the regime switching model for volatility process.
Daily returns are weakly correlated and follow a leptokurtic and negative skewed distribution.By contrast, the distribution of the standardized returns is much closer to Gaussian, which is in line with previous empirical findings: [10,42].Figure 2 documents the long memory observed in realized volatility as the autocorrelation function decays at a hyperbolic rate.This result is also consistent with the literature: [6,15,32].Sample autocorrelations and partial autocorrelations of returns and realized volatility.

Benchmark HAR Model
We start with the estimation of the benchmark linear Model (4) for the three specifications of dependent variable RV ,

√
RV and log( √ RV ), correspondingly.Table 2 presents the estimation results with the standard errors computed based on the HAC variance-covariance matrix.In particular, benchmark Model (4) underestimates volatility by around 40% during financial crises in 2007-2009.A similar pattern is observed during spikes in volatility in 2010 and 2011.One of the explanations of the poor performance of the HAR model during turbulent volatility periods is that it fails to take into account changes in volatility regimes.Indeed, if volatility reacts to negative returns more than to positive returns, then the arrival of the consequent negative shocks and volatility persistence can substantially increase the future volatility level.On the other hand, different economic regimes might affect volatility differently.We choose the TAR over SETAR model based on the higher value of the F 12 statistics or, alternatively, the lower value of p bootstrap defined in Subsection 2.2.2.These results are available upon request.

The TAR(2) Model
Next, we estimate the TAR(2) model (Tables 3 and 4), where past returns govern changes in the volatility regimes.
Table 3 shows that regression R 2 improves substantially if regimes are driven by past returns.As a result, high values of the F 12 statistics lead to the rejection of the null hypothesis (13) for all specifications at a 5% significance level.In addition, the optimal value of the threshold parameter remains the same for two specifications: RV t and √ RV t .The τ that corresponds to logarithm specification is closely related to the second threshold of the TAR(3) model.However, the confidence interval for this parameter is very wide, which leads to the imprecise estimate of the threshold parameter.Not surprisingly, this model produces a less accurate one-step forecast than TAR(2).In particular, [43] document that the imprecise estimate of the threshold parameter leads to the poor forecasting performance of the simple switching model compared to the random walk model.In both cases, changes in regimes are driven not only by negative returns (leverage effect), but by significantly negative returns: −1.3% on a daily scale.[9] also show that the transition between volatility regimes is governed not by negative past returns, but by "very bad news" or very negative past returns.The fact that changes in regimes are triggered by "very negative returns" can be explained by the volatility persistence and higher intensity of shocks during bad times.Although the value of the threshold is not very large (it corresponds to the 11th percentile of the returns distribution), the increasing number of negative returns can generate a spike in the volatility.This explanation is similar to the option pricing literature, where researchers modelled volatility by adding infinite activity jumps to the return's process [44].Even though the appearance of one small or medium jump is not enough to generate a significant surge in volatility, high volatility persistence can lead to pronounced spikes in the future volatility.Indeed, Figure 4 shows that the frequency of returns that are lower than the threshold (red line) increased dramatically during recent financial crises.By contrast, returns that exceed the threshold (blue line) completely dominated "very negative returns" during the period of low volatility in 2003-2007.
Table 4 shows that parameters β d , β w and β m are very different in high-and low-volatility regimes.In particular, β w 1 is twice as large as the corresponding estimate in the low-volatility regime for √ RV t specification.Although some estimates have negative signs, they are not statistically significant at 10% for both realized volatility and variance models.By contrast, intercepts in both regimes are statistically negative for logarithmic specifications.Overall, corresponding estimates differ substantially in different regimes, which highlights the importance of using the regime switching model.Next, Figure 5 shows that the 95% confidence interval for the threshold parameter is quite narrow (τ opt ∈ [−0.014, −0.012]), although it includes two disjoints sets.Finally, we compare the in-sample performance of the SETAR(2) and TAR(2) models for different indices, including both developing and developed countries: Bovespa (Brazil), DAX (Germany) and IPC Mexico (Mexico).The main findings remain robust to the different sets of indices: the non-linear model with an exogenous trigger is preferred over the corresponding specification with the endogenous variable.These results are available upon request.

Forecast
In this section, we discuss one-and multiple-step-ahead forecasts of realized volatility based on the TAR(2) model and several competing benchmarks.We assess their forecasting performance using lowand high-volatility periods.

One-Day-Ahead Forecast
We start with the one-day-ahead forecast of the realized volatility, which is measured as the square root of the realized kernel.The in-sample period covers 1968 days from January 2000 to January 2008.In addition to the HAR model, we choose several GARCH specifications as benchmarks, including symmetric GARCH(1,1) and asymmetric GJR-GARCH (1,1).[45] show that it is extremely hard to outperform a simple GARCH (1,1) model in terms of forecasting ability.Meanwhile, TAR(2) is a non-linear model; therefore, we need to add asymmetric GARCH specification to guarantee a "fair" model comparison.Figure 6 and Table 5 assess the forecasting performance of high-and low-frequency models.Table 5. One-day-ahead out-of-sample forecast (although realized volatility ignores overnight returns, the superior performance of the high-frequency models is unlikely to be affected).Next, we investigate whether the TAR forecast remains superior in population or not using the Giacomini and White test.Recall that the GW test is designed for the situation where in-sample size is fixed, while out-of-sample size is growing.Thus, we assess the forecasting performance of different models using the GW test only for the period from January 2008 to June 2014 and not for U.S. and Eurozone financial crises.In the latter cases, the GW test is likely to perform poorly, since we have a relatively short period of sample periods: 247 and 123 observations, correspondingly.
The main results of this comparison are the following.First, high-frequency models significantly outperform lower frequency symmetric (GARCH) or asymmetric (GJR-GARCH) daily models.This result highlights the importance of more accurate volatility measuring based on the intra-daily data.Second, non-linear TAR(2) specification dominates the linear HAR model thanks to an additional flexibility to capture changes in regimes according to the first three metrics.Surprisingly, TAR(2) does not outperform the HAR model according to the GW test.
Finally, we assess the performance of volatility forecasts during times of financial turmoil: the U.S. financial crises in 2008 and the Eurozone crises in 2011.Although high-frequency models continue to dominate GARCH specifications, the benefits of using the non-linear TAR(2) model become substantial compared to linear specification: the latter's MAE is higher by 3% (U.S. crises) and 6% (Eurozone crises).By contrast, the MAE of the HAR model is only 1% higher during the whole out-of-sample period.Figure 7 shows that TAR(2) better captures spikes in volatility than linear specification during the recent U.S. financial crises.Finally, both RMSE and MAE are lower for Eurozone crises and whole out-of-sample periods compared with recent U.S. financial crises, which reflects the learning process of the model, where recent volatility spikes help to improve the models' performance.
To sum up, the benefits of using the non-linear TAR(2) model are most evident during periods of elevated volatility.In addition, the model is able to predict spikes in volatility, even when we use a relatively calm period for in-sample estimation, since changes in regimes are driven by moderately low returns.As a result, we do not rely on extreme market events to forecast volatility.To sum up, our non-linear model outperforms its competitors thanks to its ability to capture different regimes in volatility and to measure volatility much more accurately than daily models.In addition, our model achieves approximately the same rate of improvement over the HAR model as much more complicated non-liner models, but with lower computational costs, since the TAR(2) model has only two regimes.For example, [18] modelled realized volatility with five regimes and achieved an improvement in forecasting performance over the HAR model of around 3%.This feature is essential for practical applications.

Conclusions
This paper develops a non-linear threshold model for RV (realized volatility), allowing us to obtain a more accurate volatility forecast, especially during periods of financial crisis.The changes in volatility regimes are driven by negative past returns, where the threshold equals approximately −1%.This finding remains robust to different functional forms of volatility and different set of indices from both developing and developed countries.The additional flexibility of the model allows one to produce a more accurate one-day-ahead forecast compared to the linear HAR specification and GARCH family models.More importantly, the superior multiple-step-ahead forecasting performance of TAR is achieved not only in particular samples, but also in population according to the GW test for the out-of-sample period from 2008 to 2014.Finally, we derive a closed form solution for multiple-step-ahead forecast, which is based on the NIG conditional distribution of returns.The non-linear threshold model primarily outperforms its competitors during periods of financial crisis.and θ = (θ 1 , θ 2 ) .The one-step-ahead forecast is obtained as: Next, consider the the-step-ahead forecast from Equation (9): Simplifying the first summand S 1 , we obtain: where Ŷt (s) = Y t+s is s < 0. Finally, the formula for the multiple-step-ahead forecast Ŷt (h) with h > 2 is extended recursively from Result (C5).

4. Compute S b 1 and S b 2
on the fake dataset, where b refers to specific bootstrap replication.5. Compute statistics F b 12 from (15).6. Repeat Steps (1)-(5) a large number of times.7. The bootstrap p-value (p bootstrap ) equals the percentage of times that F b 12 exceeds the actual statistic F 12 .

Figure 1 .
Figure 1.Daily standardized returns, returns, realized variance, realized volatility and the logarithm of the realized variance of the S&P500 index.The sample period goes from January 2000 till June 2014 (3603 observations).

Figure 2 .
Figure 2.Sample autocorrelations and partial autocorrelations of returns and realized volatility.

Figure 3 .
Figure 3. In-sample comparison of actual realized volatility (blue line) and volatility recovered from the HAR model (red line).The in-sample covers the period from February 2000 to June 2014 (3582 observations).

Figure 4 .
Figure 4. Daily returns in high (red line) and low (blue line) volatility regimes.The high (low) volatility regime occurs when the return is lower (higher) than the threshold.The sample period goes from February 2000 till June 2014 (3603 observations).

Figure 5 .
Figure 5. Ninety five percent confidence interval for the threshold parameter of the TAR(2) model with √ RV t specification.The red line corresponds to c(0.05) ≈ 7, while the blue points represent LR.

Figure 6 .
Figure 6.Comparison of actual and one-day-ahead forecasts based on the TAR(2), HAR, GARCH(1,1) and GJR-GARCH(1,1) models from January 2008 to June 2014 (1614 observations).The red line indicates the one-step forecast, while the blue line the actual data.

Figure 7 .
Figure 7.Comparison of actual and one-day-ahead forecasts based on the TAR(2) and HAR models during U.S. financial crises from January 2008 to January 2009 (247 observations).Red and green lines indicate one-step forecasts based on the TAR(2) and HAR models, correspondingly, while the blue line the actual data.

Figure 8 .
Figure 8.Comparison of aggregate volatility over five days and corresponding forecasts based on the TAR(2), HAR, GARCH(1,1) and GJR-GARCH(1,1) models from January 2008 to June 2014 (1604 observations).The red line indicates the aggregate five-step forecast, while the blue line the actual data.

Table 2 .
Heterogeneous autoregressive model (HAR) estimation.arein-sampleestimation results of the linear HAR model and corresponding standard errors computed based on the HAC variance-covariance matrix.The in-sample covers the period from February 2000 to June 2014 (3582 observations).Here, * * * means that the corresponding p-value is lower than 0.01. the benchmark model fails to model spikes in volatility during turbulent times on financial markets.Figure3illustrates this point and depicts a comparison between the in-sample forecast and the actual realized kernel. Reported

Table 3 .
Comparison of the TAR(1) (or HAR) and TAR(2) models.Reported are in-sample estimation results of the linear HAR model and non-linear TAR(2) model.The in-sample covers the period from February 2000 to June 2014 (3582 observations).p bootstrap is computed based on 500 replications using the heteroscedastic bootstrap method.We set the maximum amount of lags equal to 10 in the TAR estimation.

Table 4 .
TAR(2) estimation.are in-sample estimation results of the non-linear TAR(2) model and corresponding standard errors computed based on the HAC variance-covariance matrix.The in-sample covers the period from February 2000 to June 2014 (3582 observations).The first four rows correspond to the high-volatility, while the last four rows correspond the low-volatility regime, respectively.Here, * * * and * mean that the corresponding p-values are lower than 0.01 and 0.1, respectively. Reported The first four columns correspond to the period of recent financial crises in the U.S. from January 2008 to January 2009 (247 observations).The next four columns correspond to Eurozone crises from July 2011 to December 2011 (123 observations).The last four columns correspond to the period from January 2008 to June 2014 (1614 observations).The performance metrics are root mean square error (RMSE), mean absolute error (MAE), the R 2 of the Mincer-Zarnowitz regression and the p-value of the Giacomini and White test based on the MAE metric.Two forecasts are identical in population under the null hypothesis, while TAR beats its competitors under the alternative.We compare TAR against all other models, while NA corresponds to the TAR vs. TAR case.The TAR column represents the actual value of RMSE and MAE errors, while the HAR, GARCH and GJR columns, corresponding to the RMSE and MAE rows, equal the ratio of the TAR model to the following benchmark.Thus, a number below one indicates the improvement of the TAR model over its competitor.Observations for RMSE and MAE of the TAR model are standardized by 1000.