Modeling Realized Variance with Realized Quarticity

: This paper proposes a model for realized variance that exploits information in realized quarticity. The realized variance and quarticity measures are both highly persistent and highly correlated with each other. The proposed model incorporates information from the observed realized quarticity process via autoregressive conditional variance dynamics. It exploits conditional dependence in higher order (fourth) moments in analogy to the class of GARCH models exploit conditional dependence in second moments.


Introduction
The commonly used measure of financial risk, the conditional variance, of an asset return is not directly observable. The large class of observation-driven GARCH type or parameter-driven stochastic volatility models use asset return data to estimate the conditional variance process. A more recent alternative approach exploits the availability of intraday high-frequency data and treats the realized variance (RV) process as observable data. This approach was popularized by the HAR (heterogeneous autoregressive) specification of Corsi [1].
Understanding the time variation in financial risk and the ability to predict its future course is of primary interest for financial practitioners and regulators who need to manage financial risk. In the extensive literature that examines the conditional dynamics of asset return second moments, some have considered utilizing information contained in higher order moments such as skewness and kurtosis [2][3][4]. This paper contributes to this literature by examining the information in the observed fourth-moment process to predict the realized variance RV.
The proposed model exploits two empirical features of the RV and the realized quarticity (RQ) processes. First, both processes are highly persistent with slowly decaying autocorrelations. The early literature modeled the slow decay in volatility as a long memory process with Hurst exponent H > 1/2 [5][6][7]. Recently, Gatheral et al. [8] have suggested a rough volatility model with H < 1/2. As they note, though a rough volatility model is not strictly a long memory process, their autocorrelation is statistically difficult to distinguish from that of a long memory process. For the fourth-moment process Huang et al. [9] find positive but weak dependence in the volatility of volatility measure from options on VIX. Da Fonseca and Zhang [10] document roughness in volatility of volatility with H < 1/2.
The second empirical feature the proposed model exploits is the high correlation between RV and RQ processes. A positive correlation between volatility and volatility of volatility is also documented in Da Fonseca and Zhang [10]. This correlation motivates the use of information in RQ to model RV. As the variance of variance is the fourth moment and this fourth moment is persistent (first empirical feature), the proposal is to model the conditional variance of the realized variance with GARCH-type dynamics that can capture the persistence. The proposed model is a higher moment extension of the realized GARCH model of Hansen et al. [11].
The common approach to model the persistence in RV is to use a restricted distributed lag model of Corsi [1]. Bollerslev et al. [12] extend the HAR specification to account for possible measurement error in RV as an estimator of the integrated variance IV. Their HARQ specification allows the parameters of the HAR model to be time-varying and depend on √ RQ. Buccheri and Corsi [13] consider a HAR model with time-varying parameters in a parameter-driven state-space model. The variance of the observation equation for ln RV is set to RQ and the state equation parameters are driven by the scores as in Creal et al. [14]. To capture long memory type persistence in RV, the state vector has dimension 22. To economize on the number of parameters to estimate the system is sparse, with score dynamics restricted to follow a diagonal random walk. Ding [15] extends the GARCH model to capture dynamics in the variance of variance.
Section 3 formally describes the proposed model specifications and their properties. Unlike the HAR type models, the dependence in RV is indirectly modeled through dependence in RQ. More specifically, RV is modeled as a function of the (unobservable) conditional variance of RV. The proposed model, therefore, extends Hansen et al. [11] to a higher order moment with a RQ-in-mean type model. The GARCH-in-mean models with asset returns on the left-hand side have a persistence mismatch as returns have little persistence while conditional variances are highly persistent [16]. For the higher-order specification in this paper, the left-hand side RV is persistent as with the right-hand side conditional variance of RV.
Section 3 also discusses the somewhat neglected issue of the use of nonlinear transformations. The three most commonly used transformations in the literature are RV (no transformation), √ RV [1], and ln RV [17]. The first two forms are the variable of interest in financial applications. However, in the majority of the models that are estimated by least squares, the parameters are not restricted to ensure RV ≥ 0 and may result in negative out-of-sample forecasts. The log transformation ensures non-negative forecasts without parameter restrictions. Another important advantage of the log transformation, as documented in the empirical analysis Section 4, is the removal of the excess kurtosis in RV and RQ. The proposed model with the log transformation can, therefore, be estimated by maximizing the Gaussian likelihood.
The empirical performance of the proposed model is considered in Section 4 using data for the 27 stocks analyzed in Bollerslev et al. [12]. This section first documents the claimed two empirical features of RV and RQ for the sample of 27 stocks. The proposed model is then fitted using a rolling estimation window for each stock. The rolling pseudoout-of-sample predictions from the proposed model are then compared to those from the baseline HARQ model of Bollerslev et al. [12]. The HARQ model is chosen as the baseline for comparison as it also exploits information from RQ to predict RV.
The rest of the paper is organized as follows. Section 2 provides a summary of related literature, Section 3 formally describes the proposed model specification, Section 4 examines the empirical performance of the proposed model specification, and Section 5 provides some concluding comments.

Related Literature
An early parametric model with a long memory for the RV process was suggested in Andersen et al. [18]. They specify a fractionally integrated VAR where the fractional difference parameter d is estimated by the log-periodogram regression of Geweke and Porter-Hudak [19]. The long memory in RV is modeled by a restricted AR(22) specification in Corsi [1]. This long AR specification, known as the HAR (Heterogeneous AutoRegressive) model, has become the standard model to capture long memory like persistence in RV since it can be estimated by simple least squares.
Corsi et al. [17] extends the HAR specification by modeling conditional dependence in the second moment of the residuals. As the outcome variable in a HAR model is the second moment RV, the residual second moment is the variance of variance. This model of conditional fourth-moment dynamics is the higher-order extension of the GARCH class of conditional second-moment dynamic models. Like the GARCH model, the con-ditional fourth moment is a latent variable that is not directly observable in the model of Corsi et al. [17].
The proposal in this paper is to use the observed fourth-moment variable, the realized quarticity RQ. RQ was used in the HAR extension of Bollerslev et al. [12], which motivates its use as an instrument for potential measurement error in the lagged RV on the righthand side of the regression. Their HARQ model can be estimated by simple least squares. Buccheri and Corsi [13] deal with the RV measurement error in a state-space model where the observed RV is modeled as the latent signal IV plus a measurement error noise. The unobserved latent signal IV is the state variable that needs to be filtered. To capture long memory type persistence, IV has a HAR type long AR specification resulting in a latent state vector of dimension 22. Their HARK model uses the observed RQ process as the time-varying variance of the observation equation for RV. Although the state-space model can be estimated by maximum likelihood with the use of Kalman filter, the likelihood is nonlinear in the parameters and estimation is computationally more expensive than the least squares based HARQ model of Bollerslev et al. [12].
Bollerslev et al. [12] and Buccheri and Corsi [13] both use the observed fourth moment variable RQ in the HAR class model to capture long memory type persistence in RV. Rather than use long lags of RV to capture persistence, the proposal in this paper is to exploit similar long memory type persistence in RQ to model the RV dynamics. To exploit potential dependence in the residual second moments as in Corsi et al. [17], the proposed model is a higher moment extension of the realized GARCH model of Hansen et al. [11] and Hansen and Huang [20]. Compared to their realized GARCH specification, the proposed model is an RQ-in-mean specification as described below. The likelihood function of the proposed model is nonlinear in the parameters and requires numerical optimization for maximum likelihood estimation.

Realized Variance Model
Online Appendix A.1 provides a brief summary of asymptotic distributions of realized variances.

Model Specification
The proposed model extends the realized GARCH model of Hansen et al. [11] in two ways. First, instead of the asset return (first moment), the conditional mean equation is for the realized variance (second moment) of asset returns. Second, as the realized variance is persistent, the conditional mean equation has a conditional variance of variance term analogous to the GARCH-in-mean specification to capture the persistence in realized variance.
where t and u t are assumed independent. The observed outcome variable y t is some function of the realized variance RV t . The choice of transformation for y t is discussed below in Section 3.2. κ t is the (unobserved) conditional variance of y t . As κ t is the variance of variance, the parameter c 1 is the variancein-mean parameter analogous to the GARCH-in-mean specification [21]. This term is included in the conditional mean Equation (1a) to capture the dependence in y t .
The conditional dynamics for κ t is specified in log form in (1b) to ensure κ t is non-negative without restricting the parameters in (1b). This specification is analogous to that of exponential GARCH of Hansen and Huang [20] and Nelson [22]. Corsi et al. [17] considered a GARCH analogue for κ t where x t−1 = κ t−1 2 t−1 in Equation (1b) with κ t on the left-hand side. For the proposed model x t is an observed variable analogous to Hansen et al. [11] and Hansen and Huang [20]. Hansen et al. [11] 'augmented' the GARCH model by using informa-tion from realized variance by setting x t = RV t . For this application, κ t is the conditional fourth moment and x t is set to some transformation of the realized quarticity RQ t .
In Equation (1c) for the observed variable x t , the term τ( t ) captures potential asymmetric response to negative and positive shocks t . As in Hansen et al. [11], the quadratic specification τ(z) = τ 1 z + τ 2 (z 2 − 1) is used in the empirical application. τ( t ) is then a zero mean process where the parameter τ 1 captures the asymmetric response. (1b) can include additional lags of x t and ln κ t but we use only one lag for parsimony.
Model (Section 3.1) implies that ln κ t follows an AR(1) process and x t an ARMA(1,1) process [11] where e t ≡ τ( t ) + σ u u t is a zero mean iid process. The stationarity condition for ln κ t and x t is β + αφ < 1 with unconditional means ρ ≡ β + αφ is a measure of persistence of the model implied series ln κ t and x t as their autocorrelations decay with powers of ρ.

Variable Transformations
The proposed model (Section 3.1) can be considered a class of models depending on the choice of transformed variables y t and x t . The literature that models realized variance RV t have used alternative transformations of RV t for y t . The HAR model of Corsi [1] uses y t = √ RV t , Corsi et al. [17] consider y t = √ RV t , y t = ln RV t and Bollerslev et al. [12] consider y t = RV t . Strictly speaking the non-negativity of y t = RV t , y t = √ RV t requires restrictions on the mean Equation (1a) parameters to ensure out of sample forecasts remain non-negative. Such restrictions, however, do not appear to be imposed for least squares based estimators in Corsi et al. [17] and Bollerslev et al. [12]. As a consequence out-ofsample predictions may produce negative values of y t = RV t or y t = √ RV t . For this reason, the empirical application below uses the log transformation y t = ln RV t which ensures non-negative RV t predictions of RV t without restrictions on the mean equation parameters (1a). A problem with the use of log transformation is that the variable of interest in financial applications is often the variance itself RV t or the volatility √ RV t , not the log transformation. If we use the log transformation in (Section 3.1), the predicted values need to go through a nonlinear transformation to obtain predictions of the variable of interest. The nonlinear transformation results in approximate inference for the variable of interest even under the rather strong assumption of a correctly specified distribution for the transformed variable y t .
The variable x t is an observed measure of the conditional variance of y t . As y t is a second moment variable, it is natural to use a realized fourth moment or quarticity RQ t for x t . For transformations y t = √ RV t or y t = ln RV t , the asymptotic variance expression in online Appendix A.1 have IV t in the denominator. This may cause problems for the values of RV t close to zero, the so-called inlier problem. As the model (Section 3.1) restricts x t to follow an ARMA(1,1) process, an alternative approach is to choose a transformation of RQ t in the sample that has similar dependence as an ARMA(1,1) process. A further consideration is that the model implied process for x t does not ensure positive x t without restrictions on the parameters. The log transformation x t = ln RQ t ensures positive RQ t without imposing additional restrictions on the parameters.
For these reasons, the empirical application below uses the log transformation x t = ln RQ t . Alternative transformations of the conditional variance have been considered for the (G)ARCH-in-mean term. The RQ-in-mean term with coefficient c 1 in (1a) can also use κ t itself or transformations such as √ κ t . However, (1a) uses the log transformation as the dynamics of κ t in logs in (1b) ensures non-negative κ t without restricting the parameters. Furthermore, the log transformation of κ t matches the log transformation of RV t for y t and of RQ t for x t .

Maximum Likelihood Estimation
The parameters of the model can be estimated by maximum likelihood assuming the error terms have a Gaussian distribution. While one can interpret the Gaussian assumption as a quasi-likelihood, for finite sample performance, it is desirable that the transformations discussed in the previous section are chosen so that the distribution of t and u t are not 'too' different from the Gaussian. The levels of RV t and RQ t are well known to have very high kurtosis and an important reason to prefer the log transformation for y t and x t is to remove the excess kurtosis. Corsi et al. [17] and Barndorff-Nielsen and Shephard [23] find that the finite sample distribution of ln RV t is closer to the Gaussian than RV t .
For model (Section 3.1), the contribution to the Gaussian log-likelihood from the t-th observation is For the quadratic leverage function τ(z) = τ 1 z + τ 2 (z 2 − 1) used in Hansen et al. [11], the parameter vector is The parameter restrictions are σ u > 0 and for stationarity of κ t and x t , β + αφ < 1.
To start the recursion in (1b), we need to specify presample values of x t , κ t . If stationarity of x t , κ t are imposed, we can set these to the unconditional means (3). A simple alternative used in the empirical application below is to use the sample variance of y t .
For statistical inference, the 'sandwich' QML (quasi-maximum likelihood) parameter covariance matrix can be used by evaluating the first and second derivatives of the contributions to the Gaussian log-likelihood. Analytical expressions to recursively evaluate the first derivatives are given in the online Appendix A.2. The second derivatives can be evaluated by numerically differentiating the analytical first derivatives.

Model Evaluation by Pseudo Out-of-Sample Forecasting
A standard approach to evaluating model performance is to compare the accuracy of pseudo-out-of-sample forecasts. For alternative models with the same outcome variable y t and provided the actual outcome y t is observable, the comparison can be made by specifying a loss, or scoring, function.
There are two additional issues to consider in this application where the outcome of interest is (some function of) the daily realized variance RV t . First, if the statistic of interest is the integrated variance IV t , the outcome y t based on RV t is likely measured with error. Patton [24] suggested scoring functions robust to additive error in the outcome variable.
The second issue is the comparison of models where the outcome variable of interest is some transformation of the modeled variable y t . For example, in finance applications rather than y t = ln RV t we are often more interested in √ RV t = exp(y t /2). To compare models with outcomes y 1t = RV t , y 2t = √ RV t , y 3t = ln RV t , we need to specify a common outcome variable of interest. In this case setting RV t as the variable of interest is the least problematic as obtaining forecasts of √ RV t from RV t under additive error is mathematically intractable. Furthermore RV t is the observed data variable constructed from intraday data.
A commonly used approach to obtain forecasts for RV t is to assume Gaussianity. For y t = ln RV t and t ∼ N(0, 1), the h-step forecast of y is Gaussian with E t [y t+h ] ∼ N( y t+h|t , κ t ) y t+h|t = c 0 + c 1 ln κ t and the forecasts for the transformed variables are An alternative approach that does not assume Gaussianity is the 'smearing' estimate of Duan [25] and Wooldridge ([26], . Let e t = y t − y t , t = 1, . . . , T denote the in-sample residuals. The smearing forecasts are

Empirical Application
This section uses the 5-minute return data from Bollerslev et al. [12] to evaluate the performance of the proposed specification (Section 3.1) for the realized variance. The data, made publicly available by the authors, is a daily sample of 27 Dow Jones constituent stocks from 22 April 1997 to 31 December 2013 (4200 trading days). A list of the 27 stocks and their ticker symbols are provided in online Appendix A.3.

Preliminary Analysis
Before we fit the model to the data, we examine the log transformations chosen for the observed variables y t = ln RV t and x t = ln RQ t . As explained above, the log transformations ensure non-negative RV t and RQ t without restricting the model parameters. Another important reason for using the log transformation for y t is to approximate the Gaussian distribution assumed for the likelihood function closely. Table A2 in the online Appendix A.3 shows the sample third (skewness) and fourth (kurtosis) moments of RV t , √ RV t , and ln RV t for the 27 stocks in the sample. Table A2 shows that all three transformations are positively skewed. The untransformed RV has the largest positive skewness followed by √ RV, which have skewness all above one. The log transformation ln RV has the smallest skewness all below one, but they are all significantly different (at size 0.05) from the Gaussian value of zero. The untransformed RV also has the largest kurtosis, all well above the Gaussian value of three. √ RV has somewhat smaller kurtosis, but the smallest value, 7.42 (for INTC), is still well above three. The log transformation ln RV has kurtosis much closer to Gaussian with the largest value at 4.15 (for CVX).
As mentioned in the introduction, the proposed specification (Section 3.1) is based on two empirical features of RV and RQ: their common persistence and their correlation. Figure A1 in the online Appendix A.3 shows the sample autocorrelations of ln RV and ln RQ for the 27 stocks in the sample. For all stocks, ln RV is somewhat more persistent than ln RQ. The autocorrelations for ln RV slowly decay from about 0.8 while those for ln RQ slowly decay from about 0.6. Both autocorrelations die out slowly with the lag, a feature of long memory series. The two series are highly correlated with each other with all pairwise sample correlations above 0.95.
The model (Section 3.1) implies that the observed realized process x t should follow the restricted ARMA(1,1) process (2). The slowly decaying autocorrelations for x t = ln RQ t in Figure A1 is not inconsistent with an ARMA(1,1) process with a large AR(1) coefficient. As a further check, Table A5 in the online Appendix A.3 reports estimates of an unrestricted ARMA(1,1) model fitted to x t for the 27 stocks in the sample. Both the AR and MA coefficients are statistically significant, with a large positive AR(1) coefficient and a large negative MA(1) coefficient. The estimated AR(1) coefficient ranges from 0.976 (XOM) to 0.992 (MCD) and the MA(1) coefficient from −0.709 (XOM) to −0.844 (NKE). Although the portmanteau test for residual correlation up to lag 5 rejects the white noise residual null (at size 0.05), the residual serial correlations are small in size. The first order residual serial correlation ranges from 0.103 (NKE) to 0.036 (CVX).

In-Sample Estimates
To evaluate the performance of the model out of the sample, pseudo-out-of-sample rolling forecasts were obtained. As parameter estimation by maximum likelihood is computationally expensive compared to least squares, a rolling forecast window of one (calendar) month was moved forward each month starting from January 2006. The first set of parameter estimates were obtained for the estimation sample from the beginning of the data sample April 1997 to December 2005 (104 months). Using these parameter estimates, h-step forecasts for trading days in the month of January 2006 are obtained. These forecasts use the same parameter estimates but the conditioning information set, i.e., the lagged variables on the right-hand side are updated as we move forward within the forecast window.
The There are a large number of estimated parameters (10 parameters for each of the 96 estimation windows for each of the 27 stocks). Figure 1 shows a summary of the estimated parameters for stepsizes h = 1, 5, 22 days. Following Bollerslev et al. [12], for the h-step forecast the outcome variable on the left-hand side of (1a) is The h-step ahead variable for the log transformation is defined so that the forecasts from y (h) t can be compared to those from the other transformations. For h = 1, taking the log of RV or √ RV just results in a different scaling. For h > 1, the log of RV averages and the log of √ RV averages are considered since one cannot be recovered from the other just by rescaling. Strictly speaking, the model parameters change with the forecast stepsize h and should be written as a function of h.
Each panel in Figure 1 corresponds to a parameter and the shaded area is the interquartile range (from the 0.25 to 0.75 quantile) across the 27 stocks. The thick solid line is the median estimate across the 27 stocks. The intercept c 0 and the RQ-in-mean parameter c 1 of Equation (1a) are both positive. A positive RQ-in-mean parameter c 1 is to be expected given the similarity of the ln RV and ln RQ dynamics documented above. c 1 declines with stepsize h and for h > 1, c 1 is smaller for the log average of √ RV than for the log average of RV.     The parameters α, β, φ that determine the persistence ρ ≡ β + αφ of the ln κ t and x t processes are all positive and imply ρ close to but below the stationarity boundary of one ( Figure 2). As a function of stepsize h, α does not change much for the log average of RV but somewhat increases for the log average of √ RV. β increases with h from about 0.7 for h = 1 to above 0.8 for h = 22 and φ decreases with h from about 6 for h = 1 to less than 4 for h = 22. Figure 2 shows the rolling estimates of the persistence parameter ρ increase with the stepsize h. This is to be expected as the h-step outcome variable y  The asymmetric response parameter τ 1 is positive and decreases with stepsize h from about 1.3 for h = 1 to below 0.5 for h = 22. The coefficient on the quadratic term τ 2 switches sign from positive for h = 1 to negative for h = 5, 22. The volatility parameter σ u for the x t = ln RQ t Equation (1c) increases with stepsize h.

Pseudo Out-of-Sample Forecasts
This section evaluates the performance of the proposed model in terms of (pseudo) out-of-sample rolling forecasts described above in Section 4.2. If the model captures the time series dependence of the outcome variable, the forecast errors should not be serially correlated. A formal such test needs to account for sampling error in generating the forecasts based on the estimated parameters. Figures A2-A4 in the online Appendix A.4 provide an informal check by plotting the autocorrelation functions of the rolling forecast errors. In addition to forecasts error from the proposed model (Section 3.1), these figures also compare the forecast error correlations from the HAR model of Corsi [1] with outcome y = √ RV and the HARQ model of Bollerslev et al. [12] with outcome y = RV. For stepsize h = 1, the forecast error serial correlation is small in magnitude, but most of them fall outside the asymptotic interval for a white noise series. (A portmanteau test for serial correlation up to lag 5 all reject the null hypothesis of a white noise at the conventional size of 0.05. However, these are asymptotic p-values that ignore parameter estimation sampling error.) For stepsizes h > 1, the forecast errors show a large positive correlation up to lag h for all models. This dependence is due to the overlapping sample used in generating the multistep outcome variable (4).
To evaluate the relative forecast performance of the proposed model, its forecast accuracy is compared against a baseline model. The baseline model is the HARQ model of Bollerslev et al. [12], which extends the HAR model of Corsi et al. [17] with an additional interaction term involving √ RQ. The model proposed in this paper also uses the realized quarticity RQ but without the lagged RV terms of HAR(Q). Following Bollerslev et al. [12], the outcome variable of the baseline HARQ model is the untransformed y = RV. The model parameters are estimated by least squares without restricting them to ensure the predicted values are non-negative. Bollerslev et al. ( [12], footnote 17) apply the 'insanity' filter and replace predicted values that are outside the in-sample range with the in-sample mean value. The baseline predictions in this paper do not apply this somewhat ad hoc insanity filter.
An alternative natural baseline that ensures non-negative predicted RV values is the HARQ model with y = ln(RV) and an interaction term with ln(RQ). This log version of the HARQ specification, however, does not perform as well as the untransformed specification of Bollerslev et al. [12]. Table A3 in the online Appendix A.3 reports estimated coefficients on the interaction term for the full sample for stepsizes h = 1, 5, 22. For the level (untransformed) specification, these coefficients are all negative with t-ratios above two in absolute value (with two exceptions for h > 1) as reported in Bollerslev et al. [12]. For the log specification, many coefficients are positive and insignificant. For h > 1 all t-ratios (except one) are below two in absolute value. Bollerslev et al. [12] motivate the use of √ RQ to correct for possible measurement error in RV. The negative coefficient of the interaction term may be correcting for this error mainly in the tails of the high excess kurtosis of RV. This may explain the insignificant coefficient in the interaction for the log specification as the excess kurtosis largely disappears for ln(RV). The issue of appropriate HARQ specification is not the focus of this study and is left for further research.
To evaluate the relative accuracy of predictions from alternative models, we need to specify a loss or scoring function. For this study, two members from the Bregman family of consistent scoring functions that are robust to additive noise in RV [24] are used.
where RV is the actual realized value and F its forecast. S ms is the mean squared and S ql is the QLIKE scoring function. These two scoring functions were also used in Bollerslev et al. [12]. S ms is defined for all values of F while S ql is undefined for F < 0. As the baseline HARQ predictions are not filtered, the small cases of non-positive predictions are removed when evaluating S ql . The difference in scores between the baseline and comparison model is tested with the t-ratio of equal forecast accuracy of Diebold and Mariano [27]. As mentioned above, the forecast errors are correlated for h > 1. To account for this correlation, the denominator of the t-ratio, the standard error of the difference in average scores, is computed using the Bartlett kernel with bandwidth set to the forecast stepsize h. Table 1 reports these t-ratios for the full forecast sample. A positive τ value indicates a better forecast (smaller score) from the comparison model than the baseline HARQ.  [27] test of equal predictive accuracy of h-step RV rolling forecasts. The baseline forecast from the HARQ model (no transformation) is compared against the forecast from the exponential transform of the log forecast from model (Section 3.1) assuming log-normality. A positive τ value indicates a better forecast from the comparison model against the baseline HARQ forecasts. τ MS for equality of mean squared error loss and τ QL for QLIKE loss of Patton [24]. RV < 0 is the number of negative forecasts (in the forecast sample) produced by the HARQ model. The bottom four rows are the fraction out of 27 stocks that satisfy the inequalities; for RV < 0, it is the fraction of 27 stocks that produced at least one negative forecast. The rolling forecasts are generated with a rolling estimation window of 8.67 years (104 months) followed by a one-month prediction window shifted each month over 96 windows. The forecast sample is from 3 January 2006 to 31 December 2013 (2013 trading days).  Table 1 shows that in about half of the 27 stocks, a negative RV forecast is produced for the baseline HARQ model. As mentioned above, τ QL is computed for the forecast sample, excluding these negative predictions. The sign switch between τ MS and τ QL may be due to the difference in the forecast evaluation sample for the two score functions. With this caveat, the test result is somewhat sensitive to the choice of scoring function and forecast stepsize h. For h = 1, both τ MS and τ QL indicate lower average score, i.e., better forecast accuracy, for the baseline HARQ model than the proposed model (Section 3.1). The forecast accuracy of the proposed model relative to the baseline generally improves with the forecast stepsize h. For h = 22, τ MS is inconclusive in the sense that all their values are less than two in absolute value. None of the τ QL values are less than −2, but about a quarter of the stocks have values above +2 indicating more accurate forecasts from model (Section 3.1) than from the baseline.
The tests in Table 1 are based on the full forecast sample. The test results could be specific to the choice of the forecast sample and a particular sample could be cherry-picked to obtain certain results. As a guard against such potential cherry picking, Figures A5-A7 in the online Appendix A.5 show running t-ratios of the Diebold and Mariano [27] equal forecast accuracy tests for all possible end-of-forecast samples. These are the values of t-ratios when the test is applied to the forecast sample from the beginning of the forecast sample (2007-01-03) up to each date of an expanding evaluation sample. (The three figures use alternative (un)transformations to obtain forecasts for RV from ln RV.) One common feature of the running t-ratio results is the change in performance shortly after the financial crisis period 2008-2009. As expected the test performance up to the financial crisis period is quite noisy, but the majority of the t-ratios for both MS and QL are positive for h > 1, indicating better forecast accuracy from model (Section 3.1) compared to the baseline. The problem with the HARQ model in levels producing negative forecasts mostly occurs during the financial crisis period. Table A4 in the online Appendix A.3 lists all dates with negative predictions from the HARQ model, the majority of which occur during 2008. The t-ratios remain quite stable after the financial crisis period indicating robustness to the choice of forecast sample post-financial crisis. When judging the significance of these running t-ratios, one should be aware of the multiple comparisons problem and that the usual critical values are likely to be too small.

Concluding Remarks
This paper proposed a model for RV dynamics that exploits the information in the higher order moment of RQ. In contrast to the HAR(Q) models that exploit dependence in the RV variable itself, the idea is to exploit dependence in the RQ series. The empirical analysis using 27 stocks suggests that the proposed model may perform better than the HAR(Q) type specifications for multistep predictions and during periods of market turmoil such as the financial crisis of 2008.
As the empirical analysis was based on large cap (blue chip) US stocks, it remains to be seen how the proposed model performs for other markets or asset classes. These include small-cap stocks, which are known to be volatile and fat-tailed, or cryptocurrencies. Performance during other market turmoil periods, such as the recent pandemic, or for commodities during the recent conflict in Ukraine could also be analyzed.
For policymakers, the performance of the proposed model for risk management purposes could be of interest. The fundamental review of the trading book for the latest Basel regulation has introduced the use of expected shortfall for market risk capital requirements. The forecast performance analysis in this paper could be extended to evaluation of the accuracy of risk measures such as expected shortfalls. Additional tools or models for managing tail risks for regulatory purposes could prove beneficial for large financial institutions.
A somewhat puzzling result is the better accuracy for multistep forecasts from the proposed model compared to the baseline HARQ model. The HARQ specification uses RV t−22 while the proposed specification only uses one lag (though, in principle, additional lags could be considered). Whether the implied ARMA(1,1) process for x t = ln RQ can better capture long-term dependence than the AR(22) term could be further investigated.
Several extensions of the proposed model could be considered. Bollerslev et al. [12] consider the use of alternative estimators of RV such as realized kernel of Barndorff-Nielsen et al. [28] and of RQ such as QP and TP mentioned in Appendix A.1. Alternatively, option-based fourth-moment measures from VVIX as used in Huang et al. [9] could also be considered. Rather than select one of these alternative measures, one can try to incorporate information from all of these alternative measures as in Hansen and Huang [20].
The challenge in further generalizing the model is the increase in the number of parameters to estimate. With ten parameters to estimate, the model is already somewhat over-parameterized as a model of a single outcome variable. To reduce the number of parameters to estimate, one can either impose a priori 'reasonable' restrictions or sparsity consistent with the data. Alternatively, a penalty term can be added to the likelihood function for regularization [29,30]. Assume the log asset price p t follows a semimartingale process with drift µ(t) and volatility σ(t). W(t) is the standard Wiener process. The realized variance over day t is RV t = ∑ m j=1 r 2 t,j where r t,j is the log return over the j-th subinterval in day t and m is the number of subintervals over one trading day. In the diffusion limit m → ∞ [31,32], where IV t = t t−1 σ 2 (s)ds is the integrated variance and IQ t = t t−1 σ 4 (s)ds is the integrated quarticity. The (approximate) asymptotic distribution of nonlinear transformations of RV t can be obtained by the delta method [17,23] To estimate the (asymptotic) variance of RV t , we need an estimate of integrated quarticity IQ t . Commonly used consistent estimators (under no microstructure noise) are RQ t is the realized quarticity, QP t is the realized quad-power quarticity [33,34], and TP t is the realized tri-power quarticity [35].

Appendix A.2. Derivatives of Log-Likelihood
For (Section 3.1) the contribution to the Gaussian log-likelihood from the t-th observation is Contribution to score is For t = 1, if x 0 , ln κ 0 are fixed, e.g., set to estimation sample variance of y t , and if x 0 , ln κ 0 are set to their unconditional means

. Ticker Symbols
Twenty-seven constituents from the Dow Jones (Table 2, [12]).   Table A3. Estimates of interaction term in HARQ models. 'level' is HARQ with y = RV, q = √ RQ and 'log' is with y = ln(RV), q = ln(RQ). κ is the sample kurtosis of y where κ = 3 for a Gaussian. τ h are the HAR t-ratios for the interaction term q t−1 y t−1 for forecast step size h days. The daily sample is from 22 April 1997 to 31 December 2013 (4200 trading days).   Table A5. ARMA(1,1) estimates for daily log realized quarticity. The ARMA(1,1) model is parametrized as x t = φx t−1 + e t + θe t−1 where x t = ln(RQ t ) − µ, µ is the mean parameter. t are the t-ratios of the estimated parameters, ρ 1 is the first order autocorrelation of the residuals e t , p 5 is the p-value from the Ljung-Box portmanteau test of residual correlation up to lag 5. The daily sample is from 22 April 1997 to 31 December 2013 (4200 trading days).

Level
Ticker       [27] test of equal predictive accuracy of h-step RV rolling forecasts. Each panel has 27 lines for the 27 stocks in the sample. The baseline forecast is from the HARQ model (no transformation) and the comparison forecast is the exponential transform of the log forecast from model (Section 3.1) adjusted with a smearing factor [25]. A positive value indicates better forecast from the comparison model against the baseline HARQ forecasts. MS for equality of mean squared error loss and QL for QLIKE loss of Patton [24]. The rolling forecasts are generated with a rolling estimation window of 8.67 years (104 months) followed by a one month prediction window shifted each month over 96 windows. The forecast sample is from 3 January 2006 to 31 December 2013 (2013 trading days).