A Note on the Asymptotic Normality Theory of the Least Squares Estimates in Multivariate HAR-RV Models

: In this work, multivariate heterogeneous autoregressive-realized volatility (HAR-RV) models are discussed with their least squares estimations. We consider multivariate HAR models of order p with q multiple assets to explore the relationships between two or more assets’ volatility. The strictly stationary solution of the HAR( p , q ) model is investigated as well as the asymptotic normality theories of the least squares estimates are established in the cases of i.i.d. and correlated errors. In addition, an exponentially weighted multivariate HAR model with a common decay rate on the coefﬁcients is discussed together with the common rate estimation. A Monte Carlo simulation is conducted to validate the estimations: sample mean and standard error of the estimates as well as empirical coverage and average length of conﬁdence intervals are calculated. Lastly, real data of volatility of Gold spot price and S&P index are applied to the model and it is shown that the bivariate HAR model ﬁtted by selected optimal lags and estimated coefﬁcients is well matched with the volatility of the ﬁnancial data.


Introduction
Volatility in financial asset returns is one of the most important components in the financial market for optimization decisions such as portfolio selection, risk management and asset pricing. Over recent decades, extensive research works by statisticians and econometricians have analyzed volatility alongside time series modelings. In particular, because multiple assets are correlated with each other in the financial markets, the cross-correlation of two or more asset returns and the spillover effect of the volatility have been represented in multivariate time series models rather than the univariate models. Probabilistic properties such as long-memory of the volatility, which means that historical volatility has a persistent impact on future volatility, have been imposed efficiently to time series models through various statistical techniques.
In order to capture the long-memory persistence of the volatility, Corsi [1] suggested a simple but remarkably efficient time series model with additive cascades defined on different time periods by incorporating the daily, weekly and monthly volatility components. Corsi's model [1] called a heterogeneous autoregressive-realized volatility model of order 3, (HAR-RV(3) or simply HAR(3)), is a type of linear autoregression with heterogeneous regressors, which are previous lag, lagged weekly moving average, and lagged monthly moving average. Since its introduction, various adaptive versions of the HAR model are used to analyze the volatility along with empirical data analysis. Here, we refer to [2][3][4][5] for univariate data and [6][7][8][9][10][11] for multivariate data.
In particular, Bubak et al. [6] used a multivariate extension of the HAR model to uncover volatility transmission between Central European currencies and the EUR/USA foreign exchange rate, whereas Soucek and Todorova [8] employ a bivariate HAR model to explore the relationship between equity and oil market volatility. Cubadda et al. [9] proposed a vector HAR index model for detecting the presence of co-movements and analyzing the joint behavior in a set of daily realized volatility measures. Cech and Barunik [10] proposed a generalized HAR model for dynamic covariance matrix modelling and forecasting.
All of the references above empirically only dealt with the multivariate HAR models, but they have not presented theoretical results of the HAR model. Indeed, there is only a few of theoretical analysis of the HAR model, even in the univariate case. Furthermore, the least squares estimate (LSE) method has been popularly used as a tool of estimation in time series data analysis. The LSEs are used often in the literature to analyze the multivariate data in the multivariate HAR models and good performances on the volatility forecasting are obtained with forecast accuracy, for instance, see [9,12]. However, in the multivariate HAR models, rigorous formulation of asymptotic multivariate normality theory of the LSE has not yet been established in spite of the frequent uses of the LSEs. The earlier studies motivate us to consider multivariate HAR models of order p with q multiple assets, and look into theoretical analysis of the LSEs. In this work we provide, as a main result, strictly stationarity of the HAR(p, q) model and asymptotic normality theories of LSEs for the parameters in the multivariate HAR models. The multivariate normalities of the LSEs are established in two cases of i.i.d. and correlated errors of the models. In addition, we consider an exponentially weighted multivariate HAR model and discuss the estimation of a common decay rate on coefficients in the exponentially weighted HAR model.
A Monte Carlo experiment is conducted to illustrate performance of the estimations. Sample mean and standard error of the LSEs as well as empirical coverage and average length of confidence intervals are calculated. In addition, we model volatility of Gold spot price and S&P500 index using a bivariate HAR model and estimate the coefficient parameters. The parameter estimates and confidence intervals are computed according to the asymptotic normality we derived. Finally, the bivariate HAR fitting is demonstrated to the market data with optimally selected lags under some criterion rules.
The remainder of the paper is organized as follows: In Section 2, the multivariate HAR model is presented with discussion on strictly stationary solution and in Section 3 the LSEs and their multivariate normalities are derived. Section 4 deals with the exponentially weighted multivariate HAR model. A Monte-Carlo experiment is performed and an application with market data is given in Sections 5 and 6, respectively. We conclude in Section 7, where the main conclusion and discussion on future works are stated.

Multivariate HAR(p, q) Models
We consider the following HAR-RV model {Y 1,t : t ∈ Z} with multivariate data {Y j,t : t ∈ Z, j = 2, . . . , q} which has been discussed in [8] as an extension of the HAR-RV(3) model in [1].
Soucek and Todorova [8] examined the volatility linkages of oil with three considered equity markets in three separate bivariate models in the bivariate case of assets j = 1, 2, with q = 2 in model (1). Motivated from the above, we consider a general multivariate HAR-RV(p, q) model {Y j,t : t ∈ Z, j = 1, . . . , q}: where { j,t , t ∈ Z, j = 1, . . . , q} are random variables for errors with mean zero and finite variance, {h i , i = 1, 2, . . . , p } are assumed to satisfy 1 = h 1 < h 2 < · · · < h p < ∞, . . , p} } are parameters to be estimated. Volatility components are given in the same as above. For i = 1, . . . , p and j = 1, . . . , q, Y Model (2) is more general than model (1) and thus we consider model (2) in this work. As main results we investigate the strict stationarity of the model and develop the estimation of parameters.
j,t , for j = 1, . . . , q. Model (2) can be written as follows: where B 0 = (β 10 , β 20 , . . . , β q0 ) , E t = ( 1,t , 2,t , . . . , q,t ) and Note that H = (φ j,k ), j = 1, . . . , q; k = 1, 2, . . . , h p , is a matrix in R q×qh p and Y * t−1 is a column vector in R qh p . Furthermore note that HY * t−1 is written as where, for k = 1, 2, . . . , h p , Therefore we have the following vector-autoregression of order h p : which is written as with lag operator L and the identity matrix I q ∈ R q×q . We make the following assumption: (A1) The roots of det(I q − Φ 1 z − Φ 2 z 2 − · · · − Φ h p z h p ) = 0 lie outside the complex unit circle (have modulus greater than one), or equivalently, the eigenvalues of the companion matrix (of dimension qh p × qh p ) have modulus less than one. Denote [A] qq by the q × q submatrix of a square matrix A, consisting of the first q rows and the first q columns of A, that is [A] qq = I q AI q where I q = (I q , O) ∈ R q×qh p . I is the identity matrix with compatible dimension.
Proof of Theorem 1. From (3) we write Let λ i (F ) be the ith eigenvalue of the matrix F and ρ := max i |λ i (F )|. Under (A1), we have exists. Thus {Y * t } is strictly stationary and satisfies (5). It can be given straightforwardly. Let [A] q denote q × 1 column vector consisting of the first q components of a column vector A.
that is the desired result in (4). Now we show the uniqueness of the strictly stationarity solution (4). Assume that {Y t } is another strictly stationary solution of the vector autoregression (3) with E Y t < ∞. Then we haveY * t = B * 0 + FY * t−1 + E * t in the same way as above in (5). For any positive integer m where C is a constant independent of t and m.
Since m is arbitrary, we see and thus Y t =Y t almost surely. This completes the uniqueness.

The Least Squares Estimation
We consider two cases on the error processes to discuss the least squares estimates : In the first case, we assume that { j,t } are i.i.d. random variables with mean zero and variance σ 2 for all j = 1, 2, . . . , q, that is, E[E t E t ] = σ 2 I q . As a second but more general case, we adopt correlated error First, the case of i.i.d. error processes are stated on assumption as follows and the ordinary least squares estimator (OLSE) of the parameters in model (2) is investigated.
The following theorem states the asymptotic normality of the OLSE. We need the notations for the theorem: Under the strictly stationary condition in (A1), for j, l ∈ {1, . . . , q}, let Furthermore let J p be p × p matrix with all components as ones and vec(·) the usual operator, stacking the columns of a given matrix.

Theorem 2.
Assume that (A1) and (A2) holds, and the matrix X X is of full rank. (a) For each j = 1, 2, . . . , q, where Σ is given by where ⊗ is the Kronecker product.
Proof of Theorem 2. For (a),β j is written aŝ For the desired result, it suffices to show that as n → ∞, First, we prove the convergence of Σ x .
It can be shown straightforwardly that the right-hand side of (9) converges to Σ as n → ∞ by the WLLN of the stationary sequences.
Second, we verify the convergence of for any η ∈ R 1+pq . Fix j and let Ση as n → ∞. Hence, by the central limit theorem for martingale difference sequences (see [13]), the desired convergence of M n to N(0, σ 2 η Ση) in (10) in distribution holds. We complete the proof of (a).
. . , q} are independent. Thus the covariance matrix is zero, and therefore the desired asymptotic multivariate normality of vec(B) is obtained with the covariance matrix I q ⊗ Σ −1 .
Now we adopt correlated error processes by assuming the following: (A3) { j,t , j = 1, 2, . . . , q, t ∈ Z} are correlated random variables with mean zero and Cov( j,t , l,t ) = σ jl = 0, that is, the covariance matrix Ψ is assumed to be Ψ : The covariance matrix Ψ is nonsingular and positive definite so there exists a nonsingular symmetric matrix Ψ 1 2 ∈ R q×q and Ψ 1 2 Ψ 1 2 = Ψ. Under (A1) and (A3), note that lag-autocovariance matrix functions Γ( ) of Y t are given by In this case the generalized least squares estimator (GLSE) is computed by minimizing the sum of squared standardized errors. The GLSEB GLS = (β 1,GLS , . . . ,β q,GLS ) of B is given as follows: From (7) we have The GLSE is obtained bŷ The matrix form of (11) with t = 1, 2, . . . , n is given as The least squares estimator of B Ψ is of the form B Ψ = X X −1 X Y Ψ , which is the estimator of (Ψ − 1 2 B) = B Ψ − 1 2 . Furthermore note that Y Ψ = YΨ − 1 2 . Hence the GLSE of B is given aŝ that is of the same form as the OLSE. The following theorem presents the asymptotic normality of the GLSE.
Theorem 3. Assume that (A1) and (A3) hold, and the matrix X X is of full rank.
Theorem 3 includes the following case of the uncorrelated but heterogeneous variance of the error processes with Cov( j,t , l,t ) = σ 2 j if j = l, and zero if j = l.

Corollary 1. Assume (A1) and if E[E t ] = O and E[E t
Proof of Theorem 3. The proof is similar to that of the Theorem 2, except for the limit of the covariance matrix Cov √ n(β j,GLS − β j ), √ n(β l,GLS − β l ) for j = l. Following what we've done in the proof of Theorem 2, we get nE (β j, Thus the desired asymptotic multivariate normality of vec(B GLS ) is completed with the covariance matrix Ψ ⊗ Σ −1 .
We have discussed parameter estimation by means of the least squared method and have established the multivariate normalities in multivariate HAR(p, q) models. For a univariate HAR model, Hwang and Shin [3] proposed infinite-order, long-memory HAR model to capture the long-memory property and studied the asymptotic theories of the LSE. In the work of [3,14], it was assumed that HAR coefficients decrease exponentially, and it was shown that, under the exponential decay condition, the autocorrelation function of the HAR model is algebraically decreasing and thus the model is of long-memory. For this reason, additionally we consider the exponentially weighted multivariate HAR model with exponential decay rate, and develop its asymptotic normality of the rate estimator. As a simple case we assume that multiple assets have a common rate on coefficients, and the rate estimation problem is investigated in the following section. A general case of the exponentially weighted multivariate HAR model will be dealt in the future study.

Exponentially Weighted Multivariate HAR Model
In this section, as a special case of model (2), we consider an exponentially weighted multivariate HAR model with common rate on the parameters of common regressors as follows: In model (2), for j, k ∈ {1, 2, . . . , q}, and for i ∈ {1, 2, . . . , p}, β (i) jk = c jk λ i−1 for some c jk and |λ| < 1. The parameters c jk and λ are estimated using the LSEβ (i) jk in Section 3 as follows: , which is given in the asymptotic variances in Theorems 2 and 3, for l 1 , l 2 ∈ {1, 2, . . . , 1 + pq}, that is, jk which is the corresponding component of the matrix σ 2 Σ −1 in Theorem 2. Indeed, q j=1 c jk and Z * jk following normal distribution with mean zero and variance Proof of Theorem 4. By Theorem 2 for each i, j, k, we have √ n(β where Z (i) jk s are normal random variables with mean zero and variance (i) k for each j and we havê Note that Z jk − λZ (1) jk has asymptotically normal distribution with mean zero and variance (k−1)p+2,(k−1)p+3 . Therefore, the desired asymptotic normality of √ n(λ n − λ) is obtained.

Remark 1.
As pointed out by [3], the exponential decay condition β (i) jk = c jk λ i−1 is a condition for the long-memory property of HAR models. Ref. [3] discussed the HAR model of infinity order and its approximation of finite orders, where it has been shown that the exponential decay condition is equivalent to algebraically decay autocorrelation functions along with a mild lag condition. We are interested in testing whether or not the model is the exponentially weighted multivariate HAR model with common rate λ. For example, we construct the following hypothesis and test statistic: the null hypothesis H 0 : β (i) jk = c jk λ i−1 for some c jk and common rate |λ| < 1, for each j, k, versus. the alternative hypothesis H A : the model does not have the common rate nor the exponentially weighted multivariate HAR model. For i = 1, 2, . . . , p − 1, let (i) jk and note that asymptotic normality ofλ (i),n holds like Theorem 4 by that of the OLSEs. A collection of the differences λ (i 1 ),n −λ (i 2 ),n : for all pairs (i 1 , i 2 ), i 1 = i 2 is considered. In the collection, all distinct elements in absolute values are at most (p − 1)(p − 2)/2 ( =: p * ). These are relabelled as {Λ : = 1, 2, . . . , p * }. LetΛ n = ∑ p * =1 Λ /p * or some related statistics of {Λ }. Similarly to Theorem 4, we can find the limiting distribution ofΛ n under the null hypothesis H 0 . The null H 0 might be rejected if |Λ n | is large or if max Λ is large. In a future work, a specific test statistic related to Λ will be constructed and the limiting distribution of the test will be investigated.
We compute the LSEs for the parameters with replication numbers 500 and give the sample means and standard errors of the 500 estimates in Tables 1 and 2  Confidence intervals using the normal approximations in Theorems 2 and 3 with confidence level 95% are constructed: 0.95 = P(β − z 0.975 se ≤ β ≤β + z 0.975 se), for each parameter component β, with its LSEβ and standard-error estimate se =σ/ √ n, whereσ 2 is the corresponding (diagonal) component of estimate of asymptotic covariance matrix in Theorems 2 and 3. The confidence intervals of 500 samples as well as the empirical coverage probabilities and average lengths in the i.i.d. error case by the normal approximations are demonstrated in Figure 3, where sample size n = 1000 and replication number 500 are used. To illustrate the multivariate asymptotic normality, plots of the normal approximations for estimates of some parameters are depicted in Figure 4, and the bivariate normalities of some pairs of two chosen estimates also can be seen in Figure 5. The three figures support normality results established in the theory.    As for the exponentially weighted multivariate HAR(3,2) model with β (i) jk = c jk λ i−1 in Section 4, the estimates for the common rate are given in Table 3, from which we see that sample means of estimated values are close to the true ones with reasonable standard errors. Table 3. Sample mean and standard error of 500 estimates in exponentially weighted HAR model.

Application
This section addresses empirical data analysis on the Gold spot price and S&P500 index. Their volatility is modeled by bivariate HAR model, using three years of daily closing price, from 18 September 2017 to 17 September 2020, of Gold and S&P500. The three years of Gold price and S&P500 index movement and their log return are shown in Figure 6 while their volatility and autocorrelation coefficients functions (ACFs) in Figure 7. We list some critera of MSE, R 2 , AIC, BIC in Table 4 for the OLSEs (ordinary least squares estimators) of the bivariate HAR model of order p = 3, 4 by examining the volatility of Gold and S&P500. Conventionally lags of order p = 3, 4 are used for a day (h 1 = 1), a week (h 2 = 5), a month (h 3 = 22) and a quarter (h 4 = 66) in a HAR model. Nevertheless, in this work, we consider optimal lags in the sense of minimizing MSE of OLSEs for Gold and S&P500 volatility simultaneously. For order p = 3, 4, we choose lags h 2 , . . . , h p to satisfy the condition, 1 = h 1 < h 2 < · · · < h p <h for some largeh, so that the sum of the two mean squared residuals of the OLSEs for the volatility of Gold and S&P500 is minimized. As a result, we found optimal lags (h 1 , h 2 , h 3 ) = (1, 5, 6) for p = 3 with (MSE Gold , MSE S&P ) = (0.0461, 0.0387) and (h 1 , h 2 , h 3 , h 4 ) = (1, 3, 6, 7) for p = 4 with (MSE Gold , MSE S&P ) = (0.0488, 0.0410). In Table 4, we compare our optimal lags with the conventional lags in different criteria. In all cases considered, our selection of optimal lags (h 1 , h 2 , h 3 ) = (1, 5, 6) with p = 3 turns out to be the best. Therefore we report estimation results of the HAR(3,2) model with (h 1 , h 2 , h 3 ) = (1, 5, 6) for the volatilities of Gold and S&P500 in Table 5, where coefficients estimates, standard errors and 95% confidence intervals are provided. In Figure 8, two plots of the bivariate HAR(3,2) fitted model by the OLSEs for this optimal case on the datasets of Gold and S&P volatilities are depicted along with residuals. We see that the bivariate HAR (3,2) fittings are similar to the real volatility plots of both datasets as reported upon the criterions.

Conclusions
Due to the cross-correlation of multiple assets and spillover effect of volatility in the financial market, a multivariate heterogeneous autoregressive-realized volatility (HAR-RV) model has attained much attention recently. In the multivariate HAR model, its stationarity is discussed and estimation problems are studied. We first investigate the strictly stationarity solution of the multivariate HAR model and second develop the asymptotic normality theory for the least squares estimates (LSEs) with i.i.d. and correlated errors, respectively. Third, we propose an exponentially weighted multivariate HAR model and estimate its common exponential decay rate. In a Monte-Carlo experiment, performances of the LSEs are numerically illustrated with sample mean and standard error of the estimates as well as empirical coverage and average length of confidence intervals by using the normal approximation. In addition, as a real data example, volatilities of Gold spot price and S&P500 index during recent three years are used to analyze in a bivariate HAR model. The coefficient estimates and confidence intervals are found in the bivariate HAR model of volatility of Gold and S&P500, along with choosing optimal lags, and it is shown that the bivariate HAR model with the proposed optimal lags is well matched with the volatility of the financial data.
We suggest some problems on the multivariate HAR model. As we proposed before, the exponentially weighted HAR models with decay rates are of interest owing to reduced numbers of parameters as well as the long-memory property. In modeling the multivariate HAR model, testing whether the HAR coefficients have exponentially weighted decay rates or not and furthermore whether multiple assets have a common decay rate or not might provide statistically useful tools to analyze the time series model. In follow-up studies we will deal with the hypotheses tests by constructing the test statistics and establishing the null limiting distribution. Finally, we mention that in a multivariate HAR model with heteroscedasticity errors, asymptotic properties of the estimates differ from the existing results, and thus we will derive the asymptotic theory on the HAR models in the presence of dynamic heteroscedasticity. In this case, financial market data can be represented more remarkably and hence forecasting volatility with forecast accuracy will be carried out in the further research.

Conflicts of Interest:
The authors declare no conflict of interest.