1. Introduction
Volatility in financial asset returns is one of the most important components in the financial market for optimization decisions such as portfolio selection, risk management and asset pricing. Over recent decades, extensive research works by statisticians and econometricians have analyzed volatility alongside time series modelings. In particular, because multiple assets are correlated with each other in the financial markets, the cross-correlation of two or more asset returns and the spillover effect of the volatility have been represented in multivariate time series models rather than the univariate models. Probabilistic properties such as long-memory of the volatility, which means that historical volatility has a persistent impact on future volatility, have been imposed efficiently to time series models through various statistical techniques.
In order to capture the long-memory persistence of the volatility, Corsi [
1] suggested a simple but remarkably efficient time series model with additive cascades defined on different time periods by incorporating the daily, weekly and monthly volatility components. Corsi’s model [
1] called a heterogeneous autoregressive-realized volatility model of order 3, (HAR-RV(3) or simply HAR(3)), is a type of linear autoregression with heterogeneous regressors, which are previous lag, lagged weekly moving average, and lagged monthly moving average. Since its introduction, various adaptive versions of the HAR model are used to analyze the volatility along with empirical data analysis. Here, we refer to [
2,
3,
4,
5] for univariate data and [
6,
7,
8,
9,
10,
11] for multivariate data.
In particular, Bubak et al. [
6] used a multivariate extension of the HAR model to uncover volatility transmission between Central European currencies and the EUR/USA foreign exchange rate, whereas Soucek and Todorova [
8] employ a bivariate HAR model to explore the relationship between equity and oil market volatility. Cubadda et al. [
9] proposed a vector HAR index model for detecting the presence of co-movements and analyzing the joint behavior in a set of daily realized volatility measures. Cech and Barunik [
10] proposed a generalized HAR model for dynamic covariance matrix modelling and forecasting.
All of the references above empirically only dealt with the multivariate HAR models, but they have not presented theoretical results of the HAR model. Indeed, there is only a few of theoretical analysis of the HAR model, even in the univariate case. Furthermore, the least squares estimate (LSE) method has been popularly used as a tool of estimation in time series data analysis. The LSEs are used often in the literature to analyze the multivariate data in the multivariate HAR models and good performances on the volatility forecasting are obtained with forecast accuracy, for instance, see [
9,
12]. However, in the multivariate HAR models, rigorous formulation of asymptotic multivariate normality theory of the LSE has not yet been established in spite of the frequent uses of the LSEs. The earlier studies motivate us to consider multivariate HAR models of order
p with
q multiple assets, and look into theoretical analysis of the LSEs. In this work we provide, as a main result, strictly stationarity of the HAR(
) model and asymptotic normality theories of LSEs for the parameters in the multivariate HAR models. The multivariate normalities of the LSEs are established in two cases of i.i.d. and correlated errors of the models. In addition, we consider an exponentially weighted multivariate HAR model and discuss the estimation of a common decay rate on coefficients in the exponentially weighted HAR model.
A Monte Carlo experiment is conducted to illustrate performance of the estimations. Sample mean and standard error of the LSEs as well as empirical coverage and average length of confidence intervals are calculated. In addition, we model volatility of Gold spot price and S&P500 index using a bivariate HAR model and estimate the coefficient parameters. The parameter estimates and confidence intervals are computed according to the asymptotic normality we derived. Finally, the bivariate HAR fitting is demonstrated to the market data with optimally selected lags under some criterion rules.
The remainder of the paper is organized as follows: In
Section 2, the multivariate HAR model is presented with discussion on strictly stationary solution and in
Section 3 the LSEs and their multivariate normalities are derived.
Section 4 deals with the exponentially weighted multivariate HAR model. A Monte-Carlo experiment is performed and an application with market data is given in
Section 5 and
Section 6, respectively. We conclude in
Section 7, where the main conclusion and discussion on future works are stated.
2. Multivariate HAR() Models
We consider the following HAR-RV model
with multivariate data
which has been discussed in [
8] as an extension of the HAR-RV(3) model in [
1].
where
are random variables for error terms and
are coefficient parameters. For each
jth-asset,
, three volatility components
,
corresponding to time horizons of one day (
d), one week (
w), one month (
m) are given by the realized volatility
on day
if
, and
weakly and monthly averages of realized volatility if
; and
. Soucek and Todorova [
8] examined the volatility linkages of oil with three considered equity markets in three separate bivariate models in the bivariate case of assets
, with
in model (
1).
Motivated from the above, we consider a general multivariate HAR-RV(
) model
:
where
are random variables for errors with mean zero and finite variance,
are assumed to satisfy
, and
are parameters to be estimated. Volatility components are given in the same as above. For
and
,
.
Model (
2) is more general than model (
1) and thus we consider model (
2) in this work. As main results we investigate the strict stationarity of the model and develop the estimation of parameters.
Let
with
, for
. Model (
2) can be written as follows:
where, for
, (with
),
,
We write, for each
,
where
and thus
where
,
and
Note that
, is a matrix in
and
is a column vector in
. Furthermore note that
is written as
where, for
,
Therefore we have the following vector-autoregression of order
:
which is written as
with lag operator
L and the identity matrix
.
We make the following assumption:
(A1) The roots of
lie outside the complex unit circle (have modulus greater than one), or equivalently, the eigenvalues of the companion matrix (of dimension
)
have modulus less than one.
Denote by the submatrix of a square matrix A, consisting of the first q rows and the first q columns of A, that is where . I is the identity matrix with compatible dimension.
Theorem 1. Assume (A1) holds. Then there exists a unique strictly stationary solution with a finite first-order moment to the model and the solution has the form of Proof of Theorem 1. From (
3) we write
which is denoted by
Let
be the
ith eigenvalue of the matrix
and
. Under (A1), we have
. Furthermore it follows that
exists and
exists. Thus
is strictly stationary and satisfies (
5). It can be given straightforwardly.
Let
denote
column vector consisting of the first
q components of a column vector
A. Since
,
and
,
is also strictly stationary and
that is the desired result in (
4). Now we show the uniqueness of the strictly stationarity solution (
4). Assume that
is another strictly stationary solution of the vector autoregression (
3) with
. Then we have
in the same way as above in (
5). For any positive integer
m we have
, and we observe
where
C is a constant independent of
t and
m. Since
m is arbitrary, we see
and thus
almost surely. This completes the uniqueness. □
3. The Least Squares Estimation
We consider two cases on the error processes to discuss the least squares estimates: In the first case, we assume that are i.i.d. random variables with mean zero and variance for all , that is, . As a second but more general case, we adopt correlated error processes with where .
First, the case of i.i.d. error processes are stated on assumption as follows and the ordinary least squares estimator (OLSE) of the parameters in model (
2) is investigated.
(A2)
are i.i.d. random variables with mean zero and variance
for all
, that is,
. Under (A1) and (A2), let
with
and
, lag-
ℓ autocovariance matrix function. Note that, by using (
4),
Suppose that data are observed where with for .
Then Equation (
2) is, for each
j, given by
and its matrix form is
where
,
matrix, and
. The matrix form with
is given by
where
,
and
. The ordinary least squares estimator (OLSE) of
is obtained by
Its transpose is given by
Equivalently, from (
6) with
, whose matrix form with
, is given by
where
, we obtain OLSE of
for each
j as follows.
The following theorem states the asymptotic normality of the OLSE. We need the notations for the theorem:
Under the strictly stationary condition in (A1), for
, let
and
Furthermore let be matrix with all components as ones and the usual operator, stacking the columns of a given matrix.
Theorem 2. Assume that (A1) and (A2) holds, and the matrix is of full rank. (a) For each , as we havewhere Σ
is given bywith and (b) As we havewhere ⊗
is the Kronecker product. Proof of Theorem 2. For (a),
is written as
where
For the desired result, it suffices to show that as
,
First, we prove the convergence of
.
where
It can be shown straightforwardly that the right-hand side of (
9) converges to
as
by the WLLN of the stationary sequences.
Second, we verify the convergence of
in distribution. For this purpose, it is enough to show that
for any
. Fix
j and let
and
. Let
. Note that
and thus
is a martingale sequence with respect to
.
Similar to the proof of the convergence of
, we can show that
, which implies that for any
for some constant
. Furthermore
as
. Hence, by the central limit theorem for martingale difference sequences (see [
13]), the desired convergence of
to
in (
10) in distribution holds. We complete the proof of (a).
For (b), we observe covariance matrix
for
, which is equal to
By (
8),
which is zero because
are independent. Thus the covariance matrix is zero, and therefore the desired asymptotic multivariate normality of
is obtained with the covariance matrix
. □
Now we adopt correlated error processes by assuming the following:
(A3) are correlated random variables with mean zero and , that is, the covariance matrix is assumed to be .
The covariance matrix is nonsingular and positive definite so there exists a nonsingular symmetric matrix and .
Under (A1) and (A3), note that lag-
ℓ autocovariance matrix functions
of
are given by
and
In this case the generalized least squares estimator (GLSE) is computed by minimizing the sum of squared standardized errors. The GLSE
of
is given as follows: From (
7) we have
Let
,
and
Note that
and we have
The matrix form of (
11) with
is given as
where
The least squares estimator of
is of the form
, which is the estimator of
. Furthermore note that
. Hence the GLSE of
is given as
that is of the same form as the OLSE. The following theorem presents the asymptotic normality of the GLSE.
Theorem 3. Assume that (A1) and (A3) hold, and the matrix is of full rank.
(a) For each , as we havewhere Σ
is given as in Theorem 2 and . Theorem 3 includes the following case of the uncorrelated but heterogeneous variance of the error processes with if , and zero if .
Corollary 1. Assume (A1) and if and , then as , and where is given by Proof of Theorem 3. The proof is similar to that of the Theorem 2, except for the limit of the covariance matrix
for
. Following what we’ve done in the proof of Theorem 2, we get
which is equal to
. Thus the desired asymptotic multivariate normality of
is completed with the covariance matrix
. □
We have discussed parameter estimation by means of the least squared method and have established the multivariate normalities in multivariate HAR(
) models. For a univariate HAR model, Hwang and Shin [
3] proposed infinite-order, long-memory HAR model to capture the long-memory property and studied the asymptotic theories of the LSE. In the work of [
3,
14], it was assumed that HAR coefficients decrease exponentially, and it was shown that, under the exponential decay condition, the autocorrelation function of the HAR model is algebraically decreasing and thus the model is of long-memory. For this reason, additionally we consider the exponentially weighted multivariate HAR model with exponential decay rate, and develop its asymptotic normality of the rate estimator. As a simple case we assume that multiple assets have a common rate on coefficients, and the rate estimation problem is investigated in the following section. A general case of the exponentially weighted multivariate HAR model will be dealt in the future study.
4. Exponentially Weighted Multivariate HAR Model
In this section, as a special case of model (
2), we consider an exponentially weighted multivariate HAR model with common rate on the parameters of common regressors as follows: In model (
2), for
, and for
,
for some
and
. The parameters
and
are estimated using the LSE
in
Section 3 as follows:
We write , which is given in the asymptotic variances in Theorems 2 and 3, for , that is, is -component of . By Theorem 2 for each , we have for some which is the corresponding component of the matrix in Theorem 2. Indeed, is the same variance for all j, say, , and we can easily represent with . The following theorem states the asymptotic normality of the estimates for the common rate.
Theorem 4. Assume (A1) and (A2) hold. In model (2) with as , we havewhere with and following normal distribution with mean zero and variance Proof of Theorem 4. By Theorem 2 for each
, we have
for some
which is the corresponding component of the matrix
in Theorem 2, where
. We may write
where
s are normal random variables with mean zero and variance
for each
j and we have
Note that has asymptotically normal distribution with mean zero and variance . Therefore, the desired asymptotic normality of is obtained. □
Remark 1. As pointed out by [3], the exponential decay condition is a condition for the long-memory property of HAR models. Ref. [3] discussed the HAR model of infinity order and its approximation of finite orders, where it has been shown that the exponential decay condition is equivalent to algebraically decay autocorrelation functions along with a mild lag condition. We are interested in testing whether or not the model is the exponentially weighted multivariate HAR model with common rate λ. For example, we construct the following hypothesis and test statistic: the null hypothesis for some and common rate , for each , versus. the alternative hypothesis : the model does not have the common rate nor the exponentially weighted multivariate HAR model. For , letand note that asymptotic normality of holds like Theorem 4 by that of the OLSEs. A collection of the differences is considered. In the collection, all distinct elements in absolute values are at most . These are relabelled as . Let or some related statistics of . Similarly to Theorem 4, we can find the limiting distribution of under the null hypothesis . The null might be rejected if is large or if is large. In a future work, a specific test statistic related to will be constructed and the limiting distribution of the test will be investigated. 5. A Monte-Carlo Study
We present simulation results for model (
2) with
and
, which is a bivariate case of two assets, using parameters
. We consider both independent errors and correlated ones for
,
: (i) i.i.d. normal distribution
with
and (ii) correlated normal distribution with
with
with
. The bivariate HAR(3,2) processes are generated according to (
2) with sample size
.
We compute the LSEs for the parameters with replication numbers 500 and give the sample means and standard errors of the 500 estimates in
Table 1 and
Table 2 with
. It is reported in Tables that sample means are closer to the real values of parameters as well as standard errors decreases as
n increases. Additionally,
Figure 1 and
Figure 2 illustrate plots of sample means (a), (b) and standard errors (c), (d) of the LSEs of the parameters as
n increases on the horizontal axis:
Figure 1a,c and
Figure 2a,c for the estimates of
and
Figure 1b,d and
Figure 2b,d for the estimates of
.
Confidence intervals using the normal approximations in Theorems 2 and 3 with confidence level
are constructed:
, for each parameter component
, with its LSE
and standard-error estimate
, where
is the corresponding (diagonal) component of estimate of asymptotic covariance matrix in Theorems 2 and 3. The confidence intervals of 500 samples as well as the empirical coverage probabilities and average lengths in the i.i.d. error case by the normal approximations are demonstrated in
Figure 3, where sample size
and replication number 500 are used. To illustrate the multivariate asymptotic normality, plots of the normal approximations for estimates of some parameters are depicted in
Figure 4, and the bivariate normalities of some pairs of two chosen estimates also can be seen in
Figure 5. The three figures support normality results established in the theory.
As for the exponentially weighted multivariate HAR(3,2) model with
in
Section 4, the estimates for the common rate are given in
Table 3, from which we see that sample means of estimated values are close to the true ones with reasonable standard errors.
6. Application
This section addresses empirical data analysis on the Gold spot price and S&P500 index. Their volatility is modeled by bivariate HAR model, using three years of daily closing price, from 18 September 2017 to 17 September 2020, of Gold and S&P500. The three years of Gold price and S&P500 index movement and their log return are shown in
Figure 6 while their volatility and autocorrelation coefficients functions (ACFs) in
Figure 7.
We list some critera of MSE, R
, AIC, BIC in
Table 4 for the OLSEs (ordinary least squares estimators) of the bivariate HAR model of order
by examining the volatility of Gold and S&P500. Conventionally lags of order
are used for a day (
), a week
, a month (
) and a quarter (
) in a HAR model. Nevertheless, in this work, we consider optimal lags in the sense of minimizing MSE of OLSEs for Gold and S&P500 volatility simultaneously. For order
, we choose lags
to satisfy the condition,
for some large
, so that the sum of the two mean squared residuals of the OLSEs for the volatility of Gold and S&P500 is minimized. As a result, we found optimal lags
for
with (MSE
, MSE
and
for
with (MSE
, MSE
. In
Table 4, we compare our optimal lags with the conventional lags in different criteria. In all cases considered, our selection of optimal lags
with
turns out to be the best.
Therefore we report estimation results of the HAR(3,2) model with
for the volatilities of Gold and S&P500 in
Table 5, where coefficients estimates, standard errors and 95% confidence intervals are provided. In
Figure 8, two plots of the bivariate HAR(3,2) fitted model by the OLSEs for this optimal case on the datasets of Gold and S&P volatilities are depicted along with residuals. We see that the bivariate HAR(3,2) fittings are similar to the real volatility plots of both datasets as reported upon the criterions.
7. Conclusions
Due to the cross-correlation of multiple assets and spillover effect of volatility in the financial market, a multivariate heterogeneous autoregressive-realized volatility (HAR-RV) model has attained much attention recently. In the multivariate HAR model, its stationarity is discussed and estimation problems are studied. We first investigate the strictly stationarity solution of the multivariate HAR model and second develop the asymptotic normality theory for the least squares estimates (LSEs) with i.i.d. and correlated errors, respectively. Third, we propose an exponentially weighted multivariate HAR model and estimate its common exponential decay rate. In a Monte-Carlo experiment, performances of the LSEs are numerically illustrated with sample mean and standard error of the estimates as well as empirical coverage and average length of confidence intervals by using the normal approximation. In addition, as a real data example, volatilities of Gold spot price and S&P500 index during recent three years are used to analyze in a bivariate HAR model. The coefficient estimates and confidence intervals are found in the bivariate HAR model of volatility of Gold and S&P500, along with choosing optimal lags, and it is shown that the bivariate HAR model with the proposed optimal lags is well matched with the volatility of the financial data.
We suggest some problems on the multivariate HAR model. As we proposed before, the exponentially weighted HAR models with decay rates are of interest owing to reduced numbers of parameters as well as the long-memory property. In modeling the multivariate HAR model, testing whether the HAR coefficients have exponentially weighted decay rates or not and furthermore whether multiple assets have a common decay rate or not might provide statistically useful tools to analyze the time series model. In follow-up studies we will deal with the hypotheses tests by constructing the test statistics and establishing the null limiting distribution. Finally, we mention that in a multivariate HAR model with heteroscedasticity errors, asymptotic properties of the estimates differ from the existing results, and thus we will derive the asymptotic theory on the HAR models in the presence of dynamic heteroscedasticity. In this case, financial market data can be represented more remarkably and hence forecasting volatility with forecast accuracy will be carried out in the further research.