Exponentially Weighted Multivariate HAR Model with Applications in the Stock Market

This paper considers a multivariate time series model for stock prices in the stock market. A multivariate heterogeneous autoregressive (HAR) model is adopted with exponentially decaying coefficients. This model is not only suitable for multivariate data with strong cross-correlation and long memory, but also represents a common structure of the joint data in terms of decay rates. Tests are proposed to identify the existence of the decay rates in the multivariate HAR model. The null limiting distributions are established as the standard Brownian bridge and are proven by means of a modified martingale central limit theorem. Simulation studies are conducted to assess the performance of tests and estimates. Empirical analysis with joint datasets of U.S. stock prices illustrates that the proposed model outperforms the conventional HAR models via OLSE and LASSO with respect to residual errors.


Introduction
Financial market data are often correlated with each other and need to be analyzed together. If two or more financial data belong to the same category or reveal a similar pattern with strong correlation, the bivariate or multivariate datasets should be modeled simultaneously by an appropriate multivariate time series model to obtain good performance. Multivariate financial data with characteristics such as strong correlation and long memory have attracted much attention from econometricians and statisticians. Among various time series models, one of the most popular and powerful models capturing such financial features is the heterogenous autoregressive realized-volatility (HAR-RV) model [1]. Based on the HAR model, this paper considers a multivariate model to analyze joint time series data with strong cross-correlation.
The HAR-RV model has been originally proposed and widely used to explore the predictability of realized volatility [2][3][4][5]. In particular, Anderson et al. [2] used the HAR models for volatility prediction of stock prices, foreign exchange rates, and bond prices. Corsi et al. [3] discussed the volatility of the realized volatility based on HAR models with non-Gaussianity and volatility clustering. McAleer and Medeiros [4] proposed an extension of the HAR model with a multiple-regime smooth transition which contains long memory and nonlinearity, and incorporates sign and size asymmetries. Hillebrand and Medeiros [5] considered log-linear and neural network HAR models of realized volatility. Tang and Chi [6] found that the HAR model showed better predictive ability than the ARFIMA-RV model. Clements et al. [7], Bollerslev et al. [8], Bianco et al. [9], and Asai et al. [10] investigated successful uses of the HAR models for risk management with VaR measures, risk-return tradeoff, serial correlation, implied volatility, and realized volatility errors. Luo et al. [11] incorporated jumps, leverage effects, and speculation effects into the realized volatility modeling and showed that the portfolio using infinite hidden Markov regime-switching HAR model achieves higher portfolio returns than benchmark HAR model. Meanwhile, as an application of the HAR-RV model to various financial data such as oil, gold, and bitcoin realized volatility [12][13][14][15], we considered extensions of the model incorporating with an associated-uncertainty index to obtain high forecasting gains.
Along with the success of univariate HAR models as above, a multivariate HAR model has been adopted for financial analysis due to its usefulness with multivariate data. Many researchers discussed the superiority of the multivariate HAR model. Busch et al. [16] used a vector HAR model to control possible endogeneity issues. Taylor [17] demonstrated that the multivariate HAR-RV model improved forecast accuracy of the realized volatility in the international stock market. The claim in [17] was also verified by Hwang and Hong [18], who dealt with the multivariate HAR-RV model with heteroskedastic errors. Cech and Barunik [19] showed the generalized HAR model offers better predictability than univariate models in commodity markets. Tang et al. [20,21] showed that the multivariate HAR-RV model is more accurate in out-of-sample forecasting and outperforms the univariate models.
In addition, the multivariate HAR model represents strong correlations between the multiple assets data and examines cross-market spillover effects. For instance, Bubak et al. [22] used a multivariate extension of the HAR model to analyze volatility transmission between currencies and foreign exchange rates, whereas Bauer and Vorkink [23] adopted a multivariate setting of the HAR model and showed how to ensure positive covariance matrix without parameter restrictions. Soucek and Todorova [24] found instantaneous correlation between equity and energy futures by proposing a vector HAR model. Cubadda et al. [25] studied a vector HAR index model for detecting the presence of commonalities in a set of realized volatility measures, whereas Bollerslev et al. [26] proposed a model for a scalar version of vectorized HAR model for the variances and correlations separately. Luo and Ji [27] combined the HAR model with other models to identify time-varying volatility connectedness. Luo and Chen [28] employed matrix log transformation method to ensure the positive definiteness of covariance matrices and developed a Bayesian random compressed multivariate HAR model to forecast the realized covariance matrices of stock returns. Wilms et al. [29] showed that cross-market spillover effects embedded in the multivariate HAR models have long-term forecasting power.
Even though the HAR model is widely used for volatility forecasting based on the realized volatility of intraday prices, it is not restricted to the realized volatility but can be applied to various time series data such as the stock price itself or other economics index, because the HAR model is theoretically a linear AR model. Stock market forecasting techniques were surveyed in [30][31][32], including stock returns, stock prices, and volatility via conventional time series methods and soft computing methods. Stock price modelings are mostly based on efficient market hypothesis (EMH), random walk theory and machine learning techniques as in [33][34][35][36]. According to the EMH, the only relevant information on the stock is its current values. A promising application of the HAR model could be the stock price movement. A reason why the HAR model is expected to perform well on the stock price modeling is that the current value itself and the current averages make its future value in the model.
In this work, we propose a multivariate time series model for strongly correlated data and study its statistical inference of hypothesis test and estimation, with empirical analysis on joint data of financial assets. More specifically, we focus on the multivariate HAR model with exponentially decaying coefficients for the application to stock prices in the stock market. Because two or more financial data exhibit a similar pattern with strong correlation, a multivariate model should be adopted for the multiple assets, instead of univariate models for each asset. However, when the multivariate HAR model is employed to analyze the multiple data, there are many parameters to be estimated. For example, even with two assets, a bivariate HAR(3) model has 14 parameters including two intercept terms. For better performance, we need to make some efforts to reduce errors along with fewer parameters in the model. As a trial for this, we consider the exponentially weighted multivariate HAR model that has exponentially decaying coefficients. If decay rates can be imposed in the multivariate HAR model, the number of parameters substantially decreases and the proposed model might outperform the existing models, reducing the errors as well. This is one of the motivations for this work in the spirit of a principle of parsimony. Moreover, the decay rates not only serve as the long-memory effect as seen in [37], but also represent the commonality of the joint data, as we expect a common structure in multiple assets with strong correlation.
In order to employ the proposed model for joint time series data, the data need to be tested before fitting the model. To this end, we deal with a test problem based on the CUSUM test to identify the presence of decay coefficients in the multivariate HAR model of the fitted data. In general, the CUSUM test is a change-point test and would be reasonable if parameter changes are expected within the time series. For example, Refs. [38,39] dealt with CUSUM(SQ) tests for mean and variance change-detection in univariate HAR(∞) models and [40] proposed a CUSUM test for parameter change in dynamic panel models. However, the idea of the CUSUM tests can be applied to detect other dynamic structures. In this work, we suggest the use of such an idea to detect coefficient structure by generating pseudo-time series of residuals in two versions. In other words, by applying the idea of the tests to the difference series of two types of residuals, but not to the original data, the coefficient structure can be identified. That is, the CUSUM(SQ) tests of mean and variance change-detection in [38,39] are used for the pseudo-time series generated by two residuals.
The key point is that under the null hypothesis, the mean or variance of the difference series are not changed over time, whereas under the alternative hypothesis there exist change-points in mean or variance of the difference series of two residuals. This idea is a novel attempt in that the CUSUM tests are used for other test problems in time series analysis, not limited to the conventional change-point detection of the raw data.
This work proposes two CUSUM-based tests to detect whether the underlying model has exponentially decaying coefficients. The first test is conducted to test whether the model has an exponential decay rate for each asset, and the second tests whether the exponentially weighted multivariate HAR model has a common decay rate for all the multiple assets. The null limiting distributions are developed as the standard Brownian bridge, and the theoretical results are proven by means of a modified version of a martingale central limit theorem. Additionally, easy-to-implement estimators of the decay rates are discussed.
A Monte Carlo experiment is carried out to see the sample paths of our model and to validate the proposed statistical methods. The sample paths depict the long-memory feature as well as strong cross-correlation of the simulated data. Furthermore, various related series such as difference series and test statistics are depicted to justify our proposed tests under the null and alternative hypotheses. The simulation study not only strongly supports the proposed CUSUM tests with reasonable performances of size and power, but also shows consistency in estimates of decay rates. To compare with the conventional HAR model, root mean squared error (RMSE), mean absolute error (MAE), AIC, and BIC are evaluated in the models with several values of fitting parameters as well as efficiency of the exponentially weighted HAR model, relative to the benchmark HAR model, is computed by using two metrics of RMSE and MAE. It is reported that our proposed model with fewer parameters can reduce the residual errors, compared to the existing HAR models.
As an empirical application of this work, financial market stock prices with similar patterns are selected to suit the multivariate HAR model. It is interesting that the exponentially weighted multivariate HAR model is shown to be suitable for the joint data of U.S. stock prices, rather than the volatility. Our proposed CUSUM tests favor the existence of the decay rates in the multivariate HAR model of the stock prices, based on the computed test statistics. The decay-rate estimators for the stock prices are evaluated as well. The stock prices are well-matched to the exponentially weighted multivariate HAR model. To compare performance of the proposed model, RMSE, MAE, AIC, and BIC are evaluated along with those of the conventional univariate and multivariate HAR models via OLSE and LASSO. The exponentially weighted multivariate HAR model outperforms others in the chosen datasets of U.S. stock prices.
We summarize main benefits of the exponentially weighted multivariate HAR model according to the following points: fewer number of parameters, reduction of the modelfitting errors, representation of the common structure with decay rates, and an appropriate model for joint datasets of stock prices with similar patterns. Our proposed model is suitable for strongly cross-correlated multivariate (bivariate) data with similar patterns because the decay rates yield a common structure in the joint data. Along with the high applicability of the HAR model, the proposed model can be used to analyze and forecast the joint data with strong correlation and long memory as well as its extension can be considered with an exogenous variable such as associated-uncertainty index, as in [12][13][14][15]. The proposed model would help analysts provide simpler and more efficient models by producing smaller errors in predictions in financial time series. Furthermore, it has the potential to extend to dynamic time series models with error terms of heteroscedasticity, time-varying variance, non-Gaussianity, or heavy-tailed distribution, which are more practical in real-world financial markets.
The remainder of the paper is organized as follows. In Section 2 we describe the model and develop main results of the tests, and in Section 3 a simulation study is performed. In Section 4, empirical examples are given. Concluding remarks are stated in Section 5, and proofs are drawn in the Appendix A .
In this work, we are particularly concerned with the multivariate HAR model with exponentially decaying coefficients in order to account for the lesser weights on the farther past values. In the conventional HAR model, regressors are previous value, weekly average and monthly average of consecutive data, which are assigned with coefficients in a decreasing order to represent the long-memory features. For example, see [37], which introduced the (univariate) HAR(∞) model with coefficients decaying exponentially to capture the genuine long-memory. They showed that exponentially decaying coefficients make algebraic decreasing autocovariance functions under appropriate lag conditions in the HAR(∞) model. Likewise, we consider the exponentially weighted coefficient version of model (1) with multiple assets, called exponentially weighted multivariate HAR(p, q) model. In our proposed model, coefficients are assumed to be for some c jk and 0 < |λ j | < 1, for j, k ∈ {1, . . . , q} and i ∈ {1, . . . , p}. The c jk is the first coefficient for the previous value of the kth asset, Y k,t−1 , at the first lag t − 1, and the λ j is the decay rate for the next coefficients. The exponentially weighted multivariate HAR model has long memory as seen in Figures 1 and 2 in the next section, where autocovariance functions as well as the sample paths of the model with decay rates are observed. The decay rates λ j not only clearly represent the long-memory feature but also reduce the number of parameters to estimate. In this work, we mainly focus on detecting the existence of the decay rates in model (1) and additionally deal with estimating the decay rates.
In the multivariate HAR model, we first study the hypothesis test problem whether the underlying model is an exponentially weighted multivariate HAR model with decay rates, and secondly we handle easy-to-implement estimators of the decay rates.
For the hypothesis test problem, we consider two tests in (i) and (ii) as follows: (i) whether or not, in the multivariate HAR model, the jth asset has a decay rate λ j satisfying β for some c jk for each j; (ii) whether or not the exponentially weighted multivariate HAR model has a common rate λ for all multiple assets, i.e., β (i) jk = c jk λ i−1 for some 0 < |λ| < 1 for all j, k. In test (i), each asset is first individually analyzed. Once test (i) has been conducted to favor the null, test (ii) is performed to detect a common rate. For test (i), the null hypothesis H j,0 and the alternative hypothesis H j,A are, for each j, stated as for some c jk and 0 < |λ j | < 1, vs. H j,A : the model is not the exponentially weighted HAR model.
In order to introduce a test statistic, we adopt the ordinary least squares estimator (OLSE) of the multivariate HAR model. Suppose that we have observed {Y j,t : −h p + 1 ≤ t ≤ n, j = 1, 2, . . . , q} of sample size n. Let the OLSE of β j ≡ (β j0 , β The asymptotic property of OLSEβ j in the multivariate HAR model is derived theoretically by Hong et al. [41]. From the OLSE, we first choose an estimate of c jk byĉ jk =β (1) jk , and then consider a regression model with the decay rates as its coefficients under the null hypothesis as in (2) and (4). To describe the regression model, we let Note that under the null hypothesis, We rewrite η j,t as follows: where W (i) and consider the following regression in (4) with coefficients λ j,1 , . . . , λ j,p−1 , which is a similar form to (2) but replaced by observable quantities η j,t and W (k) j,t−1 , k = 2, . . . , p: From this regression we compute OLSEΛ j,n of the parameters (λ j,1 , . . . , λ j,p−1 ) byΛ j,n = (λ j,1 , . . . ,λ j,p−1 ) . Note that under the null hypothesis with β Thus, to construct a test statistic, two types of residuals {ˆ j,t } and {˜ j,t } are respectively defined byˆ To construct a test statistic, we use the difference series of the two types of residuals, (not the original time series). Let noting that E[D j,t ] → 0 as n → ∞ under the null hypothesis. Now we define a CUSUM test statistic T j,n (z) as follows: for 0 ≤ z ≤ 1, The following theorem states asymptotic distribution of both statistics. It provides critical values of the test for H j,0 . Theorem 1. We assume E|Y j,t | 2+δ < ∞ for some δ > 0, for all j, t. If the multivariate HAR(p.q) model has exponential decay rate λ j with β (i) jk = c jk λ i−1 j for some 0 < |λ j | < 1 for each j, then we have, as n → ∞, (1) is the standard Brownian bridge with the Brownian motion B(z).

Remark 1.
In order to test H j,0 vs. H j,A , we adopt the CUSUM test statistics T j,n (z), rather than S j,n , and the null hypothesis is rejected if | T j,n (z)| is large. The reason is as follows: . . , p}. Note that the pseudo-time series {D j,t = D j,t,n : t = 1, 2, . . . , n} is a triangular array and under the null hypothesis the coefficients are asymptotically zeros whereas under the alternative hypothesis the coefficients are changed over the time without vanishing asymptotically, which makes a change-point in mean or variance of the difference series. This idea is the reason why we adopt the CUSUM-based test for our goal that is to detect the exponentially decay rates.
Sample paths of the series W along with values of | T j,n (z)|. In Figures 3 and 4, it is shown that the difference series under the null is an asymptotical constant due to the asymptotical zeros of the coefficients, whereas under the alternative, it fluctuates with large variance; that is, it indicates that there are change-points in mean or variance. On the other hand, we might use the full sum S j,n as a test statistic in a view of theoretical insight. However, as seen in Figures 3 and 4, even under H j,0 , the sum is not evaluated as small values because of the following reason: S j,n can be expressed as a linear combination of . . , p} and thus as a linear combination of {∑ n t=1ĉjk Y (i) √ n converges to the normal distribution with asymptotic mean zero. Thus S j,n makes an asymptotic bias of the form Because the asymptotic bias is not negligible even though √ n(λ j,k −λ k j,1 ) tends to normal distribution with mean zero under the null hypothesis, the sum S j,n has somewhat large values and thus cannot distinguish significantly the two hypotheses. Therefore, this work adopts the test statistic T j,n (z) to resolve our problem. Now we would further like to test whether or not the exponential weighted multivariate HAR model has a common exponential decay rate λ for all multiple assets. That is, in the exponentially weighted multivariate HAR(p, q) model with β for some 0 < |λ j | < 1 for all j, k, after the first test has been performed, we test the null hypothesis H * 0 versus the alternative hypothesis H * A as follows: H * 0 : λ 1 = · · · = λ q = λ with a common rate λ on all assets j, H * A : not the null.
Similar to the above, under the null H * 0 , for all j we have η j,t = λW j,t−1 + j,t instead of (2), but we use a consistent estimateλ j of λ j for {Y j,t }. For the estimation of the rates λ j , j = 1, . . . , q, we discuss below in Remark 2. By using the estimateλ j we We construct a test statistic for testing if the HAR model has common rate as follows: , which is rewritten as The following theorem provides the null limiting distribution of the test statistic. Theorem 2. We assume E|Y j,t | 2+δ < ∞ for some δ > 0, for all j, t. If the multivariate HAR(p.q) model has a common exponential decay rate λ with β (i) jk = c jk λ i−1 for some 0 < |λ| < 1 for all j, k, then we have, as n → ∞, Note that under the null hypothesis H * 0 , the difference series {d j,t } are evaluated as small values and are characterized with small variance, but under the alternative hypothesis H * A , they have large values with dynamic variance over the time. Thus we use the CUSUMSQ test for the difference series {d j,t } (not the original data) to see the change-point of the variance. Justification of suitability of the CUSUMSQ test can be seen in the next section, where sample paths of the difference squared, d 2 j,t , and the values of test statistics in absolute, | T * n (z)|, under both hypotheses are depicted. Once the first test in Theorem 1 has been conducted to datasets of multiple assets, we obtain estimates of the decay rates by using (9), and then the second test in Theorem 2 is conducted to see whether the datasets have a common rate. Finally, the estimate (10) is used to find the common rate.

Remark 2.
The following concerns the estimation of the decay rates. In the exponentially weighted multivariate HAR model (1) with coefficients β (i) jk = c jk λ i−1 j , estimators of the decay rates λ j can be obtained in a simple way. From the OLSEs of parameter vector β j , we construct an easy-toimplement estimator of λ j as follows:λ Furthermore, in case of the common rate with β In the estimates of the decay rates in (9) and (10), only the first and the second coefficients estimates, jk andβ (2) jk , are used. This is because these two estimates have comparatively fewer standard errors than others. To see their performances, sample means and standard errors of the estimates in (9) and (10) are computed and compared in the next section.
In the conventional multivariate HAR(p, q) model there are a total of (1 + pq)q coefficient parameters to estimate, whereas in the exponential weighted multivariate HAR(p, q) model the number of parameters is decreased to (2 + q)q. Each j = 1, 2, . . . , q, Y j,t has one intercept, and q coefficients of the previous lag values of q assets and one decay rate. For a simple case with p = 3 and q = 2, the number of parameters is reduced from 14 to 8. This implies that some measures of statistical models such as AIC and BIC might be improved considerably. This improvement can be shown in the following sections with simulated data and real data examples. In the multivariate HAR(p, q) model, the asymptotic normality for the OLSEβ j,O (≡β j,OLSE ) of β j has been established by Hong et al. [41]:

Remark 3.
The following concerns the bias adjustment for a finite sample. In our exponential weighted multivariate HAR model with β jq ) , we construct an estimatorβ j,Λ of β j , called the rate-adopted estimator (RE), as follows:β j,Λ = (β j0 ,β Here we need to observe the residuals on behalf of the empirical analysis for a finite sample. We rewrite model Letˆ j,t,O and˜ j,t,Λ be residuals by the OLSE and the RE, respectively: Letμ j,Λ = 1 n ∑ n t=1˜ j,t,Λ , and then by the asymptotic normality of √ n(β j −β j,O ) with asymptotic mean zero and by noticing that Even though |β k,t−1 is not negligible in a small finite sample. Thus we need the bias adjustment for a fitting model in a finite sample. When we fit the exponentially weighted HAR model to real datasets, especially ones with small sample size, the error performances can be improved by means of the bias adjustment. For instance, one way is that the fitted model is shifted by the residual meanμ j,Λ , which is a constant. An alternative way is that the model is shifted by a moving average of residuals, which is a time-varying process, as we define in the following. For a positive integer m and t = 1, 2, . . . , n, let where where F j,Λ (X t−1 ) =β j,Λ X t−1 +ω m,t and ε j,t =˜ j,t,Λ −ω m,t . Note that 1 n ∑ n t=1 ε j,t = o p (1). Effects of m, called the fitting parameter, on the error performances of (12) will be discussed in the next section.

Monte Carlo Simulation
In this section, we first see the plots of sample paths of the proposed model and their autocorrelation coefficient functions (ACFs). Secondly, finite sample validity of the proposed tests is investigated along with the plots of various related series for the justification of the tests. Thirdly, the estimates of the decay rates are computed, and finally comparisons with conventional HAR models are addressed in terms of measures such as RMSE, MAE, AIC, and BIC. Moreover, efficiency of the proposed model vs. the benchmark HAR model is discussed.
In the simulation experiment, to see the plots of the proposed model, simulated data are generated by bivariate exponentially weighted HAR models of order p = 3, HAR(3,2) models, with lag structure h = (h 1 , . . . , h p ) = (1,5,22), by using i.i.d. standard normal distributed N(0, 1)-errors { j,t } and size n = 400. In order to avoid the effect of selected initial value in the models, data of size 600 are generated and the first 200 data are deleted to obtain n = 400. Figures 1 and 2 15. We see that the simulated data are strongly correlated with each other and reveal the long-memory feature. In Figure 1, two datasets have a correlation coefficient of 0.7856, and in Figure 2, the correlation coefficient is 0.6748.  Prior to reporting the test results, some plots of related series are illustrated in order to justify the suitability of the CUSUM tests. In particular, for Cases I, III, and IV, sample paths of W (i) j,t−1 , D j,t and | T j,n (z)| in (3), (6), and (8), respectively, are depicted with n = 500, 1000 in Figures 3 and 4, where values ofλ j,2 −λ 2 j,1 in Figure 5 are used to compute D j,t together with using W (i) j,t−1 . In Figure 5, we see thatλ j,2 −λ 2 j,1 tends to zero in Case I (j = 1, 2) and Case III (j = 2) as its theory indicates as in (5) under the null hypothesis. However, it is shown in Figure 5 that Case III (j = 1) and Case IV (j = 1, 2) have the deviation from zeros under the alternative. In Figures 3 and 4, under the null hypothesis with decay rates, the difference series D j,t does not fluctuate due to the asymptotic zero ofλ j,2 −λ 2 j,1 , which can be interpreted as constant coefficients in the linear combination of W (i) j,t−1 , whereas under the alternative, plots of D j,t are dynamic with large variance because of nonzeroλ j,2 −λ 2 j,1 , (see Equation (6)). This fact yields higher values of the CUSUM test statistic in absolute, | T j,n (z)|, as seen in the third columns of Figures 3 and 4.   j,t−1 in the first column byλ j,2 −λ 2 j,1 given in Figure 5. Test statistics in absolute | T n,j (z)|, (0 ≤ z ≤ 1), are computed using the difference series D j,t in the second column. In the first row of Case I with both H 1,0 and H 2,0 , D j,t has no change in mean and thus small values of | T n,j (z)| for all z. In the second row of Case III with H 2,0 , j = 2 (in red), the same interpretation is given.   j,t−1 in the first column byλ j,2 −λ 2 j,1 given in Figure 5. Test statistics in absolute | T n,j (z)|, (0 ≤ z ≤ 1), are computed using the difference series D j,t in the second column. In the first row of Case I with both null hypotheses H 1,0 and H 2,0 , D j,t has no change in mean and thus small values of | T n,j (z)| for all z.
In the second row of Case III with H 2,0 , j = 2 (in red), the same interpretation is given. tends to zero as n → 1000.
As for the CUSUMSQ test in Theorem 2, Figure 6 describes series of difference squared, d 2 j,t , and test statistic values in absolute, | T * n (z)|, under the two hypotheses with n = 500, 1000, respectively. It is shown that under H * 0 , the difference series d j,t has small variance with small values whose squares are less than 0.025 for n = 500 and 0.006 for n = 1000, whereas under H * A it has large values with their squares between 0 and 2500. Therefore we adopt the idea of change-point test of variance to the difference series d j,t , which yields a solution of the existence of the exponential decay rate.
Throughout the simulation study of the CUSUM tests, replication number 1000, significant level α = 0.05, and sample size n = 100, 200, 500, and 1000 are used. Tables 1 and 2 display the evaluated rejection rates in Theorems 1 and 2, respectively. In Table 1, results of rejection rates of two tests H j,0 , j = 1, 2, in Theorem 1 for the four cases are addressed. To compute the test statistic T j,n (z) in Theorem 1, a consistent estimateσ j,D of the standard deviation should be found. Recall two ofσ 2 j,D in (7), which are estimators of Var(D j,t ) = E[D 2 j,t ] − E[D j,t ] 2 . Because E[D j,t ] → 0 as n → ∞ due tô λ j,k −λ k j,1 → p 0 under the null hypothesis H j,0 , we may use both in (7) forσ 2 j,D . However, the use of the estimates incurs slow convergence rates because of the bias problem, as seen in the argument in Remark 1. If the mean has a large bias, the convergence rate tends to be slow, as affected by the bias. Thus we adopt the two estimates partially to adjust the convergence rate. In particular, so as to visualize the convergence to the nominal level with increasing n, we here take partially the second sample variance in (7) of the form: 1 n ∑ n t=1 D 2 j,t − (D j ) 2 , where D j = 1 n ∑ n t=1 D j,t , which converges to zero in probability along with | 1 n ∑ n t=1 D j,t − E[D j,t ]| → p 0 as n → ∞. To use it partially, we construct a consistent estimator with the following threshold:σ 2 j,D = 1 n ∑ n t=1 D 2 j,t − δ(D j ) 2 , where δ = I(|D j | < th * ) with an indicator function I(·) and a threshold th * . In other words, we use either the first one or the second one in (7), depending on the magnitude of the mean D j . By doing this, we can adjust the convergence rate by adopting the following threshold: setting the value δ of the indicator function to zero in case of the large bias. In this simulation, threshold th * is chosen empirically such that P(|D j | < th * ) = 0.05; that is, if |D j | ≥ th * with probability 0.95, then the first in (7) is used and otherwise with probability 0.05, the second in (7) is used for the estimateσ 2 j,D . This is because the probability of having the bias is high as seen in Remark 1. In Table 1, results of rejection rates are given by using the estimateσ j,D with the chosen threshold th * and are seen with the convergence to the nominal level α = 0.05 as n increases under the null hypothesis. For Table 2 with Theorem 2, a similar argument can be given.   It is shown from Table 1 that Case I favors two of the null hypotheses in Theorem 1; i.e., the models are exponentially weighted with small rejection numbers of the null hypothesis. Moreover, Table 2 depicts reasonable rejection rates of the test in Theorem 2 for a large sample size under both null and alternative hypotheses. Note that in Table 2, the null hypothesis H * 0 indicates that both λ 1 and λ 2 are the same values; i.e., the common rate λ = λ 1 = λ 2 ∈ {0.2, 0.4, 0.5}, where small rejection rates are reported. The rejection rates in Tables 1 and 2 tend to the nominal level α = 0.05 as n increases under the null hypothesis.
In Table 5, estimates of the decay rates in (9) and (10) are computed to obtain sample means and standard errors. Cases I-III have the same parameters as those in Table 1 whereas Cases IV * -VI * use common rates given as follows: Case VI * : Common rate λ 1 = λ 2 = λ = 0.9. Table 5 reports that estimate results are consistent in the sample sizes, whereas for Cases IV * , V * and VI * , estimatesλ j in (9) are used for λ j , j = 1, 2, and estimatesλ in (10) are used for λ. We notice that in the common rate cases, Equation (10) gives smaller standard errors of the estimates.  Finally, we discuss the fitting problem of the exponentially weighted HAR models in (12) and compare with the conventional HAR model fitted by OLSEs. The simulated data in Figure 1 are used to compute the OLSEsβ j,O of coefficients and the estimatesλ j of the decay rates in (9). From the estimated ratesλ j , the rate-adopted estimators (REs)β j,Λ of the coefficients are evaluated as a further step. To compare the fitted models by the OLSEs and REs, Table 6 presents some criteria such as the root mean square error (RMSE), mean absolute error (MAE), AIC, and BIC. In the case of the exponential weighted bivariate HAR models, the fitting parameter m is used with m = 5, 10, 20, 100. Because the conventional bivariate HAR model has 14 parameters whereas our proposed model has 8 parameters, the AIC and BIC of the latter are smaller values than those of the former. Intuitively, the small choice of m yields small errors in RMSEs and MAEs because the averageω m,t in (11) for interval [t − m, t + m] is closer to˜ j,t,Λ for smaller m, and thus this fact makes the error term ε j,t in (12) smaller. In the latter case, (12) is applied with m = 5 for the bias adjustment. RMSE and MAE of the OLSE residuals in fitting the conventional HAR model are 0.9971, 0.8253 for j = 1 and 0.9644, 0.7596 for j = 2, and those of the RE residuals in fitting our proposed model are 0.9424, 0.7805 for j = 1 and 0.9226, 0.7381 for j = 2. Table 5. Sample means and standard errors (s.e.) of estimates for the decay rates λ 1 , λ 2 of exponentially weighted HAR (3, 2) models, Replication number = 1000. Note that in common rate cases with * , Cases IV * , V * and VI * , estimatesλ j in (9) are used for λ j , j = 1, 2, while estimatesλ in (10) Table 6. Comparison of exponential weighted HAR(3,2) fitting models from the simulated data in Figure 1. Furthermore, to elaborate more on the comparison with the conventional HAR model, the efficiency of the proposed model vs. the conventional one, is computed by using two metrics of RMSE and MAE: The Exp. HAR Model Efficiency, relative to the benchmark HAR model, is defined by

Model
where RMSE 0 and MAE 0 are RMSE, MAE of the conventional HAR model, respectively and RMSE 1 and MAE 1 are those of the exponentially weighted HAR model. Table 7 displays the Exp. HAR model efficiency in the first case of (λ 1 , λ 2 ) of Table 3. We see that all values are positive with highest value 7.0817 in percentage, which means that the proposed model with the RE fitting improves the conventional HAR model with respect to residual errors. The HAR model has high applicability in the financial market. In particular, it is very powerful for the realized volatility forecasting [1,17,18,29]. However, besides volatility forecasting, as a theoretically linear AR model, it has many applications to various time series data as in [12][13][14]42]. The exponentially weighted multivariate HAR model, which is one of the special cases of HAR models, is suitable for joint data with strong crosscorrelation. The decay rate of the model plays a key role in the common structure of the joint data with strong correlation. In economics and finance, there are many strongly correlated time series data that are important for policy decisions to improve the global economy and human society. For example, stock prices in the same category tend to have the same pattern. Stock price modellings are known to be based on efficient market hypotheses (EMH), according to which only relevant information on the stock is its current values. The proposed model may be proper to the stock price modelling because the current value and the current averages (with exponentially decaying coefficients) are used as regressor variables in the model. The following section will address empirical analysis of the joint data of strongly correlated stock prices to confirm the intuition of the proposed model for the stock prices.

Empirical Analysis
In this section, we provide empirical examples of U.S. stock prices that are applied to the exponentially weighted multivariate HAR models. Note that realized volatility does not fit for our model, but the stock price itself may be suitable for the proposed model with exponential decay coefficients. To this end, we choose some datasets of U.S. stock prices and conduct our proposed CUSUM tests. For a bivariate joint data (q = 2), stock prices of Amazon.com Inc. (AMZN) Table 8, where suprema of test statistics and decay rate estimates are computed by Theorem 1 and Equation (9), respectively. More specifically, conducting the CUSUM test in Theorem 1 to detect the presence of the exponential decay rates, test statistics are evaluated as follows. In the case of (AMZN, NFLX) for detecting the existence of λ AMZN and λ NFLX , the CUSUM test statistics are computed as 0.3986 for AMZN and 0.3489 for NFLX. In the case of (AAPL, MSFT), the CUSUM test statistics are computed as 0.7508 for AAPL and 0.4979 for MSFT. These values imply that the null hypothesis is not rejected because the critical values of the standard Brownian bridge are 1.224 of level α = 0.1 and 1.358 of level α = 0.05. On the other hand, as the test in Theorem 2 for a common rate is conducted, the test statistics are evaluated as values greater than 2, which rejects the null hypothesis with the common rate. Now comparisons with the conventional HAR models are presented for the two pairs (AMZN, NFLX), (AAPL, MSFT), and for the triple (FB, AAPL, MSFT). We compare performances for these datasets applied to univariate HAR model, (conventional) multivariate HAR model and exponentially weighted multivariate HAR (Exp. HAR) model. For the conventional HAR models, two estimation methods of OLSE and LASSO used in [43] are adopted. LASSO estimates are computed by LassoLarsCV in sklearn.linear_model with Python version 3.8.6. For the Exp. HAR model, the RE is computed. In Table 9, for two joint datasets (q = 2), the conventional HAR(3,2) model has 14 parameters, whereas the exponential weighted HAR(3,2) model has 8 parameters. Each univariate HAR model has 4 parameters, so the total number is 8. Table 9 reports the comparison results of the HAR(3,2) models. The CUSUM test favors the existence of exponential decay rates, and thus the exponential bivariate HAR (3,2) is fitted with the rate estimates. The decay rates for AMZN and NFLX are estimated as (λ AMZN , λ NFLX ) = (0.2983, 0.1144) whereas those for AAPL and MSFT are (λ AAPL , λ MSFT ) = (0.02175, −0.01483), which are presented in Table 8. Four measures of RMSE, MAE, AIC, and BIC are compared in the three models via OLSE, LASSO, and RE. In Table 9, the best values are displayed in bold. The exponential bivariate HAR (3,2) model has the best performance on RMSE and MAE, whereas the univariate HAR model with LASSO has the best performance on the AIC and BIC. Figure 7 depicts the actual data of stock prices of (AMZN, NFLX) and the fitted Exp. HAR (3,2) model by the REs along with their residuals. It can be seen that the stock prices of (AMZN, NFLX) are well matched to the fitted model.  Table 10 reports the performances of the HAR(3,3) models for (FB, AAPL, MSFT). The conventional HAR(3,3) model has 30 parameters, whereas the exponential weighted HAR(3,3) model has 15 parameters. The decay rate estimates of (FB, AAPL, MSFT) are (λ FB , λ AAPL , λ MSFT ) = (−0.03302, 0.03097, 0.02733), from which Exp. HAR(3,3) models are fitted in Figure 8. As seen in Table 10, our proposed model performs better than others with respect to the RMSE, MAE, and AIC whereas the univariate HAR model with LASSO has good performance on BIC, which are indicated by bold numbers in Table 10.
Consequently, the proposed model not only has fewer parameters than the conventional HAR models, but also yields the best performance on the loss errors such as RMSE and MAE. The exponentially weighted HAR model with decay rates is suitable for the stock prices of joint financial assets with strong cross-correlation, rather than the volatility, in the stock market.

Concluding Remark
This work presents the exponentially weighted multivariate HAR models with exponentially decaying coefficients. The models represent very well two main features: long memory and strong cross-correlation, of financial market data. The common structure of multivariate data, which has such features, can be expressed by the existence of the decay rates of the coefficients in the model. For detecting the existence of the decay rates in the multivariate HAR models, CUSUM-based tests are established in two stages. The first is whether the multivariate HAR model has an exponential decay rate for each asset. The second is whether the model has a common rate for all assets. To test the presence of the rates, difference series are generated from two types of residuals and the change-points of its mean or variance are detected from the pseudo-time series, but not the raw data. The null limiting distributions of the test statistics are derived to be the standard Brownian bridge and are used in providing the asymptotic critical values for rejection of the hypothesis. Easy-to-implement estimators of the decay rates are computed. A Monte Carlo simulation study verifies the proposed tests by illustrating the related series and evaluating finite sample performance of size and power of the tests. Empirical examples show the usefulness of our proposed models in the stock market, especially stock price, but not volatility. Fewer parameters and smaller residual errors in our models are demonstrated.
Let us consider four aspects: (i) decreased number of parameters; (ii) smaller modelfitting errors; (iii) representation of common structure with decay rates; and (iv) a suitable model for stock price movement. These are the main advantages of the exponentially weighted multivariate HAR models. These advantages will help to practically provide more efficient models with smaller errors of predictions in financial time series modelling. In economics and finance, many strongly correlated data of joint assets play a crucial role in policymaking on economic and social regulations. The multivariate feature of our proposed model could be useful to improve forecasting accuracy of the financial assets and, thus it is possible to fine-tune policymaking on such asset classes.
The HAR model has very high applicability in the financial market. An extension of our proposed model can be useful to joint mutivariate data with strong correlation and long memory. In particular, like [12][13][14][15], an extended model incorporating exogenous variables such as an associated-uncertainty index would be a good prediction model to assess the high forecasting gain. For example, joint datasets, like gold and silver, oil and exchange rate, or several stock prices in the same sector, are affected together by global issues like COVID-19 and the Ukrainian War in the current decade. Such a uncertainty-related index can be added to the model as regressors and the extended multivariate HAR model is expected to give a good performance.
Also, an extension of the proposed model can be established as a dynamic time series model that is more applicable to real-world market data, for example, with time-varying variance, non-Gaussianity or heavy-tailed distribution. Recently, Ref. [18] analyzed a multivariate HAR-RV model with GARCH errors, for which a weighting scheme based on the conditional variances of the errors is used to construct the weighted least squares estimates. An extension of this work can be linked to heteroscedasticity. Exponentially weighted multivariate HAR models with time-varying variances such as GARCH errors or ARCH without intercept (see Ref. [44]) would be interesting topics. Reference [44] proposed a double AR model without an intercept (DARWIN model) as a modification of an AR-ARCH model as follows: y t = φy t−1 + η t αy 2 t−1 where φ ∈ R, α > 0, {η t } is i.i.d. with zero mean and unit variance, and independent of {y j : j < t}. The DARWIN model is nonstationary and heteroscedastic regardless of the sign of Lyapunov exponent, and hence it provides us a new way to model the nonstationary heteroscedastic time series. Analysis on a nonstationary, exponentially weighted HAR model combined with the DARWIN model will be an interesting topic in modelling heteroscedastic time series data. In the exponentially weighted multivariate HAR model with the DARWIN errors, statistical methods for detecting and estimating the decay rates along with the DARWIN parameter estimation will be a challenging study.

Conflicts of Interest:
The authors declare no conflict of interest.
Note that V n U n = exp(ιxS n ) and U n → p exp − x 2 2 =: a by (A2) and (A5). Because V n U n = V n (U n − a) + aV n , we may show that |V n | is uniformly integrable and E(V n ) → 1.
Because σ 2 j,D = lim n→∞σ 2 j,D , (A1) holds. Similarly, we can show that S [nz] := ∑ [nz] t=1 X n,t converges to the Brownian motion B(z). It follows that the desired result in Theorem 1 is obtained.