We structure our empirical study in the following way. First, we describe the data and construction of the portfolios in 
Section 4.1. Second, we estimate the time series models and provide the corresponding diagnostics in 
Section 4.2, where we compare the HAR approaches with simple AR(1) and AR(5) alternatives. Finally, in 
Section 4.3, we evaluate the performance of our approach by investigating whether our measures are suitable for reducing the portfolio variance from the investors’ perspective.
  4.1. Data and Construction of Portfolios
Our sample consisted of 10 stocks listed in 
Table 1 ranging from 1 February 2001–31 December 2009 with 
 observations in total. This sample was investigated by 
Noureldin et al. (
2012) and is available through 
Heber et al. (
2009). We used the period until 28 February 2005 with 1022 observations as the in-sample to estimate our models. 
Table 2 displays the average in-sample and out-of-sample daily realized variances of all assets.
For our purposes, we constructed three portfolios, namely the equally-weighted portfolio with 10 assets denoted as , as well as two equally weighted portfolios from five stocks each with the highest and lowest average in-sample daily realized variances, denoted as  and , respectively. In the out-of-sample period, which was manifested by the subprime mortgage crisis 2007–2009, the average portfolio variance increased compared to the in-sample period by 150.29% for , by 75.13% for , and only by 5.01% for .
We consider now the following setting: the investor holds the portfolio  and considers the possibility to reduce the portfolio risk by including the portfolio  as an additional asset. For this approach, we computed the realized diversification measures  and the realized GMVP weights , which correspond to the proportion of  in the new portfolio.
In 
Figure 1 and 
Figure 2, we provide the autocorrelation function for 
 and 
 for both the in-sample and out-of-sample. Both measures appeared to be rather persistent, which is also taken into account by time series modeling in 
Section 4.2.
  4.2. Time Series Modeling
For time series modeling of 
 and 
, we applied the HAR models as in (
14) and (
15). Moreover, we considered both the AR(1) and AR(5) models as simple benchmarks for both processes with, e.g., AR(5) for the realized 
, parameterized as:
All three models were estimated by OLS with the results reported in 
Table 3.
Almost all model coefficients proved to be significantly different from zero. Considering the 
, the HAR model gave the best fit, followed by the AR(5). Moreover, the AR(5) and HAR both had lower values for AIC and BIC than the AR(1) model. Next, we analyzed the in-sample regression residuals to further check the models’ adequacy. In 
Figure 3, 
Figure 4 and 
Figure 5, we show the Autocorrelation Functions (ACF) of the models’ residuals and their squares.
Based on the ACF plots, we conclude that our HAR and AR(5) modeling removed residual autocorrelation, whereas some autocorrelation remained for the AR(1) approach. Furthermore, there appeared to be no autocorrelation in the squared residuals for all models. Additionally, in 
Table 4, we provide the results of residual tests, namely the Ljung–Box (LB) test for autocorrelation, the ARCH-LM test for heteroskedasticity, and Shapiro–Wilk (SW) test for the normality assumption.
Supporting the evidence from the ACF plots, the tests failed to reject the null hypotheses of no serial correlation and no ARCH effects for the HAR and AR(5) models. On the other hand, the Ljung–Box test rejected the null “no autocorrelation” for AR(1), indicating that this model does not reflect the underlying dynamics well enough. The normality assumption was clearly rejected for all models.
Next, we estimated the HAR, AR(5), and AR(1) models for the process of realized weights 
, with, e.g., the AR(5) model given as:
Similar to 
Table 3, in 
Table 5, we show the estimation results, whereas the model diagnostics are presented in 
Table 6. As for the case of 
, the model coefficient for 
 were mostly highly significant. At first glance, AR(5) appeared to be preferred by AIC compared to AR(1) and HAR; however, judging by the adjusted 
, the HAR still seemed to be the best specification among the considered models.
In 
Figure 6, 
Figure 7 and 
Figure 8, we show the in-sample residual ACFs. As for 
, in the case of 
, the ACFs for the HAR and AR(5) residuals showed no remaining autocorrelation, whereas for AR(1), there was still some autocorrelation left. The in-sample diagnostic test results are shown in 
Table 6. The HAR and AR(5) models for 
 seemed to pass all the tests, whereas AR(1) residuals showed some residual autocorrelation.
Summarizing our time series modeling, we could conclude that both HAR and AR(5) models seemed to be appropriate for modeling realized diversification benefits 
 and realized portfolio weights 
. Next, we conduct out-of-sample analysis in 
Section 4.3 in order to investigate whether this modeling would be helpful to achieve lower portfolio variances.
  4.3. Economic Evaluation
Now we provide the out-of-sample analysis within the following framework. Consider the investor holding the portfolio  and willing to know whether he/she should diversify it further by including the portfolio  as a potential additional asset. Based on the in-sample data, we estimated the time series models both for  and  and denote the corresponding one-step-ahead out-of-sample forecasts by  and , respectively.
Next, consider that the investor is eager to diversify only if volatility could be reduced at least by a certain amount, for example because of the transaction costs argument. In practice, investors often make decisions by relying not on statistical significance, but on some (naive) empirical criteria; see, e.g., 
Brandt et al. (
2009). In order to resemble this setting, we assumed that the investor seeks to diversify away at least 
 of portfolio risk, so that the ratio 
 must not exceed 0.95. This can be translated into a threshold 
ℓ for the log diversification measure 
 with the value 
. Thus, the corresponding decision rule would be to diversify if the forecast 
 and to stay by the initial portfolio if 
. Then, given the realized measures 
, one could learn in the next period whether this decision was correct or not. The resulting frequencies are visualized using 
 decision matrices in 
Table 7.
Judging only from the percentage of correct predictions,  and , the HAR model appeared to perform better than both AR(5) and AR(1). Note that the HAR approach is a rather conservative one, as it leads to frequent recommendations not to diversify compared, e.g., with AR(5). To sum up, the HAR produced the most correct predictions and, moreover, resulted in the fewest wrong and costly diversification signals.
As a next step, we incorporated into the decision procedure the forecasted portfolio weight 
 in order to quantify the amount of a possible portfolio variance reduction. The strategy would be as follows: select the diversified portfolio with the forecasted weight 
 in the case of 
, which would lead to the variance 
, or remain by the initial portfolio 
 in the case of 
 with the variance 
. We denote the resulting portfolio variance from this diversification rule as 
, as we considered its ratio to the variances from three benchmark approaches: 
 corresponding to the ex-post GMVP, 
, and 
 for the portfolio with 50% in 
 and 50% in 
. The comparison of different models is provided in 
Table 8.
The realized GMVP benchmark  provided the lower boundary, so it was reported primarily for comparison purposes. Concerning the portfolio , the HAR model provided the possibility to reduce its variance by diversifying in more than 43% of days, leading to wrong decisions only in 12.3% of days. Similar evidence was found for the equally-weighted portfolio . The results became worse the for AR(5) models and appeared to be really unsatisfactory for the AR(1) approaches, where holding  led to a lower portfolio variance in more than 50% of days.
For a further illustration of our our results, we visualize the time series of portfolio variance ratios with respect to the benchmarks of 
 and 
. In particular, for the HAR, AR(5), and AR(1) approaches, we report the time series of 
 and 
 in 
Figure 9 and 
Figure 10, respectively.
In 
Figure 9 for the benchmark 
, we observe that the HAR-based approach suggested to diversify only at a comparatively small number of days, whereas most of the time, the ratio was equal to one, i.e., no diversification was recommended. It provided the major correct recommendation before the start of the crisis. The AR(5) suggested very often diversification decisions; however, they appeared to be mostly disadvantageous from the start of the subprime mortgage crisis in the middle of 2007. The AR(1) model provided mostly wrong diversification decisions, especially during the crisis year 2008. Note that the reasons for these false recommendations could be attributed to either 
 or 
 forecasting models. Hence, it is apparent that AR(1) is not really suitable for our purposes here.
Different from the case above, in 
Figure 10, for the benchmark 
, we observe that the HAR model provided reasonable diversification recommendations especially since the crisis began in 2007; however, it was not really useful before the crisis start. Surprisingly, the other two approaches—AR(5) and AR(1)—also performed similarly to the HAR for this equally-weighted portfolio benchmark. We interpreted these findings as evidence that not only the choice of the time series model, but also the choice of the benchmark could determine the success of a portfolio diversification strategy.