Modeling and Forecasting Realized Portfolio Diversiﬁcation Beneﬁts

: For a ﬁnancial portfolio, we suggest a realized measure of diversiﬁcation beneﬁts, which is based on intraday high-frequency returns. Our measure quantiﬁes volatility reduction, which could be achieved by including an additional asset in the portfolio. In order to make our approach feasible for investors, we also provide time series modeling of both the realized diversiﬁcation measure and realized portfolio weight. The performance of our approach is evaluated in-sample and out-of-sample. We ﬁnd out that our approach is helpful for the purpose of portfolio variance minimization.


Introduction
The mean-variance portfolio selection procedure of Markowitz (1952) remains a theoretical cornerstone of the modern portfolio theory.Empirically, one should not apply the mean-variance portfolio optimization directly, primarily due to the effect of estimation risk in the mean returns; see, e.g., Best and Grauer (1991); Chopra and Ziemba (1993).For this reason, variance minimization approaches leading to the choice of the global minimum variance portfolio (GMVP) (cf.Ledoit and Wolf 2003) or even naive equally-weighted portfolios (cf. DeMiguel et al 2009a(cf. DeMiguel et al , 2009b) often appear to be preferable in practical portfolio selection.Further improvement of the GMVP performance could be gained by imposing constraints on portfolio weights (Jagannathan and Ma 2003), using shrinkage procedures (Golosnoy and Okhrin 2009;Frahm and Memmel 2010), or applying LASSO or other regularization techniques (Callot et al. 2017).
The essential concept of portfolio diversification postulates that non-systematic risks could be substantially reduced by including enough not perfectly-correlated risky assets into the portfolio.Hence, when selecting a portfolio composition, one of the crucial questions is whether the portfolio is already diversified enough or if including additional assets would lead to a further noticeable risk reduction (cf.Evans and Archer 1968).There are various measures of portfolio diversification proposed in the literature; see, e.g., Rudin and Morgan (2006), Bera and Park (2008), Choueifaty and Coignard (2008), Goetzmann and Kumar (2008), or Meucci (2009).
Recently, the work in Frahm and Memmel (2010) suggested to measure diversification as a ratio of the current portfolio variance and the GMVP variance.The work in Frahm and Wiechers (2013) analyzed the properties of this measure and showed that it possesses a convenient economic interpretation.The availability of intraday returns allows computing daily realized variances and covariances of risky asset returns, which are consistent estimators of the daily covariance matrix (Barndorff-Nielsen and Shephard 2004).By analogy, one could also calculate daily realized GMVP weights, which are a function of the realized covariance matrix (cf.Golosnoy et al. 2019).In this paper, we propose a daily realized portfolio diversification measure, which quantifies a portfolio volatility reduction due to inclusion of an additional asset.Our statistic could be seen as a realized extension of the diversification measure of Frahm and Wiechers (2013).We provide its stochastic properties for a given realized covariance matrix estimator.
As investors intend to make portfolio decisions for the next period, it is of interest to predict the next period (day) diversification gains.Hence, we model realized diversification benefits ex-ante and directly in order to obtain the corresponding forecasts.For this purpose, we consider several time series models for our realized diversification measure.Forecasts provided by these models help an investor to decide whether it is reasonable to include or not to include an additional risky asset into the portfolio composition.We found out that the cascade HAR model of Corsi (2009) appears to be mostly suitable in our empirical application both in-sample and out-of-sample, so that it is shown to be reasonable to consider realized diversification statistics.
The rest of the paper is organized as follows: In Section 2, we introduce our realized diversification measure and establish its asymptotic stochastic properties.In Section 3, we propose time series models for diversification in order to make forecasting decisions whether to include additional assets into the portfolio.Then, in Section 4, we provide the empirical study where we estimate the time series models for diversification measures and evaluate the economic relevance of the corresponding diversification forecasts.In Section 5, we conclude the paper, whereas some theoretical results are placed in Appendix A.

Quantifying Diversification Benefits
Consider a portfolio of risky assets with log-price p p (τ) and an additional asset (not included in this portfolio yet) with log-price p a (τ) where τ ∈ I R + represents continuous time.Denote the bivariate vector of their log-prices p(τ) = (p p (τ), p a (τ)) , and assume that p(τ) is a Brownian stochastic semimartingale with a spot covariance matrix Θ(τ).The integrated covariance matrix at day t is denoted by Σ t with: where σ 2 p,t and σ 2 a,t are the daily variances of the initial portfolio and the additional asset returns, respectively, and σ pa,t denotes the corresponding covariance, so that vech(Σ t ) = (σ 2 p,t , σ pa,t , σ 2 a,t ) .Further, we assume that the matrix Σ t is positive definite for all t.
For day t, the log returns on the initial portfolio are x p,t = p p (t) − p p (t−1) and on the asset x a,t = p a (t) − p a (t−1).Construct a new portfolio with log return x t , which is a linear combination of returns on the initial portfolio x p,t and on the asset x a,t : (2) The daily variance of the new portfolio return is denoted by σ 2 t .This problem can be reformulated as a task of constructing a two-asset (the original portfolio and the additional asset) GMVP, where the corresponding GMVP weight is obtained as a solution of the variance minimization task: The solution of the task (3) is the weight of the original portfolio, which is given by: The variance of the new GMVP obtained by combining the original portfolio with the additional asset is given as: In line with Frahm and Wiechers (2013), we argue that a statistic that measures the distance between σ 2 t and σ 2 p,t is of great interest: when σ 2 p,t is substantially larger than σ 2 t , then there are diversification benefits to achieve by including this additional asset into the portfolio.The diversification benefits could be quantified by the variance ratio D t , which is defined as: with ρ pa,t = σ pa,t /(σ p,t σ a,t ).As it holds that σ 2 a,t σ 2 p,t − σ 2 pa,t > 0 due to the positive definiteness of the matrix Σ t , the case σ 2 p,t = σ pa,t is excluded as well.Hence, D t ∈ (0, +∞), and the value of D t close to zero indicates no reasonable diversification effect from including this asset in the existing portfolio.
However, for our purposes, it is more convenient to consider the log measure: As the log measure L t ∈ (−∞, ∞), it appears to be advantageous from the perspective of time series modeling.

Realized Measures for Diversification
The availability of intraday high-frequency returns provides the possibility to construct precise realized volatility measures.Using them, we introduce the realized diversification measure D t and analyze its asymptotic stochastic properties.
Assume that we observe m intraday log-prices for day t at uniformly-spaced time intervals.Then, the jth intraday return vector is given by: These high-frequency intraday returns appear to be useful for the construction of the realized covariance measures, which are precise nonparametric ex-post estimates of Σ t .The most simple realized covariance matrix estimator is given as: Accordingly, for the entries of matrix Σ t , we get the realized estimators σ 2 a,t = ∑ m j=1 x 2 a,t,j , σ 2 p,t = ∑ m j=1 x 2 p,t,j , and σ pa,t = ∑ m j=1 x p,t,j x a,t,j .More advanced estimators, such as the realized kernel or composite realized kernel (cf.Lunde et al. 2016), are proposed in order to provide more elaborated realized volatility estimators.
Given the realized covariance matrix Σ t , the realized diversification measure D t is defined as: .
For time series modeling purposes, it appears to be more convenient to consider the measure L t = ln D t , as its distribution is not as skewed as those of D t .We formulate the stochastic properties of both D t and L t in the following proposition, which is proven in Appendix A.
Proposition 1.Consider a realized covariance matrix measure Σ t in (9), which is a consistent estimator of the positive definite covariance matrix Σ t for the number of intraperiod returns m → ∞.Then, the realized diversification measure D The expression for Π t was provided by Barndorff-Nielsen and Shephard (2004); its realized estimator Π t is given in (A1) in Appendix A, whereas the gradient is given by: Next, the estimator L t of L t is also consistent and asymptotically normally distributed for m → ∞ with: with the gradient given as: .
A similar asymptotic distribution for the optimal realized weight w t could be obtained as a special case of the Theorem 1 result in Golosnoy et al. (2019).We provide the asymptotic variance of w t in Appendix A, together with the asymptotic covariances ACov( w t , D t ) and ACov( w t , L t ) for m → ∞.
The distributional result in Equation ( 10) in Proposition 1 would be of particular importance for making tests for the usefulness of the additional asset in the portfolio composition.For example, the null hypotheses H 0 : D t = D 0 can be tested by means of the statistic: If this statistic T D,t is smaller than the α-quantile of the N (0, 1) distribution, no significant diversification benefits can be achieved by including the additional asset into the portfolio.The test for H 0 : L t = L 0 could be conducted by analogy with the test statistic: To illustrate the results in Proposition 1, we conducted a Monte Carlo simulation study, which was designed as follows.For T = 10 4 days, we drew m intraday returns x t = (x t,p , x t,a ) from a bivariate normal distribution with x t ∼ N 0, m −1 Σ and vech(Σ) = (1, σ pa,t , 1) .We performed this simulation study for m = 78 corresponding to 5-min intraday sampling and for m = 390 corresponding to 1-min intraday sampling.We selected the correlation σ pa,t ∈ [0, 0.3, 0.5].Relying on the results from Proposition 1, we computed ( 11) and ( 12) for each t = 1, ..., T and then calculated the sample moments-mean, variance, skewness, kurtosis-as well as applied the Kolmogorov-Smirnov (KS) and Jarque-Bera (JB) test to check whether ( 11) and ( 12) followed a standard normal distribution.The values D 0 and L 0 were set to the true value for the given matrix Σ.
From the results reported in Table 1, we could observe that the sample moments were much better matched by T L compared to T D .The p-values of the KS and JB tests indicated that at intraday sampling frequency m = 78, the assumption of normality could be rejected in some cases.However, for m = 390, both D t -and L t -based test statistics appeared to be quite close to the null hypothesis of the standard normal distribution.

Time Series Model for Diversification Measures
The realized diversification measures D t and L t are observable at the end of period (day) t.However, the investor should decide whether to include or not the additional asset into the portfolio already at t−1.Hence, it is necessary to forecast the diversification measures based on the information set F t−1 in order to facilitate the investment decisions.Since the measure D t is a function of σ 2 p,t , σ 2 a,t and σ pa,t , we expect that both time series of D t and L t would exhibit similar properties, such as slowly decaying autocorrelation functions, which is also supported by the empirical evidence in Section 4. Next, we concentrate on time series modeling of the observable realized measures L t .
To capture the persistency of D t , we utilized the Heterogeneous Autoregressive (HAR) model proposed by Corsi (2009), which is a natural choice for modeling log realized volatility series.The HAR models have several advantages for our purposes.First, they can be estimated by a simple application of the Ordinary Least Squares (OLS) methodology.Second, the model is sparsely parameterized so that only a few parameters need to be estimated.Third, the model is able to accommodate easily various shocks due to its cascade nature, which makes the HAR approach rather robust.The HAR model for D t is defined as: with Hence, the HAR model postulates that the current D t depends on the previous day D t−1 , as well as on averages over the last week and over last month.By analogy, we specify the model for L t : with L (w) t = ln (1/5) • ∑ 5 i=1 D t−i , etc.For making investment decisions at t − 1, it is also of importance to forecast the GMVP weight w t based on the information set F t−1 .As the time series properties of realized w t are rather similar to those of the realized volatilities, we applied for w t the HAR model as well with: with w (w) t = (1/5) • ∑ 5 i=1 w t−i , etc.The corresponding one-step-ahead forecasts are denoted as D t , L t , and w t .The HAR-type models in ( 13)-( 15) are estimated by the OLS methodology in Section 4.

Empirical Application
We structure our empirical study in the following way.First, we describe the data and construction of the portfolios in Section 4.1.Second, we estimate the time series models and provide the corresponding diagnostics in Section 4.2, where we compare the HAR approaches with simple AR(1) and AR(5) alternatives.Finally, in Section 4.3, we evaluate the performance of our approach by investigating whether our measures are suitable for reducing the portfolio variance from the investors' perspective.

Data and Construction of Portfolios
Our sample consisted of 10 stocks listed in Table 1 ranging from 1 February 2001-31 December 2009 with T = 2242 observations in total.This sample was investigated by Noureldin et al. (2012) and is available through Heber et al. (2009).We used the period until 28 February 2005 with 1022 observations as the in-sample to estimate our models.Table 2 displays the average in-sample and out-of-sample daily realized variances of all assets.For our purposes, we constructed three portfolios, namely the equally-weighted portfolio with 10 assets denoted as P ew , as well as two equally weighted portfolios from five stocks each with the highest and lowest average in-sample daily realized variances, denoted as P high and P low , respectively.In the out-of-sample period, which was manifested by the subprime mortgage crisis 2007-2009, the average portfolio variance increased compared to the in-sample period by 150.29% for P high , by 75.13% for P ew , and only by 5.01% for P low .
We consider now the following setting: the investor holds the portfolio P low and considers the possibility to reduce the portfolio risk by including the portfolio P high as an additional asset.For this approach, we computed the realized diversification measures L t and the realized GMVP weights w t , which correspond to the proportion of P low in the new portfolio.
In Figures 1 and 2, we provide the autocorrelation function for L t and w t for both the in-sample and out-of-sample.Both measures appeared to be rather persistent, which is also taken into account by time series modeling in Section 4.2.

Time Series Modeling
For time series modeling of L t and w t , we applied the HAR models as in ( 14) and ( 15).Moreover, we considered both the AR(1) and AR(5) models as simple benchmarks for both processes with, e.g., AR(5) for the realized L t , parameterized as: All three models were estimated by OLS with the results reported in Table 3. Almost all model coefficients proved to be significantly different from zero.Considering the R2 , the HAR model gave the best fit, followed by the AR(5).Moreover, the AR(5) and HAR both had lower values for AIC and BIC than the AR(1) model.Next, we analyzed the in-sample regression residuals to further check the models' adequacy.In Figures 3-5, we show the Autocorrelation Functions (ACF) of the models' residuals and their squares.Based on the ACF plots, we conclude that our HAR and AR(5) modeling removed residual autocorrelation, whereas some autocorrelation remained for the AR(1) approach.Furthermore, there appeared to be no autocorrelation in the squared residuals for all models.Additionally, in Table 4, we provide the results of residual tests, namely the Ljung-Box (LB) test for autocorrelation, the ARCH-LM test for heteroskedasticity, and Shapiro-Wilk (SW) test for the normality assumption.
Supporting the evidence from the ACF plots, the tests failed to reject the null hypotheses of no serial correlation and no ARCH effects for the HAR and AR(5) models.On the other hand, the Ljung-Box test rejected the null "no autocorrelation" for AR(1), indicating that this model does not reflect the underlying dynamics well enough.The normality assumption was clearly rejected for all models.The corresponding p-values are reported in parentheses.
Next, we estimated the HAR, AR(5), and AR(1) models for the process of realized weights w t , with, e.g., the AR(5) model given as: Similar to Table 3, in Table 5, we show the estimation results, whereas the model diagnostics are presented in Table 6.As for the case of L t , the model coefficient for w t were mostly highly significant.At first glance, AR(5) appeared to be preferred by AIC compared to AR(1) and HAR; however, judging by the adjusted R2 , the HAR still seemed to be the best specification among the considered models.In Figures 6-8, we show the in-sample residual ACFs.As for L t , in the case of w t , the ACFs for the HAR and AR(5) residuals showed no remaining autocorrelation, whereas for AR(1), there was still some autocorrelation left.The in-sample diagnostic test results are shown in Table 6.The HAR and AR(5) models for w t seemed to pass all the tests, whereas AR(1) residuals showed some residual autocorrelation.Summarizing our time series modeling, we could conclude that both HAR and AR(5) models seemed to be appropriate for modeling realized diversification benefits L t and realized portfolio weights w t .Next, we conduct out-of-sample analysis in Section 4.3 in order to investigate whether this modeling would be helpful to achieve lower portfolio variances.

Economic Evaluation
Now we provide the out-of-sample analysis within the following framework.Consider the investor holding the portfolio P low and willing to know whether he/she should diversify it further by including the portfolio P high as a potential additional asset.Based on the in-sample data, we estimated the time series models both for w t and L t and denote the corresponding one-step-ahead out-of-sample forecasts by w t and L t , respectively.
Next, consider that the investor is eager to diversify only if volatility could be reduced at least by a certain amount, for example because of the transaction costs argument.In practice, investors often make decisions by relying not on statistical significance, but on some (naive) empirical criteria; see, e.g., Brandt et al. (2009).In order to resemble this setting, we assumed that the investor seeks to diversify away at least 5% of portfolio risk, so that the ratio σ 2 low,t /σ 2 t must not exceed 0.95.This can be translated into a threshold for the log diversification measure L t with the value = ln(1/0.95− 1) = −1.28.Thus, the corresponding decision rule would be to diversify if the forecast L t ≥ and to stay by the initial portfolio if L t < .Then, given the realized measures L t , one could learn in the next period whether this decision was correct or not.The resulting frequencies are visualized using 2 × 2 decision matrices in Table 7. Judging only from the percentage of correct predictions, ( L t < ∩ L t < ) and ( L t ≥ ∩ L t ≥ ), the HAR model appeared to perform better than both AR(5) and AR(1).Note that the HAR approach is a rather conservative one, as it leads to frequent recommendations not to diversify compared, e.g., with AR(5).To sum up, the HAR produced the most correct predictions and, moreover, resulted in the fewest wrong and costly diversification signals.
As a next step, we incorporated into the decision procedure the forecasted portfolio weight w t in order to quantify the amount of a possible portfolio variance reduction.The strategy would be as follows: select the diversified portfolio with the forecasted weight w t in the case of L t > , which would lead to the variance σ 2 t ( w t ), or remain by the initial portfolio P low in the case of L t < with the variance σ 2 low,t .We denote the resulting portfolio variance from this diversification rule as σ 2 t ( w t , L t ), as we considered its ratio to the variances from three benchmark approaches: σ 2 t ( w t ) corresponding to the ex-post GMVP, σ 2 low,t , and σ 2 ew,t for the portfolio with 50% in P high and 50% in P low .The comparison of different models is provided in Table 8.The realized GMVP benchmark σ 2 t ( w t ) provided the lower boundary, so it was reported primarily for comparison purposes.Concerning the portfolio P low , the HAR model provided the possibility to reduce its variance by diversifying in more than 43% of days, leading to wrong decisions only in 12.3% of days.Similar evidence was found for the equally-weighted portfolio P ew .The results became worse the for AR(5) models and appeared to be really unsatisfactory for the AR(1) approaches, where holding P low led to a lower portfolio variance in more than 50% of days.
For a further illustration of our our results, we visualize the time series of portfolio variance ratios with respect to the benchmarks of P low and P ew .In particular, for the HAR, AR(5), and AR(1) approaches, we report the time series of σ 2 t ( w t , L t )/σ 2 low,t and σ 2 t ( w t , L t )/σ 2 ew,t in Figures 9 and 10, respectively. .Ratios σ 2 t ( w t , L t )/σ 2 low,t for the HAR, AR(5), and AR(1) models, from above to below.Note: the red lines correspond to a ratio of one, indicating equal variances.
In Figure 9 for the benchmark P low , we observe that the HAR-based approach suggested to diversify only at a comparatively small number of days, whereas most of the time, the ratio was equal to one, i.e., no diversification was recommended.It provided the major correct recommendation before the start of the crisis.The AR(5) suggested very often diversification decisions; however, they appeared to be mostly disadvantageous from the start of the subprime mortgage crisis in the middle of 2007.The AR(1) model provided mostly wrong diversification decisions, especially during the crisis year 2008.Note that the reasons for these false recommendations could be attributed to either L t or w t forecasting models.Hence, it is apparent that AR(1) is not really suitable for our purposes here.
Different from the case above, in Figure 10, for the benchmark P ew , we observe that the HAR model provided reasonable diversification recommendations especially since the crisis began in 2007; however, it was not really useful before the crisis start.Surprisingly, the other two approaches-AR(5) and AR(1)-also performed similarly to the HAR for this equally-weighted portfolio benchmark.We interpreted these findings as evidence that not only the choice of the time series model, but also the choice of the benchmark could determine the success of a portfolio diversification strategy. .Ratios σ 2 t ( w t , L t )/σ 2 ew,t for the HAR, AR(5), and AR(1) models, from above to below.Note: the red lines correspond to a ratio of one, indicating equal variances.

Conclusions
The availability of intraday returns allows constructing precise realized volatility measures, which should be used for the improvement of risk management procedures.In this paper, we introduced the novel realized measure for portfolio diversification benefits.Our procedure would help to decide whether to include or not to include an additional security into the risky asset portfolio in order to reduce its variance.
After providing the asymptotic properties of our realized diversification measures, we considered several time series models for them such that we formulated a diversification decision rule.The performance of these models was evaluated in the empirical study based on a dataset of 10 risky assets.We found that the HAR time series approach was mostly suitable for out-of-sample prediction of realized diversification measures, as well as in order to forecast the optimal proportion of wealth to invest into the additional asset.
The Delta method is also applied to derive the asymptotic distribution of L t = ln D t : A further application of the Delta method yields the asymptotic distribution of w t , which is a special case of the results in Golosnoy et al. (2019), as well as the asymptotic covariances between w t and D t , w t and L t .
The realized weight of the original portfolio in the GMVP is given as:

Figure 1 .
Figure 1.In-sample: autocorrelation functions of L t (left) and w t (right).

Figure 2 .
Figure 2. Out-of-sample: autocorrelation functions of L t (left) and w t (right).

Figure 3 .
Figure 3. L t , in-sample autocorrelations of HAR residuals (left) and their squares (right).

Figure 6 .
Figure 6.w t , in-sample autocorrelations of HAR residuals (left) and their squares (right).
Figure 9. Ratios σ 2t ( w t , L t )/σ 2 low,t for the HAR, AR(5), and AR(1) models, from above to below.Note: the red lines correspond to a ratio of one, indicating equal variances.
Figure10.Ratios σ 2 t ( w t , L t )/σ 2 ew,t for the HAR, AR(5), and AR(1) models, from above to below.Note: the red lines correspond to a ratio of one, indicating equal variances.

Table 1 .
Sample moments and p-values of Kolmogorov-Smirnov and Jarque-Bera tests for normality.
Note: computed by generating T = 10 4 days with m = 78 or m = 390 intraday returns.

Table 2 .
Average realized daily variances of assets and portfolios.

Table 3 .
Time series model estimates, L t .

Table 5 .
Time series model estimates of w t .

Table 6 .
w t , in-sample residual diagnostic test statistics.
The corresponding p-values are reported in parentheses.

Table 8 .
Ratios of σ 2 t ( w t , L t ) to different benchmark variances.

Benchmark Mean Ratio Std. Deviation of Ratio % with >1 % with <1 % with =1
Note: "% with >1" is % of days where the diversification rule delivers larger portfolio variance than the benchmark.