Can Machine Learning Based Portfolios Outperform Traditional Risk-Based Portfolios? The Need to Account for Covariance Misspecification

The Hierarchical risk parity (HRP) approach of portfolio allocation, introduced by [Lopez de Prado, 2016], applies graph theory and machine learning to build a diversified portfolio. Like the traditional risk based allocation methods, HRP is also a function of the estimate of the covariance matrix, however, it doesn’t require its invertibility. In this paper we first study the impact of covariance misspecification on the performance of the different allocation methods. Next we study under appropriate covariance forecast model whether the machine learning based HRP out-performs the traditional risk based portfolios. For our analysis we use the test for superior predictive ability, on out-of-sample portfolio performance, to determine whether the observed excess performance is significant or occurred by chance. We find that when the covariance estimates are crude, inverse volatility weighted portfolios are more robust, followed by the machine learning based portfolios. Minimum variance and maximum diversification are most sensitive to covariance misspecification. HRP follows the middle ground, it is less sensitive to covariance misspecification when compared with minimum variance or maximum diversification portfolio, while it is not as robust as the inverse volatility weighed portfolio. We also study the impact of different rebalancing horizon and how the portfolios compare against a market-capitalization weighted portfolio.


Introduction
Many of the present day portfolio optimization techniques are based on the mean-variance optimization framework that was developed by [Markowitz, 1952]. Due to the practical challenges associated with forecasting the mean returns, the prevalent popular portfolio risk optimization techniques require only the forecast of covariance of returns. Some of the notable risk-based portfolio allocation methods that rely only on covariance forecasts are the minimum variance [Clarke et al., 2006], maximum diversification [Choueifaty and Coignard, 2008], equal risk budget [Leote et al., 2012], and equal risk contribution [Maillard et al., 2010].
The most well known and common estimator for the forecast of covariance of returns is the sample based covariance. It is calculated from the time series of historical returns. For a covariance matrix of size N there needs to be atleast 1 2 N (N + 1) independent and identically distributed (iid) returns observations to estimate the sample based forecast. Therefore, in order to construct a covariance matrix of returns for 50 assets, one would ideally need at the least 5 years of daily returns time series, with the hope that they are iid data! There is ample evidence that asset returns exhibit heteroskedasticity with volatility clustering, and also that the correlation structures do not remain invariant for such long periods ( [Zakamulin, 2015], [Lopez de Prado, 2016]). There are broadly two major directions of work to address the above concern. The first approach is related to the development of better covariance forecast models. Some of the notable works in this direction are the shrinkage estimation of covariance matrix proposed by [Ledoit and Wolf, 2003], the exponentially weighted covariance matrix that was popularized by [Riskmetrics, 1996]. The sophisticated dynamic conditional correlation (DCC) model by [Engle, 2002], where the persistence in the variance and correlation dynamics is achieved by using a GARCH(1,1) type model is one of the most popular multivariate GARCH models for covariance forecasts.
Hierarchical Risk Parity (HRP), as proposed in [Lopez de Prado, 2016], uses graph theory and machine learning algorithms to infer the hierarchical relationships between the assets which are then directly utilized for portfolio diversification. This approach therefore constitutes the second, more recent, direction of work to circumvent issues related to covariance matrix forecasts. Most of the traditional risk-based optimal allocations require the inversion of the covariance matrix, a step that is avoided in HRP. This provides an additional advantage to HRP, as the inversion of ill-conditioned matrices that is required in most risk-based portfolios can add significant estimation errors. The technique is extended in [Raffinot, 2017] where different methods for hierarchical clustering are employed and the robustness and performance of these algorithms with respect to traditional risk-based portfolios are studied. [Zakamulin, 2015] investigates the impact of the various covariance matrix forecasting methodologies on the performance of minimum variance and target volatility strategies. The study however doesn't pay attention to the performance of other popular risk-based allocation methodologies with these forecasting techniques. The impact of covariance matrix misspecification on the optimal weights that result from different risk-based optimization methods is reported in [Ardia et al., 2017]. The paper, however, does not study the impact of covariance matrix misspecification on portfolio performance, but rather just the portfolio weights. While [Raffinot, 2017] shows better performance for HRP and its variants when compared to traditional risk-based allocation techniques, the study doesn't account for the impact of covariance misspecification on the outcomes due to the use of possible inferior covariance matrix forecasting methods.
The objectives of this paper are two-fold. The first objective is to empirically study whether there are covariance matrix forecasting methodologies that provide superior performance for both traditional risked based, and machine learning based portfolios. This is achieved by looking at the out-of-sample performance of the portfolios; constructed using covariance matrix obtained from different forecasting methodologies, at the daily, weekly, and monthly forecasting horizon. The second objective is to study if the more sophisticated machine learning algorithms provide a better portfolio performance when compared with the traditional risk-based portfolios that are constructed using appropriate covariance forecasting methodology. For both the objectives, we use the stationary bootstrap based superior predictive ability (SPA) test proposed in [Hansen, 2005]. SPA test has been designed to evaluate whether an observed excess performance is significant or could have occurred by chance.
This paper is organized as follows. Section 2 describes the various risk-based portfolio allocation methods considered in the paper, while Section 3 describes the covariance forecast models. Section 4 explains HRP, the machine learning based portfolio allocation approach, we consider in this work. In Section 5 we describe the Data used and Methodology followed for the out-of-sample performance evaluations. We present our empirical results in Section 6 and Section 7 contains some concluding remarks.

Risk-based Portfolios
We define a generic portfolio in a market with N risky assets by the N × 1 vector of portfolio weights w ≡ (w1, . . . , wN ) . The N × N covariance matrix of the N × 1 arithmetic returns r ≡ (r1, . . . , rN ) forecasted for the desired holding horizon is denoted by Σ. We consider long only portfolios in our entire analysis. The constraints can be defined as The study considers the following traditional risk-based portfolios

Minimum variance portfolio (MVP)
The weights for a minimum variance portfolio is obtained by solving the following quadratic optimization problem wmin ≡ argmin w∈C w Σw (1)

Inverse volatility weighted portfolio (IVWP)
Let σ ≡ diag(Σ) be the N × 1 vector of standard deviation of arithmetic returns, then the inverse volatility weighted portfolio assigns the following weights to the N assets

Equal risk contribution portfolio (ERC)
This strategy assigns weights such that each asset contributes equally to the overall portfolio volatility. If we denote the %RCi as the percentage risk contribution of the ith asset, then In the ERC portfolio, %RC for all the assets is equivalent to 1 N . The weights can be calculated by solving

Maximum diversification portfolio (MDP)
This is a strategy where we maximize the diversification ratio of the portfolio, which is the ratio of weighted average of stock volatility and portfolio volatility, The weights for maximum diversification portfolio that was first proposed by [Choueifaty and Coignard, 2008] is then obtained by solving

Market-capitalization-weighted portfolio (MCWP)
A market-capitalization-weighted portfolio is constructed by assigning the following portfolio weights, where Mi is the market capitalization of the ith asset at the time of asset allocation. Market-cap-weighted portfolios serve as an important benchmark, as under a standard interpretation of the Capital Asset Pricing Model; a market portfolio is automatically Sharpe ratio maximized. Note that [Hsu, 2004] empirically show that market cap weighted portfolios can be sub-optimal .

Covariance matrix forecasting methods
Given the time series of T past returns, r t−T , . . . , rt−1, we want to forecast the covariance of returns, Σt, of rt. We discuss below the methods used in our study for the forecast of the covariance matrix.

Sample-Based Covariance (SMPL)
Following the notations of [Zakamulin, 2015] we assume that the vector of daily asset return is given by where εt is the vector of white noise on day t such that E[εt|Ft−1] = 0 N , where 0 N , is N × 1 vector of zeros.
To estimate the sample based covariance matrix, we use the rolling window of T historical log returns. The covariance matrix on day t is given byΣ Because of the time aggregation property of log-returns the covariance matrix for weekly and monthly returns projection can be obtained as the sum of the iterated 1 day ahead covariance predictions.

Exponentially weighted moving average (EWMA)
This estimator is designed to focus more on the recent past returns, a method that was popularized by the [Riskmetrics, 1996]. The exponentially weighted covariance matrix is estimated by using the following recursion: where based on the recommendation of the RiskMetrics group a decay constant λ = 0.94 for daily returns is used. We calculate the forecast for weekly and monthly EWMA covariance matrix by multiplying the daily covariance matrix by the number of days in the subsequent week and month respectively.
The conditional correlation is modeled as whereQ is the unconditional correlation matrix of ε. The parameter estimation is done via a two stage optimization, where first the parameters, ω, κ, λ that maximize the log-likelihood of the conditional variance are determined. In the second stage the values of α, and β that maximize the log likelihood of the conditional correlation are determined while taking into account the results from stage one. We use the log-returns for calibrating the model, as then using the time aggregation property of log-returns the covariance matrix for weekly and monthly returns projection can be obtained as the sum of the iterated 1 day ahead covariance predictions.

Hierarchical risk parity (HRP)
The traditional risk-based portfolios are sensitive to the accuracy of the forecasted covariance matrix (see [Ardia et al., 2017], [Zakamulin, 2015]). When the assets are highly correlated there is a greater need for diversification. However, for highly correlated returns, the condition number of the covariance matrix, i.e. the ratio between its maximal and minimal eigenvalues is large. The weights calculated when the covariance matrix has a high condition number can have large estimation errors, as their calculations involve inversion of the covariance matrix. Therefore, the benefit of diversification in such a case cannot be materialized, due to the large estimation errors for the portfolio. This as [De Prado, 2018] refers, is known as the Markowitz's curse.
The Hierarchical Risk Parity ([Lopez de Prado, 2016]) approach addresses the problems of traditional riskbased portfolio optimisation by using the covariance matrix without inverting it. In essence, HRP calculates the inverse volatility weights for groups of similar assets, that are iteratively scaled down as one moves to even smaller sub-groups until each asset forms a subgroup. The algorithm operates in three stages. The first step involves determining the hierarchical relationships between the assets using a recursive cluster formation scheme. The clusters are formed using correlations to identify similar groups of assets that are successively merged until one large cluster. The next stage involves the quasi-diagonalisation of the covariance matrix by rearranging rows and columns based on the information from the first stage. The aim of the second stage is to achieve a more diagonal representation of the covariance matrix with high correlations placed close to each other and therefore, the diagonal. Quasi-diagonalisation ensures that similar investments are grouped together and dissimilar ones are kept fairly apart. After this quasi-diagonalisation of the covariance matrix, weights are distributed using inverse variance allocation between sub-groups that are obtained by recursively bisecting the rearranged covariance matrix from the second stage. We here detail the three stages of the HRP algorithm.

Clustering
Clustering is a partitioning technique to group data points based on their characteristics. In the case of HRP the correlation coefficient is used as the characteristic to measure the similarity between time series, and therefore to cluster assets that have similar time series. HRP uses an agglomerative nesting for clustering, where initially all the individual assets behave as a separate cluster. Then on the basis of their correlation, they start forming bigger clusters until all the similar assets are clustered together. First, a suitable distance metric is defined as: where di,j is the correlation-distance index between i th and j th asset and ρi,j is the corresponding Pearson's correlation coefficient. Matrix D ≡ diji, j = 1, . . . , N defined in such a way will be an appropriate metric space  Prado, 2018] for proof). Next, a matrix that defines Euclidean distance between any two columns of D is defined asD, whose elements ared Agglomerative clustering starts with every asset representing a single cluster. At each step, the closest two clusters are merged into one. The measure of dissimilarity between the clusters is known as the linkage criterion. There are three different agglomerative clustering linkage criteria that are used in this study.
• Single Linkage: The single linkage (SL) clustering method keeps the distance between two clusters as the minimum of the distance between any two points in the clusters such that: This method is simpler to implement but sensitive to outliers and might result in long chained clusters [Raffinot, 2017].
• Average Linkage: In the average linkage (AL) technique the distance is defined by the average of the distance between any two points in the clusters. For clusters Ci, Cj: • Ward's Method: The most popularly used method is Ward's method ([Ward Jr, 1963]). It says that the distance between two clusters is how much the sum of squared errors will increase when they are merged: where mi, mj are the cluster sizes and ci, cj are the center of the clusters Ci, Cj. It starts at zero, and then grows as clusters merge. Figure 1 gives a schematic of the outcome of agglomerative clustering of the assets.

Quasi Diagonalisation
This step of the HRP algorithm, rearranges the covariance matrix using the information from the clustering algorithm. It places the assets with high correlations adjacently and close to the matrix diagonal, making sure that the similar assets are placed together. It allows us to allocate weights optimally following an inverse-volatility allocation described below.

Recursive Bisection
The weights are allocated by inverse-volatility technique between two clusters and are scaled down as each cluster is recursively bisected until a single asset is left in each cluster. The algorithm for recursive bisection has the following steps (see [Lopez de Prado, 2016] for details): 1. Initialize a list of assets in the portfolio with L = {L0}, with L0 = {n}n=1,...,N .

Stop if
which is the inverse volatility weight for the elements of the cluster.

Data and methodology
The optimal weights in a portfolio depend on the general level of correlation between the assets of the investment universe [Schumann, 2013], and the specific composition of the investment universe [Bertrand and Lapointe, 2018]. We try and capture different correlation and composition structures by creating five different universes as summarized in Table 1. We use the individual stocks that comprise the NIFTY 50 index of National Stock Exchange (NSE) India, to create these sub-universes. The first universe includes the top 10 stocks in the financial sector, the second includes the top 10 stocks by market capitalization (as of Dec 2016), the third and fifth universe contain randomly selected individual stocks from NIFTY 50, and the fourth universe contains individual stocks from the energy sector. The composition of the five universes is listed in Appendix A.
For each dataset, we divide the observations into an estimation period and an evaluation period: We use the daily adjusted closing prices of the individual stocks from Nov 2010 to Dec 2016 (a total of T = 1525) observations for the estimation period. The parameters of the three covariance forecast models, as described in Section 3, are estimated using the first T inter-day observations. We consider following three cases for portfolio rebalancing, (a) daily h = 1, (b) weekly h = 5, and (c) monthly h = 20. For daily rebalancing, we need to forecast, given the returns up until time t, and the model parameters calibrated using the rolling window of T past observations, the covariance of returns on the t + 1th day. To obtain the weekly t + 5 and monthly t + 20 covariance forecasts, we sum the iterated 1-step ahead covariance predictions using the parameters calibrated for t. Iterated sum of daily covariance forecasts to obtain weekly and monthly forecasts is possible if we work with log returns, because of its time aggregation property. The h-period forecast of covariance matrix of the log returns is then converted to h-period covariance matrix of linear returns (See Appendix B for detailed explanation) for calculating the optimal weights. This conversion is essential as only weighted sum of linear (as opposed to log) individual assets' returns is equal to the portfolio return.

Intra-day realized covariance estimator
During the evaluation period, we calculate the out-of-sample realized portfolio performance based on different risk measures as described in Section 5.2. In order to compute the realized performance, we use the minute by minute intra-day prices ( 400 data points per day) collected from NSE for the period starting from 2nd January 2017 until 31st December 2017 1 . The intra-day returns data is constructed artificially by fitting a Piecewise Cubic Hermite Interpolating Polynomial (PCHIP) on the available data in order to obtain returns on an equally spaced time grid of m = 200 points between 9:00 IST -15:30 IST for all the assets. The intra-day returns is then defined asrt,i ≡ log( , is a vector of asset prices. It is reasonable to assume that E[rt,i|Ft−1] ≈ 0 N and that intra-day returns have no autocorrelation for moderately large values of m (see [Hansen and Lunde, 2005]). The relationship between the log intra-day returns and the daily returns is given by, The realized daily covariance Σt can then be estimated as where the equality in Equation 11 is due to the assumption of absence of autocorrelations in the returns time series, Equation 12 is the outcome of the assumption that the expected value of intra-day returns is nearly equal to zero. We therefore use Σ OC t ≡ m i=1r t,ir t,i as the estimator for realized intra-day covariance. As the NSE stock market is not open 24hrs, the intra-day covariance misses out the covariance contribution from the time market closes until it opens on the next working day. We follow the approach of [Martens, 2002] and [Koopman and Hol Uspensky, 2002], where a scaling factor is used to convert intra-day volatility to obtain a measure of volatility for the whole day. The scaling factor for returns of the i−th stock is computed as co,i , is the variance in the close to open log returns for the i−th stock, andσ 2 oc,i is the corresponding open to close variance of log returns measured in the evaluation period. Let c ≡ {c1, . . . , cN } , then the measure for daily covariance from the intra-day return is obtained as wherert,i ≡ √ c rt,i. The realized variance is next used to evaluate the performance of the portfolios based on the measures described below.

Portfolio risk measures
To assess the out-of-sample performance of the different portfolio strategies, and different covariance forecast methods, we use the following risk measures. These risk measures also double up as the loss functions for the superior predictive ability test that is described in Section 5.3. 1) Portfolio variance: We use the total daily variance of the portfolio as the first measure of performance.
The realized variance of the portfolio is given by, whereŵt is the vector of weights obtained for a particular choice of portfolio allocation technique and covariance forecastΣt. Σt is the realised covariance matrix obtained from the intra-day returns data for t = 1, ..., n, as described in Section 5.1. Higher realized value of portfolio variance is an indicator of bad performance, and therefore we can directly use portfolio variance as a loss-function for the SPA test.
2) Conditional Value-at-Risk (CVaR), also known as the expected shortfall, is a measure of risk, which is defined as (see [Acerbi and Tasche, 2002]). Let X be the profit-loss of a portfolio on a specified time horizon and let α ∈ (0, 1) be some specified probability level. The expected α shortfall of the portfolio is defined as where P is the appropriate probability measure and Note that the corresponding value-at-risk (VaR) is given by We first compute the out-of-sample realized intra-day returns for the constructed portfolio Xi =ŵ trt,i , i = 1, . . . , m and then sort it according to increasing profits X1:m ≤ . . . ≤ Xm:m and approximate the number of α elements in the sample by s = [mα] = max{v|v ≤ mα, m ∈ N}. Then the set of worst case losses corresponding to parameter α would be represented by the least s outcomes{X1:m, . . . , Xs:m}. VaR of the portfolio would be −Xs:m, and the expected shortfall can be estimated as Again as higher CVaR values are indicators of bad performance, we use the CVaR values directly as loss function for the SPA test.
3) Herfindahl Index (H * ) of percentage risk contribution : The normalized Herfindahl index is an indicator of concentration risk. It takes the value between 0 and 1, where 0 signifies a perfectly diversified portfolio. It is calculated as: . As greater value of the index reflects greater risk concentration and therefore we use the index directly as one of our loss functions for the SPA test.

4) Diversification Ratio (DR) : It is computed as defined in Equation 4
. In order to compute the realized DR. we use the portfolio weights computed using the forecasted covariance matrix, and the covariance matrix in the equation is substituted with the realized covariance matrix. It gives the measure of diversification in the portfolio and takes values ≥ 1. As we know, higher diversification ratio is a better performance indicator, we use −DR as our loss function.

Test for superior predictive ability
In our study we want to evaluate whether a particular benchmark model is significantly outperformed by other models, while taking into account the large number of models that are being compared. Let k = 0, . . . , l be the models being considered, with k = 0 being the chosen benchmark model and k = 1, . . . , l are the models the benchmark is being compare against. Each model leads to a sequence of daily losses, L k,t , t = 1, . . . , m, where the losses are chosen as the realized portfolio variance, CVaR, H * (%RC), and the negative of Diversification Ratio, as described in Section 5.2. The relative performance variables are defined as Let Xt = (X1,t, . . . , X l,t ) be a vector of relative performances and if µ = E(Xt), µ ∈ R l×1 is defined, our null hypothesis is H0 : µ ≤ 0, that is, the benchmark model is not inferior to any of the alternative models when the objective is to minimize the expectation of the loss function considered.
The SPA test is based on the test statistic, represents the largest test statistic of relative performance. We want to find if T SPA m is too large for it to be plausible that µ ≤ 0. This is achieved through the SPA test where the distribution of T SPA m is estimated under the null hypothesis and the critical value of T SPA m is obtained. Under the assumptions that Xt is stationary and has well defined moments (see [Gonçalves and de Jong, 2003] for the necessary assumptions and [Hansen and Lunde, 2005] for the justification of the assumptions), it is known that the distribution of √ m(X − µ) converges to a multivariate normal distribution with mean 0 and covariance ]. This result can be used to determine the distribution of T SPA m , however, as m is practically not large enough relative to l it is not possible to obtain the l × l covariance matrix Ω. One has to then rely on the stationary bootstrap of [Politis and Romano, 1994] to estimate the distribution of T SPA m .

Stationary bootstrap based implementation
We obtain B bootstrap re-samples, (X * b,1 , . . . , X * b,m ),b = 1, ..., B, using the stationary bootstrap approach of [Politis and Romano, 1994]. The bootstrapped re-samples are then used to estimate ω 2 kk and the distribution for T SPA m . First we calculate the sample averages,X * b = 1 m m t=1 X * b,t and next estimateω 2 kk ≡ m B B b=1 (X * b,k −X k ) 2 from the bootstrapped re-samples, as the empirical distribution of m 1/2X * b converges to the true asymptotic distribution of m 1/2X , (see [Gonçalves and de Jong, 2003]). As we seek the distribution of T SPA m under the null hypothesis, we must recentre the bootstrap variables about the true value of µ. As we do not have a true value for µ, we can use the three estimates proposed in [Hansen, 2005], i.e., where A k,m ≡ 1 4 m −1/4ω kk is the correction factor 2 . Now we redefine our performance variables for each bootstrapped re-sample asZ *

Results
The aim of the paper is to study two major objectives, the first is for a given portfolio allocation method (for different rebalancing horizons), is there a benchmark covariance forecast method that is not inferior to the other methods considered. The second objective is to determine based on risk objectives, allocation methods that are not inferior to other allocation methods for different rebalancing horizons. For the second objective, we use corresponding to the allocation method, the covariance forecast model that came out as the benchmark from the first study. For both objectives, we study whether the outcomes are consistent for different rebalancing frequencies of the portfolios.

Superior method for forecasting covariance matrix
The covariance forecast models considered for this study are the SMPL, EWMA and DCC-GARCH details of which are described in Section 3. The portfolio allocation methods considered and the corresponding loss functions that were used for the SPA test are reported in Table 2 The out-of-sample loss is calculated using the weights computed from covariance forecasts made by different models,ŵt, together with the realized returns and covariances for that period. The results from the SPA test for the case of daily rebalancing in the form of p-values (we only report p c -values, as p l , and p u are not significantly different) is reported in Table 3. The p−values correspond to the null hypothesis that a chosen model is as good as any other model. A low p−value ( we take a value ≤ 0.05 ) rejects the null hypothesis, which implies 2 It ensures that limm→∞ P (μ c k = 0 | µ k = 0) = 1 and limm→∞ P (Z * b,k,m ≤ 0 | µ k < 0) = 1 which is important for the consistency, as the models with µ k < 0 do not influence the asymptotic distribution of T SP A n , see [Hansen, 2005] Category Method Name Loss function used for SPA We reach almost similar conclusions for weekly and monthly rebalancing, as the results (not reported here) are not significantly different from that for daily rebalancing. For daily, weekly and monthly rebalancing, the performance of HRP is not inferior, in expectation, when DCC GARCH is used as covariance forecast model, although with longer rebalancing horizons the other forecasting methods also are not inferior in few universes. In case of MDP the outcomes are not inferior when DCC-GARCH is used for forecasting the covariance matrix. MVP and ERC, in most universes, perform best when DCC-GARCH is used as forecast model, although this conclusion is not as evident as in the case of MDP. IVWM seems more robust to the choice of covariance forecast methodology, although DCC-GARCH still doesn't provide an inferior performance in most of the universes.

Benchmark allocation methods for different portfolio performance objectives
From Section 6.1 it is evident that DCC-GARCH can be used as benchmark model for the traditional risk based allocation methods as well as machine learning based allocation methods, as it provides performance that is not inferior to any other covariance forecast model in most of the universes considered. We now try to determine if there are benchmark allocation methods whose performance are not inferior to the other models when different risk objectives are considered as loss functions. Unless specified otherwise, we use DCC-GARCH to forecast the covariance matrix for all the allocation methods. The market-cap weighted portfolio is also included in the study as it serves as proxy for passive investment strategies.

Out-of-sample portfolio variance
We first study the out-of-sample daily realized portfolio variance for the different allocation models considered. Table 4 reports the corresponding p−values for the null hypothesis, that the performance of the portfolio constructed using the benchmark allocation method is not inferior to the performance from other allocation methods. The results reported in the table are for the case of daily rebalancing of the portfolio. Clearly MVP, designed with the objective to minimize portfolio variance, doesn't perform well as benchmark model. We find that atleast one of the variants of HRP has a large p−value in every universe. The performance of the HRP(SL) can not be considered inferior to other models in any universe, when DCC-GARCH is used for covariance estimation. For this case, when SMPL, an inferior covariance estimation method, is used we find that there is no single allocation method that is not discarded in one of the universes as a benchmark model. IVWP seems most robust amongst the traditional risk based portfolios and HRP (AL) amongst the machine learning based portfolios, as both of them cannot be considered inferior to other models in 3 out of 5 universes. We next study, whether different portfolio rebalancing frequencies can affect the choice of benchmark model for minimizing the portfolios out-of-sample daily realized variance.   Table 2.
when the portfolio is rebalanced weekly and monthly respectively. Again DCC-GARCH is used to forecast the covariance matrix for the weekly and monthly time horizons, and the realized intra-day asset returns and covariances are used to measure the realized portfolio variance. With a weekly rebalancing frequency,amongst the risk based portfolios, ERC and IVWP are not inferior to other allocation methods in three and two out of five universes respectively. With monthly rebalancing they are not inferior in one and two out of five universes respectively. With the machine learning based allocations, we see that with longer forecasting and rebalancing horizons, the fraction of universes in which a variant of HRP did not have an inferior performance goes down. A summary of the realized annual volatilities of the portfolios constructed using the above allocation methods with daily rebalancing is reported in Table 6. A few observations that can be made are, firstly the market-cap weighted portfolios have the highest volatilities. Secondly, the volatilities of risk based and machine learning based portfolios are in similar range, although when DCC GARCH is used, HRP variants have the minimum portfolio variance in all the universes. Finally an inferior covariance estimation model, in this case SMPL, results in higher volatilities. Even for this case the HRP variants have minimum out-of-sample volatility in each of the universe considered, except Universe 2 where ERC has the lowest volatility.

Out-of-sample weekly CVaR
Expected shortfall is a widely used coherent risk measure, especially for computing capital reserves for unforeseen losses. We next look at the out-of-sample realized weekly CVaR, using the intra day returns of the portfolios constructed using different allocation methods. Table 8 reports the p− values for the null hypothesis, that a chosen benchmark model, in expectation, has lower realized expected shortfall value than others. From the reported values it is clear that when MVP and MDP are considered as benchmark model the null hypothesis is rejected due to low p−values. IVWP and ERP are the only two risk-based portfolios that have significant p− values in atleast a couple of universes. However, the results show that atleast one of the variants of HRP has a large p−value in each of the universe. The performance of the HRP(SL) is not inferior to other models considered, except in Universe 5, where HRP(AL) and HRP(Ward) performed better. With an inferior forecast model for covariance, the only risk based portfolio that doesn't have inferior performance is IVWP, which cannot be considered inferior in 3 out of 5 universes. With larger covariance misspecification, HRP(SL) doesn't perform well, however, HRP(Ward) comes out as benchmark model in 4 out of 5 universes.
Realized 5 Table 8: p c -values for different portfolio allocation benchmark models considered when weekly expected shortfall is taken as the loss function and the portfolio is rebalanced daily. The highlighted cells are outcomes for variants of HRP.
How does portfolio rebalancing frequency affect the choice of benchmark model for minimizing the portfolios out-of-sample weekly CVaR values? Table 9 reports the p−values for weekly and monthly rebalancing frequencies with DCC-GARCH used to forecast the covariance matrix. Amongst the risk based portfolios, ERC's performance is not inferior to other models in four out of five universes, while IVWP's performance is not inferior in three universes, for weekly rebalancing with DCC GARCH used to forecast the covariance matrix. HRP variants are not inferior in 4 out of 5 universes, with HRP(SL) still performing (in terms of fraction of universes it is not inferior) better when compared to HRP(AL) and HRP(Ward). With monthly rebalancing both ERC and IVWP come out as not inferior choice in 3 out of 5 universes, while the null hypothesis with a variant of HRP considered as benchmark model is not rejected in just 2 out of 5 universes.

Out-of-sample Herfindahl index and diversification ratio
While minimizing portfolio variance and expected shortfall are seen as an outcome of better diversification, we next study directly the extent of out-of-sample diversification using the Herfindahl index of the realized percentage risk contribution, and the realized diversification ratio of the portfolio. Table 10 reports the p−values corresponding to different choices of the portfolio allocation methods considered as benchmark models with daily rebalancing. Clearly MDP, designed with the objective to maximize portfolio diversification, doesn't perform well as a benchmark model, with the null hypothesis being rejected in all the universes. Only ERC and IVWP have large p−values in 3 and 2 out of 5 universes respectively . With weekly rebalancing (see Table 11) this becomes 4 and 3 out of five universes respectively while with monthly it is again 3 and 2 out of 5 universes  Table 9: p c −value of different benchmark portfolios based on the out-of-sample CVaR for different rebalancing horizons. The covariance forecast model is taken as DCC-GARCH.
respectively. IVWP can be considered as the benchmark model when the objective is to minimize the Herfindahl index computed based on the realized percentage risk contribution of the underlying assets. In 2 out of 5 universes, ERC is not inferior to others when the loss function is taken as H * (%RC). The conclusions for the choice of benchmark model with objective to minimize the Herfindhal index remains the same when different rebalancing horizons are considered, as reported in Table 11.

Out-of-sample Sharpe ratios
We have so far tried to identify if there are benchmark models that perform not inferior to other models with respect to purely risk driven objectives. We now bring in realized portfolio returns into our analysis by comparing the performance of different models when the objective is to maximize the Sharpe ratios. We take as loss function for our SPA test, the negative value of the realized weekly Sharpe ratios. The weekly Sharpe ratios are computed using the intra-day returns.  GARCH is used for forecasting the covariance matrix, there are candidates from both risk-based portfolios and machine learning based portfolios whose performance can not be considered inferior to other models in most of the universes. The minimum variance portfolio can be considered as a benchmark model in 4 out of 5 universes. Market-cap weighted portfolios also are not inferior to other methods in 4 out of 5 universes. Amongst the ML based portfolios, HRP(SL) can be considered as a benchmark model in almost all the universes for this case. When SMPL is used for forecasting the covariance matrix, the performance of both traditional risk based portfolios and machine learning based portfolios is significantly impacted, which is clear from the corresponding p−values. However, market-capitalization weighted portfolio, which does not require a covariance forecast, can be considered not inferior to other models in all the universes.
The impact of rebalancing frequencies on the choice of benchmark model, whose weekly out-of-sample Sharpe ratios are not lower than the other models, is reported in Table 13. An observation that can be made is that with longer rebalancing horizons the relative performance of machine learning based portfolios becomes inferior. While the HRP(SL) could be considered not inferior in almost all the universes when the portfolio was rebalanced daily, with weekly and monthly rebalancing this reduces to just two. The relative performance of the minimum variance portfolio, on the other hand, improves, with a p− values close to one in most of the universes. Another interesting observation is that in this case the relative performance of the market-cap weighted portfolio also deteriorates with longer rebalancing horizons. It should be noted that we comment only on the relative performance of the models for different rebalancing horizons considered. It should not be inferred, for instance, that HRP gives the highest Sharpe ratio when the portfolio is rebalanced daily, but rather if a choice of weekly rebalancing has been made, MVP would in expectation provide not inferior Sharpe ratio than HRP.
For the sake of completeness, we provide the summary of the realized annual Sharpe ratios for the different portfolio strategies for the year 2017. Table 14 provides the realized Sharpe ratios when DCC-GARCH and SMPL are respectively used for covariance forecast, while the portfolio is rebalanced daily. We see that the realized Sharpe ratios of MVP and MDP amongst the traditional risk based portfolios are significantly affected by the covariance misspecification. The results for IVWP and ERC appear more robust in presence of covariance misspecification, a result consistent with the findings of [Ardia et al., 2017]. The machine learning based portfolios, as expected from previous experiments, perform better with DCC-GARCH. However, an inferior covariance estimator, does not as significantly affect the outcomes as it does for MVP. Market-cap weighted portfolio is outperformed in majority of the universes only by IVWP, when the portfolio is daily rebalanced.
The realized annual Sharpe ratio for different rebalancing horizons, when DCC-GARCH is used for covariance forecast, is reported in Table 15. The Sharpe ratios improve for most of the allocation methods, while moving from daily to monthly rebalancing, except for the market-capitalization weighted portfolio. The most significant improvement in the Sharpe ratios is for MVP, followed by the three variants of HRP. Overall, with longer horizons for rebalancing, MVP performs the best, while the performance of the variants of HRP, IVWP and ERC are similar for the dataset we consider. For our dataset, MDP and MWP perform comparatively poor (in Realized weekly Sharpe Ratio with DCC GARCH Benchmark Method Univ 1 Univ 2 Univ 3 Univ 4 Univ 5 MVP   Table 13: p c − value for different choices of benchmark models when the loss function considered is negative of weekly Sharpe ratio and different rebalancing horizons are considered. The covariance matrix forecast is made using DCC-GARCH that order), when the objective is to maximize the Sharpe ratio.
If an inferior covariance forecast method is used, the above inference can change significantly for longer rebalancing horizons. This is illustrated in Table 16 which reports the annual Sharpe ratios of a portfolio rebalanced monthly using the covariance forecasts obtained from SMPL. For the dataset we consider, IVWP which is more robust to covariance misspecification, performs the best in most of the universes. The other allocation methods that are not significantly affected are the variants of HRP and ERC. For the dataset considered, MVP and MDP appear to be most significantly affected by inferior covariance forecasts.    Table 16: The annual Sharpe ration for the different allocation methods when the portfolio is rebalanced monthly and SMPL is used to forecast the covariance matrix

Conclusions
We have compared here the out-of-sample performance of portfolios constructed using traditional risk-based allocation methods with those constructed using machine learning methods. As the forecasted covariance matrix plays an important role in risk based allocation methods, we first determine if there are benchmark covariance forecasting methods that leads to a superior performance of a particular portfolio allocation strategy. In line with the results of [Zakamulin, 2015] and [Ardia et al., 2017] we find, through an SPA test of [Hansen, 2005], that the minimum variance and maximum diversification schemes are highly sensitive to covariance misspecification and perform the best when DCC-GARCH is used as the forecast method. IVWP appears more robust towards covariance misspecification, although its performance is not inferior when DCC GARCH is used as forecast model. The machine learning based HRP, that builds upon IVWP, by supplementing the weights obtained using IVWP with the relatedness information of the assets obtained from their correlation matrix. Unlike MVP and MDP the relatedness information is obtained without inverting the correlation matrix, but rather through a hierarchical clustering approach. Therefore, ideally HRP should inherit the robustness of IVWP, and lead to better diversification than IVWP as it incorporates the correlation information. We do observe that when the objective is to minimize the portfolio variance, variants of HRP, perform better than other models. When an inferior choice of covariance forecast model is used, IVWP performs better than HRP in most universes considered in our dataset. However, we observe that a poor covariance forecast doesn't impact the performance of HRP in the same magnitude as it does MVP or MDP. The observation is similar when the out-of-sample realized weekly expected shortfall is considered as the performance objective of the portfolio. With longer rebalancing horizon the more robust IVWP gives a superior performance in most of the universes when it comes to minimizing portfolio variance and expected shortfall.
MVP does shine when it comes to the out-of-sample Sharpe ratios. With a good covariance forecast estimator, the realized weekly Sharpe ratio is better in expectation than any other model. While when the portfolio is rebalanced daily, there are other allocation methods, including HRP, whose performance cannot be considered inferior to others with respect to realized Sharpe ratio. When the portfolio is rebalanced less frequently (weekly and monthly) MVP remains the sole allocation method in most universes with superior performance. However, with a poor choice of covariance forecast model, the performance of IVWP with respect to realized Sharpe is not inferior in most of the universes and for different rebalancing horizons.
The market-cap-weighted portfolio can be a safe choice when the performance objective is the realized weekly Sharpe ratio, especially if only an inferior model for covariance forecast is available. However, from our dataset it appears that one should rebalance their portfolio frequently when using market-capitalization weighted portfolios. A clear disadvantage of using market-cap weighted portfolio is its clearly inferior performance in purely riskdriven objectives, such as lower expected shortfall values. Ticker name  1  AXISBANK  BANKBARODA HDFCBANK ICICIBANK INDUSINDBK KOTAKBANK  SBIN  TCS  YESBANK  HDFC  2  AXISBANK  HDFCBANK  HDFC  ICICIBANK  INFY  ITC  KOTAKBANK  LT  RELIANCE  TCS  3  BHARTIARTL  CIPLA  GAIL  ITC  MARUTI  NTPC  POWERGRID TATAMOTORS TATASTEEL SUNPHARMA  4  BHEL  BPCL  GAIL  NTPC  ONGC  POWERGRID  RELIANCE  TATAPOWER  COALINDIA  HINDALCO  5  YESBANK  RELIANCE  ICICIBANK  IDEA  INFY  NTPC  CIPLA  HDFCBANK  WIPRO  ZEEL   Table 17: Ticker name of the constituents of each of the universes constructed for our study B Converting covariance matrix of log returns to linear returns

Universe
The covariance matrix of log returns can be converted approximately to the covariance matrix of linear returns following the approach of [Meucci, 2001]. We denote the logarithmic and linear returns for asset i by respectively, where τ is the time horizon and P i t is the price at time t. Now, let Mτ be the vector of expected values of linear returns of N assets, and Sτ be the corresponding covariance matrix that we wish to determine, i.e., Mτ = E(Rt,τ ), Sτ = cov(Rt,τ ).
Using the relation Rt,τ = e r t,τ − 1, and under the assumption that returns are log-normally distributed [Meucci, 2001] shows that for assets with index i, and j E(R i t,τ ) = R i t,τ (r)φ(r)dr = e r i t,τ − 1 φ(r)dr, E(R i t,τ R j t,τ ) = R i t,τ (r)R j t,τ (r)φ(r)dr = e r i t,τ − 1 e r j t,τ − 1 φ(r)dr (15) = e r i t,τ +r j t,τ φ(r)dr − e r i t,τ φ(r)dr − e r j t,τ φ(r)dr + 1, where φ(x) is the probability density function of the standard normal random variable. Let the covariance matrix of logarithmic returns be Στ and the expected return of logarithmic return be µτ = E(rt,τ ). Then under the above assumptions the expected linear return of asset i is M i τ = e µ i τ + 1 2 Σ ii τ − 1, and the covariance between asset i and j is S ij τ = E(R i t,τ R j t,τ ) − M i τ M j τ = e µ i τ +µ j τ + 1 2 (Σ ii τ Σ jj τ ) e Σ ij τ − 1 .