Improved Large Covariance Matrix Estimation Based on Efﬁcient Convex Combination and Its Application in Portfolio Optimization

: The estimation of the covariance matrix is an important topic in the ﬁeld of multivariate statistical analysis. In this paper, we propose a new estimator, which is a convex combination of the linear shrinkage estimation and the rotation-invariant estimator under the Frobenius norm. We ﬁrst obtain the optimal parameters by using grid search and cross-validation, and then, we use these optimal parameters to demonstrate the effectiveness and robustness of the proposed estimation in the numerical simulations. Finally, in empirical research, we apply the covariance matrix estimation to the portfolio optimization. Compared to the existing estimators, we show that the proposed estimator has better performance and lower out-of-sample risk in portfolio optimization.


Introduction
With the development of information technology, the covariance matrix estimation plays a crucial role in multivariate statistics analysis, and it is used widely in many fields, such as finance, wireless communications, biology, chemometrics, social networks, health sciences, etc. [1][2][3][4][5]. In particular, due to the high noise of the sample covariance matrix, the properties of financial data are not characterized by multivariate normality and stationarity [6]. As an essential input to many financial models, it is vital to remove the sample noise to improve the estimation accuracy of the covariance matrix in asset allocation and risk management [4,7,8]. It is known that the sample covariance matrix is no longer a good estimator of the population covariance matrix when the dimension of the matrix is close to or larger than the sample size. In fact, the sample covariance matrix becomes a singular matrix in high-dimensional data. The so-called "high dimensions" mainly include large orders of 30 in magnitude and high data dimensions [1][2][3]. So far, some popular ways used to obtain a good estimator are the shrinkage estimation methods without prior information, sparse estimation methods with prior information, the factor model [9,10], the rank model [11], etc.
The shrinkage method, proposed by Stein [12], is one without prior information for estimating the covariance matrix. The essential idea of this method is to pull extreme eigenvalues of the sample covariance matrix toward the mean of the eigenvalues by shrinking the eigenvalues when the dimension of the matrix is close to the sample size. Ledoit and Wolf showed that the shrinkage estimation methods have an improvement over the sample covariance matrix. Specifically, they proposed the shrinkage estimation methods provide good solutions to deal with the overfitting of the sample covariance matrix [8,13,14].
Since the linear shrinkage method is the first-order approximation to a nonlinear problem, as the dimension of the matrix becomes high, it is no longer suitable for the improvement of the sample covariance matrix. Thus, they proposed the nonlinear shrinkage method [15], which has better performance for high-dimensional asymptotics. Recently, Ledoit and Wolf proposed optimal nonlinear shrinkage estimators [16], which are decisiontheoretically optimal within a class of nonlinear shrinkage estimators. For more details on the shrinkage methods, refer to [2,3,17,18].
The sparse estimation with prior information is another one for estimating the covariance matrix, which estimates the sparse matrix directly and its inverse indirectly. In the case of direct estimation, Bickel and Levina [19] showed that the estimation can be obtained by the threshold methods under the hypothesis of the sparseness of the true covariance matrix. In the case of no assumption of the sparse pattern, Rothman et al. [20] proposed a new class of generalized threshold estimators to obtain the sparse estimation by inducing sparsity and imposing the norm penalty. Theoretically, these methods are shown to be superior, and the generalized thresholding estimators are consistent with a large class of approximate sparse covariance matrices. In fact, the resulting estimators are not always positive-definite. In order to guarantee the positive definiteness of the covariance matrix estimation, Rothman et al. [21] built a convex optimization model based on the quadratic loss function under the Frobenius norm (F-norm) and studied the estimation of the high-dimensional covariance matrix. Subsequently, some convex optimization models with penalty functions such as the L 1 function were proposed [22,23], and some nonconvex penalty functions were used to achieve both sparsity and positive definiteness [24,25]. However, since the changes of the variance and covariance over time are not considered, they are affected by dimensional disasters and large noise problems. For more details about optimization algorithms and inverse matrix estimation methods, refer to the literature [7,[26][27][28][29][30][31] and the references therein.
In addition, Bun et al. [32] introduced the rotation-invariant estimation in which they assumed that the estimator of the population correlation matrix shares the same eigenvectors as the sample covariance matrix itself. The experiments' results demonstrated that the rotation-invariant estimator is more suitable for dealing with large dimension datasets than the eigenvalue clipping methods and can be significantly improved over the sample covariance matrix as the data size grows, but it did not perform well on a small sample data. In a recent paper [33], Deshmukh et al. combined the shrinkage transformation with the eigenvalue clipping to obtain the estimator of the covariance matrix for the convex combination of the optimal parameters. This estimator can achieve less out-of-sample risk in portfolio optimization for small datasets.
The research in this paper was mainly motivated by [32,33], and the novelties of this study are as follows:

1.
A new large covariance matrix estimator is proposed by constructing a convex combination of the linear shrinkage estimation and the rotation-invariant estimator under the Frobenius norm.

2.
Our new covariance matrix estimator improves the impact of the sample noise on the covariance matrix by adjusting the parameters of the convex combination in financial data.

3.
The proposed estimator has better performance and lower out-of-sample risk in portfolio optimization.
The rest of this paper is organized as follows: Section 2 describes the related work of covariance matrix estimation. Section 3 introduces our new proposed estimator and its application. Section 4 implements the numerical simulation and empirical research. Section 5 gives the conclusions.

The Rotation-Invariant Estimator
First, we briefly introduce the basic idea of the rotation-invariant estimator. For more details, we refer to [32].
Let r = (r 1 , r 2 , ..., r N ) denote a T × N matrix of T independent and identically distributed (iid) observations on a system of N random variables with mean vector µ. N and T denote the number of variable and the size of the variable, respectively. In this case, the sample covariance matrix is given by Let N and T be asymptotic in the high-dimensional regime, i.e., In addition, the concentration ratio is given by The construction steps of the rotation-invariant estimator are as follows: • Step 1: Calculate the Stieltjes transform of the empirical spectral measure of S 1 from where z denotes the spectral parameter and The function (4) contains all the information about the eigenvalues of the matrix S 1 , which has the same nonzero eigenvalues as Σ SCM . • Step 2: Update (4) based on the nonzero eigenvalues of Y Y and YY , i.e., where y i = r i − µ, Y = (y 1 , y 2 , ..., y N ), and λ i denotes the ith eigenvalue of the sample covariance matrix Σ SCM . • Step 3: Calculate the function δ i of the ith eigenvalue of S from where s(·) is the empirical Stieltjes transform from (6) and parameter η = T − 1 2 .

•
Step 4: Output the resulting covariance matrix estimator Σ RIE from where U N is an orthogonal matrix, whose columns [u 1 , u 2 , ..., u N ] are the corresponding eigenvectors, with the eigenvalue of the rotation-invariant estimator defined by One can easily verify that This implies that the estimator Σ RIE has the same trace as the sample covariance matrix. More literature reviews on rotation-invariant estimators are presented in the Table 1.

Author
Brief Introduction Ref.

Ledoit, O., Wolf, M.
Under the assumption of the large dimension asymptotic, Ledoit and Wolf kept the eigenvectors of the sample covariance matrix and shrunk the inverse sample eigenvalues to construct a rotation-invariant estimator of the large covariance matrix. [34] Donoho et al. Based on spiked covariance and the rotationinvariant estimator, Donoho et al. demonstrated that the optimal estimation of the population covariance matrix is related to the best shrinker, which acts as an element of the sample eigenvalues. [36] Debashis Paul, Alexander Aue Debashis Paul and Alexander Aue summarized the random matrix theory (RMT) and described how the development of highdimensional statistical inference theory and practice is affected by the corresponding development in the RMT field. [37]

Improved Covariance Estimator Based on Eigenvalue Clipping
Deshmukh et al. [33] introduced an improved estimation based on eigenvalue clipping, which takes the optimal parameters in the convex combination of the sample covariance matrix Σ SCM , the shrinkage target Σ F with and the matrix Σ MP obtained by applying eigenvalue clipping. Let y i = r i − r i be independent, identically distributed, random variables with finite variance σ. The Marchenko-Pastur density ρ Σ SCM (λ) of the eigenvalues of Σ SCM is defined by where n(λ) is the number of eigenvalues of the sample covariance matrix Σ SCM less than λ. In the condition of the limit N → ∞, T → ∞, and 1 c ≥ 1, the density follows from (14): where [λ min , λ max ] represents the MP law bounds. In this case, the covariance matrix can be cleaned by scaling the eigenvectors of Σ SCM with these new eigenvalues. Σ MP is obtained by this method. Let Σ be the population covariance matrix; the optimal parameters in convex combination can be found from the following optimization problem [33].
where θ and φ are the parameters of the convex combination. Usually, the eigenvectors of the sample covariance matrix deviate from those of the population covariance matrix under large-dimensional asymptotics. Correcting the deviation of the eigenvalues of the sample covariance matrix can improve the performance of the large covariance matrix. Although the estimation can adapt to changing the sampling noise conditions by performing parameter optimization, the performance of the estimation outperforms other estimations only for small-dimensional problems.

Proposed Estimator
For further improve the performance of the large covariance matrix, we replaced the eigenvalues falling inside Marchenko-Pastur (MP) law bounds with the rotation-invariant estimator Σ RIE and applied the linear shrinkage estimation to shrink the eigenvalues falling outside the MP law bounds in this paper. Our new estimation is presented below.
Thus, the estimation of the covariance matrix is given by where θ * and φ * are the optimal parameters of the optimization problem given by (19) and (20). It is well known that the financial data are heavy-tailed and non-normal [7,33]. However, the existing covariance matrix estimation methods generally requires the assumption of normality [32,38]. For overcoming this drawback, we propose a new estimator, which is a convex combination of the linear shrinkage estimation and the rotation-invariant estimator under the Frobenius norm. One advantage of the new estimation is that we can remove the noise caused by the bulk eigenvalues and the extreme eigenvalues in the financial data. Furthermore, we set five-fold cross-validation k = 5 to implement the simulation and empirical research for improving the accurate estimation of the covariance matrix.

Minimum Variance Portfolio Optimization
According to Markowitz's theory [39], we included an additional return constraint in the portfolio because even a risk-averse investor would expect a minimal positive return. The classic portfolio optimization model that satisfies the minimum expected return is defined by s.t.
where x, r, and r min represent the weight of portfolio optimization, daily return, and the minimum daily expected return, respectively. It is well known that the portfolio selection is widely used in the financial field, which is a convex quadratic programming problem [39].
In the portfolio optimization, the weight of each asset is closely related to the covariance matrix. An accurate covariance matrix can achieve a more reasonable weight distribution and better portfolio effect. Due to the heavy-tailed nature of financial data and the availability of limited samples [8], many studies started concentrating on the global minimum variance (GMV) portfolio. To improve the performance of the sample covariance matrix in the portfolio optimization, DeMiguel [40] added the additional constraint and regularizing asset weight vector into the minimum variance portfolio and showed that the estimator always leads the constructed portfolio to achieve a smaller variance and a higher Sharpe ratio than other portfolios. Furthermore, Ledoit and Wolf [18] applied the estimation to the portfolio optimization to overcome the dimension and noise problems of a high-dimensional covariance matrix, and the results were better than the linear shrinkage estimation [13]. Moreover, due to the influence of financial market information on covariance matrix estimation, the time-varying covariance matrix or the correlation matrix also have practical significance in portfolio optimization. For more details, refer to [41] and the references therein.
In this paper, we divided the sample data into in-sample data and out-of-sample data, which were used for the estimation and prediction of the covariance matrix, respectively. To measure the out-of-sample performance of the estimation of the covariance matrix in portfolio optimization, we used the out-of-sample risk, the average return, and the Sharpe ratio as the criteria of the measurement. The average return was annualized by multiplying it by 252 (252 trading days per year), and the standard deviation was annualized by multiplying it by √ 252. The out-of-sample performance of the portfolio model was evaluated through the following procedure.

•
Step 0: Input the returns of the current in-sample r in and out-of-sample data r out , the expected return r min , and the estimation of the covariance matrix Σ * . • Step 1: Set r := r in , and solve the optimal weight vector x * from Model (22) by the quadratic optimizer called quadprog in Matlab. • Step 2: Calculate the out-of-sample Σ * from (21), and obtain the out-of-sample standard deviation: the average return: and the Sharpe Ratio: where x * and r f represent the optimal weight vector and the risk-free interest, respectively.

Numerical Simulation
In the simulation, we used the simulation data of Engle et al. [41], and the mean return ranged between −0.0031 and 0.0036. We divided the dataset into in-sample data and out-of-sample data, and both the sample sizes were T = 500. In pursuit of accuracy, we implemented the five-fold cross-validation, and the parameter selection criterion was the F-norm of the estimator and the population covariance matrix. We set three dimensions for the return series, which were N = 100, 200, and 400, respectively. The maximum concentration ratio is To measure the performance of the estimators, we compared the error between each estimator and the population covariance matrix. The six estimators are shown in Table 2.

The Formulation of Estimation
Ref. [15,18,43] In the five-fold cross-validation, we obtained the error between the proposed estimation and the population covariance matrix for the different parameters θ and φ under three asset dimensions. In Figures 1-3, the horizontal and longitudinal axis represent the different values of θ and φ, respectively, and the vertical axis represents the sum of the error. It is obvious that there are two optimal parameters to minimizing the error between the proposed estimator and the population covariance matrix for all θ and φ. The results are shown in Table 3. To some degree, this ensures the effectiveness of the proposed estimation. Table 4 shows that the F-norm error of our new estimation is the smallest in the ones of the six estimations. Under this premise, we calculated the portfolio variance in the minimum variance portfolio that satisfies the minimum 0.0015 expected return. The results are shown in Table 5. Figures 4-6 show the mean return of out-of-sample data ranging from the 1st asset to the 400th asset. We mark individual points on the graph. The horizontal and longitudinal axis represent the order of assets in the total assets and the mean return of this asset, respectively. We can see that the mean return of the point that is marked is relatively high. Generally speaking, higher asset returns will also face relatively large investment risks. To understand the following description, we divided the return into three asset risk grades: high (r min ≥ 0.001), middle (0.0005 ≤ r min ≤ 0.001), and low (r min < 0.0005), respectively. In Table 5, it is obvious that the variance of Σ Identity is the largest in all asset dimensions. In the case of N = 200, the asset weights of the portfolio model corresponding to Σ Identity are distributed on the 15th, 51st, 71st, 75th, and 138th assets in Figure 7, respectively, with 87% high-risk assets and 13% medium assets. However, in the case of N = 400, the asset weights of the portfolio model corresponding to Σ Identity are distributed on twenty assets, with 60% high-risk assets, 39.5% medium assets, and only 0.5% low-risk assets, and we can see that the high-risk and medium-risk assets account for the vast majority of the 20 assets. Instead, the portfolio model corresponding to our new estimator Σ * distributed the weights on high-risk assets and medium-risk assets as 69% and 26.12%, respectively, to achieve the 0.0015 expected return. The remaining 5% was distributed on low-risk assets to reduce investment risk. The corresponding results are shown in Figures 8 and 9. Overall, the reasonable distribution of asset weights on high-, medium, and low-risk assets can appropriately decrease investment risks. As the number of the assets increased, the performance of our new estimator Σ * became better. At the same time, the proposed estimation in the minimum variance portfolio was more dispersed on the allocation of the assets. Table 3. The optimal parameters θ and φ in convex combination and the sum of the corresponding error.

Empirical Research
The data of this paper came from the component stock of CSI500, HS300, and SSE50 on the tushare financial website. The whole period of the samples was from 24 May 2017 to 1 July 2021. Removing the missing data of the samples from transaction, we finally obtained 426 component stocks of CSI500, 218 component stocks of HS300, and 41 component stocks of SSE50.
In this paper, we set T 1 = 500 and T 2 = 500 as the window of estimation and prediction, respectively. The maximum concentration ratio is We used the log return as we studied the object and divide all samples into two parts for estimation and prediction. We constructed six portfolio optimization models by using the estimator in Table 2. The procedures of estimation and prediction are as follows. • Step 0: Input the sample data; divide the data into in-sample data T 1 = 500 and out-of-sample data T 2 = 500. • Step 1: Calculate Σ F , Σ RIE , and Σ SCM based on the in-sample data. • Step 2: Set five-fold cross-validation; calculate the Σ est from (20) for parameters between 0 and 1; implement the minimum variance portfolio (22), where r min takes a value from the minimum to maximum mean return; solve the corresponding weight vector x by multiple times of iteration over the values of both θ and φ in the convex optimization. • Step 3: Calculate the standard deviation based on the out-of-sample data, and record the out-of-sample risk σ out in each iteration. • Step 4: Calculate the optimal parameters θ * , φ * when the sum of σ out of the five-fold cross-validation achieves the minimum. • Step 5: Implement the minimum variance portfolio (22) to obtain the assets' weights, the average return r ave , and the out-of-sample risk σ risk when satisfying the minimum 0.002 return constraints under the θ * and φ * . • Step 6: Calculate the Sharpe ratio, where the risk-free interest is set as 1.75%.
In portfolio optimization, the reduction of volatility at the first decimal place is also considered to be quite significant [13,33]. Tables 6 and 7 show the performance of the six portfolio optimization model based on the different asset dimensions. For out-ofsample data, we used the standard deviation as the performance metric. Furthermore, we calculated the average return and the Sharpe ratio in the portfolio optimization model (22) and the risk-free interest was set to 1.75%.
Tables [6][7][8] show that the average returns of each estimator are equal. Table 6 shows that the standard deviation of the portfolio optimization model corresponding to Σ Identity is only 27.53%, which is the smallest in each estimator. The standard deviation of the portfolio optimization model corresponding to our new estimator Σ * is 27.97%, and its performance ranks fourth among all estimators. Table 7 shows the performance of the portfolio optimization model corresponding to each estimator with 218 assets. It can be seen that the performance of the estimator Σ Identity becomes weak. In this case, Σ Identity is the worst estimator, which is expected as it assumes zero correlations among stocks, and Σ SCM is the second-worst. The Sharpe ratio of the portfolio optimization corresponding to Σ NL is the highest in each estimator followed by the one of Σ D . At the same time, the performance of our new estimator Σ * ranks third among all estimators. Comparing to the case of N = 218, the performance of our new estimator improved as the asset dimension increased. Table 8 compares the performance of each model with the number of assets of 426. The result shows that the portfolio optimization model corresponding to our new estimator Σ * obtaining the smallest standard deviation leads to the highest Sharpe ratio. Obviously, compared with other estimators, especially with Σ Identity , our new estimator has a significant decrease in the out-of-sample standard deviation. Meanwhile, this also implies that the performance of our new estimator Σ * is superior to the ones of the other five estimators as the asset dimension increases.
The homogeneity of the variance test is a metric to measure whether the variances of the two investment strategies are equal, and we used the improved bootstrap inference [44] to test the significant variance difference between Σ * and other estimations excess returns. Table 9 shows that the test between Σ * and the alternative methods all reject the null hypothesis of equal variances. Moreover, the sample variance of the excess return generated corresponding to our new estimator Σ * is significantly lower than the other portfolio optimization models as the number of assets increases. Therefore, our new estimation Σ * is superior to other estimations.    The difference in the out-of-sample variance between Σ * and the alternative estimation with all assets (the significance level is 5%).

Conclusions
In this paper, we proposed a new estimator for the covariance matrix, which is a convex combination of the linear shrinkage estimation and the rotation-invariant estimator under the F-norm. We first obtained the optimal parameters through considerable numerical operations, and then, we focused on the accuracy of the model and ignored the complexity of the calculation. Moreover, we demonstrated the effectiveness of the model in the simulation. Finally, we applied our estimation to the minimum variance portfolio optimization and showed that the performance of the proposed estimator is superior to the other five existing estimators in the portfolio optimization for high-dimensional data.
In addition, we only considered the sample noise on the covariance matrix in this article, but in the financial field, the covariance matrix estimation will vary with time due to the influence of market information, and the covariance matrix estimation will be affected by the market information. Therefore, the performance of our new estimation in the dynamic conditional correlation model [41] can be investigated as part of future work for a large-dimensional covariance matrix.