Next Article in Journal
Discrete Entropies of Chebyshev Polynomials
Previous Article in Journal
An Algorithm for Calculating the Parameter Selection Area of a Doubly-Fed Induction Generator Based on the Guardian Map Method
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Element Aggregation for Estimation of High-Dimensional Covariance Matrices

School of Mathematical Sciences, University of Electronic Science and Technology of China, Chengdu 611731, China
Mathematics 2024, 12(7), 1045; https://doi.org/10.3390/math12071045
Submission received: 29 February 2024 / Revised: 26 March 2024 / Accepted: 28 March 2024 / Published: 30 March 2024
(This article belongs to the Special Issue Advances in High-Dimensional Data Analysis)

Abstract

:
This study addresses the challenge of estimating high-dimensional covariance matrices in financial markets, where traditional sparsity assumptions often fail due to the interdependence of stock returns across sectors. We present an innovative element-aggregation method that aggregates matrix entries to estimate covariance matrices. This method is designed to be applicable to both sparse and non-sparse matrices, transcending the limitations of sparsity-based approaches. The computational simplicity of the method’s implementation ensures low complexity, making it a practical tool for real-world applications. Theoretical analysis then confirms the method’s consistency and effectiveness with its convergence rate in specific scenarios. Additionally, numerical experiments validate the method’s superior algorithmic performance compared to conventional methods, as well as the reduction in relative estimation errors. Furthermore, empirical studies in financial portfolio optimization demonstrate the method’s significant risk management benefits, particularly its ability to effectively mitigate portfolio risk even with limited sample sizes.

1. Introduction

Covariance matrix estimation is essential in statistics [1], econometrics [2], finance [3], genomic studies [4], and other related fields. This matrix quantifies the pairwise interdependencies between variables in a dataset, and each element signifies the covariance between two variables. The accuracy of this estimation is critical for statistical and data analyses. As the size of the matrix increases, the efficiency of the estimation process often decreases. Therefore, the development of efficient methods for this estimation remains a significant research challenge. Various approaches concentrate on assuming specific features in the matrix. Initial methods include principal component analysis (PCA) [5] and factor models [6]. Other prevalent techniques encompass the constant correlation approach [7], maximum likelihood estimation (MLE) [8], shrinkage methods [9,10], and so on [11]. Notably, shrinkage methods have demonstrated exceptional efficacy in financial portfolio allocation.
For the estimation of sparse or approximately sparse covariance matrices, various element-wise thresholding techniques for the sample covariance matrix have emerged. These include hard thresholding [12,13], soft thresholding with extensions [14], and adaptive thresholding [15,16]. Generally, these methods offer low computational demands and high consistency. However, the resulting matrix might not always be semi-positive definite. To address this, more sophisticated methods have been introduced to ensure the estimators are semi-positive definite [17,18].
While the sparsity condition is often assumed in [6,19], it is not universally applicable. For instance, in financial studies, variables such as stock returns often share common factors, making the sparsity assumption less appropriate. Instead, a more common pattern in the covariance matrix of financial data is the clustering of entries. This occurs because stocks within the same industry sector are often similarly correlated with stocks from other sectors [20]. An illustrative example is provided by [21,22], who analyzed the daily returns of nine companies on the New York Stock Exchange (NYSE) market based on 2515 observations from 2000 to 2009. Their analysis revealed a matrix of correlation coefficients without zero entries, with many coefficients sharing identical values. Using statistical hypothesis testing, the correlation coefficients were grouped into five distinct values ( 0.27 , 0.35 , 0.56 , 0.58 , 0.69 ) , shown in Figure 1a. A similar observation was made by [23]. When stocks are arranged in a particular order, the correlation coefficient matrix manifests itself as a block-symmetric matrix, as depicted in Figure 1b, which can be written as
1 0.27 0 0 0 1 0.58 0 0 0 1 0.69 I 3 + 0.27 0.35 0.35 0.35 0.58 0.56 0.35 0.56 0.69 1 1 1 1 1 1 1 1 1 ,
where ⊗ is the Kronecker product. We will demonstrate later that this uncomplicated framework is highly effective for estimating the covariance matrix in Corollary 3.
The proposed estimation method in this paper derives from the correlation matrix structure mentioned earlier. This structure is widespread in numerous financial research situations, encompassing sparse covariance matrices, the formerly noted block-wise covariance matrices, and all correlation coefficients in a global constant [24]. However, many existing methods are unsuitable for estimating covariance matrices under this structure. To broaden our scope, we delve into the estimation of covariance matrices with clustered entries. To achieve this, we propose an element-aggregation method that is tailored towards covariance matrix estimation. Moreover, we examine the theoretical properties of our method and confirm its effectiveness through extensive numerical simulations and real-world data analysis.
The rest of this paper is organized as follows. In Section 2, we propose the element-aggregation method for covariance matrix estimation while also elucidating its implementation. Section 3 assesses the theoretical consistency of the estimation errors and exhibits corresponding convergence rates. All theoretical proofs are in Appendix A, Appendix B, Appendix C, Appendix D, Appendix E and Appendix F. Numerical simulation analyses and real data analysis with portfolio allocation are presented in Section 4 and Section 5, respectively. Conclusions are provided in Section 6.

2. Estimation and Implementation for Covariance Matrix

Suppose the multiple random variable X = ( x 1 , , x p ) has the covariance matrix Σ = ( σ i j ) 1 i , j p , and samples X i = ( x i 1 , , x i p ) , i = 1 , , n are generated from X. The sample covariance matrix is defined as Σ ^ = ( σ ^ i j ) 1 i , j p . For 1 i , j p ,
σ ^ i j = n 1 = 1 n π , i j ,
where π , i j = ( x i x ¯ i ) ( x j x ¯ j ) and x ¯ k = n 1 = 1 n x k for k = 1 , , p .
Motivated by the structure of the correlation matrix discussed in the introduction and aiming to analyze the covariance matrices with clustered entries, we consider the following specification for the covariance matrix. Let Ω = { σ i j : i < j } denote the collection of off-diagonal elements in the covariance matrix Σ , and let the set of distinct elements in Ω be represented by
= { ς k Ω : ς k ς k , 1 k k K } ,
where K is the cardinality of set , and K changes with p. In other words, σ i j for all 1 i < j p . For k = 1 , , K , the index set of covariance matrix elements that are equal to ς k is defined as
k = { ( i , j ) : σ i j = ς k , 1 i < j p } .
In brief, k stands for the category of ς k . Similarly, for 1 i < j p , the index set of covariance matrix elements that are equal to σ i j is defined as
i j = { ( a , b ) : σ a b = σ i j , 1 a < b p } .
Then, for ( i , j ) k , we have i j = k .
If the sets k , k = 1 , , K are known, i.e., the indices for same elements in Σ are known, then we can estimate ς k by averaging the sample covariance elements σ ^ i j as follows for all ( i , j ) k ,
ς ˇ k = 1 # k ( i , j ) k σ ^ i j = n 1 1 # k = 1 n ( i , j ) k π , i j ,
where # k is the cardinality of set k . Thus, the covariance matrix Σ can be estimated by
Σ ˇ = ( σ ˇ i j ) , with σ ˇ i j = ς ˇ k , if ( i , j ) k .

Element-Aggregation Estimation Method (ELA)

We hereby introduce the element-aggregation (ELA) estimation method for estimation purposes. As k is unknown, we estimate i j , i.e., the category of each σ i j , as follows,
˜ i j = { ( a , b ) : | σ ^ i j σ ^ a b | < c ϱ ^ i j log ( p n ) n , 1 a < b p } ,
where ϱ ^ i j is a sample estimate of ϱ i j = V a r ( π i j ) and c is a tuning parameter. Regarding the choice of this tuning parameter c, Cai and Liu [15] show that a good choice of the tuning parameter c does not affect the rate of convergence but it does affect the numerical performance of the estimators. The tuning parameter c can be taken as fixed at c = 2 , as suggested in [15], or it can be chosen empirically by cross-validation, such as via the five-fold cross-validation method used in [12,15]. The tuning parameter c operates akin to a significance level in statistical testing, and the 2- σ rule is widely endorsed for normally distributed data, wherein approximately 95 % of observations are encapsulated within the interval of the mean plus or minus two standard deviations, denoted as [ μ 2 σ , μ + 2 σ ] . In our simulations, we also discover that employing c = 2 as the tuning parameter for the proposed ELA algorithm yields commendable performance in high-dimensional cases, and does not differ significantly from the parameter optimization obtained by cross-validation. Consequently, we adopt c = 2 as the tuning parameter for our subsequent analysis.
Analogous to (1), we estimate the covariance matrix element σ i j by
σ ˜ i j = n 1 1 # ˜ i j = 1 n ( s , t ) ˜ i j π , s t ,
and Σ by Σ ˜ = ( σ ˜ i j ) . This is the element-aggregation (ELA) estimator of the covariance matrix.
In terms of computational complexity, the ELA process requires O ( p 2 log ( p ) ) time, where log ( p ) accounts for the binary search computation of ˜ i j in { σ ^ a b : 1 a < b p } . In comparison, element-wise thresholding has a computational complexity of O ( p 2 ) . Therefore, the ELA estimation calculation is only slightly more complex than the element-wise thresholding approach.

3. Theoretical Properties

In this section, we provide the theoretical justification for the element-aggregation estimation under the assumption of a normal distribution of X. The results can be proven under more general conditions using more complicated techniques. Let A denote the operator norm, i.e., A = { λ m a x ( A A ) } 1 / 2 , where λ m a x ( A A ) is the square root of the largest eigenvalue of A A .
Lemma 1. 
Let K be the number of different values of the off-diagonal elements in the covariance matrix C o v ( X ) = Σ . Let n i , k be the numbers of elements that are equal to ς k in the i-th row of Σ, i.e., n i , k = # { j : σ i j = ς k , 1 j i p } . If max 1 i , j p σ i j M 0 for some constant M 0 and
V a r 1 # k ( i , j ) k π , i j = O ( V k 2 ) , k = 1 , , K ,
where V k = Var ( ς ˇ k ) and ς ˇ k is the estimate in (1), then the covariance matrix estimator Σ ˇ in (2) has the following estimation error
Σ ˇ Σ = O P log ( p n ) n k = 1 K n j , k # k V k .
An important characteristic is that the effect of dimension on the estimation error is regulated by a slowly varying function, log ( p ) . Lemma 1 could also be extended to the case that k is unknown. In such cases, we need to identify the corresponding category k . To simplify the explanation, we will use i j here, since i j = k if ( i , j ) k .
Lemma 2. 
Suppose X is normally distributed with max 1 i , j p σ i j M 0 for some constant M 0 . If
n min { | ς k ς j | , k j } / ( log ( p n ) ) 1 / 2 ,
then the identification by (3) is consistent, i.e., for any ( i , j ) k ,
P r ( 1 i , j p { ˜ i j i j } ) 0 .
Theorem 1. 
With the above notation, if the conditions in Lemmas 1 and 2 are satisfied, then
Σ ˜ Σ = O P log ( p n ) n k = 1 K n j , k # k V k .
Below, we give some special cases of Theorem 1, and show the convergence rates in Corollaries 1–3.
Corollary 1. 
Suppose Σ = ( σ i j ) 1 i , j p is sparse with a number of non-zero elements in each row of o ( p ) , i.e., c ( p ) / p 0 , where c ( p ) = max j i = 1 p ( σ i j 0 ) . If the conditions in Theorem 1 hold, then
Σ ˜ Σ = O P c ( p ) log ( p n ) n .
This result is a special case in [12]. It could be extended to more general cases.
Corollary 2. 
Suppose X = ( x 1 , , x p ) follows normal distribution with Σ = ( ς | i j | ) 1 i , j p . If | ς k | = O ( c k ) for some c < 1 , then
Σ ˜ Σ = O P log ( p ) log ( p n ) n .
Finally, we consider a simple block covariance matrix with the matrix in Figure 1b as a special case.
Corollary 3. 
Suppose Σ = Λ M I m + Φ M Ξ m where I m is a m × m identity matrix, Λ M is a diagonal matrix, Ξ m is a m × m matrix with all elements 1, and Φ M is an M × M symmetric matrix. If M is fixed as p , then we have
| | Σ ˜ Σ | | = O P ( n 1 / 2 ) .
Remark 1. 
Since the dimension M of the matrix Φ M and the diagonal matrix Λ M assumed in Corollary 3 are fixed ( p ), the different values for their matrix elements are at most M ( M + 1 ) / 2 and M, respectively, so Σ also has at most M ( M + 3 ) / 2 different values, which is fixed at p . By Lemma 2, the estimates Λ ˜ M and Φ ˜ M satisfy the rate of O P ( n 1 / 2 ) , i.e., for any δ > 0 , a subset of the probability space S n : P r ( S n ) > 1 δ can be found, such that
| | Λ ˜ M Λ M | | = O P ( n 1 / 2 ) , | | Φ ˜ M Φ M | | = O P ( n 1 / 2 ) .
where Σ ˜ = Λ ˜ M I m + Φ ˜ M Ξ m . Using the eigenvalue and Kronecker product properties, we show in the Appendix A, Appendix B, Appendix C, Appendix D, Appendix E and Appendix F that the covariance matrix estimate Σ ˜ also has the same rate O P ( n 1 / 2 ) . That is, since the number of selectable elements of the covariance matrix is determined by a fixed M, which is not dependent on p, the rate of convergence for the covariance matrix estimate Σ ˜ is also independent of p.
Our ELA method efficiently estimates the covariance matrix with the above structure, demonstrating both efficiency and dimensional independence. Mathematically, this block-wise structure is straightforward. Furthermore, we hypothesize that these conclusions can be extended to more general block-wise matrices, such as the Khatri–Rao product or the Tracy–Singh product [25].

4. Numerical Simulation

Our study presents the ELA method to estimate the covariance matrix using an element-aggregation idea. The ELA method is designed for simplicity and clarity, making it easy to implement. The steps for implementing the method were thoroughly detailed in the preceding section. This section focuses on comparing the algorithmic performance of the proposed method with several estimation methods for high-dimensional sparse and non-sparse covariance matrices. We compare our ELA estimation method with multiple established methods: (1) the adaptive thresholding method [15], (2) the POETk method [19] with k = 0 , 1 , 2 , factors, and (3) the Rothman method [18].
The numerical simulation focuses on the relative matrix errors when estimating the covariance matrix and the correlation coefficient matrix. To estimate the covariance matrix, first standardize each variable. Then, multiply the elements of the correlation coefficient matrix by the sample standard deviations of each random variable. This effectively transforms the estimation of the covariance matrix into the estimation of the corresponding correlation coefficient matrix. Thus, the ELA method’s efficiency and accuracy in capturing the underlying data structure can be comprehensively assessed through the dual focus on both covariance and correlation coefficient matrix estimations.
The metric employed to compare the estimation efficiency between these estimation methods is the average matrix loss over 500 replications. The matrix losses are measured by the spectral norm of the correlation coefficient matrix and the covariance matrix, similar to that performed in [18]. To aid visualization, we define the relative estimation errors between the estimated matrix and the actual matrix for both the correlation coefficient matrix R and the covariance matrix Σ as follows:
e c o r = R ^ R R × 100 % , e c o v = Σ ^ Σ Σ × 100 % .
where A = { λ m a x ( A A ) } 1 / 2 denote the operator norm. The next simulation compares the relative matrix estimation errors for various matrix types and algorithms, but also shows how the matrix estimation error changes as the matrix dimension p increases, in order to demonstrate more clearly the superiority of the proposed estimation algorithm.
The following three covariance matrices Σ = ( σ i j ) 1 i , j p are considered.
Example 1. 
σ i j = i j ρ | i j | with ρ = 0.9 .
Example 2. 
σ i j = ( i j ) 1 / 4 r i j where r i i = 1 , r i j = 0.6 | i j | 1 u i j for i j and u i j = u j i are IID uniform on [0, 1]. A similar example was used in [16].
Example 3. 
Constant block matrix
Σ = Σ 1 Σ 2 Σ 2 Σ 3
where Σ 1 = 0.8 Ξ p / 2 + 0.2 I p / 2 , Σ 2 = 0.4 Ξ p / 2 , Σ 3 = 0.3 Ξ p / 2 + 0.7 I p / 2 , Ξ p / 2 is a ( p / 2 ) × ( p / 2 ) matrix with all elements 1, and I p / 2 is the identity matrix of size p / 2 .
Remark 2. 
Due to the assumption of sparse covariance matrices in the adaptive thresholding method, the POET0 method, and the Rothman estimation method [6,15,18], the covariance matrix is estimated as a sparse matrix, where most of the matrix elements are estimated to be zero. This is significantly different from the non-sparse covariance matrix in Example 3. The relative estimation error of the covariance matrix estimated by these methods is relatively large compared to the sample covariance estimation method. Therefore, the adaptive thresholding method, the POET0 method, and the Rothman method cannot be used directly for Example 3.
Note that Σ can be written as
Σ = Σ 0 + λ 1 1 1 + λ 2 2 2 ,
where Σ 0 is a sparse matrix and λ 1 and λ 2 are the first two largest eigenvalues of Σ, with 1 and 2 as the corresponding eigenvectors, respectively. By this decomposition, the POETk method with k = 2 factors in [19] can be used, i.e., POET2 is applicable to the estimation for Example 3.
Random samples are generated from X N ( 0 , Σ ) with sample sizes n = ( 100 , 200 ) and p = ( 100 , 500 , 900 , 1400 , 1900 , 2300 , 2500 ) . The averages of the relative estimation errors, based on 500 replications, are depicted in Figure 2, Figure 3 and Figure 4. The ELA method is represented by a solid red line. Only methods with a performance comparable to the best one are displayed in each panel.
In Figure 2 for Example 1 with n = 100 , our ELA estimator for the correlation coefficient matrix R is slightly inferior to POET0 but exceeds Rothman. For the covariance matrix Σ , our method slightly lags behind both POET0 and Rothman. However, with n = 200 , the ELA method outperforms them in both the correlation coefficient matrix and the covariance matrix with p < 1000 , ranking first for the correlation coefficient matrix and second for the covariance matrix with p > 1000 . In general, the ELA method is comparable to both POET and Rothman for Example 1.
In Figure 3 for Example 2, the ELA estimator consistently outperforms both POET and Rothman for both matrices when n = ( 100 , 200 ) . Although Remark 2 suggests that POET2 is suitable for Example 3, Figure 4 for Example 3 demonstrates the superior efficiency of the ELA method over POET2 for both matrices with n = ( 100 , 200 ) . In summary, the ELA method significantly outperforms both POET and Rothman for Examples 2 and 3.
Furthermore, we conducted a comprehensive analysis comparing the average computational time across 500 replications for our ELA method alongside the adaptive thresholding method [15], the POET method [19], and the Rothman method [18]. These comparisons are detailed in Table 1. The random samples are still generated from X N ( 0 , Σ ) with Σ following Example 1, sample sizes n = ( 100 , 200 ) , and covariance matrix dimensions p = ( 100 , 500 , 900 , 1400 , 1900 , 2300 , 2500 ) . Notably, due to the POETk methods exhibiting computational times that are broadly analogous for different values of k, we present only the results for POET1.
Upon examination of Table 1, it is evident that, for lower dimensions of p = ( 100 , 500 ) , the ELA method has a higher computational cost compared to the adaptive thresholding and Rothman methods but outperforms the POET method. For p = ( 900 , 1400 ) , the ELA method is slightly slower than adaptive thresholding but remains superior to both the POET and Rothman methods. When dealing with higher dimensions of p = ( 2300 , 2500 ) , the ELA method demonstrates a significant computational advantage over its counterparts. This suggests that the ELA method scales more efficiently with increasing dimensionality, which is a desirable attribute for high-dimensional data analysis.
Our simulation studies have yielded the following conclusions: The ELA method outperforms both POET and Rothman for Examples 2 and 3 in both the correlation coefficient matrix and the covariance matrix with n = ( 100 , 200 ) and p = ( 100 , , 2500 ) . For Example 1, it performs comparably to them. These results emphasize the efficiency of the ELA method for estimating covariance matrices. Our proposed ELA technique is not just computationally efficient but also markedly lowers the relative estimation error, i.e., with minimal loss of spectral norm for the correlation coefficient and the covariance matrix.

5. Real Data Analysis and Portfolio Allocation

The component stocks of the SP500 index were analyzed using historical daily prices sourced from Yahoo Finance through the R package ‘tidyquant’. We selected 430 stocks from SP500 with daily prices spanning over 3000 days, from 1 January 2008 to 31 December 2019, to ensure a sufficiently large sample size. This allowed for a larger window, N = 500 , to be selected when we use the rolling window method later. Our emphasis is on the correlation coefficient matrix and covariance matrix for daily returns. Although the covariance matrix, in particular the individual stock variances, is known to fluctuate over time, the correlation coefficient matrix remains relatively stable. This stability underscores the significance of evaluating estimation methods based on the correlation coefficient matrix [26]. The constant conditional correlation multivariate GARCH model [26] is defined as H t = Λ t R Λ t , where Λ t = d i a g ( σ 1 t , , σ p t ) , R = ( ρ i j ) 1 i , j p is the constant correlation coefficient matrix, and σ i t is the conditional volatility modeled by GARH(1,1) for i-th stock based on its past daily returns. See [27] for more discussion about the performance of GARCH(1,1). Let r k , t be the return of the k-th stock on day t, where k = 1 , , 430 . We begin by standardizing the returns using
u k , t = r k , t / σ k , t .
In order to evaluate the estimation performance of the covariance matrix, we assume that the covariance matrix of u t = ( u 1 , t , , u p , t ) remains constant over time or changes very slowly.
Rooted in modern portfolio theory (MPT), portfolio allocation advocates strategically distributing investments across diverse asset classes to optimize the trade-off between risk and return. The MPT theory highlights the significance of not only choosing individual stocks but also the proportional weighting of these stocks in a portfolio [28,29]. Therefore, we investigate the optimal allocation for minimum risk portfolios of stocks, utilizing the estimated covariance matrix of various methods.
To evaluate various methods, we apply their estimators to construct minimum risk portfolios. For any rolling window of time ( t N , . , t 1 ) , we utilize data from one window to estimate the covariance matrix via different methods, including (1) our ELA method, (2) the POETk method [19] with k = 0 , 1 , 2 factors, (3) the shrinkage method [9], denoted by ShrinkMarket, and (4) the simple sample covariance matrix, denoted by Sample. The simple sample estimate of the covariance matrix Σ for a given time t is defined as Σ ^ t S = N 1 s = t N t 1 ( u s u ¯ ) ( u s u ¯ ) with u s = ( u 1 , s , , u p , s ) .
For an estimated covariance matrix Σ ^ t , the minimum risk portfolio w t = ( w 1 , t , , w p , t ) for time t is defined as
w t = arg min w t w t Σ ^ t w t subject to w i , t 0 , i = 1 , , p w 1 , t + + w p , t = 1 .
The portfolio for time t is then w t u t . To determine the portfolio with the lowest risk, we calculate the standard deviation of the portfolio returns { w t u t , t = N + 1 , , T } . Estimators yielding a smaller standard deviation are deemed superior in portfolio allocation.
To understand how risk varies with the number of stocks, we alphabetically order the stocks by their symbols on the New York Stock Exchange and analyze growing subsets of stocks with p = 50 , 70 , 90 , , 430 . The risks associated with portfolios based on different methods are illustrated in Figure 5, with the ELA method highlighted by a red solid line. As the sample size N in the rolling window increases from 100 to 500 across the panels, the performance gap between the different estimation methods narrows, illustrating the homogeneity of the correlation matrix.
Figure 5 shows that portfolios crafted using the ELA method consistently have the lowest risk compared to the POETk methods, the shrinkage method, and the sample covariance matrix method, especially with smaller sample sizes of N = 100 and N = 200 . Therefore, the ELA method is superior in constructing portfolios that minimize risk, is an effective tool for reducing portfolio risk, and is suitable for a wide range of sample sizes.

6. Conclusions

In this paper, we introduce a novel method called element-aggregation (ELA) estimation for the estimation of covariance matrices. The ELA method stands out due to its simplicity, low computational complexity, and applicability to both sparse and non-sparse matrices. Our theoretical analysis shows that the ELA method offers strong consistency while maintaining computational efficiency and dimensional independence, especially concerning block-wise covariance matrices.
In our numerical simulation study, we highlight the exceptional effectiveness of the ELA method in estimating correlation coefficient matrices and covariance matrices using diverse random samples. In comparison to established methods like POET and Rothman, the ELA method consistently either outperforms or equals them. The computational efficiency of our ELA method is complemented by a significant reduction in the relative estimation error for both correlation coefficients and covariance matrices.
In the real data analysis of the SP500 index stocks, the ELA method consistently generates portfolios with the lowest risk in comparison to other listed methods. This outcome is particularly pronounced in scenarios with smaller sample sizes, underscoring the potent ability of the ELA method to construct risk-minimized portfolios.

7. Future Work

This paper delves into the estimation of covariance matrices employing an elemental clustering approach, substantiating and evaluating the consistency and efficiency of the proposed methodology within the realm of high-dimensional data. The exponential growth in data volume and dimensionality due to advancements in computational and storage technologies has led to the emergence of ultra-high-dimensional and high-frequency datasets. Thus, the extensibility of the proposed method to these novel datasets, coupled with the demonstration of its consistency and efficacy, presents a significant avenue for future research. In addition, the exploration of computational strategies to increase efficiency under the constraints of ultra-high-dimensionality and to optimize the trade-off between computational resources and analytical accuracy is imperative. Furthermore, the proposed method has the potential to be integrated into diverse domains, such as psychology, social sciences, and genetic research. This also represents a promising direction for future research.

Funding

This research was funded by the Doctoral Foundation of Yunnan Normal University (Project No. 2020ZB014) and the Youth Project of Yunnan Basic Research Program (Project No. 202201AU070051).

Data Availability Statement

The author confirms that the data supporting the findings of this study are available within the article.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Appendix A. Proof of Lemma 1

Proof. 
First, recall that ς ˇ k = 1 # k ( i , j ) k σ ^ i j with σ i j = ς k for all ( i , j ) k . We regard σ ^ i j as independent zero-mean random variables with E ( σ ^ i j ) = ς k , for all ( i , j ) k . By Bernstein inequalities we have
| ς ˇ k ς k | = V k # k O P log ( p n ) n , k = 1 , , K .
where V k = V a r ( ς ˇ k ) . For symmetric matrices, by [30], we have
Σ ˇ Σ max j i = 1 p | σ ˇ i j σ i j | .
For the right hand side above,
i = 1 p | σ ˇ i j σ i j | = k = 1 K i : σ i j = ς k | σ ˇ i j ς k | = k = 1 K n j , k | ς ˇ k ς k | O P log ( p n ) n k = 1 K n j , k # k V k .
We complete the proof. □

Appendix B. Proof of Lemma 2

Proof. 
Let D n = ( M log ( p n ) ) 1 / 2 ϱ a b / n . For any ( i , j ) such that σ i j = k , recall that ˜ i j = { ( a , b ) : | σ ^ i j σ ^ a b | < D n } . Let W 1 = { ( a , b ) k : ( a , b ) ˜ i j } , and W 2 = { ( a , b ) ˜ i j : ( a , b ) k } . Obviously,
{ ˜ i j k } W 1 W 2 .
Thus, it is sufficient to prove
P r ( W 1 ) 0 and P r ( W 2 ) 0 .
For any ( a , b ) W 1 , we have ( a , b ) k , i.e., σ a b = ς k , and
{ ( a , b ) ˜ i j } = | σ ^ i j σ ^ a b | D n { | σ ^ i j ς k | 1 2 D n } { | σ ^ a b ς k | 1 2 D n } .
For any σ a b = ς k , if X follows normal distribution, by Bernstein inequality, we have
P r | σ ^ a b ς k | 1 2 D n exp ( 1 16 M log ( p n ) ) = ( p n ) M / 16
and
P r | σ ^ i j σ ^ a b | D n P r | ς k σ ^ s t | 1 2 D n + P r | ς k σ ^ i j | 1 2 D n 2 ( p n ) M / 16 .
Thus, as n ,
P r ( W 1 ) ( a , b ) k P r { ( a , b ) ˜ i j } 3 ( p n ) M / 16 p 2 0 , if M > 32 .
Now, for any ( a , b ) W 2 , we have ( a , b ) k , but instead ( a , b ) s for some s k . By the assumption, we have | ς k ς s | > 2 D n when k s and log ( p n ) / n 0 . Thus,
| σ ^ i j σ ^ a b | D n | ( σ ^ i j ς k ) + ( ς s σ ^ a b ) + ( ς k ς s ) | D n ( | ς k ς s | | σ ^ i j ς k | | ς s σ ^ a b | ) D n = ( | σ ^ i j ς k | + | ς s σ ^ a b | ) ( | ς k ς s | D n ) ( | σ ^ i j ς k | + | ς s σ ^ a b | ) D n | σ ^ i j ς k | 1 2 D n | ς s σ ^ a b | 1 2 D n .
Similar to (A2) and (A3), we have
P r ( W 2 ) ( a , b ) k P r { ( a , b ) ˜ i j } 3 ( p n ) M / 16 p 2 0 , if M > 32 .
Thus, Lemma 2 follows from (A3) and (A4). □

Appendix C. Proof of Theorem 1

Proof. 
It follows immediately by applying Lemma 1 to 1 i , j p { ˜ i j k } in Lemma 2. □

Appendix D. Proof of Corollary 1

Proof. 
For simplicity, we only consider the case with E X = 0 and σ ^ i j = n 1 k = 1 n x k i x k j . For any = 1 , , n , define π , i j = x i x j and π i j = x i x j . Let 1 = { ( i , j ) : σ i j = 0 } . By the assumption, we have # 1 = p ( p 1 ) ( 1 o ( 1 ) ) and # k = O ( c ( p ) ) for k = 2 , , K . Note that
V k 2 = V a r ( ( i , j ) k π , i j / # k ) # k max ( i , j ) k V a r ( σ ^ i j ) = O ( 1 ) # k .
Obviously, if we can prove that for those σ i j = 0 , the covariance
V 1 2 = V a r ( ( i , j ) 1 π i j / # 1 ) = O ( c ( p ) 2 ) ,
then
k = 1 K n j , k # k V k p # 1 V 1 + k = 2 K V k p # 1 O ( c ( p ) ) + k = 2 K # k O ( 1 ) = O ( c ( p ) ) ,
thus the Corollary 1 follows.
By the properties of normal distribution, we have
C o v ( π i j , π a b ) = E ( x i x j x a x b ) σ i j σ a b = E ( x i x j ) E ( x a x b ) + E ( x i x a ) E ( x j x b ) + E ( x i x b ) E ( x j x a ) σ i j σ a b = E ( x i x a ) E ( x j x b ) + E ( x i x b ) E ( x j x a ) = σ i a σ j b + σ i b σ j a .
Thus, by assumption that # { σ i j 0 , 1 i < j p } = O ( p c ( p ) ) , it follows that
# { C o v ( x 1 i x 1 j , x 1 s x 1 t ) 0 , i < j , s < t } = O ( p 2 c ( p ) 2 ) .
Thus,
V a r ( ( i , j ) 1 π i j / # 1 ) = O ( p 2 c ( p ) 2 / p 2 ) = O ( c ( p ) 2 ) .
This completes the proof of (A6). □

Appendix E. Proof of Corollary 2

Proof. 
By (A7) and the assumption that σ i j = ς | i j | , we have
| C o v ( π i j , π a b ) | = O ( ς | i a | ς | j b | + ς | i b | ς | j a | ) .
For ( i , j ) and any k > 0 , define
A k = { ( a , b ) : | i a | = k and | j b | k } { ( a , b ) : | i a | k and | j b | = k } .
It is easy to see that min ( | i a | + | j b | , | i b | + | j a | } k for any ( a , b ) A k and # A k = 8 k . Thus,
ς | i a | ς | j b | + ς | i b | ς | j a | = O ( c k ) , for   any ( a , b ) A k ,
and
( a , b ) | C o v ( π i j , π a b ) | = k = 1 p ( a , b ) A k | C o v ( π i j , π a b ) | = k = 1 p 8 k O ( c k ) C 0
where C 0 is a constant. Moreover, for any set S ,
V a r ( ( a , b ) S π a b ) C 0 # S .
Let Π = x 1 x 1 + + + x p x p . In a similar calculation, we can show that
V a r ( Π ) = O ( p ) .
Thus, we have
| ς ˜ ς | = O log ( p n ) ( p ) n .
Let I = { k : | ς k ς | > D n for all k }, i.e., I is the set of elements in the matrix that can be identified by D n . It can also easily be seen that # I = O ( log ( p ) ) , and thus
k I | ς ˜ k ς k | = O ( # I log ( p n ) ( p # I ) n ) = O ( log ( p n ) n ) .
Let I I = { k : | ς k | D n / p 2 } , i.e., I I are all the elements in the matrix that are very small by themselves. When p n is big enough, I I I = and # I I = p O ( log ( p n ) ) . For any | σ i j | D n / p 2 , let
σ ˜ i j = 1 # ˜ i j ( s , t ) ˜ i j σ ^ s t , and σ ¯ i j = 1 # ˜ i j ( a , b ) ˜ i j σ a b ,
where ˜ i j = { ( a , b ) : | σ ^ i j σ ^ a b | < D n } . It is easy to see that # ˜ i j = p ( p O ( log ( p n ) ) , and by (A9),
V a r ( σ ^ i j ) = O ( 1 / ( n p 2 ) ) .
Thus,
max ( i , j ) k I I k | σ ˜ i j σ ¯ i j | = O p ( log ( p n ) p 2 n ) .
On the other hand, we can see
k I I | ς k | ( D n / p 2 ) i = 1 c i = O ( D n / p 2 ) ,
and thus,
max ( i , j ) k I I k | σ i j σ ¯ i j | = O ( D n / p 2 ) .
It follows from (A11) and (A12) that
max ( i , j ) k I I k | σ ˜ i j σ i j | = O p ( log ( p n ) p 2 n ) .
and
k I I | ς ˜ k ς k | = p O p ( log ( p n ) p 2 n ) = O p ( log ( p n ) n ) .
Finally, let I I I = Ω I I I . Because ς k = O ( c k ) , if ( i , j ) I we must have c k c k + 1 < c 1 D n for some c 1 > 0 , where σ i j = ς k . Thus, σ i j = O ( D n ) for any ( i , j ) I I I . It is easy to see that # I I I = O ( log ( p ) ) . Let σ ¯ i j be defined similarly as above. Because | σ ^ i j σ i j | = O ( D n ) and by definition
sup ( a , b ) i j | σ ^ i j σ ^ a b | D n
thus
| σ ¯ i j σ i j | < 4 D n .
On the other hand, we have
max ( i , j ) k I I I k | σ ˜ i j σ ¯ i j | = O ( log ( p ) ) O p ( log ( p n ) ( log p ) 2 n ) = O p ( log ( p n ) n ) .
Similar to (A13), it follows from the above two equations that
k I I I | ς ˜ k ς k | O ( log ( p ) D n ) .
The lemma follows from (A1), (A10), (A13), and (A14). □

Appendix F. Proof of Corollary 3

Proof. 
It is easy to see that in Σ there are at most M ( M + 3 ) / 2 different values, which is fixed as p . By Lemma 2, these elements can be consistently identified and be estimated with root-n consistency, i.e., for any δ > 0 , we can find a subset of probability space S n : P r ( S n ) > 1 δ on which
| | Λ ˜ M Λ M | | = O P ( n 1 / 2 ) , | | Φ ˜ M Φ M | | = O P ( n 1 / 2 ) .
where Λ ˜ M = diag ( λ ^ 1 , , λ ^ M ) and that
Σ ˜ = Λ ˜ M I m + Φ ˜ M Ξ m .
Let τ ^ 1 , , τ ^ M be the M eigenvalues of Φ ˜ M , with corresponding eigenvectors μ ^ 1 , , μ ^ M , respectively. Then, it is easy to check that the eigenvalues of Φ ˜ M Ξ m are m τ ^ 1 , , m τ ^ M and 0 of ( m 1 ) M replications. Because I m is the identity matrix with the same dimension as Ξ m , the eigenvalues for Λ ˜ M I m + Φ ˜ M Ξ m are λ ^ 1 + m τ ^ 1 , , τ ^ m + m λ ^ M and λ ^ 1 of ( m 1 ) replications, …, and λ ^ m of ( m 1 ) replications.
Let the M eigenvectors of Φ M be μ 1 , , μ M , respectively. Note that the first eigenvectors of Ξ m are ν 1 = ( 1 , . , 1 ) / m . Thus, the first M eigenvectors of Λ ˜ M I m + Φ ˜ M Ξ m are ^ M = μ ^ k ν 1 , k = 1 , , M and that of Λ M I m + Φ M Ξ m are k = μ k ν 1 , k = 1 , , M , respectively. It is easy to verify that
| | ^ k k | | = | | μ ^ k ν 1 μ k ν 1 | | = | | μ ^ k μ k | | × | | ν 1 | | = | | μ ^ k μ k | | = O P ( n 1 / 2 )
for k = 1 , , M . □

References

  1. Ledoit, O.; Wolf, M. The power of (non-) linear shrinking: A review and guide to covariance matrix estimation. J. Financ. Econ. 2022, 20, 187–218. [Google Scholar] [CrossRef]
  2. Zeileis, A. Econometric Computing with HC and HAC Covariance Matrix Estimators. J. Stat. Softw. 2004, 11, 1–17. [Google Scholar] [CrossRef]
  3. Ledoit, O.; Wolf, M. Nonlinear shrinkage of the covariance matrix for portfolio selection: Markowitz meets Goldilocks. Rev. Financ. Stud. 2017, 30, 4349–4388. [Google Scholar] [CrossRef]
  4. Schäfer, J.; Strimmer, K. A shrinkage approach to large-scale covariance matrix estimation and implications for functional genomics. Stat. Appl. Genet. Mol. 2005, 4, 1–32. [Google Scholar] [CrossRef]
  5. Tipping, M.E.; Bishop, C.M. Probabilistic principal component analysis. J. R. Stat. Soc. Ser. B Stat. Method. 1999, 61, 611–622. [Google Scholar] [CrossRef]
  6. Fan, J.; Fan, Y.; Lv, J. High dimensional covariance matrix estimation using a factor model. J. Econom. 2008, 147, 186–197. [Google Scholar] [CrossRef]
  7. Driscoll, J.C.; Kraay, A.C. Consistent covariance matrix estimation with spatially dependent panel data. Rev. Econ. Stat. 1998, 80, 549–560. [Google Scholar] [CrossRef]
  8. Pourahmadi, M. Maximum likelihood estimation of generalised linear models for multivariate normal covariance matrix. Biometrika 2000, 87, 425–435. [Google Scholar] [CrossRef]
  9. Ledoit, O.; Wolf, M. Improved estimation of the covariance matrix of stock returns with an application to portfolio selection. J. Empir. Financ. 2003, 10, 603–621. [Google Scholar] [CrossRef]
  10. Ledoit, O.; Wolf, M. A well-conditioned estimator for large-dimensional covariance matrices. J. Multivar. Anal. 2004, 88, 365–411. [Google Scholar] [CrossRef]
  11. Pepler, P.T.; Uys, D.W.; Nel, D.G. Regularized covariance matrix estimation under the common principal components model. Commun. Stat. Simul. Comput. 2018, 47, 631–643. [Google Scholar] [CrossRef]
  12. Bickel, P.J.; Levina, E. Regularized estimation of large covariance matrices. Ann. Stat. 2008, 36, 199–227. [Google Scholar] [CrossRef]
  13. El Karoui, N. Operator norm consistent estimation of large-dimensional sparse covariance matrices. Ann. Stat. 2008, 36, 2717–2756. [Google Scholar] [CrossRef]
  14. Rothman, A.J.; Levina, E.; Zhu, J. Generalized thresholding of large covariance matrices. J. Am. Stat. Assoc. 2009, 104, 177–186. [Google Scholar] [CrossRef]
  15. Cai, T.; Liu, W. Adaptive thresholding for sparse covariance matrix estimation. J. Am. Stat. Assoc. 2011, 106, 672–684. [Google Scholar] [CrossRef]
  16. Cai, T.T.; Yuan, M. Adaptive covariance matrix estimation through block thresholding. Ann. Stat. 2012, 40, 2014–2042. [Google Scholar] [CrossRef]
  17. Lam, C.; Fan, J. Sparsistency and rates of convergence in large covariance matrix estimation. Ann. Stat. 2009, 37, 4254–4278. [Google Scholar] [CrossRef] [PubMed]
  18. Rothman, A.J. Positive definite estimators of large covariance matrices. Biometrika 2012, 99, 733–740. [Google Scholar] [CrossRef]
  19. Fan, J.; Liao, Y.; Mincheva, M. Large covariance estimation by thresholding principal orthogonal complements. J. R. Stat. Soc. Ser. B Stat. Method. 2013, 75, 603–680. [Google Scholar] [CrossRef]
  20. Onnela, J.P.; Chakraborti, A.; Kaski, K.; Kertesz, J.; Kanto, A. Dynamics of market correlations: Taxonomy and portfolio analysis. Phys. Rev. E 2003, 68, 056110. [Google Scholar] [CrossRef]
  21. Matteson, D.; Tsay, R.S. Multivariate Volatility Modeling: Brief Review and a New Approach; Manuscript, Booth School of Business, University of Chicago: Chicago, IL, USA, 2011. [Google Scholar]
  22. Qian, J. Shrinkage Estimation of Nonlinear Models and Covariance Matrix. Doctoral Thesis, National University of Singapore, Department of Statistics and Applied Probability, Singapore, 2012. Available online: https://core.ac.uk/download/pdf/48656486.pdf (accessed on 27 March 2024).
  23. Jiang, H.; Saart, P.W.; Xia, Y. Asymmetric conditional correlations in stock returns. Ann. Appl. Stat. 2016, 10, 989–1018. [Google Scholar] [CrossRef]
  24. Elton, E.J.; Gruber, M.J. Estimating the dependence structure of share prices–implications for portfolio selection. J. Financ. 1973, 28, 1203–1232. [Google Scholar]
  25. Liu, S. Matrix results on the Khatri-Rao and Tracy-Singh products. Linear Algebra Appl. 1999, 289, 267–277. [Google Scholar] [CrossRef]
  26. Bollerslev, T. Modeling the coherence in short-run nominal exchange rates: A multivariate generalized ARCH model. Rev. Econ. Stat. 1990, 72, 498–505. [Google Scholar] [CrossRef]
  27. Hansen, P.R.; Lunde, A. A forecast comparison of volatility models: Does anything beat a GARCH(1,1)? J. Appl. Econ. 2005, 20, 873–889. [Google Scholar] [CrossRef]
  28. Aghamohammadi, A.; Dadashi, H.; Sojoudi, M.; Sojoudi, M.; Tavoosi, M. Optimal portfolio selection using quantile and composite quantile regression models. Commun. Stat. Simul. Comput. 2022, 1–11. [Google Scholar] [CrossRef]
  29. Zhang, Z.; Yue, M.; Huang, L.; Wang, Q.; Yang, B. Large portfolio allocation based on high-dimensional regression and Kendall’s Tau. Commun. Stat. Simul. Comput. 2023, 1–13. [Google Scholar] [CrossRef]
  30. Golub, G.H.; Van Loan, C.F. Matrix Computations; Johns Hopkins University Press: Baltimore, MD, USA, 2013. [Google Scholar]
Figure 1. Correlation coefficient matrix of the daily returns of 9 companies with stock symbols AIG, BA, BAC, GS, INTC, JPM, MS, PG, and WFC on the NYSE market.
Figure 1. Correlation coefficient matrix of the daily returns of 9 companies with stock symbols AIG, BA, BAC, GS, INTC, JPM, MS, PG, and WFC on the NYSE market.
Mathematics 12 01045 g001
Figure 2. Relative estimation error for the correlation matrix (left) and the covariance matrix (right) in Example 1 with sample sizes n = 100 and 200; the estimation is based on 500 replications per data point.
Figure 2. Relative estimation error for the correlation matrix (left) and the covariance matrix (right) in Example 1 with sample sizes n = 100 and 200; the estimation is based on 500 replications per data point.
Mathematics 12 01045 g002
Figure 3. Relative estimation error for the correlation matrix (left) and the covariance matrix (right) in Example 2 with sample sizes n = 100 and 200; the estimation is based on 500 replications per data point.
Figure 3. Relative estimation error for the correlation matrix (left) and the covariance matrix (right) in Example 2 with sample sizes n = 100 and 200; the estimation is based on 500 replications per data point.
Mathematics 12 01045 g003
Figure 4. Relative estimation error for the correlation matrix (left) and the covariance matrix (right) in Example 3 with sample sizes n = 100 and 200; the estimation is based on 500 replications per data point.
Figure 4. Relative estimation error for the correlation matrix (left) and the covariance matrix (right) in Example 3 with sample sizes n = 100 and 200; the estimation is based on 500 replications per data point.
Mathematics 12 01045 g004
Figure 5. Portfolios of minimum risk based on different estimators of the covariance matrix.
Figure 5. Portfolios of minimum risk based on different estimators of the covariance matrix.
Mathematics 12 01045 g005
Table 1. The average computational time over 500 replications of the ELA method, the adaptive thresholding method, the POET1 method, and the Rothman method for sample sizes n = ( 100 , 200 ) and p = ( 100 , 500 , 900 , 1400 , 1900 , 2300 , 2500 ) (in seconds).
Table 1. The average computational time over 500 replications of the ELA method, the adaptive thresholding method, the POET1 method, and the Rothman method for sample sizes n = ( 100 , 200 ) and p = ( 100 , 500 , 900 , 1400 , 1900 , 2300 , 2500 ) (in seconds).
MethodAdaptive ThresholdingPOETRothmanELA
p n = 100 n = 200 n = 100 n = 200 n = 100 n = 200 n = 100 n = 200
1000.0050.0060.2220.8150.0120.0130.0340.037
5000.3700.4090.8062.0420.8280.8300.8630.930
9002.1762.2072.6575.1634.6804.9102.8493.131
14007.9677.4679.24812.71118.44418.4427.7438.087
230044.90143.97434.24142.22881.29281.83420.52523.054
250060.11760.87643.07752.826108.323110.30125.31725.570
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Yang, J. Element Aggregation for Estimation of High-Dimensional Covariance Matrices. Mathematics 2024, 12, 1045. https://doi.org/10.3390/math12071045

AMA Style

Yang J. Element Aggregation for Estimation of High-Dimensional Covariance Matrices. Mathematics. 2024; 12(7):1045. https://doi.org/10.3390/math12071045

Chicago/Turabian Style

Yang, Jingying. 2024. "Element Aggregation for Estimation of High-Dimensional Covariance Matrices" Mathematics 12, no. 7: 1045. https://doi.org/10.3390/math12071045

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop