Estimation of Large-Dimensional Covariance Matrices via Second-Order Stein-Type Regularization

This paper tackles the problem of estimating the covariance matrix in large-dimension and small-sample-size scenarios. Inspired by the well-known linear shrinkage estimation, we propose a novel second-order Stein-type regularization strategy to generate well-conditioned covariance matrix estimators. We model the second-order Stein-type regularization as a quadratic polynomial concerning the sample covariance matrix and a given target matrix, representing the prior information of the actual covariance structure. To obtain available covariance matrix estimators, we choose the spherical and diagonal target matrices and develop unbiased estimates of the theoretical mean squared errors, which measure the distances between the actual covariance matrix and its estimators. We formulate the second-order Stein-type regularization as a convex optimization problem, resulting in the optimal second-order Stein-type estimators. Numerical simulations reveal that the proposed estimators can significantly lower the Frobenius losses compared with the existing Stein-type estimators. Moreover, a real data analysis in portfolio selection verifies the performance of the proposed estimators.


Introduction
As a fundamental problem in modern multivariate statistics and various practical applications, estimating the covariance matrix of a large-dimensional random vector has attracted significant interest in the last two decades [1,2]. The traditional sample covariance matrix (SCM) becomes unstable and ill-conditioned when the dimension increases proportionally with the sample size. The algorithm, which still employs SCM as the covariance matrix estimator, will result in a drop in performance or failure [3,4]. Although remaining unbiased, the SCM is not a satisfactory estimator of the actual covariance matrix anymore [5,6]. Therefore, it is of great concern to develop well-conditioned covariance matrix estimators in large-dimensional scenarios [7][8][9].
A comprehensive point of view to obtain well-conditioned estimators is by improving the SCM [10]. In the Stein-type regularization (linear shrinkage estimation), a target matrix is preset by rectifying the SCM according to prior information on the covariance structure. For example, the spherical target is a scalar matrix, with the coefficient being the average of the diagonal elements of the SCM [11]. The diagonal target is a diagonal matrix which retainsthe diagonal elements of the SCM [12,13]. Moreover, the Toeplitz-structured target is formed by averaging each diagonal of the SCM [14]. To some extent, the target matrices are covariance matrix estimators. Despite being biased, they usually enjoy low variances. The Stein-type estimator combines the SCM and the target matrix to balance the bias and variance. This method generates a well-conditioned estimator for the spherical target matrix by retaining the sample eigenvectors and shrinking the sample eigenvalues toward their grand mean. Moreover, the Stein-type regularization can be translated as a weighted average between the SCM and the target matrix. Many Stein-type estimators have been developed for various target matrices. Additionally, it is worth mentioning that the optimal Stein-type estimator can be expressed in closed form and significantly outperforms the traditional SCM under some appropriate criteria [15][16][17].
In addition, the nonlinear shrinkage estimation is proposed based on the random matrix theory [18]. By taking spectral decomposition on the SCM, this method retains the sample eigenvectors and estimates the actual eigenvalues by taking a nonlinear transformation on the sample eigenvalues. Then, the nonlinear shrinkage estimator is obtained by assembling the estimated eigenvalues and the sample eigenvectors. In the mean squared error (MSE) sense, the resulting nonlinear shrinkage estimator enjoys a significant advantage over the SCM. It also outperforms the Stein-type estimator for the spherical target. However, both the sample eigenvalues and sample eigenvectors have serious deficiencies in a high-dimensional case [19]. Hence, the existing nonlinear shrinkage strategy, which modifies the sample eigenvalues whileretaining the sample eigenvectors, has some limitations in improving the SCM performance. Moreover, the method can hardly handle the prior structure information employed in the Stein-type regularization. Hence, developing a new nonlinear shrinkage technique is essential to generate outperformed covariance matrix estimators.
This paper combines the SCM and the target matrix via a nonlinear shrinkage strategy to obtain well-conditioned estimators of a large-dimensional covariance matrix. The main contributions are the following: 1.
The second-order Stein-type estimator is modeled as a quadratic polynomial concerning the SCM and an almost surely (a.s.) positive definite target matrix. For the spherical and diagonal target matrices, the MSEs between the second-order Stein-type estimator and the actual covariance matrix are unbiasedly estimated under Gaussian distribution.

2.
We formulate the second-order Stein-type estimators for the two target matrices as convex quadratic programming problems. Then, the optimal second-order Stein-type estimators are immediately obtained.

3.
Some numerical simulations and application examples are provided for comparing the proposed second-order Stein-type estimators with the existing linear and nonlinear shrinkage estimators.
The outline of this paper is as follows. Section 2 proposes the second-order Steintype estimator based on the Stein-type regularization. In Section 3, the spherical and diagonal matrices are employed as the target matrices. We obtain unbiased estimates of the MSEs between the second-order Stein-type estimators and the actual covariance matrix. The optimal second-order Stein-type estimators are obtained by solving the corresponding optimization problems. Section 4 provides some numerical simulations and two examples to discover the performance of the proposed estimators in large-dimensional scenarios. Section 5 concludes the major work of this paper.

Notation, Motivation, and Formulation
The symbol R p denotes the set of entire p-dimensional real column vectors, R m×n denotes the set of entire m × n real matrices, and S p denotes the set of entire p × p real symmetric matrices. The bold symbol E denotes the squared matrix having all entries 1 with appropriate dimensions. The symbol I p denotes the p × p identity matrix. For a matrix A, A T , tr(A), and A , we denote its transpose, trace, and Frobenius matrix norm, respectively. For two matrices A and B, A • B means their Hadamard (element-wise) product.
Assume that x 1 , x 2 , . . . , x n ∈ R p is an independent and identically distributed (i.i.d.) sample drawn from a certain distribution with mean 0 and covariance matrix Σ. The SCM S is defined by x m x T m .
As is widely known, the SCM is ill conditioned in large-dimension scenarios and is even singular when p > n. The Stein-type regularization can produce a well-conditioned covariance matrix estimator based on the SCM [20][21][22][23].
For an a.s. positive definite target matrix T representing the prior information of the covariance structure, the Stein-type estimator combines the SCM and the target matrix with a linear function Through an equivalent transformation, the expression given by (2) can be recombined as where T − S is a regularization term and w is a tuning parameter. Moreover, the tuning parameter w lies in (0, 1] so as to keep the Stein-type estimator a.s. positive definite even when n < p. An interesting fact is that the matrix (T − S) 2 is still symmetric and positive definite, which motivates us to consider further a quadratic function of the SCM and the target matrix.
For an a.s. positive definite target matrix T, we model the second-order Stein-type estimator of the covariance matrix Σ aŝ where w 1 , w 2 are the tuning parameters. It is easy to find out thatΣ ∈ S p . In the same manner, we further assume w 1 ∈ (0, 1] and w 2 ≥ 0 to keep the covariance estimator given by (4) to be a.s. positive definite. We note that the constraint on the tuning parameters is an easy-to-implement condition but is not necessary. One can also consider alternative assumptions, such as the condition number constraint, to obtain positive definite estimators of the large-dimensional covariance matrix [24]. Next, we choose the optimal tuning parameters in (4). In the Stein-type regularization, the MSE between the actual covariance matrix and its estimator is the most commonly used loss function. It includes unknown scalars concerning the expectation operator and the actual covariance matrix. One practical way is to make the MSE available by estimating the unknown scalars and obtaining the optimal shrinkage intensity by minimizing the available MSE [15]. Therefore, we still follow the above steps to find optimal tuning parameters in the second-order Stein-type estimator. To be specific, the loss function of the second-order Stein-type estimatorΣ is defined by where w = (w 1 , w 2 ) T . Through substituting the expression ofΣ into (5), we can obtain where Therefore, the second-order Stein-type regularization can be modeled as the following optimization problem: where e 1 = (1, 0) and e 2 = (0, 1). It is easy to see that the loss function in problem (10) is a binary quadratic polynomial function about w, and H(T) is the Hessian matrix. By the Cauchy-Schwarz inequality, we have Therefore, the Hessian matrix H(T) is positive definite. Then, the optimization problem (10) is a convex quadratic program. However, it cannot be solved because the quantities H(T), b(T), and c in the objective function are unknown. When the underlying distribution is given and the target matrix is prespecified, we can estimate the unknown quantities H(T), b(T), and c. Then, the optimization problem (10) turns out to be available based on plug-in strategy and can be effectively solved.

Remark 1.
The unknown quantity c does not affect the choice of optimal tuning parameter w in the optimization problem (10). Moreover, it is the theoretical MSE between the actual covariance matrix and the classic SCM and plays an important role in evaluating the performance of improved covariance estimators based on the SCM.

Optimal Second-Order Stein-Type Estimators
In this section, as the target matrix is specified, we estimate the corresponding unknown quantities under Gaussian distribution, then establish the available version of the optimization problem to obtain the optimal second-order Stein-type estimator.

Target Matrices
As mentioned before, the target matrix represents the prior information of the actual covariance structure. In the Stein-type regularization, the commonly used target matrices include the spherical target, the diagonal target, the Toeplitz-structured target, and the tapered SCM. Among these, the spherical target and the diagonal target are a.s. positive definite, whereas both the Toeplitz-structured target and the tapered SCM are unnecessary. Thereby, in the second-order Stein-type regularization, we employ the spherical target and the diagonal target, given by The diagonal target T 2 is also denoted as D S because it consists of the diagonal elements of the SCM.

Available Loss Functions
For the target matrices T 1 and T 2 , we unbiasedly estimate the loss functions given by (6) through plugging in the estimates of unknown quantities H(T i ), b(T i ), i = 1, 2 and c under Gaussian distribution.
First of all, by directly removing the expectation operator, the Hessian matrices H(T i ) can be estimated byĤ Furthermore,Ĥ(T i ), i = 1, 2 are, respectively, unbiased estimates of H(T i ), i = 1, 2.
Next, for i = 1, 2, we decompose the unknown vectors b(T i ) into two terms, Similar to the Hessian matrices H(T 1 ) and H(T 2 ), the first term u(T i ) can be unbiasedly estimated byû Therefore, we only need to estimate the second term v(T i ). It is challenging to estimate v(T i ) unbiased because it includes both the expectation operator and the actual covariance matrix Σ. We need the following moment properties about the Wishart distribution [25]. Lemma 1. Denote A and B as arbitrary symmetric nonrandom matrices, S is the sample covariance matrix given by (1), and then the following equalities hold under Gaussian distribution: By Lemma 1, letting A = B = I p , we can obtain Moreover, letting A = Σ and B = I p , we have Lemma 1 is very helpful to compute the second term v(T i ) in (14). For the spherical target matrix T 1 , we have and For the diagonal target matrix T 2 , we have where D Σ = diag(Σ 11 , . . . , Σ pp ). Moreover, we can obtain Denote then the vectors v(T i ), i = 1, 2 can be rewritten as It is worth noting that each element in v(T i ), i = 1, 2 is a linear combination of the quantities in (27). Therefore, we only need to find out the estimates of the quantities in (27).
Firstly, the unbiased estimates of the quantities a i , i = 1, 2, 3 were proposed in [12,26,27], where τ a = n (n−1)(n+2) . Secondly, denote the matrix W as Then, the unbiased estimates of b i , i = 1, 2, 3 can be obtained by the following theorem.

Theorem 1.
Under Gaussian distribution, the following equation holds when n ≥ 3: Proof. The actual covariance matrix Σ has the spectral decomposition which is described as  N(0, I p ). Let X = (x 1 , . . . , x n ) and Z = (z 1 , . . . , z n ); then, we have X = FZ. Notice that the SCM is S = 1 n ∑ n m=1 x m x T m ; therefore, we have Moreover, we can obtain tr(nS) = tr(FZZ T F) = tr(Z T ΣZ) = tr(Z T Γ T ΛΓZ).
Define a matrix Q T = Z T Γ T = (q 1 , . . . , q p ) and denote v ii = q T i q i and v ij = q T i q j = q T j q i for i, j ∈ {1, . . . , p}, then the above equation can be rewritten as In a same manner, we can obtain and tr(nS By the moment properties of random variables v ii and v ij in [26,28], we have where i, j, k are arbitrary mutually unequal numbers in {1, . . . , p}. Next, we compute the mathematical expectation of Equations (38)-(40) based on the moment properties in (41).
Then, the unknown scalars b i , i = 1, 2, 3 can be unbiasedly estimated by Remark 2. In a large sample scenario, the unknown scalars b i , i = 1, 2, 3 can be consistently estimated by tr(S 3 ), tr 3 (S) and tr(S) tr(S 2 ) [29]. Theorem 1 shows that these estimates are biased. Moreover, the biases become non-ignorable in high-dimensional situations [30]. Furthermore, by Theorem 1, the biases can be eliminated by the linear combinations of tr(S 3 ), tr 3 (S), and tr(S) tr(S 2 ).
Thirdly, denote the matrices G, R, and K as follows: where x i is the observations of i-th variable for i = 1, . . . , p, then the following theorem holds.

Proof.
Let F = ( f ij ), and F is a symmetric matrix; then, we have F 2 = Σ = (σ ij ). Therefore, for arbitrary i, j ∈ {1, . . . , p}, the equalities σ ij = ∑ p k=1 f ik f jk and σ ii = ∑ p k=1 f 2 ik hold. For m = 1, . . . , n, denote x m = (x m1 , x m2 , . . . , x mp ) T and z m = F −1 x m , then z m is an i.i.d. sample and z m ∼ N(0, I p ). Let z m = (z m1 , z m2 , . . . , z mp ) T , then z mk , m = 1, . . . , n, k = 1, . . . , p are mutually independent standard Gaussian random variables. For arbitrary m ∈ {1, . . . , n} and i, j ∈ {1, . . . , p}, we have x mi = ∑ p k=1 f ik z mk and x mj = ∑ p k=1 f jk z mk . Denote that SCM S = (s ij ), then s ij can be decomposed as follows: Then, for arbitrary m ∈ {1, . . . , n} and i, j ∈ {1, . . . , p}, we have Especially when i = j, we have E[x 2 mi ] = σ ii . Then, we can obtain Then, the unknown scalars c 1 and c 2 can be unbiasedly estimated by By plugging the estimates of quantities in (27) into (28) and (29), the unbiased estimates of v(T 1 ) and v(T 2 ) are given bŷ By the Equations (16), (64), and (65), the unbiased estimates of the vectors b(T i ), i = 1, 2 are given byb In addition, the constant c in (10) can be further calculated under Gaussian distribution, which is Therefore, we can obtain that the unbiased estimate of c is given bŷ To sum up, by Equations (13), (66) and (68), we can obtain that the unbiased estimates of the loss functions M T i (w|Σ) are given bŷ

Optimal Second-Order Stein-Type Estimators
For the target matrices T i , i = 1, 2, through replacing the objective function in (10) with its unbiased estimate given by (69), we further formulate the second-order Stein-type estimators as the following optimization problems: For i = 1, 2, the Hessian matrix of the objective function in (70) isĤ(T i ). By the following inequality we can obtain thatĤ(T i ) is positive definite. Therefore, the optimization problem (70) is a convex quadratic program. Furthermore, we can obtain the globally optimal solution by an efficient algorithm. For the target matrices T i , i = 1, 2, by denoting the corresponding optimal tuning parameters as w i = (w i 1 , w i 2 ) T , the optimal second-order Stein-type estimators can be expressed asΣ Remark 3. The proposed second-order Stein-type estimators are both well conditioned. Moreover, by taking the spectral decomposition, the SCM can be expressed as where ∆ = diag(δ 1 , . . . , δ p ) is a diagonal matrix consisting of the eigenvalues and U is the corresponding unitary matrix. Then, the second-order Stein-type estimatorΣ 1 has the following spectral decomposition:Σ where∆ = diag(δ 1 , . . . ,δ p ) with where θ = tr(S) p is the mean of sample eigenvalues. Therefore, the proposed estimatorΣ 1 shrinks the sample eigenvalues by a nonlinear transformation whileretaining the sample eigenvectors.

Numerical Simulations and Real Data Analysis
This section presents numerical simulations and two application examples to discover the performance of the proposed second-order Stein-type estimators. The proposed covariance matrix estimators for the target matrices T 1 and T 2 are denoted as QS-T1 and QS-T2, respectively. The control estimators include the Stein-type estimator LS-T1 for T 1 in [11], and the Stein-type estimator LS-T2 for T 2 in [12] and the nonlinear shrinkage estimator NS developed in [18].
Under Model 1, the diagonal elements are equal to 1, and the off-diagonal elements are tiny. Therefore, the covariance matrix is close to a spherical matrix. Under Model 2, the diagonal elements are dispersive, and the off-diagonal elements correspond to weak correlations. Therefore, the covariance matrix is close to a diagonal matrix. We carry out random sampling in each Monte Carlo run and compute the Frobenius loss of each covariance matrix estimator. The MSE performance of each covariance matrix estimator is evaluated by averaging the Frobenius losses of 5 × 10 3 runs. Figures 1 and 2 report the logarithmic Frobenius loss of each estimator in largedimensional scenarios where the dimension is 180 and the sample size varies from 10 to 100. Under Model 1, the Stein-type estimator LS-T1 and the second-order Stein-type estimator QS-T1 outperform the nonlinear shrinkage estimator NS and the shrinkage estimators LS-T2 and QS-T2, which employ the diagonal target matrix. Furthermore, the proposed second-order Stein-type estimator QS-T1 shows a significant advantage over the Stein-type estimator LS-T1, especially when the sample size is tiny. Similarly, under Model 2, the Steintype estimator LS-T2 and the second-order Stein-type estimator QS-T2 perform better than the other three estimators. Moreover, the proposed estimator QS-T2 outperforms the corresponding Stein-type estimator LS-T2. Therefore, when the correct target matrix is employed, the second-order Stein-type estimators enjoy lower Frobenius losses than the linear and nonlinear shrinkage estimators in large-dimensional scenarios.  Figures 3 and 4 report the logarithmic Frobenius loss of each estimator versus the dimension. The sample size is 30. We can find that the logarithmic Frobenius losses become more prominent as the dimension increases. In Figure 3, the actual covariance matrix is close to being spherical. The proposed second-order Stein-type estimator QS-T1 significantly outperforms the other estimators. In Figure 4, the actual covariance matrix is close to being diagonal. The proposed second-order Stein-type estimator QS-T2 enjoys lower Frobenius loss when the dimension exceeds 120.  The proposed second-order Stein-type estimators take significant advantage of the MSE performance over the linear and nonlinear shrinkage estimators, especially when the dimension is large compared to the sample size.

Portfolio Selection
In finance, assets with higher expected returns generally involve higher risks. Therefore, investors must constantly balance the expected return and the risk tolerance. The portfolio strategy is a popular way to reduce risk and enhance return. Therefore, portfolio selection plays a vital role in asset investment.
In 1952, Markowitz introduced the famous mean-variance optimization to determine the optimal portfolio weights [31]. Let m and Σ be the expectation and covariance matrix of the daily returns. For portfolio weight k, the variance of the portfolio is defined as σ 2 = k T Σk in the Markowitz framework. As short selling is forbidden, the Markowitz portfolio optimization is formulated as the following mean-variance problem: where r is a given expected return. By, respectively, replacing m and Σ with their estimateŝ m andΣ, the optimal weight k r can be solved by efficient quadratic programming algorithm. It is obvious that the Markowitz optimization only depends on the estimates of the first and second moments of the daily returns. The sample mean and the SCM in the classic portfolio perform well in the portfolio risk measurement; however, the SCM becomes unstable as the number of stocks is large, resulting in significant property loss [32]. Therefore, a wellperformed covariance matrix estimator is important in current portfolio selection [2,33].
The covariance matrix estimators LS-T1, LS-T2, NS, QS-T1, and QS-T2 are generated from the daily returnX. For an expected return r, the realized risk is defined as σ r = k T rΣ k r . Next, we employ the realized risks, one key index to evaluate the portfolio, to verify the performance of the covariance matrix estimators. Figures 5 and 6 plot the realized risks of three kinds of shrinkage estimators for different investment horizons. For a short-term investment of 44 trading days (2 months), the proposed QS-T2 has the lowest realized risk when the expected return is less than 0.5% and has the highest realized risk when the expected return exceeds 0.6%. The proposed QS-T1 and the Stein-type estimator LS-T1 have the lowest realized risk when the expected return exceeds 0.5%. For a long-term investment of 280 trading days (13 months), the expected return of the five estimators becomes similar for the short-term investment.   The proposed second-order Stein-type estimator QS-T2 enjoys good portfolio selection for short-term investment and prudent return cases. The proposed second-order Stein-type estimator QS-T1 is recommended in long-term investment and high-return scenarios.

Discriminant Analysis
We further discover the performance of the second-order regularized estimators in small-sample-size situations. The Parkinson's data are collected on the website https: //archive-beta.ics.uci.edu/, accessed on 10 November 2022. p = 160 biomedical voice attitudes are measured from n 1 patients and n 2 healthy individuals. LetΣ be the pooling estimator based on a certain estimation strategy. We use the following quadratic discriminant rule M = (x i −x) TΣ−1 (x i −x) to make a diagnosis for each x i . M denotes the Mahalanobis distance between individual x i and the sample centerx. The individual x i is classified as Parkinson's patient if x i is closer to the sample center of patients in the sense of Mahalanobis distance. Table 1 reports the classification accuracy rates of different estimators. We can see that the Stein-type estimators LS-T1, LS-T2, QS-T1, and LS-T2 perform better than NS. Moreover, the accuracy rate of the Stein-type estimators becomes larger as the sample size increases. Moreover, the proposed second-order Stein-type estimator QS-T2 enjoys the largest accuracy rate when n ≤ 40, and the Stein-type estimator LS-T2 has the best performance when n ≥ 45. Therefore, we can see that the proposed second-order Stein-type estimation performs better than the classic Stein-type estimation in this application.

Conclusions and Discussion
This paper investigated the problem of estimating a large-dimensional covariance matrix. Motivated by Stein's strategy, we developed a novel strategy named the second-order Stein-type estimation. The proposed estimator is expected to be positive definite in the form of a quadratic binomial of the SCM and the target matrix. Firstly, we specified the spherical and diagonal targets in the second-order Stein-type regularization. The mean squared errors were, respectively, obtained for the two targets. Secondly, we unbiasedly estimated the two mean squared errors under the Gaussian distribution. Thirdly, the optimal parameters were obtained by solving the convex quadratic programming. The optimal second-order Stein-type estimators were obtained for the two target matrices. Finally, we verified the performance of the proposed estimators in numerical simulations and real data applications.
It is worth mentioning that the second-order Stein-type estimators were proposed under Gaussian distribution. In practical applications, the data may often deviate from the Gaussian distribution. Therefore, the problem of investigating the second-order Stein-type regularization under non-Gaussian distributions remains open and important.