Next Article in Journal
Smoothing Estimation of Parameters in Censored Quantile Linear Regression Model
Previous Article in Journal
DP-4-Colorability on Planar Graphs Excluding 7-Cycles Adjacent to 4- or 5-Cycles
Previous Article in Special Issue
Matrix Factorization and Prediction for High-Dimensional Co-Occurrence Count Data via Shared Parameter Alternating Zero Inflated Gamma Model
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Consistent Estimators of the Population Covariance Matrix and Its Reparameterizations

Institute of Statistical Science, Academia Sinica, Taipei, Taiwan
*
Author to whom correspondence should be addressed.
Mathematics 2025, 13(2), 191; https://doi.org/10.3390/math13020191
Submission received: 12 November 2024 / Revised: 21 December 2024 / Accepted: 31 December 2024 / Published: 8 January 2025
(This article belongs to the Special Issue Statistics for High-Dimensional Data)

Abstract

:
For the high-dimensional covariance estimation problem, when lim n p / n = c ( 0 , 1 ) , the orthogonally equivariant estimator of the population covariance matrix proposed by Tsai and Tsai exhibits certain optimal properties. Under some regularity conditions, the authors showed that their novel estimators of eigenvalues are consistent with the eigenvalues of the population covariance matrix. In this paper, under the multinormal setup, we show that they are consistent estimators of the population covariance matrix under a high-dimensional asymptotic setup. We also show that the novel estimator is the MLE of the population covariance matrix when c ( 0 , 1 ) . The novel estimator is used to establish that the optimal decomposite T T 2 -test has been retained. A high-dimensional statistical hypothesis testing problem is used to carry out statistical inference for high-dimensional principal component analysis-related problems without the sparsity assumption. In the final section, we discuss the situation in which p > n , especially for high-dimensional low-sample size categorical data models in which p > > n .

1. Introduction

The problem of high-dimensional covariance estimation is one of the most interesting topics in statistics (Pourahmadi [1], Zagidullina [2]). Stein ([3,4]) investigated an orthogonally equivariant nonlinear shrinkage estimator for the population covariance matrix. Stein’s estimator has been considered as the gold standard, from which a significant strand of research on the orthogonally equivariant estimation of the covariance matrix was generated (see, e.g., Ledoit and Wolf [5,6,7,8]; Rajaratnam and Vincenzi [9] and the references therein).
Tsai and Tsai [10] focused their attention on rotation-equivariant estimators, showing that Stein’s estimator can be inadmissible when the dimension p is fixed. Under a high-dimensional asymptotic setup—namely, when both the sample size n and dimension p were sufficiently large with a concentration of c = lim n p / n , c ( 0 , 1 ) — they re-examined the asymptotic optimal property of the estimators proposed by Stein [3] and Ledoit and Wolf [6]. Moreover, Tsai and Tsai [10] looked into the mechanism of the Mar c ˇ enko–Pastur equation (Silverstein [11]) to determine an explicit equality relationship of the quantiles of the limiting spectral distributions. They used the obtained equality to propose a new kind of orthogonally equivalent estimator for the population covariance matrix and showed that their novel estimators of the eigenvalues are consistent with the eigenvalues of the population covariance matrix. When p / n c ( 0 , 1 ) , they further showed that their proposed covariance estimator is the best orthogonally equivariant estimator for the population covariance matrix under the normalized Stein loss function. In contrast, both Stein’s estimator and the sample covariance matrix can be inadmissible.
In this context, the question regarding whether the consistent estimator of the population covariance matrix does or does not exist naturally arises. In this paper, we further show that the estimator proposed by Tsai and Tsai [10] is a consistent estimator of the population covariance matrix Σ when p / n c [ 0 , 1 ) . To achieve this, first, under the multinormal setup, we show that the components of spectral decomposition of the sample covariance matrix are the maximum likelihood estimators (MLEs) of the components of the population covariance matrix when the dimension p is fixed and the sample size n is large (i.e., c = 0 ). This process is demonstrated in Section 3. Then, in Section 4, we extend the results of Section 3 to the case in which p / n c ( 0 , 1 ) : namely, to show that the novel estimator is not only consistent but is also the MLE of the population covariance matrix. Based on the proposed covariance estimator, the optimal decomposite T T 2 -test for a high-dimensional statistical hypothesis testing problem is established. This test can also be applied to make statistical inferences for high-dimensional principal component analysis (PCA)-related problems without the sparsity assumption, as shown in Section 5. In the final section, we discuss the situation in which p > n , even when p > > n .

2. Preliminary Notations

Let X 1 , , X n be independent p-dimensional random vectors with a common multivariate normal distribution, N p ( 0 , Σ ) . First, we assume that the dimension p is fixed in Section 2 and Section 3. A basic problem that is considered in the literature is the estimation of the p × p covariance matrix Σ , which is unknown and assumed to be non-singular. It is also assumed that n p , so that the sufficient statistic
A = i = 1 n X i X i
is positive definite with a probability of one. In the literature, the estimators ϕ ( A ) of Σ are the functions of A . The sample space 𝒮 , the parameter space Θ , and the action space A are taken to be the set P p of the p × p symmetric positive definite matrices. The general linear group G l ( p ) acts on the space P p . Note that A has a Wishart distribution, W ( Σ , n ) , and the maximum likelihood estimator (MLE) of Σ is expressed as follows:
Σ ^ M L = S , where S = n 1 A ,
which is unbiased (Anderson [12]).
We consider the invariant loss function L, i.e., L satisfies the condition that L ( g ϕ ( A ) g , g Σ g ) = L ( ϕ ( A ) , Σ ) for all g G l ( p ) . An estimator Σ ^ is defined as G l ( p ) -equivariant if Σ ^ ( G A G ) = G Σ ^ ( A ) G , G G l ( p ) , A P p . Suppose that G acts on P p , whereby the orbit through x P p is the set G x = { g x | g G } P p . This action is called transitive if Θ is one orbit, which is defined by the condition that if x , y Θ , there is some g G in which g x = y . It may then be easy to note the fact that if L is G l ( p ) -invariant, Σ ^ is G l ( p ) -equivariant, and G acts transitively on P p ; then, the risk function is constant on P p : R ( Σ ^ , Σ ) = R ( Σ ^ , I ) , Σ P p .
One of the most interesting loss functions was introduced by Stein [13]:
L ( ϕ ( S ) , Σ ) = tr Σ 1 ϕ ( S ) log det Σ 1 ϕ ( S ) p ,
where tr and det denote the trace and determinant of a matrix, respectively. Because G l ( p ) acts transitively on the space P p , the best G l ( p ) -equivariant estimator exists. It can be easily determined that the MLE S of Σ is the best G l ( p ) -equivariant estimator. The minimum risk is
R m ( Σ ^ M L , Σ ) = i = 1 p { log n E [ log χ n i + 1 2 ] } ,
where E [ X ] denotes the expectation of the random variable of X.

3. Optimal Estimators of Σ When It Is Reparameterized

As the general linear group G l ( p ) is not an amenable group, to study the minimax problem, James and Stein [14] reparameterized the parameter Σ Θ Θ , Θ G T + , where G T + denotes the group of p × p lower triangular matrices with positive diagonal elements; the loss function is also invariant under G T + . Using the Cholesky decomposition, we may express that A = T T , where T G T + . As G T + acts transitively on the space P p , the best G T + -equivariant estimator was proposed by James and Stein [14] as Σ ^ S = T D S 1 T , where D S is a positive diagonal matrix with the elements d S i i = n + p 2 i + 1 , i = 1 , , p . The minimum risk for the best G T + -equivariant estimator Σ ^ S is
R m ( Σ ^ S , Σ ) = i = 1 p { log ( n + p 2 i + 1 ) E [ log χ n i + 1 2 ] } .
This is because G T + is the solvable group and, hence, it is amenable. Thus, Stein’s estimator Σ ^ S is the minimax.

3.1. The Stein Phenomenon

It is easy to see that R ( Σ ^ S , Σ ) R ( Σ ^ M L , Σ ) ; thus, the MLE S is inadmissible, and the estimator Σ ^ S should be used instead of S . This is the well-known Stein phenomenon in the covariance estimation problem (for details, see Anderson [12]).
In order to determine why the Stein phenomenon occurs, which is when the MLE S of Σ is inadmissible, we began to think about the deeper meaning behind the Stein phenomenon. Tsai [15] extended Stein’s method to establish another minimax estimator. We explain this method briefly in the following. Let Σ ( k ) and A ( k ) be partitioned as
Σ ( k ) = σ ( k ) 11 Σ ( k ) 12 Σ ( k ) 21 Σ ( k ) 22 and A ( k ) = a ( k ) 11 A ( k ) 12 A ( k ) 21 A ( k ) 22 ,
and for all k = 1 , , p with Σ ( 1 ) = Σ and A ( 1 ) = A . Define
Σ ( k + 1 ) = Σ ( k ) 22 : 1 = Σ ( k ) 22 Σ ( k ) 21 Σ ( k ) 12 / σ ( k ) 11
and
A ( k + 1 ) = A ( k ) 22 : 1 = A ( k ) 22 A ( k ) 21 A ( k ) 12 / a ( k ) 11 .
Note that the dimension of Σ ( k + 1 ) is one less than that of Σ ( k ) , which denotes a process of successive diagonalization. Let
g ( k ) = 1 0 Σ ( k ) 21 σ ( k ) 11 1 I and h ( k ) = 1 0 A ( k ) 21 a ( k ) 11 1 I , k = 1 , , p .
We then have the following:
Σ ˜ ( k ) = g ( k ) Σ ( k ) g ( k ) = σ ( k ) 11 0 0 Σ ( k ) 22 : 1 ,
and
A ˜ ( k ) = h k A ( k ) h ( k ) = a ( k ) 11 0 0 A ( k ) 22 : 1 , k = 1 , , p .
Let
Σ * = Diag ( σ ( 1 ) 11 , , σ ( p ) 11 ) and A * = Diag ( a ( 1 ) 11 , , a ( p ) 11 ) .
Consequently, Σ and A are individually transformed into the diagonal matrices Σ * and A * so that the one-to-one correspondences of Σ Σ * and A A * are established to allow ϕ ( A ) = D A * for the Stein loss function and D D ( p ) to be the group of positive diagonal matrices. Based on the properties of the Wishart distribution (see Theorem 4.3.4, Theorem 7.3.4, and Theorem 7.3.6 of Anderson [12]), it is easy to note that a ( i ) 11 / σ ( i ) 11 , i = 1 , , p , which are independent χ 2 random variables with n i + 1 degrees of freedom. Let D 0 be the diagonal matrix with elements d 0 i i = n i + 1 , i = 1 , , p . We may then conclude that A * is Wishart distribution with a mean matrix D 0 Σ * . Furthermore, it should be noted that all the p Jacobins of the transformation of A A * are one, and the Wishart density of A is equivalent to the Wishart density of A * . Thus, the Stein loss function is
L ( ϕ ( A * ) , Σ * ) = tr Σ * 1 D A * log det Σ * 1 D A * p .
As A * also acts transitively on the space P p , the best D ( p ) -equivariant estimator can be expressed in the following form:
Σ ^ * = D 0 1 A * .
Thus, the minimum risk for the estimator Σ ^ I * is
R m ( Σ ^ * , Σ * ) = i = 1 p { log ( n i + 1 ) E [ log χ n i + 1 2 ] } .
As the group D ( p ) is also solvable, we may conclude that Σ ^ * is a minimax. Based on (5) and (15), it is easy to see that R m ( Σ ^ * , Σ * ) R m ( Σ ^ S , Σ ) ; hence, similarly to the Stein phenomenon, we may conclude that Stein’s estimator Σ ^ S is inadmissible, while the estimator Σ ^ * is admissible.

3.2. The Optimal Properties of the MLE

We may note that the MLE S of Σ is the best G l ( p ) -equivariant estimator. James and Stein [14] used the Cholesky decomposition to parameterize the parameter Σ to obtain the Stein estimator Σ ^ S , which is the best G T + -equivariant estimator. Tsai [15] used the full Iwasawa decomposition to obtain the best D ( p ) -equivariant estimator Σ ^ * . It is important to note that the inequality R m ( Σ ^ * , Σ * ) R m ( Σ ^ S , Σ ) R m ( Σ ^ M L , Σ ) holds. As D ( p ) G T + G l ( p ) , we can easily see that the above inequality holds. The minimum risk of the estimator is larger for the larger group and smaller for the smaller group.
Tsai [15] showed that the minimum risks of the MLEs under the Cholesky decomposition and the full Iwasawa decomposition are the same when the geodesic distance loss function on a non-Euclidean space P p is adopted. Comparing the minimum risks of estimators for different groups does not make much statistical sense. The comparison of different estimators may make sense when they are compared under the same parameterized decomposition. For the spectral decomposition and when the dimension p is fixed, Tsai and Tsai [10] claimed that the sample covariance matrix S is the best orthogonally equivariant estimator of spectral decomposition under the Stein loss function, Stein ([3,4]) another orthogonally equivariant estimator can be inadmissible under spectral decomposition. These results are different from the Stein phenomenon, in which S is inadmissible. Hence, we cannot help but suspect that the Stein phenomenon is due to the parameterized decompositions and does not hold significant statistical meaning. We hope that this paper may impact those statisticians who have been constantly warned not to use an MLE for the covariance matrix ever since the Stein phenomenon occurred, making them reconsider the employment of an MLE for the covariance matrix.
When the dimension p is fixed, each of the three estimators possess their optimal properties for their respective parameterized decompositions. All three estimators— S , Σ ^ S , and Σ ^ * —are the best G l ( p ) -equivariant, G T + -equivariant, and D ( p ) -equivariant estimators, respectively. The sample covariance matrix S is not only the best G l ( p ) -equivariant estimator but also the best O ( p ) -equivariant estimator. They are the MLEs for G l ( p ) , G T + , and D ( p ) decompositions, respectively. The optimal property of the MLE is essentially not affected.
Note that the Stein loss function is essentially equivalent to the entropy loss function under the multinormal setup. When the dimension p is fixed and the sample size n , the literature agrees that S and Σ ^ * will converge to Σ and Σ * , respectively, almost surely ( a . s . ) when n (Anderson [12]). We extend Tsai’s [15] approach to the case of spectral decomposition conditions, i.e., to determine whether the sample covariance matrix S is the best orthogonally ( O ( p ) ) -equivariant estimator of the population covariance matrix Σ when the dimension p is fixed. In other words, we investigate whether S is the MLE of Σ under spectral decomposition conditions, so that the sample components converge to the corresponding population components a . s . as n .

3.3. The Best Orthogonally Equivariant Estimator

For the application to the statistical inference of principal component analysis, we need the notation of the so-called spectral decomposition of the population covariance matrix, which can be understood as another type of reparametrization of Σ . Stein ([3,4]) considered the orthogonally equivariant estimator for the population covariance matrix, which has been considered the gold standard. Consider the spectral decomposition of the population covariance matrix: namely, Σ = V Γ V , where Γ is a diagonal matrix with the eigenvalues γ i , p , i = 1 , , p , and V = ( v 1 , , v p ) is the corresponding orthogonal matrix with v i being the eigenvector that is associated to the ith largest eigenvalue γ i , p , v i 1 0 , i = 1 , , p . Similar dynamics apply to the sample spectral decomposition, i.e., S = U L U , where L is a diagonal matrix with the eigenvalues l i , p , and U = ( u 1 , , u p ) is the corresponding orthogonal matrix, with u i being the eigenvector corresponding to l i , p , u i 1 0 , i = 1 , , p . Write L = diag ( l 1 , p , , l p , p ) and Γ = diag ( γ 1 , p , , γ p , p ) . Note that the matrices U and L are the consistent estimators of V and Γ , respectively, when the dimension p is fixed and the sample size n is large (for details, see Anderson [12]). Hence, we may conclude that there are two situations in which the dimension p is fixed: (i) When Σ is not reparameterized, the sample covariance matrix S is unbiased, and hence, it is consistent. (ii) When Σ is reparameterized via spectral decomposition, the components U and L are the consistent estimators of V and Γ , respectively. Then, the sample covariance matrix S is still consistent.
 Remark 1. 
We aim to study the consistency property with the help of the optimal properties of MLEs. The main reason is that based on the general theory of estimation, the maximum likelihood estimator is consistent; that is, it tends toward the true value with a probability of one as the sample size increases under certain regularity conditions, which are satisfied by the non-degenerated Wishart distribution.
We may note that when Σ is not reparameterized, it is easy to see that the sample covariance matrix S is the MLE of Σ . When the spectral decomposition for Σ is adopted, it is expected that the sample components U and L are the MLEs of the corresponding population components V and Γ , respectively.
First, when the dimension p is fixed, n S is Wishart-distributed when n > p . Under the spectral decompositions for Σ and S , we find the MLEs of V and Γ in the following. Note that V , U O ( p ) constitutes the set of orthogonal matrices. Let H = V U ; then, H O ( p ) . Assume that n p + 1 , and then, the 2 n log-likelihood function of S is
l ( S | Σ ) = t r Σ 1 S log det Σ 1 S 2 n log c n ( S ) = t r V Γ 1 V U L U log det V Γ 1 V U L U 2 n log c n ( L ) = t r Γ 1 H L H log det Γ 1 L 2 n log c n ( L ) ,
where c n ( S ) = n ( n p 1 ) / 2 | S | ( p + 1 ) / 2 2 n p / 2 π p ( p 1 ) / 4 i = 1 p Γ [ 1 2 ( n i + 1 ) ] = c n ( L ) , which is independent of Σ (i.e., V , Γ ). Equation (16) is essentially equivalent to the Stein loss function.
 Theorem 1 
(von Neumann [16]). For H orthogonal and D γ and D l diagonal ( γ 1 γ p > 0 , l 1 > > l p > 0 ),
m i n H O ( p ) t r D γ 1 H D l H = t r D γ 1 D l ,
and a minimizing value of H is H ^ = I . For the detailed proofs, see Theorem A.4.7 and Lemma A.4.6 by Anderson [12].
Based on the result of the von Neumann Theorem, we can determinate that the MLE of V is V ^ = U and, hence,
min V O ( p ) tr V Γ 1 V U L U = tr Γ 1 L .
Thus, we may state that
min V O ( p ) l ( S | Σ ) = tr Γ 1 L log det Γ 1 L 2 n log c n ( L ) .
After some calculations, the function min V O ( p ) l ( S | Σ ) in (19) is further minimized with respect to Γ at Γ ^ = L . As such, when p is fixed, we have that U is the MLE of V , and l i , p is the MLE of γ i , p , i = 1 , , p . Thus, when p is fixed and the sample size n is large, according to the property of the MLE, we have U V and L Γ almost surely (a.s.); therefore, U and L are the consistent estimators of V and Γ , respectively. Hence, S = U L U V Γ V = Σ a.s. Therefore, in terms of the spectral decompositions, when the dimension p is fixed, the sample covariance matrix S is the consistent estimator of the population covariance matrix Σ . Based on the above arguments, when the dimension p is fixed, we may note that the MLEs play an important role in achieving optimal conditions whether the dimension is reparameterized or not. When it is not reparameterized, the MLE S of Σ is unbiased and consistent, while when it is reparameterized, the MLEs of the component parameters for spectral decomposition are consistent.
However, the situation may be different when the dimension p is large, so that c ( 0 , 1 ) , because the sample covariance matrix S is no longer the MLE of the population covariance matrix Σ . Hence, the question naturally arises as to whether the consistent estimator of Σ exists or not under the large dimensional asymptotic setup. Under the spectral decomposition, Tsai and Tsai [10] proved the consistency of their proposed estimators of population eigenvalues with the help of random matrix theory. Some notations of this are presented below.

4. High-Dimensional Case

For a large ( n , p ) setup, a large dimensional asymptotic framework is set up when ( n , p ) , so that c = lim n p / n is fixed, with 0 c < 1 . In this section, we extend the class of orthogonally equivariant estimators to the realm of large dimensional asymptotics with a concentration of c ( 0 , 1 ) .

4.1. The Mar c ˇ enko–Pastur Equation

In accordance with Ledoit and P e ´ ch e ´ [17], we make the following assumptions:
A1. Note that x i = Σ 1 / 2 z i , i = 1 , , n , where z i values are independent and identically distributed with a mean of 0 and covariance matrix I . Assume that the 12th absolute central moment of each variable z i j is bounded by a constant.
A2. The population covariance matrix Σ is nonrandom and positive definite. lim   inf p γ p , p > 0 , and lim   sup p γ 1 , p < .
A3. For a large ( n , p ) setup, the large dimensional asymptotic framework is established when ( n , p ) , so that c = p / n is fixed as 0 c < 1 in this paper.
A4. Let 0 < γ p , p < < γ 1 , p . The empirical spectral distribution of Σ , defined as H n ( γ ) = 1 p i = 1 p 1 [ γ i , p , ) ( γ ) , converges as p to a probability distribution function H ( γ ) at every point of continuity of H. The support of H, Supp ( H ) , is included in a compact set, [ h 1 , h 2 ] , with 0 < h 1 h 2 < .
Let F n ( λ ) = 1 p i = 1 p 1 [ l i , p , ) ( λ ) be the sample spectral distribution and F be its limiting factor. It is proved that F n converges to F a.s. as n (Mar c ˇ enko–Pastur [18]).
The Stieltjes transformation of distribution function F is defined as follows:
m F ( z ) = 1 l z d F ( l ) , z C + ,
where C + is the half-plane of complex numbers with a strictly positive imaginary part. Let
m F n ( z ) = p 1 tr [ ( S z I ) 1 ] .
Then, based on the results of random matrix theory, F n ( z ) converges to F ( z ) if and only if m F n ( z ) converges to m F ( z ) . Subsequently, the well-known Mar c ˇ enko–Pastur equation (Silverstein [19]) can be expressed in the following form:
m F ( z ) = 1 γ [ 1 c c z m F ( z ) ] z d H ( γ ) , z C + ,
where H denotes the limiting behavior of the population spectral distribution. Based on the Mar c ˇ enko–Pastur equation, meaningful information regarding the population spectral distribution can be retrieved under the large dimensional asymptotic framework. Choi and Silverstein [20] further showed that
lim z C + ł m F ( z ) = m ˇ F ( l )
exists for any l R / { 0 } .
Using the Sokhotski–Plemelj formula, the term m ˇ F ( l ) can be separated into the real part, which becomes a principal value integral (the so-called Hilbert transform), while the imaginary part becomes π times the limiting sample spectral density function f ( l ) . Namely,
m ˇ F ( l ) = Re [ m ˇ F ( l ) ] + i π f ( l ) ,
where the Hilbert transform denotes
Re [ m ˇ F ( x ) ] = P r d F ( t ) t x .
In some special cases, m ˇ F ( x ) can be expressed explicitly. For example, let λ + = ( 1 + c ) 2 and λ = ( 1 c ) 2 . When Σ = I , the Mar c ˇ enko–Pastur density function is of the following form:
f M P ( x ) = ( x λ ) ( λ + x ) 2 π c x , x ( λ , λ + ) .
Using the resolvent method, we then have that
m ˇ F ( x ) = 1 c x + i ( x λ ) ( λ + x ) 2 c x ,
where the real part is the Cauchy principal value, i.e.,
Re [ m ˇ F ( x ) ] = P r f M P ( t ) d t t x = 1 c x 2 c x .
Generally, Σ is unknown, and the form of Re [ m ˇ F ( x ) ] will not be explicit.
Stein [3] used the naive empirical counterpart m ˇ F n ( l i , p ) ( = 1 p j i 1 l j , p l i , p ) to estimate the Hilbert transformation, Re [ m ˇ F ( l i ) ] , where l i denotes the ( 1 α ) quantile of the limiting sample spectral distribution F such that [ p ( 1 α ) ] = i , i = 1 , , p with [ x ] denoting the largest integer of x. As F n ( z ) converges to F ( z ) a.s., meaning that m F n ( z ) converges to m F ( z ) a.s., Stein concluded that l i , p converges to l i a . s . , w i t h i = 1 , , p . Then, the empirical counterpart m ˇ F n ( l i , p ) is a consistent estimator of the Hilbert transform.

4.2. The Consistent Estimators of Population Eigenvalues

The Mar c ˇ enko–Pastur equation in (22) shows the implicit relationship between F and H. Tsai and Tsai [10] further established the following explicit equality relationship:
γ i = l i 1 c c l i Re [ m ˇ F ( l i ) ] , i = 1 , , p ,
where γ i and l i denote the ( 1 α ) quantiles of the limiting population and sample spectral distributions H and F, respectively, such that [ p ( 1 α ) ] = i , i = 1 , , p , with [ x ] denoting the largest integer of x. Let Supp ( F ) be the support of F. Using Theorem 2 of Choi and Silverstein [19], Ledoit and P e ´ ch e ´ [17] pointed out that if l i Supp ( F ) , l i / 1 c c l i Re [ m ˇ F ( l i ) ] Supp ( H ) for l i R / { 0 } , i = 1 , , p .
Based on γ i = ψ i ( L ) , i = 1 , , p and the results of (29) and the empirical counterpart of Re [ m ˇ F ( l i ) ] , Tsai and Tsai [10] proposed a new kind of orthogonally equivariant estimator, Σ ^ T of Σ , which is of the following form:
Σ ^ T = U Ψ ^ ( L ) U , where Ψ ^ ( L ) = diag ( ψ ^ 1 ( L ) , , ψ ^ p ( L ) ) with ψ ^ i ( L ) = n l i , p n p + 1 p l i , p m ˇ F n ( l i , p ) = n l i , p ( n p + 1 l i , p j i 1 l j , p l i , p ) 1 , i = 1 , , p .
When the dimension p is fixed and n is large (i.e., c = 0 ), as discussed in Section 3, we have that γ i , p γ i and γ i = l i , i = 1 , , p . However, when p / n c ( 0 , 1 ) , γ i , p γ i , and γ i is no longer l i . In contrast, by means of Equation (29), it should be of the following form: l i 1 c c l i [ Re m ˇ F ( l i ) ] , i = 1 , , p . Note that based on assumption A4, H n converges to H when c ( 0 , 1 ) , and thus, γ i , p converges to γ i , i = 1 , , p . In other words, Γ converges to Ψ ( L ) as defined in Proposition 1. Hence, to estimate γ i , p is the same as to estimate γ i under the large dimensional asymptotic setup, i = 1 , , p . Under certain regularity conditions, Tsai and Tsai [10] claimed that their proposed estimators of the population eigenvalues are consistent. We summarize the results in the following.
 Proposition 1. 
Let Ψ ( L ) = d i a g ( γ 1 , , γ p ) and Ψ ^ ( L ) = d i a g ( ψ ^ 1 ( L ) , , ψ ^ p ( L ) ) be defined in (29) and (30), respectively. Under the assumptions of Theorem 2 of Tsai and Tsai [10], Ψ ^ ( L ) is the then consistent estimator of Ψ ( L ) , namely, Ψ ^ ( L ) is the consistent estimator of Γ, when p / n c ( 0 , 1 ) .
 Remark 2. 
In the literature, there are methods for imposing additional structure, such as sparse methods [20], the factor model [21], or a graph model [9], on the covariance matrix estimation. Orthogonally equivariant estimators are widely adopted. Using the Mar c ˇ enko–Pastur equation in (22), meaningful information on the population spectral distribution can be retrieved under the large dimensional asymptotic framework. However, the relationship is tangled. Under certain regularity conditions, Tsai and Tsai [10] established an explicit relationship between the quantiles of the limiting sample spectral distribution F and the limiting population spectral distribution H in (29). Then, the consistent estimators of the population eigenvalues can easily be established. This result makes up for the deficiency of estimators in the literature, such as Stein’s [4] and Ledoit and Wolf’s [5] estimators, which are inconsistent estimators of population eigenvalues. Tsai and Tsai [10] proposed a new kind estimator, Σ ^ T , for the population covariance matrix  Σ  and showed that the proposed estimator is the best orthogonally ( O ( p ) ) -equivariant estimator of the population covariance matrix Σ under the normalized Stein loss function when the c ( 0 , 1 ) . Random matrix theory provided essential support for these results. However, it remains undetermined whether the proposed estimator is consistent for  Σ, namely, whether the sample component estimators are consistent for the corresponding population components. To investigate this, we adopt the MLE approach; thus, in this paper, we assume that X i , i = 1 , , n are independent and identically multinormally distributed.

4.3. The Consistent Estimator of the Population Covariance Matrix

When the dimension p is large, the sample covariance matrix S is no longer the MLE of Σ . It is difficult to directly determined the functional form of S such that it is the MLE of Σ ; therefore, we may follow the detour of the reparametrization of Σ via spectral decomposition. Following this, the main goal is to see whether the orthonormal matrix U is the MLE of V or not. The result of E U = V implies that the limiting distribution of U on O ( p ) is entirely concentrated at V , and unbiasedness is not a useful optimal property, the role of which would be replaced by the property of equivariance. Ledoit and P e ´ ch e ´ [17] pointed out that the projection of the sample eigenvector onto the population eigenvector, p | u i v j | 2 , will wipe out the non-rotation-equivariant behavior and the average of the quantities of p | u i v j | 2 over the sample eigenvectors that are associated with the sample eigenvalues, demonstrating how the eigenvectors of the sample covariance matrix deviate from those of the population covariance matrix under large dimensional asymptotics. This is one of the main reasons why we prefer to restrict it to the class of rotation-equivariant estimators. Tsai and Tsai [10] established the best orthogonally equivariant estimator Σ ^ T for Σ . We continue to study whether the proposed estimator Σ ^ T is the consistent estimator of the population covariance matrix Σ or not when p / n c ( 0 , 1 ) . Based on Proposition 1, it only needs to be determined whether U is the consistent estimator of V .
The orthogonal matrix U may not generally be a consistent estimator of V when the dimension p is large (see Bai et al. [22] and references therein). Hence, we may process it under the restricted model, namely, under the Wishart distribution setup, when p / n c ( 0 , 1 ) .
When Σ is reparameterized via the spectral decomposition, we aim to study the consistency property of the component parameters when the dimension p is large. Under the multivariate normal setup, when the dimension p is fixed, n S is Wishart-distributed with the mean matrix Σ . However, when lim n p / n = c ( 0 , 1 ) , n S is not Wishart-distributed with the mean matrix Σ , and nor is S the MLE of Σ . Instead, we may notice that n Σ ^ T is Wishart-type-distributed with the mean matrix Σ when p / n c ( 0 , 1 ) . It is easy to note that the 2 n log-likelihood function l ( Σ ^ T | Σ ) of Σ ^ T is similar to (16) with Σ ^ T in (30) replacing S in (16) (i.e., with Ψ ^ ( L ) replacing L ), which still satisfies the regularity conditions, and it does not degenerate. Based on l ( Σ ^ T | Σ ) , our goal is to show that Σ ^ T is the MLE of Σ when lim n p / n = c ( 0 , 1 ) .
First, we want to show that U is the MLE of V when p / n c ( 0 , 1 ) , namely, to extend the von Neumann Theorem to the case in which p / n c ( 0 , 1 ) . Note that H = V U O ( p ) . Thus, m i n V O ( p ) l ( Σ ^ T | Σ ) is equal to m i n H O ( p ) l ( Σ ^ T | Σ ) . Note that H H = I implies that d H H + H d H = 0 . Moreover,
d tr Γ 1 H Ψ ^ ( L ) H = tr [ Γ 1 d H Ψ ^ ( L ) H + Γ 1 H Ψ ^ ( L ) d H ] = tr [ Γ 1 d H Ψ ^ ( L ) H Γ 1 H Ψ ^ ( L ) H d H H ] .
Then, the derivative becomes d tr Γ 1 H Ψ ^ ( L ) H d H = Γ 1 Ψ ^ ( L ) H Γ 1 H Ψ ^ ( L ) H H . Thus, d tr Γ 1 H Ψ ^ ( L ) H d H = 0 implies that H Ψ ^ ( L ) H = Ψ ^ ( L ) . Similarly to the above arguments, we can also show that H Γ 1 H = Γ 1 . Hence, we may have that min H O ( p ) t r Γ 1 H Ψ ^ ( L ) H = t r Γ 1 Ψ ^ ( L ) ; namely, the minimum of t r Γ 1 H Ψ ^ ( L ) H , with respect to H O ( p ) , occurs at H ^ = I ( i . e . , V ^ = U ) . Thus, the von Neumann Theorem still holds for the case in which p / n c ( 0 , 1 ) . As such, in terms of the spectral decompositions, we may have that U is also the MLE of V when p / n c ( 0 , 1 ) . Hence, based on the property of the MLE, we may summarize it as follows:
 Theorem 2. 
Let X 1 , , X n be independent p-dimensional random vectors with a common multivariate normal distribution, N p ( 0 , Σ ) . Consider the spectral decompositions Σ = V Γ V and S = U L U and let Σ ^ T be defined as in (30). Under the assumptions of Proposition 1, when lim n p / n = c ( 0 , 1 ) , U is the MLE of V . Hence, it is the consistent estimator of V .
Based on Proposition 1 and Theorem 2, we may then conclude that the proposed novel estimator Σ ^ T is consistent for the population covariance matrix Σ when the dimension p is large. Next, we continue to investigate whether the proposed estimator Σ ^ T is the MLE of Σ or not. Note that
min V O ( p ) l ( Σ ^ T | Σ ) = tr Γ 1 Ψ ^ ( L ) log det Γ 1 Ψ ^ ( L ) 2 n log c n ( Ψ ^ ( L ) ) .
After a number of calculations, the function min V O ( p ) l ( Σ ^ T | Σ ) in (32) is further minimized with respect to Γ at Γ ^ = Ψ ^ ( L ) . As such, when p / n c ( 0 , 1 ) , we may conclude that Ψ ^ ( L ) is the MLE of Γ . Thus, based on Theorem 2, we have that Σ ^ T is the MLE of Σ ; however, the sample covariance matrix S is not. According to the property of the MLE, we have that U , Ψ ^ ( L ) , and Σ ^ T are the consistent estimators of V , Γ , and Σ , respectively. Therefore, the following can be stated:
 Theorem 3. 
Under the assumptions of Theorem 2, when p / n c ( 0 , 1 ) , Σ ^ T is the MLE of Σ. Hence, it is consistent.
 Remark 3. 
We may make the following three conclusions: (i) The sample covariance matrix S is the MLE of the population covariance matrix  Σ  when the dimension p is fixed. (ii) The estimator Σ ^ T is the MLE of  Σ  when the dimension p is large so that lim n p / n = c ( 0 , 1 ) . (iii) It is easy to see that Σ ^ T reduces to the sample covariance matrix S when the dimension p is fixed and the sample size n is large (i.e., c = 0 ). These are insightful parallels. Hence, for simplicity, we may integrate the above results into a unified one: when p is fixed or lim n p / n = c ( 0 , 1 ) (i.e., c [ 0 , 1 ) ) , n Σ ^ T is Wishart-distributed with a mean matrix  Σ, and Σ ^ T is the MLE of  Σ. Thus, Σ ^ T is the consistent estimator of  Σ, and hence, Σ ^ T converges to Σ a . s . as n . Therefore, we may use Σ ^ T to replace S when making statistical inferences when the dimension p is fixed or c ( 0 , 1 ) .
 Remark 4. 
Tsai and Tsai [10] used the fundamental statistical concept to determine the quantile equality relationship of the limiting sample and population spectral distributions so that the consistent problems between the sample eigenvalues and the population eigenvalues can be easily handled. Then, we can use the likelihood function to progress. As long as the density function does not degenerate, the statistical inference can be performed similarly to the traditional one. The key point based on the conclusion is to find a consistent estimator of the population covariance matrix, namely, to directly determine the MLE of  Σ  when n > p .
When the dimension p is fixed, we have that l i , p is the MLE of γ i , p , i = 1 , , p . However, it is not true that l i , p is the MLE of γ i , p , i = 1 , , p when p / n c ( 0 , 1 ) .
Johnstone and Paul [23] provided a detailed discussion of sample eigenvalue bias and eigenvector inconsistency under the spiked covariance model and for high-dimension PCA-related phenomena.
 Remark 5. 
We may note that n Σ ^ T is Wishart-distributed with the mean matrix  Σ, and Σ ^ T reduces to the sample covariance matrix S when the dimension p is fixed. With the proposed novel estimator, Σ ^ T , replacing the sample covariance matrix, S , the case for the traditional fixed dimension p and that for the new high-dimensional dimensions can be integrated into one. Hence, we may suggest using the proposed consistent estimator Σ ^ T to replace the sample covariance matrix S in order to carry out the multivariate statistical inference, including PCA-related problems for both cases in which (i) p is fixed and n is large (i.e., c = 0 ) and ones in which (ii) p / n c ( 0 , 1 ) .
We provide an outline for the likelihood ratio test (LRT) of the hypothesis testing problem in the next section.

5. The Decomposite T T 2 -Test When the Dimension p Is Large

Let X i , i = 1 , , n , be a n i . i . d random vector with a p-dimensional multinormal distribution, a mean vector μ , and an unknown positive definite covariance matrix Σ . Consider the hypothesis testing problem of
H 0 : μ = 0 versus H 1 : μ 0
when both the dimension p and sample size n are large. Let
X ¯ = 1 n i = 1 n X i and S = 1 n 1 i = 1 n ( X i X ¯ ) ( X i X ¯ ) .
Then, the well-known Hotelling’s T 2 -test statistic, which can be found in the literature, is denoted as follows:
T 2 = n X ¯ S 1 X ¯ .
When the dimension p is fixed, Hotelling’s T 2 -test is optimal for the problem (33).
However, when the dimension p is large, the performance of Hotelling’s T 2 -test is not optimal, because the sample covariance matrix S is no longer a consistent estimator of Σ . To overcome this difficulty, we can adopt the novel estimator Σ ^ T to replace the sample covariance matrix S and then consider the following decomposite T T 2 -test statistic:
T T 2 = n X ¯ Σ ^ T 1 X ¯ ,
where Σ ^ T is defined in (30), with the S that is defined in (34) replacing the one defined in (2). It is evident that the T T 2 -test is the LRT statistic for the problem (33).
To avoid the issue that the power of any reasonable test moves toward one as n , Le Cam’s contiguity concept was adopted to study the asymptotically local distribution when the dimension p was fixed. Note that the traditional local alternatives do not depend on the dimension p. Based on a large dimension p, in her Ph.D. thesis, Chia-Hsuan incorporated the dimension p into the consideration to study an asymptotical distribution under local alternatives.
H 0 : μ = 0 versus H 1 n : μ = n 1 / 2 p 1 / 4 δ ,
where δ is a fixed p-dimensional vector, which leads to the assumption that δ Σ 1 δ < when p is large. Compared with the traditional one, the local alternatives also depend on the dimension p with a slight change in the converge rate. Let
T 0 2 = n X ¯ Σ 1 X ¯ .
Similarly to the arguments presented in Chia-Hsuan’s thesis, we can show that T T 2 does not converge to T 0 2 in terms of probability; however, it is true that T T 2 converges to T 0 2 in its local distribution. Note that in the traditional case of the fixed dimension p, the proposed decomposite T T 2 -test is reduced to Hotelling’s T 2 -test, and Hotelling’s T 2 -test statistic converges to T 0 2 in terms of probability, which implies a convergence in distribution. It is not difficult to ascertain that the T T 2 -test statistic reduces asymptotically and locally (under H 1 n with a rate of n 1 / 2 p 1 / 4 ) to a non-central chi-square χ p 2 ( δ Σ 1 δ ) distribution. This asymptotically local power function is still the monotone function of non-centrality δ Σ 1 δ . Hence, when p / n c [ 0 , 1 ) , it is easy to see that the proposed decomposite T T 2 -test is optimal for the problem (33).
 Remark 6. 
The high-dimensional PCA problem has mainly been studied in spiked covariance models, and there is a need to assume sparsity in the population eigenvectors for the consistent problem (Johnstone and Lu [24] and the references therein). In contrast, with the proposed novel estimator Σ ^ T replacing the sample covariance matrix S for statistical inference, when p / n c [ 0 , 1 ) , the results of Theorem 3 can be applied to make multivariate statistical inferences and solve PCA-related problems without the sparsity assumption. When c [ 0 , 1 ) , our approach unifies the traditional ( c = 0 ) and modern high-dimensional ( c ( 0 , 1 ) ) cases for multivariate statistical methods and high-dimensional PCA-related problems. The proposed novel estimator is incorporated to establish the optimal decomposite T T 2 -test for a high-dimensional statistical hypothesis testing problem and can be directly applied to high-dimensional PCA-related problems without the sparsity assumption.
In the final section, we discuss the case in which p > n , especially for high-dimensional low-sample size categorical data models in which p > > n .

6. General Remarks When p > n

6.1. When p > n , Both n and p Are Fixed

Under the multinormal setup, we may note that when p > n , p and n are fixed. Then, the density function of S becomes a singular Wishart distribution (Uhlig [25]), which it degenerates. In this situation, assume that rank ( S ) = n ; we then may use the following notations: L is a n × n diagonal matrix, and the reparametrization S = U 1 L U 1 , where U 1 V n , p , and the n p n ( n + 1 ) / 2 -dimensional Stiefel manifold of the p × n matrix U 1 with orthonormal columns U 1 T U 1 = I . Note that V O ( p ) ; thus, t r Σ 1 S = t r V Γ 1 V U 1 L U 1 = t r Γ 1 H 1 L H 1 , where H 1 = V U 1 V n , p . Note that H 1 O ( p ) , which means that the von Neumann Theorem may fail to be true in general.

6.2. When p > n , Both n and p Are Large, So That c ( 1 , )

When p > n , since the population covariance matrix Σ is assumed to be a positive definite symmetric matrix, its p eigenvalues are all positive; however, rank( S ) = n . Thus, it has p n sample eigenvalues with 0 in probability. As such, it seems difficult to obtain all the consistent eigenvalue estimators of the population eigenvalues. When the sample size n and the dimension p are both large, we may only need the n largest eigenvalue estimators to be consistent for the first n largest population eigenvalues. If this is the case, the method developed in this note is still applicable.

6.3. Use in High-Dimensional Low-Sample-Size (HDLSS) Categorical Data Models

When p > > n (i.e., in HDLSS categorical models), our method might still be used in some situations. HDLSS categorical models are abundant in genomics and bioinformatics, with relatively smaller sample sizes n, but also often p > > n . Motivated by the 2002 severe acute respiratory syndrome coronavirus (SARSCoV) epidemic model, a general model of comparing G ( 2 ) groups is considered. Each sequence has P positions, each one relating to a categorical response, indexed as 1 , , C , and there are n g sequences in the gth group, with g = 1 , , G . For the gth group, pth position, and cth category, let n g p c be the number of sequences, and let n g p = c = 1 C n g p c for p = 1 , , P , g = 1 , , G . Note that if there is no missing value, each sequence, at each position, takes on one of the C responses 1 , , C , so that n g p = n g , for all p = 1 , , P . The combined group sample size is n = g = 1 G n g . For geographically separated sequences, the assumption of independence of the G groups may be reasonable, but the sequences within a group may not be independent due to their shared ancestry. For SARSCoV or HIV genome sequences, because of the rapid evolution of the virus, the independence assumption may not be very stringent. Further, for each sequence, the responses at the P positions are generally not independent nor necessarily identically distributed.
For SARSCoV genome sequences, the scientific focus is the statistical comparison of different strata to coordinate plausible differences in response to pertinent environmental factors. In many fields of applications, particularly in genomic studies, not only do we have p > > n , but also, n is often small, leading to a range with dimensional problems. We encounter conceptual and operational roadblocks due to there being too many unknown parameters. For such genomic sequences, any single position (gene) yields very little statistical information. Hence, a composite measure of the qualitative variation over the entire sequence is thought to be a better way of gauging the statistical group discrimination. In this specific context, some molecular epidemiologic studies have advocated for a suitable external sequence analysis such as multivariate analysis of variance (MANOVA), although there are impasses of various types. Genomic research is a prime illustration of the need for an appropriate statistical methodology for comprehending the genomic variation in such high-dimensional categorical data models. Variation (diversity) in such large P small n models cannot be properly statistically studied using standard discrete multivariate analysis tools or the full likelihood approach. For qualitative data models, the Gini–Simpson (GS) index (Gini [26]; Simpson [27]) and Shannon entropy (Shannon [28]) are commonly used for statistical analysis in a range of fields, including genetic variation studies (Chakraborty and Rao [29]). The Hamming distance provides an average measure that does not ignore dependence or possible heterogeneity. The U-statistics methodology (Hoeffding [30]) is incorporated to obtain optimal nonparametric estimators and their jackknife variance estimators. In the genome sequence context, we are confronted with the P > > n environment. Within this framework, we encounter two scenarios: (i) P > > n , in which n is at least moderately large, and (ii) P > > n , in which n is small. In (i), the sample estimates of the Hamming distances are all U-statistics, to which standard asymptotics (Sen [31]) apply: the estimators are asymptotically (with n ) normal, and their jackknifed variance estimators are consistent. Hence, we shall not enter into a detailed discussion of (i). Case (ii), which is more commonly encountered in genomic studies, entails different perspectives. We must use the appropriate central limit theory (CLT) for dependent sequences of bounded random variables.
Let π g p c denote the cth cell probability for the pth marginal law π g p = ( π g p 1 , , , π g p C ) of group g ( 1 c C , 1 g G , 1 p P ) , and let n g p c be the cell frequencies for the pth marginal table corresponding to the gth group, so that the MLE of π g p c is π ^ g p c = n g p c / n g , 1 c C , where n g = c = 1 C n g p c , with the same process being applied to every p ( = 1 , , P ) . We incorporate the jackknife methodology to obtain the nonparametric estimators. The jackknife estimator, a plug-in estimator that is based on the MLE of π g p , of the Hamming–Shannon measure is considered. The difficulties associated with the HDLSS asymptotics in the HDLSS genomic context are assessed, and suitable permutation procedures are appraised. Under the null hypothesis, i.e., the homogeneity of the G groups, the advantage of the resulting permutation invariance structure is used. Therefore, we proceed with this extended permutation–jackknife methodology.
Consider all possible equally likely permutations of the observations for each p, each with the same conditional probability 1 N , where N = n ! / g = 1 G n g ! , g = 1 , , G . Let Y 1 = ( T 2 : 1 , , T G : G 1 ) t with T { i : i 1 } being well-defined statistics. In practice, to overcome the difficulty that N is too large, we may choose N 1 , which is sufficiently large, instead of N 1 < < N . Next, generate a set of ( N 1 1 ) permutations. For this construction, we use the permutation distribution that has been generated by the set of all possible permutations among themselves. Let Y i be the ( i 1 ) th corresponding permutation of Y 1 , i = 2 , , N 1 and the corresponding covariance matrix S N 1 = ( N 1 1 ) 1 i = 1 N 1 ( Y i Y ¯ ) ( Y i Y ¯ ) t , where Y ¯ = N 1 1 i = 1 N 1 Y i . In practice, N 1 is taken to be sufficiently large so that l i m N 1 P / N 1 = c ( 0 , 1 ) . Carry out the spectral decomposition for the matrix S N 1 , with the eigenvalues of S N 1 being replaced by the new corresponding eigenvalues, which are obtained based on Equation (30) in the same way that the sample covariance matrix S was replaced by Σ ^ T , to obtain the new and improved jackknife covariance matrix replacing S N 1 for the statistical inference. Thus, under these circumstances, the procedure that we proposed in Section 4 works well for HDLSS categorical data models.
When the sample size n is larger than the dimension p, so that lim n p / n = c ( 0 , 1 ) , as shown in Section 4, we can note that n Σ ^ T is Wishart-distributed with a mean matrix Σ . It is demonstrated that Σ ^ T is the MLE of the population covariance matrix Σ ; hence, it is consistent. Moreover, it is easy to ascertain that Σ ^ T reduces to the sample covariance matrix S when the dimension p is fixed. Hence, when n > P , with the proposed novel estimator Σ ^ T replacing the sample covariance matrix S , the traditional case of a fixed dimension p and the modern case of a high-dimensional setup can be integrated into a unified theory. We may therefore also conclude that Σ ^ T is the best O ( p ) -equivariant estimator. Thus, when n > P , the proposed novel estimator Σ ^ T of the population covariance matrix Σ plays a fundamental role in the further theoretical development of statistical inference. Practically, it has applications in certain HDLSS categorical data models. When the sample size n is moderate, the optimal nonparametric methods for the genomic data may be proposed. When P > > n and the sample n is small, we may incorporate the permutation and jackknife methodologies to make statistical inferences for the genomic data. Hopefully, the optimal statistical methods can be of help for scientific breakthroughs as well as for real-world applications within gene science.

Author Contributions

Conceptualization, M.-T.T.; Methodology, M.-T.T. and C.-H.T.; Validation, M.-T.T.; Investigation, C.-H.T.; Writing—original draft, C.-H.T.; Writing—review and editing, M.-T.T.; Project administration, C.-H.T. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

No new data were created or analyzed in this study.

Acknowledgments

The authors thank the two reviewers for their helpful comments, and the Editor for his comments, which helped in the rewriting of Section 6.3 in a more concise manner. The authors are grateful to the reviewer who recommended that they expand the comments in Remark 2.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Pourahmadi, M. High-Dimensional Covariance Estimation; Wiley: New York, NY, USA, 2013. [Google Scholar]
  2. Zagidullina, A. High-Dimensional Covariance Matrix Estimation: An Introduction to Random Matrix Theory; SpringerBriefs in Applied Statistics and Econometrics; Springer: Cham, Switzerland, 2021. [Google Scholar]
  3. Stein, C. Estimation of a covariance matrix. In Proceedings of the Rietz lecture, 39th Annual Meeting IMS, Atalanta, GA, USA, 1975. [Google Scholar]
  4. Stein, C. Lectures on the theory of estimation of many parameters. J. Math. Sci. 1986, 43, 1373–1403. [Google Scholar] [CrossRef]
  5. Ledoit, O.; Wolf, M. Nonlinear shrinkage estimation of large-dimensional covariance matrices. Ann. Statist. 2012, 40, 1024–1060. [Google Scholar] [CrossRef]
  6. Ledoit, O.; Wolf, M. Optimal estimation of a large-dimensional covariance matrix under Stein’s loss. Bernoulli 2018, 24, 3791–3832. [Google Scholar] [CrossRef]
  7. Ledoit, O.; Wolf, M. Analytical nonlinear shrinkage of large-dimensional covariance matrices. Ann. Statist. 2020, 48, 3043–3065. [Google Scholar] [CrossRef]
  8. Ledoit, O.; Wolf, M. Shrinkage estimation of large covariance matrices: Keep it simple, statistician? J. Multivar. Anal. 2021, 186, 104796. [Google Scholar] [CrossRef]
  9. Rajaratnam, B.; Vincenzi, D. A theoretical study of Stein’s covariance estimator. Biometrika 2016, 103, 653–666. [Google Scholar] [CrossRef]
  10. Tsai, M.-T.; Tsai, C.-H. On the orthogonally equivariant estimators of a covariance matrix. arXiv 2024, arXiv:2405.06877. [Google Scholar]
  11. Silverstein, J.W. Strong convergence of the empirical distribution of eigenvalues of large dimensional random matrices. J. Multivar. Anal. 1995, 55, 331–339. [Google Scholar] [CrossRef]
  12. Anderson, T.W. An Introduction to Multivariate Statistical Analysis, 3rd ed.Weily: New York, NY, USA, 2003. [Google Scholar]
  13. Stein, C. Inadmissibility of the usual estimator of the mean of a multivariate normal distribution. Proc. Third Berkeley Symp. Math. Statist. Probab. 1956, 1, 197–206. [Google Scholar]
  14. James, W.; Stein, C. Estimation with quadratic loss. Proc. Fourth Berkeley Symp. Math. Statist. Probab. 1961, 1, 361–379. [Google Scholar]
  15. Tsai, M.-T. On the maximum likelihood estimator of a covariance matrix. Math. Method. Statist. 2018, 27, 71–82. [Google Scholar] [CrossRef]
  16. von Neumann, J. Some matrix-inequalities and metrization of matric-space. Tomsk. Univ. Rev. 1937, 1, 286–300. [Google Scholar]
  17. Ledoit, O.; Péché, S. Eigenvectors of some large sample covariance matrix ensembles. Probab. Theory Relat. Fields. 2011, 151, 233–264. [Google Scholar] [CrossRef]
  18. Marčenko, V.A.; Pastur, L.A. Distribution of eigenvalues for some sets of random matrices. Sb. Math. 1967, 1, 457–483. [Google Scholar]
  19. Choi, S.I.; Silverstein, J.W. Analysis of the limiting spectral distribution of large dimensional random matrices. J. Multivar. Anal. 1995, 54, 295–309. [Google Scholar]
  20. Bickle, p.J.; Levina, E. Regularized estimation of large covariance matrices. Ann. Statist. 2008, 36, 199–227. [Google Scholar] [CrossRef]
  21. Fan, J.; Fan, Y.; Lv, J. High dimensional covariance matrices using a factor model. J. Econom. 2008, 147, 186–197. [Google Scholar] [CrossRef]
  22. Bai, Z.D.; Miao, B.Q.; Pan, G.H. On asymptotics of eigenvectors of large sample covariance matrix. Ann. Probab. 2007, 35, 1532–1572. [Google Scholar] [CrossRef]
  23. Johnstone, I.M.; Paul, D. PCA in high dimensions: An orientation. Proc. IEEE 2018, 106, 1277–1292. [Google Scholar] [CrossRef]
  24. Johnstone, I.M.; Lu, A.Y. On consistency and sparsity for principal components analysis in high dimensions. J. Amer. Statist. Assoc. 2009, 104, 682–693. [Google Scholar] [CrossRef]
  25. Uhlig, H. On singular Wishart and singular multivariate Beta distributions. Ann. Statist. 1994, 22, 395–405. [Google Scholar] [CrossRef]
  26. Gini, C.W. Variabilita e Mutabilita, Studi Economico-Giuridici della R; Universita de Cagliary: Cagliari, Italy, 1912; Volume 2, pp. 3–159. [Google Scholar]
  27. Simpson, E.H. The measurement of diversity. Nature 1949, 163, 688. [Google Scholar] [CrossRef]
  28. Shannon, C.E. A mathematical theory of communication. Bell System Techni. J. 1948, 27, 379–423, 623–656. [Google Scholar] [CrossRef]
  29. Chakraborty, R.; Rao, C.R. Measurement of genetic variation for evolutionary studies. In Handbook of Statistics Vol. 8: Statistical Methods in Biological and Medical Sciences; Rao, C.R., Chakraborty, R., Eds.; Elsevier: Amsterdam, The Netherlands, 1991; pp. 271–316. [Google Scholar]
  30. Hoeffding, W. A class of statistics with asymptotically normal distribution. Ann. Math. Statist. 1948, 19, 293–325. [Google Scholar] [CrossRef]
  31. Sen, P.K. Some invariance principles relating to Jackknifing and their role in sequential analysis. Ann. Statist. 1977, 5, 315–329. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Tsai, C.-H.; Tsai, M.-T. Consistent Estimators of the Population Covariance Matrix and Its Reparameterizations. Mathematics 2025, 13, 191. https://doi.org/10.3390/math13020191

AMA Style

Tsai C-H, Tsai M-T. Consistent Estimators of the Population Covariance Matrix and Its Reparameterizations. Mathematics. 2025; 13(2):191. https://doi.org/10.3390/math13020191

Chicago/Turabian Style

Tsai, Chia-Hsuan, and Ming-Tien Tsai. 2025. "Consistent Estimators of the Population Covariance Matrix and Its Reparameterizations" Mathematics 13, no. 2: 191. https://doi.org/10.3390/math13020191

APA Style

Tsai, C.-H., & Tsai, M.-T. (2025). Consistent Estimators of the Population Covariance Matrix and Its Reparameterizations. Mathematics, 13(2), 191. https://doi.org/10.3390/math13020191

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop