Abstract
For the high-dimensional covariance estimation problem, when , the orthogonally equivariant estimator of the population covariance matrix proposed by Tsai and Tsai exhibits certain optimal properties. Under some regularity conditions, the authors showed that their novel estimators of eigenvalues are consistent with the eigenvalues of the population covariance matrix. In this paper, under the multinormal setup, we show that they are consistent estimators of the population covariance matrix under a high-dimensional asymptotic setup. We also show that the novel estimator is the MLE of the population covariance matrix when . The novel estimator is used to establish that the optimal decomposite -test has been retained. A high-dimensional statistical hypothesis testing problem is used to carry out statistical inference for high-dimensional principal component analysis-related problems without the sparsity assumption. In the final section, we discuss the situation in which , especially for high-dimensional low-sample size categorical data models in which .
MSC:
62C20; 62F10
1. Introduction
The problem of high-dimensional covariance estimation is one of the most interesting topics in statistics (Pourahmadi [1], Zagidullina [2]). Stein ([3,4]) investigated an orthogonally equivariant nonlinear shrinkage estimator for the population covariance matrix. Stein’s estimator has been considered as the gold standard, from which a significant strand of research on the orthogonally equivariant estimation of the covariance matrix was generated (see, e.g., Ledoit and Wolf [5,6,7,8]; Rajaratnam and Vincenzi [9] and the references therein).
Tsai and Tsai [10] focused their attention on rotation-equivariant estimators, showing that Stein’s estimator can be inadmissible when the dimension p is fixed. Under a high-dimensional asymptotic setup—namely, when both the sample size n and dimension p were sufficiently large with a concentration of — they re-examined the asymptotic optimal property of the estimators proposed by Stein [3] and Ledoit and Wolf [6]. Moreover, Tsai and Tsai [10] looked into the mechanism of the Marenko–Pastur equation (Silverstein [11]) to determine an explicit equality relationship of the quantiles of the limiting spectral distributions. They used the obtained equality to propose a new kind of orthogonally equivalent estimator for the population covariance matrix and showed that their novel estimators of the eigenvalues are consistent with the eigenvalues of the population covariance matrix. When , they further showed that their proposed covariance estimator is the best orthogonally equivariant estimator for the population covariance matrix under the normalized Stein loss function. In contrast, both Stein’s estimator and the sample covariance matrix can be inadmissible.
In this context, the question regarding whether the consistent estimator of the population covariance matrix does or does not exist naturally arises. In this paper, we further show that the estimator proposed by Tsai and Tsai [10] is a consistent estimator of the population covariance matrix when . To achieve this, first, under the multinormal setup, we show that the components of spectral decomposition of the sample covariance matrix are the maximum likelihood estimators (MLEs) of the components of the population covariance matrix when the dimension p is fixed and the sample size n is large (i.e., ). This process is demonstrated in Section 3. Then, in Section 4, we extend the results of Section 3 to the case in which : namely, to show that the novel estimator is not only consistent but is also the MLE of the population covariance matrix. Based on the proposed covariance estimator, the optimal decomposite -test for a high-dimensional statistical hypothesis testing problem is established. This test can also be applied to make statistical inferences for high-dimensional principal component analysis (PCA)-related problems without the sparsity assumption, as shown in Section 5. In the final section, we discuss the situation in which , even when .
2. Preliminary Notations
Let be independent p-dimensional random vectors with a common multivariate normal distribution, . First, we assume that the dimension p is fixed in Section 2 and Section 3. A basic problem that is considered in the literature is the estimation of the covariance matrix , which is unknown and assumed to be non-singular. It is also assumed that , so that the sufficient statistic
is positive definite with a probability of one. In the literature, the estimators of are the functions of . The sample space , the parameter space , and the action space are taken to be the set of the symmetric positive definite matrices. The general linear group acts on the space . Note that has a Wishart distribution, , and the maximum likelihood estimator (MLE) of is expressed as follows:
which is unbiased (Anderson [12]).
We consider the invariant loss function L, i.e., L satisfies the condition that for all . An estimator is defined as -equivariant if . Suppose that acts on , whereby the orbit through is the set . This action is called transitive if is one orbit, which is defined by the condition that if , there is some in which . It may then be easy to note the fact that if L is -invariant, is -equivariant, and acts transitively on ; then, the risk function is constant on : .
One of the most interesting loss functions was introduced by Stein [13]:
where and denote the trace and determinant of a matrix, respectively. Because acts transitively on the space , the best -equivariant estimator exists. It can be easily determined that the MLE of is the best -equivariant estimator. The minimum risk is
where denotes the expectation of the random variable of X.
3. Optimal Estimators of When It Is Reparameterized
As the general linear group is not an amenable group, to study the minimax problem, James and Stein [14] reparameterized the parameter , where denotes the group of lower triangular matrices with positive diagonal elements; the loss function is also invariant under . Using the Cholesky decomposition, we may express that , where . As acts transitively on the space , the best -equivariant estimator was proposed by James and Stein [14] as , where is a positive diagonal matrix with the elements . The minimum risk for the best -equivariant estimator is
This is because is the solvable group and, hence, it is amenable. Thus, Stein’s estimator is the minimax.
3.1. The Stein Phenomenon
It is easy to see that ; thus, the MLE is inadmissible, and the estimator should be used instead of . This is the well-known Stein phenomenon in the covariance estimation problem (for details, see Anderson [12]).
In order to determine why the Stein phenomenon occurs, which is when the MLE of is inadmissible, we began to think about the deeper meaning behind the Stein phenomenon. Tsai [15] extended Stein’s method to establish another minimax estimator. We explain this method briefly in the following. Let and be partitioned as
and for all with and . Define
and
Note that the dimension of is one less than that of , which denotes a process of successive diagonalization. Let
We then have the following:
and
Let
Consequently, and are individually transformed into the diagonal matrices and so that the one-to-one correspondences of and are established to allow for the Stein loss function and to be the group of positive diagonal matrices. Based on the properties of the Wishart distribution (see Theorem 4.3.4, Theorem 7.3.4, and Theorem 7.3.6 of Anderson [12]), it is easy to note that which are independent random variables with degrees of freedom. Let be the diagonal matrix with elements . We may then conclude that is Wishart distribution with a mean matrix . Furthermore, it should be noted that all the p Jacobins of the transformation of are one, and the Wishart density of is equivalent to the Wishart density of . Thus, the Stein loss function is
As also acts transitively on the space , the best -equivariant estimator can be expressed in the following form:
Thus, the minimum risk for the estimator is
As the group is also solvable, we may conclude that is a minimax. Based on (5) and (15), it is easy to see that ; hence, similarly to the Stein phenomenon, we may conclude that Stein’s estimator is inadmissible, while the estimator is admissible.
3.2. The Optimal Properties of the MLE
We may note that the MLE of is the best -equivariant estimator. James and Stein [14] used the Cholesky decomposition to parameterize the parameter to obtain the Stein estimator , which is the best -equivariant estimator. Tsai [15] used the full Iwasawa decomposition to obtain the best -equivariant estimator . It is important to note that the inequality holds. As , we can easily see that the above inequality holds. The minimum risk of the estimator is larger for the larger group and smaller for the smaller group.
Tsai [15] showed that the minimum risks of the MLEs under the Cholesky decomposition and the full Iwasawa decomposition are the same when the geodesic distance loss function on a non-Euclidean space is adopted. Comparing the minimum risks of estimators for different groups does not make much statistical sense. The comparison of different estimators may make sense when they are compared under the same parameterized decomposition. For the spectral decomposition and when the dimension p is fixed, Tsai and Tsai [10] claimed that the sample covariance matrix is the best orthogonally equivariant estimator of spectral decomposition under the Stein loss function, Stein ([3,4]) another orthogonally equivariant estimator can be inadmissible under spectral decomposition. These results are different from the Stein phenomenon, in which is inadmissible. Hence, we cannot help but suspect that the Stein phenomenon is due to the parameterized decompositions and does not hold significant statistical meaning. We hope that this paper may impact those statisticians who have been constantly warned not to use an MLE for the covariance matrix ever since the Stein phenomenon occurred, making them reconsider the employment of an MLE for the covariance matrix.
When the dimension p is fixed, each of the three estimators possess their optimal properties for their respective parameterized decompositions. All three estimators—, , and —are the best -equivariant, -equivariant, and -equivariant estimators, respectively. The sample covariance matrix is not only the best -equivariant estimator but also the best -equivariant estimator. They are the MLEs for , , and decompositions, respectively. The optimal property of the MLE is essentially not affected.
Note that the Stein loss function is essentially equivalent to the entropy loss function under the multinormal setup. When the dimension p is fixed and the sample size , the literature agrees that and will converge to and , respectively, almost surely when (Anderson [12]). We extend Tsai’s [15] approach to the case of spectral decomposition conditions, i.e., to determine whether the sample covariance matrix is the best orthogonally (-equivariant estimator of the population covariance matrix when the dimension p is fixed. In other words, we investigate whether is the MLE of under spectral decomposition conditions, so that the sample components converge to the corresponding population components as .
3.3. The Best Orthogonally Equivariant Estimator
For the application to the statistical inference of principal component analysis, we need the notation of the so-called spectral decomposition of the population covariance matrix, which can be understood as another type of reparametrization of . Stein ([3,4]) considered the orthogonally equivariant estimator for the population covariance matrix, which has been considered the gold standard. Consider the spectral decomposition of the population covariance matrix: namely, , where is a diagonal matrix with the eigenvalues , and is the corresponding orthogonal matrix with being the eigenvector that is associated to the ith largest eigenvalue . Similar dynamics apply to the sample spectral decomposition, i.e., , where is a diagonal matrix with the eigenvalues , and is the corresponding orthogonal matrix, with being the eigenvector corresponding to . Write and . Note that the matrices and are the consistent estimators of and , respectively, when the dimension p is fixed and the sample size n is large (for details, see Anderson [12]). Hence, we may conclude that there are two situations in which the dimension p is fixed: (i) When is not reparameterized, the sample covariance matrix is unbiased, and hence, it is consistent. (ii) When is reparameterized via spectral decomposition, the components and are the consistent estimators of and , respectively. Then, the sample covariance matrix is still consistent.
Remark 1.
We aim to study the consistency property with the help of the optimal properties of MLEs. The main reason is that based on the general theory of estimation, the maximum likelihood estimator is consistent; that is, it tends toward the true value with a probability of one as the sample size increases under certain regularity conditions, which are satisfied by the non-degenerated Wishart distribution.
We may note that when is not reparameterized, it is easy to see that the sample covariance matrix is the MLE of . When the spectral decomposition for is adopted, it is expected that the sample components and are the MLEs of the corresponding population components and , respectively.
First, when the dimension p is fixed, is Wishart-distributed when . Under the spectral decompositions for and , we find the MLEs of and in the following. Note that constitutes the set of orthogonal matrices. Let ; then, . Assume that , and then, the log-likelihood function of is
where , which is independent of (i.e., ). Equation (16) is essentially equivalent to the Stein loss function.
Theorem 1
(von Neumann [16]). For orthogonal and and diagonal (),
and a minimizing value of is . For the detailed proofs, see Theorem A.4.7 and Lemma A.4.6 by Anderson [12].
Based on the result of the von Neumann Theorem, we can determinate that the MLE of V is and, hence,
Thus, we may state that
After some calculations, the function in (19) is further minimized with respect to at . As such, when p is fixed, we have that is the MLE of , and is the MLE of . Thus, when p is fixed and the sample size n is large, according to the property of the MLE, we have and almost surely (a.s.); therefore, and are the consistent estimators of and , respectively. Hence, a.s. Therefore, in terms of the spectral decompositions, when the dimension p is fixed, the sample covariance matrix is the consistent estimator of the population covariance matrix . Based on the above arguments, when the dimension p is fixed, we may note that the MLEs play an important role in achieving optimal conditions whether the dimension is reparameterized or not. When it is not reparameterized, the MLE of is unbiased and consistent, while when it is reparameterized, the MLEs of the component parameters for spectral decomposition are consistent.
However, the situation may be different when the dimension p is large, so that , because the sample covariance matrix is no longer the MLE of the population covariance matrix . Hence, the question naturally arises as to whether the consistent estimator of exists or not under the large dimensional asymptotic setup. Under the spectral decomposition, Tsai and Tsai [10] proved the consistency of their proposed estimators of population eigenvalues with the help of random matrix theory. Some notations of this are presented below.
4. High-Dimensional Case
For a large setup, a large dimensional asymptotic framework is set up when , so that is fixed, with . In this section, we extend the class of orthogonally equivariant estimators to the realm of large dimensional asymptotics with a concentration of .
4.1. The Marenko–Pastur Equation
In accordance with Ledoit and Pch [17], we make the following assumptions:
A1. Note that , where values are independent and identically distributed with a mean of and covariance matrix . Assume that the 12th absolute central moment of each variable is bounded by a constant.
A2. The population covariance matrix is nonrandom and positive definite. , and .
A3. For a large setup, the large dimensional asymptotic framework is established when , so that is fixed as in this paper.
A4. Let . The empirical spectral distribution of , defined as , converges as to a probability distribution function at every point of continuity of H. The support of H, , is included in a compact set, , with .
Let be the sample spectral distribution and F be its limiting factor. It is proved that converges to F a.s. as (Marenko–Pastur [18]).
The Stieltjes transformation of distribution function F is defined as follows:
where is the half-plane of complex numbers with a strictly positive imaginary part. Let
Then, based on the results of random matrix theory, converges to if and only if converges to . Subsequently, the well-known Marenko–Pastur equation (Silverstein [19]) can be expressed in the following form:
where H denotes the limiting behavior of the population spectral distribution. Based on the Marenko–Pastur equation, meaningful information regarding the population spectral distribution can be retrieved under the large dimensional asymptotic framework. Choi and Silverstein [20] further showed that
exists for any .
Using the Sokhotski–Plemelj formula, the term can be separated into the real part, which becomes a principal value integral (the so-called Hilbert transform), while the imaginary part becomes times the limiting sample spectral density function . Namely,
where the Hilbert transform denotes
In some special cases, can be expressed explicitly. For example, let and . When , the Marenko–Pastur density function is of the following form:
Using the resolvent method, we then have that
where the real part is the Cauchy principal value, i.e.,
Generally, is unknown, and the form of will not be explicit.
Stein [3] used the naive empirical counterpart to estimate the Hilbert transformation, , where denotes the quantile of the limiting sample spectral distribution F such that with denoting the largest integer of x. As converges to a.s., meaning that converges to a.s., Stein concluded that converges to . Then, the empirical counterpart is a consistent estimator of the Hilbert transform.
4.2. The Consistent Estimators of Population Eigenvalues
The Marenko–Pastur equation in (22) shows the implicit relationship between F and H. Tsai and Tsai [10] further established the following explicit equality relationship:
where and denote the quantiles of the limiting population and sample spectral distributions H and F, respectively, such that , with denoting the largest integer of x. Let be the support of F. Using Theorem 2 of Choi and Silverstein [19], Ledoit and Pch [17] pointed out that if , for
Based on and the results of (29) and the empirical counterpart of , Tsai and Tsai [10] proposed a new kind of orthogonally equivariant estimator, of , which is of the following form:
When the dimension p is fixed and n is large (i.e., ), as discussed in Section 3, we have that and . However, when , , and is no longer . In contrast, by means of Equation (29), it should be of the following form: Note that based on assumption A4, converges to H when , and thus, converges to . In other words, converges to as defined in Proposition 1. Hence, to estimate is the same as to estimate under the large dimensional asymptotic setup, . Under certain regularity conditions, Tsai and Tsai [10] claimed that their proposed estimators of the population eigenvalues are consistent. We summarize the results in the following.
Proposition 1.
Let and be defined in (29) and (30), respectively. Under the assumptions of Theorem 2 of Tsai and Tsai [10], is the then consistent estimator of , namely, is the consistent estimator of Γ, when .
Remark 2.
In the literature, there are methods for imposing additional structure, such as sparse methods [20], the factor model [21], or a graph model [9], on the covariance matrix estimation. Orthogonally equivariant estimators are widely adopted. Using the Marenko–Pastur equation in (22), meaningful information on the population spectral distribution can be retrieved under the large dimensional asymptotic framework. However, the relationship is tangled. Under certain regularity conditions, Tsai and Tsai [10] established an explicit relationship between the quantiles of the limiting sample spectral distribution F and the limiting population spectral distribution H in (29). Then, the consistent estimators of the population eigenvalues can easily be established. This result makes up for the deficiency of estimators in the literature, such as Stein’s [4] and Ledoit and Wolf’s [5] estimators, which are inconsistent estimators of population eigenvalues. Tsai and Tsai [10] proposed a new kind estimator, , for the population covariance matrix Σ and showed that the proposed estimator is the best orthogonally (-equivariant estimator of the population covariance matrix Σ under the normalized Stein loss function when the . Random matrix theory provided essential support for these results. However, it remains undetermined whether the proposed estimator is consistent for Σ, namely, whether the sample component estimators are consistent for the corresponding population components. To investigate this, we adopt the MLE approach; thus, in this paper, we assume that are independent and identically multinormally distributed.
4.3. The Consistent Estimator of the Population Covariance Matrix
When the dimension p is large, the sample covariance matrix is no longer the MLE of . It is difficult to directly determined the functional form of such that it is the MLE of ; therefore, we may follow the detour of the reparametrization of via spectral decomposition. Following this, the main goal is to see whether the orthonormal matrix is the MLE of or not. The result of implies that the limiting distribution of on is entirely concentrated at , and unbiasedness is not a useful optimal property, the role of which would be replaced by the property of equivariance. Ledoit and Pch [17] pointed out that the projection of the sample eigenvector onto the population eigenvector, , will wipe out the non-rotation-equivariant behavior and the average of the quantities of over the sample eigenvectors that are associated with the sample eigenvalues, demonstrating how the eigenvectors of the sample covariance matrix deviate from those of the population covariance matrix under large dimensional asymptotics. This is one of the main reasons why we prefer to restrict it to the class of rotation-equivariant estimators. Tsai and Tsai [10] established the best orthogonally equivariant estimator for . We continue to study whether the proposed estimator is the consistent estimator of the population covariance matrix or not when . Based on Proposition 1, it only needs to be determined whether is the consistent estimator of .
The orthogonal matrix may not generally be a consistent estimator of when the dimension p is large (see Bai et al. [22] and references therein). Hence, we may process it under the restricted model, namely, under the Wishart distribution setup, when .
When is reparameterized via the spectral decomposition, we aim to study the consistency property of the component parameters when the dimension p is large. Under the multivariate normal setup, when the dimension p is fixed, is Wishart-distributed with the mean matrix . However, when , is not Wishart-distributed with the mean matrix , and nor is the MLE of . Instead, we may notice that is Wishart-type-distributed with the mean matrix when . It is easy to note that the log-likelihood function of is similar to (16) with in (30) replacing in (16) (i.e., with replacing ), which still satisfies the regularity conditions, and it does not degenerate. Based on , our goal is to show that is the MLE of when .
First, we want to show that is the MLE of when , namely, to extend the von Neumann Theorem to the case in which . Note that . Thus, is equal to . Note that implies that . Moreover,
Then, the derivative becomes . Thus, implies that . Similarly to the above arguments, we can also show that . Hence, we may have that ; namely, the minimum of , with respect to , occurs at . Thus, the von Neumann Theorem still holds for the case in which . As such, in terms of the spectral decompositions, we may have that is also the MLE of when . Hence, based on the property of the MLE, we may summarize it as follows:
Theorem 2.
Let be independent p-dimensional random vectors with a common multivariate normal distribution, . Consider the spectral decompositions and and let be defined as in (30). Under the assumptions of Proposition 1, when , is the MLE of . Hence, it is the consistent estimator of .
Based on Proposition 1 and Theorem 2, we may then conclude that the proposed novel estimator is consistent for the population covariance matrix when the dimension p is large. Next, we continue to investigate whether the proposed estimator is the MLE of or not. Note that
After a number of calculations, the function in (32) is further minimized with respect to at . As such, when , we may conclude that is the MLE of . Thus, based on Theorem 2, we have that is the MLE of ; however, the sample covariance matrix is not. According to the property of the MLE, we have that , , and are the consistent estimators of , , and , respectively. Therefore, the following can be stated:
Theorem 3.
Under the assumptions of Theorem 2, when , is the MLE of Σ. Hence, it is consistent.
Remark 3.
We may make the following three conclusions: (i) The sample covariance matrix is the MLE of the population covariance matrix Σ when the dimension p is fixed. (ii) The estimator is the MLE of Σ when the dimension p is large so that . (iii) It is easy to see that reduces to the sample covariance matrix when the dimension p is fixed and the sample size n is large (i.e., ). These are insightful parallels. Hence, for simplicity, we may integrate the above results into a unified one: when p is fixed or (i.e., , is Wishart-distributed with a mean matrix Σ, and is the MLE of Σ. Thus, is the consistent estimator of Σ, and hence, converges to as Therefore, we may use to replace when making statistical inferences when the dimension p is fixed or .
Remark 4.
Tsai and Tsai [10] used the fundamental statistical concept to determine the quantile equality relationship of the limiting sample and population spectral distributions so that the consistent problems between the sample eigenvalues and the population eigenvalues can be easily handled. Then, we can use the likelihood function to progress. As long as the density function does not degenerate, the statistical inference can be performed similarly to the traditional one. The key point based on the conclusion is to find a consistent estimator of the population covariance matrix, namely, to directly determine the MLE of Σ when .
When the dimension p is fixed, we have that is the MLE of . However, it is not true that is the MLE of when .
Johnstone and Paul [23] provided a detailed discussion of sample eigenvalue bias and eigenvector inconsistency under the spiked covariance model and for high-dimension PCA-related phenomena.
Remark 5.
We may note that is Wishart-distributed with the mean matrix Σ, and reduces to the sample covariance matrix when the dimension p is fixed. With the proposed novel estimator, , replacing the sample covariance matrix, , the case for the traditional fixed dimension p and that for the new high-dimensional dimensions can be integrated into one. Hence, we may suggest using the proposed consistent estimator to replace the sample covariance matrix in order to carry out the multivariate statistical inference, including PCA-related problems for both cases in which (i) p is fixed and n is large (i.e., ) and ones in which (ii) .
We provide an outline for the likelihood ratio test (LRT) of the hypothesis testing problem in the next section.
5. The Decomposite -Test When the Dimension Is Large
Let be a n random vector with a p-dimensional multinormal distribution, a mean vector , and an unknown positive definite covariance matrix . Consider the hypothesis testing problem of
when both the dimension p and sample size n are large. Let
Then, the well-known Hotelling’s -test statistic, which can be found in the literature, is denoted as follows:
When the dimension p is fixed, Hotelling’s -test is optimal for the problem (33).
However, when the dimension p is large, the performance of Hotelling’s -test is not optimal, because the sample covariance matrix is no longer a consistent estimator of . To overcome this difficulty, we can adopt the novel estimator to replace the sample covariance matrix and then consider the following decomposite -test statistic:
where is defined in (30), with the that is defined in (34) replacing the one defined in (2). It is evident that the -test is the LRT statistic for the problem (33).
To avoid the issue that the power of any reasonable test moves toward one as , Le Cam’s contiguity concept was adopted to study the asymptotically local distribution when the dimension p was fixed. Note that the traditional local alternatives do not depend on the dimension p. Based on a large dimension p, in her Ph.D. thesis, Chia-Hsuan incorporated the dimension p into the consideration to study an asymptotical distribution under local alternatives.
where is a fixed p-dimensional vector, which leads to the assumption that when p is large. Compared with the traditional one, the local alternatives also depend on the dimension p with a slight change in the converge rate. Let
Similarly to the arguments presented in Chia-Hsuan’s thesis, we can show that does not converge to in terms of probability; however, it is true that converges to in its local distribution. Note that in the traditional case of the fixed dimension p, the proposed decomposite -test is reduced to Hotelling’s -test, and Hotelling’s -test statistic converges to in terms of probability, which implies a convergence in distribution. It is not difficult to ascertain that the -test statistic reduces asymptotically and locally (under with a rate of ) to a non-central chi-square distribution. This asymptotically local power function is still the monotone function of non-centrality . Hence, when , it is easy to see that the proposed decomposite -test is optimal for the problem (33).
Remark 6.
The high-dimensional PCA problem has mainly been studied in spiked covariance models, and there is a need to assume sparsity in the population eigenvectors for the consistent problem (Johnstone and Lu [24] and the references therein). In contrast, with the proposed novel estimator replacing the sample covariance matrix for statistical inference, when , the results of Theorem 3 can be applied to make multivariate statistical inferences and solve PCA-related problems without the sparsity assumption. When , our approach unifies the traditional () and modern high-dimensional cases for multivariate statistical methods and high-dimensional PCA-related problems. The proposed novel estimator is incorporated to establish the optimal decomposite -test for a high-dimensional statistical hypothesis testing problem and can be directly applied to high-dimensional PCA-related problems without the sparsity assumption.
In the final section, we discuss the case in which , especially for high-dimensional low-sample size categorical data models in which .
6. General Remarks When
6.1. When , Both n and p Are Fixed
Under the multinormal setup, we may note that when , p and n are fixed. Then, the density function of becomes a singular Wishart distribution (Uhlig [25]), which it degenerates. In this situation, assume that rank (; we then may use the following notations: is a diagonal matrix, and the reparametrization , where , and the -dimensional Stiefel manifold of the matrix with orthonormal columns . Note that ; thus, , where . Note that , which means that the von Neumann Theorem may fail to be true in general.
6.2. When , Both n and p Are Large, So That
When , since the population covariance matrix is assumed to be a positive definite symmetric matrix, its p eigenvalues are all positive; however, rank(. Thus, it has sample eigenvalues with 0 in probability. As such, it seems difficult to obtain all the consistent eigenvalue estimators of the population eigenvalues. When the sample size n and the dimension p are both large, we may only need the n largest eigenvalue estimators to be consistent for the first n largest population eigenvalues. If this is the case, the method developed in this note is still applicable.
6.3. Use in High-Dimensional Low-Sample-Size (HDLSS) Categorical Data Models
When (i.e., in HDLSS categorical models), our method might still be used in some situations. HDLSS categorical models are abundant in genomics and bioinformatics, with relatively smaller sample sizes n, but also often . Motivated by the 2002 severe acute respiratory syndrome coronavirus (SARSCoV) epidemic model, a general model of comparing groups is considered. Each sequence has P positions, each one relating to a categorical response, indexed as , and there are sequences in the gth group, with . For the gth group, pth position, and cth category, let be the number of sequences, and let for . Note that if there is no missing value, each sequence, at each position, takes on one of the C responses , so that , for all . The combined group sample size is . For geographically separated sequences, the assumption of independence of the G groups may be reasonable, but the sequences within a group may not be independent due to their shared ancestry. For SARSCoV or HIV genome sequences, because of the rapid evolution of the virus, the independence assumption may not be very stringent. Further, for each sequence, the responses at the P positions are generally not independent nor necessarily identically distributed.
For SARSCoV genome sequences, the scientific focus is the statistical comparison of different strata to coordinate plausible differences in response to pertinent environmental factors. In many fields of applications, particularly in genomic studies, not only do we have , but also, n is often small, leading to a range with dimensional problems. We encounter conceptual and operational roadblocks due to there being too many unknown parameters. For such genomic sequences, any single position (gene) yields very little statistical information. Hence, a composite measure of the qualitative variation over the entire sequence is thought to be a better way of gauging the statistical group discrimination. In this specific context, some molecular epidemiologic studies have advocated for a suitable external sequence analysis such as multivariate analysis of variance (MANOVA), although there are impasses of various types. Genomic research is a prime illustration of the need for an appropriate statistical methodology for comprehending the genomic variation in such high-dimensional categorical data models. Variation (diversity) in such large P small n models cannot be properly statistically studied using standard discrete multivariate analysis tools or the full likelihood approach. For qualitative data models, the Gini–Simpson (GS) index (Gini [26]; Simpson [27]) and Shannon entropy (Shannon [28]) are commonly used for statistical analysis in a range of fields, including genetic variation studies (Chakraborty and Rao [29]). The Hamming distance provides an average measure that does not ignore dependence or possible heterogeneity. The U-statistics methodology (Hoeffding [30]) is incorporated to obtain optimal nonparametric estimators and their jackknife variance estimators. In the genome sequence context, we are confronted with the environment. Within this framework, we encounter two scenarios: (i) , in which n is at least moderately large, and (ii) , in which n is small. In (i), the sample estimates of the Hamming distances are all U-statistics, to which standard asymptotics (Sen [31]) apply: the estimators are asymptotically (with ) normal, and their jackknifed variance estimators are consistent. Hence, we shall not enter into a detailed discussion of (i). Case (ii), which is more commonly encountered in genomic studies, entails different perspectives. We must use the appropriate central limit theory (CLT) for dependent sequences of bounded random variables.
Let denote the cth cell probability for the pth marginal law of group , and let be the cell frequencies for the pth marginal table corresponding to the gth group, so that the MLE of is , where , with the same process being applied to every . We incorporate the jackknife methodology to obtain the nonparametric estimators. The jackknife estimator, a plug-in estimator that is based on the MLE of , of the Hamming–Shannon measure is considered. The difficulties associated with the HDLSS asymptotics in the HDLSS genomic context are assessed, and suitable permutation procedures are appraised. Under the null hypothesis, i.e., the homogeneity of the G groups, the advantage of the resulting permutation invariance structure is used. Therefore, we proceed with this extended permutation–jackknife methodology.
Consider all possible equally likely permutations of the observations for each p, each with the same conditional probability , where . Let with being well-defined statistics. In practice, to overcome the difficulty that N is too large, we may choose , which is sufficiently large, instead of . Next, generate a set of permutations. For this construction, we use the permutation distribution that has been generated by the set of all possible permutations among themselves. Let be the th corresponding permutation of , and the corresponding covariance matrix , where . In practice, is taken to be sufficiently large so that . Carry out the spectral decomposition for the matrix , with the eigenvalues of being replaced by the new corresponding eigenvalues, which are obtained based on Equation (30) in the same way that the sample covariance matrix was replaced by , to obtain the new and improved jackknife covariance matrix replacing for the statistical inference. Thus, under these circumstances, the procedure that we proposed in Section 4 works well for HDLSS categorical data models.
When the sample size n is larger than the dimension p, so that , as shown in Section 4, we can note that is Wishart-distributed with a mean matrix . It is demonstrated that is the MLE of the population covariance matrix ; hence, it is consistent. Moreover, it is easy to ascertain that reduces to the sample covariance matrix when the dimension p is fixed. Hence, when , with the proposed novel estimator replacing the sample covariance matrix , the traditional case of a fixed dimension p and the modern case of a high-dimensional setup can be integrated into a unified theory. We may therefore also conclude that is the best -equivariant estimator. Thus, when , the proposed novel estimator of the population covariance matrix plays a fundamental role in the further theoretical development of statistical inference. Practically, it has applications in certain HDLSS categorical data models. When the sample size n is moderate, the optimal nonparametric methods for the genomic data may be proposed. When and the sample n is small, we may incorporate the permutation and jackknife methodologies to make statistical inferences for the genomic data. Hopefully, the optimal statistical methods can be of help for scientific breakthroughs as well as for real-world applications within gene science.
Author Contributions
Conceptualization, M.-T.T.; Methodology, M.-T.T. and C.-H.T.; Validation, M.-T.T.; Investigation, C.-H.T.; Writing—original draft, C.-H.T.; Writing—review and editing, M.-T.T.; Project administration, C.-H.T. All authors have read and agreed to the published version of the manuscript.
Funding
This research received no external funding.
Data Availability Statement
No new data were created or analyzed in this study.
Acknowledgments
The authors thank the two reviewers for their helpful comments, and the Editor for his comments, which helped in the rewriting of Section 6.3 in a more concise manner. The authors are grateful to the reviewer who recommended that they expand the comments in Remark 2.
Conflicts of Interest
The authors declare no conflicts of interest.
References
- Pourahmadi, M. High-Dimensional Covariance Estimation; Wiley: New York, NY, USA, 2013. [Google Scholar]
- Zagidullina, A. High-Dimensional Covariance Matrix Estimation: An Introduction to Random Matrix Theory; SpringerBriefs in Applied Statistics and Econometrics; Springer: Cham, Switzerland, 2021. [Google Scholar]
- Stein, C. Estimation of a covariance matrix. In Proceedings of the Rietz lecture, 39th Annual Meeting IMS, Atalanta, GA, USA, 1975. [Google Scholar]
- Stein, C. Lectures on the theory of estimation of many parameters. J. Math. Sci. 1986, 43, 1373–1403. [Google Scholar] [CrossRef]
- Ledoit, O.; Wolf, M. Nonlinear shrinkage estimation of large-dimensional covariance matrices. Ann. Statist. 2012, 40, 1024–1060. [Google Scholar] [CrossRef]
- Ledoit, O.; Wolf, M. Optimal estimation of a large-dimensional covariance matrix under Stein’s loss. Bernoulli 2018, 24, 3791–3832. [Google Scholar] [CrossRef]
- Ledoit, O.; Wolf, M. Analytical nonlinear shrinkage of large-dimensional covariance matrices. Ann. Statist. 2020, 48, 3043–3065. [Google Scholar] [CrossRef]
- Ledoit, O.; Wolf, M. Shrinkage estimation of large covariance matrices: Keep it simple, statistician? J. Multivar. Anal. 2021, 186, 104796. [Google Scholar] [CrossRef]
- Rajaratnam, B.; Vincenzi, D. A theoretical study of Stein’s covariance estimator. Biometrika 2016, 103, 653–666. [Google Scholar] [CrossRef]
- Tsai, M.-T.; Tsai, C.-H. On the orthogonally equivariant estimators of a covariance matrix. arXiv 2024, arXiv:2405.06877. [Google Scholar]
- Silverstein, J.W. Strong convergence of the empirical distribution of eigenvalues of large dimensional random matrices. J. Multivar. Anal. 1995, 55, 331–339. [Google Scholar] [CrossRef]
- Anderson, T.W. An Introduction to Multivariate Statistical Analysis, 3rd ed.Weily: New York, NY, USA, 2003. [Google Scholar]
- Stein, C. Inadmissibility of the usual estimator of the mean of a multivariate normal distribution. Proc. Third Berkeley Symp. Math. Statist. Probab. 1956, 1, 197–206. [Google Scholar]
- James, W.; Stein, C. Estimation with quadratic loss. Proc. Fourth Berkeley Symp. Math. Statist. Probab. 1961, 1, 361–379. [Google Scholar]
- Tsai, M.-T. On the maximum likelihood estimator of a covariance matrix. Math. Method. Statist. 2018, 27, 71–82. [Google Scholar] [CrossRef]
- von Neumann, J. Some matrix-inequalities and metrization of matric-space. Tomsk. Univ. Rev. 1937, 1, 286–300. [Google Scholar]
- Ledoit, O.; Péché, S. Eigenvectors of some large sample covariance matrix ensembles. Probab. Theory Relat. Fields. 2011, 151, 233–264. [Google Scholar] [CrossRef]
- Marčenko, V.A.; Pastur, L.A. Distribution of eigenvalues for some sets of random matrices. Sb. Math. 1967, 1, 457–483. [Google Scholar]
- Choi, S.I.; Silverstein, J.W. Analysis of the limiting spectral distribution of large dimensional random matrices. J. Multivar. Anal. 1995, 54, 295–309. [Google Scholar]
- Bickle, p.J.; Levina, E. Regularized estimation of large covariance matrices. Ann. Statist. 2008, 36, 199–227. [Google Scholar] [CrossRef]
- Fan, J.; Fan, Y.; Lv, J. High dimensional covariance matrices using a factor model. J. Econom. 2008, 147, 186–197. [Google Scholar] [CrossRef]
- Bai, Z.D.; Miao, B.Q.; Pan, G.H. On asymptotics of eigenvectors of large sample covariance matrix. Ann. Probab. 2007, 35, 1532–1572. [Google Scholar] [CrossRef]
- Johnstone, I.M.; Paul, D. PCA in high dimensions: An orientation. Proc. IEEE 2018, 106, 1277–1292. [Google Scholar] [CrossRef]
- Johnstone, I.M.; Lu, A.Y. On consistency and sparsity for principal components analysis in high dimensions. J. Amer. Statist. Assoc. 2009, 104, 682–693. [Google Scholar] [CrossRef]
- Uhlig, H. On singular Wishart and singular multivariate Beta distributions. Ann. Statist. 1994, 22, 395–405. [Google Scholar] [CrossRef]
- Gini, C.W. Variabilita e Mutabilita, Studi Economico-Giuridici della R; Universita de Cagliary: Cagliari, Italy, 1912; Volume 2, pp. 3–159. [Google Scholar]
- Simpson, E.H. The measurement of diversity. Nature 1949, 163, 688. [Google Scholar] [CrossRef]
- Shannon, C.E. A mathematical theory of communication. Bell System Techni. J. 1948, 27, 379–423, 623–656. [Google Scholar] [CrossRef]
- Chakraborty, R.; Rao, C.R. Measurement of genetic variation for evolutionary studies. In Handbook of Statistics Vol. 8: Statistical Methods in Biological and Medical Sciences; Rao, C.R., Chakraborty, R., Eds.; Elsevier: Amsterdam, The Netherlands, 1991; pp. 271–316. [Google Scholar]
- Hoeffding, W. A class of statistics with asymptotically normal distribution. Ann. Math. Statist. 1948, 19, 293–325. [Google Scholar] [CrossRef]
- Sen, P.K. Some invariance principles relating to Jackknifing and their role in sequential analysis. Ann. Statist. 1977, 5, 315–329. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).