Chi-square approximation for the distribution of individual eigenvalues of a singular Wishart matrix

This paper discusses the approximate distributions of eigenvalues of a singular Wishart matrix. We give the approximate joint density of eigenvalues by Laplace approximation for the hyper-geometric functions of matrix arguments. Furthermore, we show that the distribution of each eigenvalue can be approximated by the chi-square distribution with varying degrees of freedom when the population eigenvalues are infinitely dispersed. The derived result is applied to testing the equality of eigenvalues in two populations


Introduction
In multivariate analysis, some exact distributions of eigenvalues for a Wishart matrix are represented by hypergeometric functions of matrix arguments.James (1964) classified multivariate statistics problems into five categories based on hypergeometric functions.However, the convergence of these functions is slow, and their numerical computation is cumbersome when sample sizes or dimensions are large.Consequently, the derivation of approximate distributions of eigenvalues has received a great deal of attention.Sugiyama (1972) derived the approximate distribution for the largest eigenvalue through the integral representation of the confluent hypergeometric function.Sugiura (1973) showed that the asymptotic distribution of the individual eigenvalues is expressed by a normal distribution for a large sample size.The chi-square approximation was discussed when the population eigenvalues are infinitely dispersed in Takemura and Sheena (2005) and Kato and Hashiguchi (2014).Approximations for hypergeometric functions have been developed and applied to the multivariate distribution theory in Butler and Wood (2002, 2005, 2022).Butler and Wood (2002) provided the Laplace approximation for the hypergeometric functions of a single matrix argument.The numerical accuracies for that approximation were shown in the computation of noncentral moments of Wilk's lambda statistic and the likelihood ratio statistic for testing block independence.This approximation was extended for the case of two matrix arguments in Butler and Wood (2005).All the results addressed above were carried out for eigenvalue distributions for a non-singular Wishart matrix.
Recently, the distribution of eigenvalues for the non-singular case has been extended to the singular case; see Shimizu andHashiguchi (2021, 2022) and Shinozaki et al. (2022).Shimizu and Hashiguchi (2021) showed the exact distribution of the largest eigenvalue for a singular case is represented in terms of the confluent hypergeometric function as well as the non-singular case.The generalized representation for the non-singular and singular cases under the elliptical model was provided by Shinozaki et al. (2022).
The rest of this paper is organized as follows.In Section 2, we apply the Laplace approximation introduced by Butler and Wood (2005) to the joint density of eigenvalues of a singular Wishart matrix.Furthermore, we show that the approximation for the distribution of the individual eigenvalues can be expressed by the chi-square distribution with varying degrees of freedom when the population covariance matrix has spiked eigenvalues.Section 3 discusses the equality of the individual eigenvalues in two populations.Finally, we evaluate the precision of the chi-square approximation by comparing it to the empirical distribution through Monte Carlo simulation in Section 4.

Approximate distributions of eigenvalues of a singular Wishart matrix
Suppose that an m × n real Gaussian random matrix X is distributed as where O is the m × n zero matrix, Σ is a m × m positive symmetric matrix, and ⊗ is the Kronecker product.This means that the column vectors of X are independently and identically distributed (i.i.d.) from N m (0, Σ) with sample size n, where 0 is the m-dimensional zero vector.The eigenvalues of Σ are denoted by λ 1 , λ 2 , . . ., λ m and λ 1 ≥ λ 2 ≥ • • • ≥ λ m > 0. Subsequently, we define the singular Wishart matrix as W = XX ⊤ , where m > n and its distribution is denoted by W(n, Σ).The spectral decomposition of W is represented as The set of all m × n matrices H 1 with orthonormal columns is called the Stiefel manifold, denoted by For the definition of the above exterior product (H ⊤ 1 dH 1 ), see page 63 of Muirhead (1982).If m = n, Stiefel manifold V m,m coincides with the orthogonal groups O(m).Uhlig (1994) gave the density of W as where Γ m (a) = π m(m−1)/4 m i=1 Γ{a−(i−1)/2} and etr(•) = exp(tr(•)).Srivastava (2003) represented the joint density of eigenvalues of W in a form that includes an integral over the Stiefel manifold; where (dH 1 ) = Vol(V n,m ) and The above integral over the Steifel manifold was evaluated by Shimizu and Hashiguchi (2021) as the hypergeometric functions of the matrix arguments.We approximate (1) by Laplace approximation for the hypergeometric functions of two matrix arguments provided in Butler and Wood (2005).
For a positive integer k, let κ = (κ 1 , κ 2 , . . ., κ m ) denote a partition of k with The set of all partitions with less than or equal to m is denoted by and (α) 0 = 1.For integers, p, q ≥ 0 and m × m real symmetric matrices A and B, we define the hypergeometric function of two matrix arguments as where α = (α 1 , . . ., α p ) ⊤ , β = (β 1 , . . ., β q ) ⊤ and C κ (A) is the zonal polynomial indexed by κ with the symmetric matrix A, see the details given in Chapter 7 of Muirhead (1982).The hypergeometric functions with a single matrix are defined as The special cases 1 F 1 and 2 F 1 are called the confluent and Gauss hypergeometric functions, respectively.Butler and Wood (2002) proposed a Laplace approximation of 1 F 1 and 2 F 1 through their integral expressions.They showed the accuracy of that approximation is greater than the previous results.This approximation was extended to the complex case in Butler and Wood (2022).The important property of ( 2) is the integral representation over the orthogonal group where (dH) is the invariant measure on the m×m orthogonal group O(m).Integral representations (4) are a useful tool for obtaining approximation of p F (m) q .Asymptotic expansions of 0 F (m) 0 are given in Anderson (1965) when both two positive definite matrix arguments are widely spaced.Constantine and Muirhead (1976) gave the asymptotic behavior of 0 F (m)  0 when the population eigenvalues are multiple.From the integral expression (4), Butler and Wood (2005) provided Laplace approximations for p F q (m) .

Lemma 1. Let the two diagonal matrices be
where s = r−1 i=1 r j=i+1 m i m j and Hessian J is defined in Butler and Wood (2005).Shimizu and Hashiguchi (2021) showed the following relationship for an m × m matrix B = , where B 1 is an n × n symmetric matrix and O is the zero matrix.From ( 5), the joint density (1) can be rewritten by where L = diag(ℓ 1 , . . ., ℓ n , 0, . . ., 0) is the m×m matrix and the symbol "∝" means that a constant required for scaling is removed.Applying Laplace's method to the above joint density, we have the approximation for the joint density of eigenvalues.
Proposition 1.The joint density of eigenvalues of a singular Wishart matrix by Laplace approximation is expressed by Proof.Applying Lemma 1 to the hypergeometric functions in (6), the integral over the Stiefel manifold in ( 1) is approximated by Substituting ( 8) to (1), we have the desired result.
In order to derive the approximate distributions of individual eigenvalues, we define the spiked covariance model ρ k that implies the first k-th eigenvalues of Σ > 0 are infinitely dispersed, namely for k = 1, . . ., n.Under the condition of (9), Takemura and Sheena (2005) proved that the distribution of individual eigenvalues for a non-singular Wishart matrix is approximated by a chisquare distribution.The improvement for that approximation, that is, when the condition listed in (9) cannot be assumed, was discussed in Tsukada and Sugiyama (2021).The following lemma was provided by Takemura and Sheena (2005) and Nasuda et al. ( 2022) in the non-singular case and could be easily extended to the singular case.
, where m > n and ℓ 1 , ℓ 2 , . . ., ℓ n be the eigenvalues of W. If in the sense that ∀ǫ > 0, ∃δ > 0, From Proposition 1 and Lemma 2, we obtain the chi-square approximation that is the main result of this paper.
Theorem 1.Let W ∼ W m (n, Σ), where m > n and ℓ 1 , ℓ 2 , . . ., ℓ n be the eigenvalues of W. If ρ n → 0, it holds that where χ 2 is a chi-square distribution with n−i+1 degrees of freedom and the symbol " d →" means convergence in the distribution.
Proof.First, we rewrite the approximate distribution (7) as From Lemma 2, we have j , the joint density (7) can be written as where g n−i+1 (•) is the density function of the chi-square distribution with degree of freedom n − i + 1.
Remark 1.Note that if only the first k population eigenvalues are infinitely dispersed, that is ρ k → 0, ℓ k /λ k can be approximated by g n−k+1 similar to the proof in Theorem 1.
In the context of the High Dimension-Low Sample Size (HDLSS) setting, the asymptotic behavior of the eigenvalue distribution of a sample covariance matrix was discussed in Ahn et al. (2007), Jung and Marron (2009), and Bolivar-Cime and Perez-Abreu (2014).Jung and Marron (2009) showed that the spiked sample eigenvalues are approximated by the chi-square distribution with a degree of freedom n.In contrast, Theorem 1 gives the approximation of the distribution of ℓ k by a chi-square distribution with varying degrees of freedom.

Application to test for equality of the individual eigenvalues
This section discusses testing for equality of individual eigenvalues of the covariance matrix in two populations.For testing problems, we give the approximate distribution of the statistic based on the derived results from the previous section.
Let an m × n i Gaussian random matrix X (i) be distributed as , where Σ (i) > 0 and i = 1, 2. The eigenvalues of Σ (i) are denoted by For fixed k, we consider the test of the equality of the individual eigenvalues in two populations as Sugiyama and Ushizawa (1998) reduced (10) to the equality of variance test for the principal components and proposed a testing procedure using the Ansari-Bradley test.Takeda (2001) proposed the test statistic ℓ (1)  k /ℓ (2)  k with n ≥ m for (10) and derived the exact distribution of ℓ (1)  1 /ℓ (2)  1 .Since Johnstone (2001) indicated that the first few eigenvalues are very large compared to the others in the large dimensional setting, it is essential to understand how the distribution for the first few eigenvalues is constructed.We provide the exact density function of ℓ (1)  1 /ℓ (2)  1 with n < m in the same way as Takeda (2001).
As the dimension increases, it is difficult to perform the numerical computation of (11) due to the high computational complexity.From Theorem 1, we give the approximate distribution for (11) by F-distribution.

Simulation study
We investigate the accuracy of the approximation for the derived distributions.In the simulation study, we consider the following population covariance matrix; where a, b > 0. In the large-dimensional setting, mainly the accuracy of the approximate distributions for the largest and second eigenvalues was investigated; see  1 and 2 shows the α-percentile points of the distributions of ℓ 1 and ℓ 2 for m = 50 and n = 10, respectively.From the simulation study, we know that sufficient accuracy of approximation for the largest eigenvalue has already been obtained in Case 2. Case 1 is more accurate than Case 2 for the second eigenvalue.It is seen that the desired accuracy can be achieved when the parameter ρ k is small.3 and 4 present the chi-square probabilities for Case 1 in the upper percentile points from the empirical distribution.In this simulation study, we set m = 20, 30, 40, 100 and n = 5, 15.We can observe that all probabilities are close to α.  (1)  1 /ℓ (2)  1 for n i = 10 (i = 1, 2) and m = 30 in Case 2. The vertical line and histograms show the empirical distribution of the ℓ (1)  1 /ℓ (2)  1 based on 10 6 iteration, respectively.The solid line is the density function of the F distribution.From the 95% points of F sim , we can confirm that the approximate probability is 0.950.

Fig. 1 :
Fig. 1: n i = 10 (i = 1, 2) and m = 30 Sugiyama et al. (2013)dSugiyama et al. (2013).We set (a, b) = (200, 3) as Case 1 and (a, b) = (50, 3) as Case 2. These two cases imply that the population covariance matrix has two spiked eigenvalues.Parameter ρ k in (9) is smaller in Case 1 than in Case 2. We denote F 1 (x) and F 2 (x) as the chi-square distributions with n and n − 1 degrees of freedom, which are the approximate distributions of the largest and second eigenvalues, respectively.The empirical distribution based on 10 6 Monte Carlo simulations is denoted by F sim .Tables

Table . 3
: Approximate probabilities of ℓ 1 based on the empirical percentile points (Case 1)

Table . 4
: Approximate probabilities of ℓ 2 based on the empirical percentile points (Case 1) Finally, we give the graph of the density of F distribution in Corollary 1 compared to the empirical distribution function.In Fig 1, we superimpose the graph of F approximation with the histogram of ℓ