Abstract
We proved the local Marchenko–Pastur law for sparse sample covariance matrices that corresponded to rectangular observation matrices of order with (where ) and sparse probability (where ). The bounds of the distance between the empirical spectral distribution function of the sparse sample covariance matrices and the Marchenko–Pastur law distribution function that was obtained in the complex domain with (where ) were of order and the domain bounds did not depend on while .
MSC:
60F99; 60B20
1. Introduction
The random matrix theory (RMT) dates back to the work of Wishart in multivariate statistics [1], which was devoted to the joint distribution of the entries of sample covariance matrices. The next RMT milestone was the work of Wigner [2] in the middle of the last century, in which the modelling of the Hamiltonian of excited heavy nuclei using a large dimensional random matrix was proposed, thereby replacing the study of the energy levels of nuclei with the study of the distribution of the eigenvalues of a random matrix. Wigner studied the eigenvalues of random Hermitian matrices with centred, independent and identically distributed elements (such matrices were later named Wigner matrices) and proved that the density of the empirical spectral distribution function of the eigenvalues of such matrices converges to the semicircle law as the matrix dimensions increase. Later, this convergence was named Wigner’s semicircle law and Wigner’s results were generalised in various aspects.
The breakthrough work of Marchenko and Pastur [3] gave impetus to new progress in the study of sample covariance matrices. Under quite general conditions, they found an explicit form of the limiting density of the expected empirical spectral distribution function of sample covariance matrices. Later, this convergence was named the Marchenko–Pastur law.
Sample covariance matrices are of great practical importance for the problems of multivariate statistical analysis, particularly for the method of principal component analysis (PCA). In recent years, many studies have appeared that have connected RMT with other rapidly developing areas, such as the theory of wireless communication and deep learning. For example, the spectral density of sample covariance matrices is used in calculations that relate to multiple input multiple output (MIMO) channel capacity [4]. An important object of study for neural networks is the loss surface. The geometry and critical points of this surface can be predicted using the Hessian of the loss function. A number of works that have been devoted to deep networks have suggested the application of various RMT models for Hessian approximation, thereby allowing the use of RMT results to reach specific conclusions about the nature of the critical points of the surface.
Another area of application for sample covariance matrices is graph theory. The adjacency matrix of an undirected graph is asymmetric, so the study of its singular values leads to sample covariance matrices. An example of these graphs is the bipartite random graph, the vertices of which can be divided into two groups in which the vertices are not connected to each other.
If we assume that the probability of having graph edges tends to zero as the number of vertices n increases to infinity, we arrive at the concept of sparse random matrices. The behaviour of the eigenvalues and eigenvectors of a sparse random matrix significantly depends on its sparsity and results that are obtained for non-sparse matrices cannot be applied. Sparse sample covariance matrices have applications in random graph models [5] and deep learning problems [6] as well.
Sparse Wigner matrices have been considered in a number of papers (see [7,8,9,10]), in which many results have been obtained. With the symmetrisation of sample covariance matrices, it is possible to apply these results when observation matrices are square. However, when the sample size is greater than the observation dimensions, the spectral limit distribution has a singularity at zero, which requires a different approach. The spectral limit distribution of sparse sample covariance matrices with a sparsity of (where was arbitrary small) was studied in [11,12]. In particular, a local law was proven under the assumption that the matrix elements satisfied the moment conditions . In this paper, we considered a case with a sparsity of for and assumed that the matrix element moments satisfied the conditions and for .
2. Main Results
We let , where . We considered the independent and identically distributed zero mean random variables , and with and and an independent set of the independent Bernoulli random variables , and with . In addition, we supposed that as . In what follows, we omitted the index n from when this would not cause confusion.
We considered a sequence of random matrices:
Denoted by , the singular values of and the symmetrised empirical spectral distribution function (ESD) of the sample covariance matrix were defined as:
where stands for the event A indicator.
We let and be the symmetrised Marchenko–Pastur distribution function with the density:
where and . We assumed that for . When the Stieltjes transformation of the distribution function was denoted by and the Stieltjes transformation of the distribution function was denoted by , we obtained:
We also put:
In this paper, we proved the so called local Marchenko–Pastur law for sparse covariance matrices. We let:
For a constant , we defined the value . We assumed that a sparse probability of and that the moments of the matrix elements satisfied the following conditions:
- Condition : for and , we have
- Condition : for , we have
- Condition : a constant exists, such that for all and , we have
We introduced the quantity with a positive constant . We then introduced the region:
For constants and V, we defined the region:
Next, we introduced some notations. We let:
We introduced the quantity:
and put:
We stated the improved bounds for and put:
Theorem 1.
Assuming that the conditions – are satisfied. Then, for any the positive constants , and exist, such that for :
We also proved the following result.
Theorem 2.
Under the conditions of Theorem 1 and for , the positive constants , , and exist, such that for :
2.1. Organisation
The paper is organised as follows. In Section 3, we state Theorems 3–5 and several corollaries. In Section 4, the delocalisation is considered. In Section 4, we prove the corollaries that were stated in Section 3. Section 6 is devoted to the proof of Theorems 3–5. In Section 7, we state and prove some auxiliary results.
2.2. Notation
We use C for large universal constants, which may be different from line to line. and denote the Stieltjes transformations of the symmetrised Marchenko–Pastur distribution and the spectral distribution function, respectively. denotes the resolvent matrix. We let , , and . We consider the -algebras , which were generated by the elements of (with the exception of the rows from and the columns from ). We write instead of and instead of for brevity. The symbol denotes the matrix , from which the rows with numbers in and columns with numbers in were deleted. In a similar way, we denote all objects in terms of , such that the resolvent matrix is , the ESD Stieltjes transformation is , , etc. The symbol denotes the conditional expectation with respect to the -algebra and denotes the conditional expectation with respect to -algebra . We let and .
3. Main Equation and Its Error Term Estimation
Note that is the ESD of the block matrix:
where is a matrix with zero elements.
We let be the resolvent matrix of :
By applying the Schur complement, we obtained:
This implied:
For the diagonal elements of , we could write:
for and:
for . The correction terms for and for were defined as:
and
We let be positive constant V, depending on . The exact values of these constants were defined as below. For , we defined as:
Remembering that:
and:
We defined:
For any value, the constant existed, such that:
It could be , for example. In what follows, we assumed that and V were chosen so that (6) was satisfied and we wrote:
We defined:
where
In this section, we demonstrate the following results.
Theorem 3.
Under the condition , the positive constants , and exist, such that for :
where
Remark 1.
Theorem 3 was auxiliary. was the perturbation of the main equation in the Stieltjes transformation of the limit distribution. The size of was responsible for the stability of the solution of the perturbed equation. We were interested in the estimates of that were uniform in the domain and had an order of (such estimates were needed for the proof of the delocalisation of Theorem 6). It was important to know to what extent the estimates depended on both and . The estimates behaved differently on the beam and at the ends of the support of the limit distribution (the introduced functions and were responsible for the behaviour of the estimates, depending on the real part of the argument: on the beam or at the ends of the support of the limit distribution). For estimation, there were two regimes: for , we used the inequality (10) and for , we used the inequality (18).
Corollary 1.
Under conditions of Theorem 3, the following inequalities hold:
and
Corollary 2.
Under the conditions of Theorem 3 and in the domain:
for any , a constant C exists that depends on Q, such that:
Moreover, for to satisfy and and for , a constant C exists that depends on Q, such that:
Corollary 3.
Under the conditions of Theorem 3, for , a constant C that depends on Q exists, such that:
Theorem 4.
Under the conditions of Theorem 1, for , the positive constants and exists, such that for :
Moreover, for , the positive constants , and exist, such that for satisfying and :
where
To prove the main result, we needed to estimate the entries of the resolvent matrix.
Theorem 5.
Under the condition and for and , the constants , , , and exist, such that for and , we have:
where
Corollary 4.
Under the conditions of Theorem 5, for and , a constant H exists, such that for :
4. Delocalisation
In this section, we demonstrate some applications of the main result. We let and be orthogonal matrices from the SVD of matrix s.t.:
where and . Here and in what follows, denotes a matrix with zero entries. The eigenvalues of matrix are denoted by ( for , for and for ). We let be the eigenvector of matrix , corresponding to eigenvalue , where .
We proved the following result.
Theorem 6.
Under the conditions –, for , the positive constants and exist, such that:
Moreover, for , we have:
Proof.
First, we noted that according to [13] based on [14] and Theorem 1, exists, such that:
Furthermore, by Lemma 11, we obtained:
where
We noted that:
and
These implied that:
We chose . Then, by Corollary 4, we obtained:
We obtained the bounds for in a similar way. Thus, the theorem was proven. □
5. Proof of the Corollaries
5.1. The Proof of Corollary 4
Proof.
We could write:
Combining this inequality with , we found that:
By applying Theorem 5, we obtained what was required.
Thus, the corollary was proven. □
5.2. The Proof of Corollary 2
Proof.
We considered the domain . We noted that for , we obtained:
and
First, we considered the case . This inequality implied that:
From there, it followed that:
Furthermore, for the case , we obtained . We used the inequality:
By Chebyshev’s inequality, we obtained:
By applying Corollary 1, we obtained:
where
First, we noted that for :
Moreover, for :
From there, it followed that:
Furthermore:
Using these estimations, we could show that:
By choosing and , we obtained:
Then, we considered the case . In this case:
By applying the inequality and Corollary 1, we obtained:
It was then simple to show that:
Thus, the first inequality was proven. The proof of the second inequality was similar to the proof of the first. We had to use the inequality:
which was valid on the real line, instead of , which held in the domain . Moreover, we noted that for any z value, we obtained:
Thus, the corollary was proven. □
5.3. Proof of Corollary 3
Proof.
According to Theorem 4:
We noted that for :
Furthermore:
We split the interval into subintervals by , such that for :
We noted that the event implied the event . From there, for , , we obtained:
□
6. Proof of the Theorems
6.1. Proof of Theorem 1
Proof.
We obtained:
The second term in the RHS of the last inequality was bounded by Corollary 3. For z (such that ), we used the inequality:
the inequality:
and the Markov inequality. We could write:
We recalled that in the case :
In the case and using Corollary 1, we obtained:
First, we considered the case . By our definition of , we obtained:
This inequality completed the proof for .
We then considered . We used inequality and Corollary 1 to obtain:
By choosing a sufficiently large K value, we obtained the proof. Thus, the theorem was proven. □
6.2. Proof of Theorem 2
Proof.
The proof of Theorem 2 was similar to the proof of Theorem 1. We only noted that inequality:
held for all . □
6.3. The Proof of Theorem 5
Proof.
Using the definition of the Stieltjes transformation, we obtained:
and
It is also well known that for :
and
We considered the following event for and :
We set:
For , and u, we obtained:
We recalled:
Then:
We introduced the events:
It was easy to see that:
In what follows, we used .
Equations (4) and (5) and Lemma 10 yielded that for and for that satisfied , the following inequalities held:
and ,
We noted that for and under appropriate and , we obtained .
We considered the off-diagonal elements of the resolvent matrix. It could be shown that for :
for :
and
where
Inequalities (21) and (22) implied that:
for and and that:
for and . Equations (23)–(25) produced:
for and:
for . Similarly, we obtained:
and
We noted that for , we obtained:
Using Rosenthal’s inequality, we found that:
for and that:
for . We noted that:
Using Chebyshev’s inequality, we obtained:
By applying the triangle inequality to the results of Lemmas (1)–(3) (which were the property of the multiplicative gradient descent of the resolvent matrix), we arrived at the inequality:
When we set , and and took into account that and , then we obtained:
Moreover, the constant c could be made arbitrarily large. We could obtain similar estimates for the quantities of , , , , . Inequalities (27) and (28) implied:
The last inequalities produced:
We noted that for . So, by choosing c large enough, we obtained:
This completed the proof of the theorem. □
6.4. The Proof of Theorem 3
Proof.
First, we noted that for , a constant exists, such that:
Without a loss of generality, we could assume that . We recalled that:
Then:
We considered the smoothing of the indicator :
We noted that:
where, as before:
We estimated the value:
It was easy to see that:
To estimate , we used the approach developed in [15], which refers back to Stein’s method. We let:
We set:
Then, we could write:
The equality:
implied that a constant C exists that depends on in the definition of , such that:
We considered:
Then:
By the definition of , we could rewrite the last inequality as:
We set:
where
We obtained:
and this yielded:
Then, we used:
Further, we considered:
We noted that by the Jensen inequality, for :
We represented in the form:
where
Since , we found:
From there, it was easy to obtain:
6.4.1. Estimation of
Using the representation of , we could write:
where
By Hölder’s inequality:
Further:
We obtained:
In the case , we obtained:
This implied that:
Furthermore, in the case and , we obtained:
This implied that:
For , we could write:
Using this, we concluded that:
By applying Lemmas 2 and 3, we obtained:
Hölder’s inequality and (35) produced:
6.4.2. Estimation of
We noted that:
Using Hölder’s inequality and Cauchy’s inequality, we obtained:
By applying Lemmas 2, 3 and 5, we obtained:
6.4.3. Estimation of
Using Taylor’s formula, we obtained:
where is uniformly distributed across the interval and the random variables are independent from each other. Since yields , we found that:
Taking into account the inequality:
we obtained:
By applying Hölder’s inequality, we obtained:
Jensen’s inequality produced:
To estimate , we had to obtain the bounds for:
Using Cauchy’s inequality, we obtained:
where
6.4.4. Estimation of
Lemma 2 produced:
and, in turn, Lemma 3 produced:
By summing the obtained estimates, we arrived at the following inequality:
6.4.5. Estimation of
We considered . Since and , we obtained:
Further, we noted that:
Then:
We obtained:
Then, we returned to the estimation of . Equality (41) implied:
We could rewrite this as:
First, we found that:
and
We noted that:
It was straightforward to see that:
This bound implied that:
Further, since:
we could write:
By combining the estimates that were obtained for , we concluded that:
We noted that:
Then, Inequality (42) yielded:
We rewrote this as:
where
6.4.6. Estimation of
We recalled that:
By applying:
we obtained:
The last inequality produced:
We put:
and
By applying Lemma 5, we obtained:
Finally, using Lemma 6, we obtained:
Using:
we could write:
6.5. The Proof of Theorem 4
Proof.
We considered the case , where
For z, we obtained:
This implied that the constant exists, depending on , such that:
First, we considered the case . Without a loss of generality, we assumed that , where is the constant in the definition of . This meant that . Furthermore:
and
Using Theorem 3, we obtained:
We let:
The analysis of for .
- The bound of . By the definition of and , we obtained:
- The bound of . By the definition of , we obtained:For this, we used .
- The bound for . By the definition of , we obtained:
- The bound of . Simple calculations showed that:
- The bound of . We noted that:From there and from the definition of , it followed that:
- The bound of . Simple calculations showed that:
We defined:
By combining all of these estimations and using:
we obtained:
For (such that ), we could write:
Then, we considered . In this case, we used the inequality:
In what follows, we assumed that .
The bound of for .
- By the definition of , we obtained:We could obtain from this that, for sufficiently small values:
- We noted that . This immediately implied that:
- We noted that for , we obtained:andFrom there, it followed that:
- Simple calculations showed that:
- Simple calculation showed that:
- It was straightforward to check that:
By applying the Markov inequality for , we obtained:
On the other hand, when , we used the inequality:
By applying the Markov inequality, we obtained:
This implied that:
We noted that for and that for :
On the other hand:
We chose , such that:
It was enough to put . We let . For , we defined:
and . We noted that and that:
We started with . We noted that:
This implied that:
From there, it followed that:
By repeating this procedure and using the union bound, we obtained the proof.
Thus, Theorem 4 was proven. □
7. Auxiliary Lemmas
Lemma 1.
Under the conditions of Theorem, for and , we have:
Proof.
For simplicity, we only considered the case and . We noted that:
By applying Schur’s formula, we obtained:
The second inequality was proven in a similar way. □
Lemma 2.
Under the conditions of Theorem 5, for all , the following inequalities are valid:
and
In addition, for , we have:
and for , we have:
Proof.
For simplicity, we only considered the case and . The first two inequalities were obvious. We only considered . By applying Rosenthal’s inequality, for , we obtained:
We recalled that:
and under the conditions of the theorem:
By substituting the last inequality into Inequality (44), we obtained:
The second inequality could be proven similarly. □
Lemma 3.
Under the conditions of the theorem, for all , the following inequalities are valid:
and
In addition, for , we have:
and for , we have:
Proof.
It sufficed to apply the inequality from Corollary 1 of [16]. □
We recalled the notation:
Lemma 4.
Under the conditions of the theorem, the following bounds are valid:
and
Proof.
We considered the equality:
It implied that:
Further, we noted that for a sufficiently small value, a constant H existed, such that:
Hence:
It was easy to see that:
We introduced the events:
It was obvious that:
Consequently:
Further, we considered . We obtained:
Then, it followed that:
Next, the following inequality held:
Under the condition and the inequality , we obtained the bounds:
By applying Lemmas 2 and 3, for the first term on the right side of (48), we obtained:
This completed the proof of Inequality (45).
Furthermore, by using representation (47), we obtained:
By applying Lemmas 2 and 3, we obtained:
By applying Young’s inequality, we obtained the required proof. Thus, the lemma was proven. □
Lemma 5.
Under the conditions of the theorem, we have:
Proof.
We set . Using Schur’s complement formula:
Since was measurable with respect to , we could write:
We introduced the notation:
In this notation:
We noted that:
Since:
Theorem 5 produced:
Similarly, for the moment of , we obtained the following estimate:
From the above estimates and Lemma 4, we concluded that:
Thus, the lemma was proven. □
Lemma 6.
Under the conditions of the theorem, for , we have:
Proof.
We used the representation:
We noted that by using Rosenthal’s inequality:
Similarly, for the second moment of , we obtained the following estimate:
From the estimates above and Lemma 4, we concluded that:
Lemma 7.
For , the following inequality holds:
Proof.
We noted that:
and
It was easy to show that for :
Indeed:
The last expression was not positive for . From the negativity of the real part, it followed that:
This implied the required proof. Thus, the lemma was proven. □
Lemma 8.
There is an absolute constant , such that for :
and that for to satisfy and , the following inequality is valid:
Proof.
We changed the variables by setting:
and
In this notation, we could rewrite the main equation in the form:
It was easy to see that:
Then, it sufficed to repeat the proof of Lemma B.1 from [17]. We noted that this lemma implied that Inequality (50) held for all w with (and, therefore, for all z) and that Inequality (49) satisfied for w. From this, we concluded that Inequality (49) held for , such that for a sufficiently small constant .
Thus, the lemma was proven. □
Lemma 9.
For , we have:
and
where
Proof.
First, we noted that:
Using this, we could write:
From there, it followed that:
This implied that:
Equality (51) yielded:
Thus, the lemma was proven. □
Lemma 10.
A positive absolute constant B exists, such that:
and
Proof.
First, we considered . Then, for :
In the case , we obtained:
we then considered the case :
To prove the second inequality, we considered the equality:
Thus, the lemma was proven. □
We let be a rectangular matrix with . We let be the singular values of matrix . The diagonal matrix with was denoted by . We let be an matrix with zero entries. We put and . We let and be orthogonal (Hermitian) matrices, such that the singular value decomposition held:
Furthermore, we let be the identity of an matrix and . We introduced the matrices and . We noted that and . We introduced the matrix We considered the matrix . We then obtained the following:
Lemma 11.
Proof.
The proof followed direct calculations. It was straightforward to see that:
Furthermore:
□
8. Conclusions
In this work, we obtained results by assuming that the conditions – were fulfilled. The condition was of a technical nature. In our investigation on the asymptotic behaviour of the Stieltjes transformation on a beam, this restriction could be eliminated. However, this was a technically cumbersome task that requires separate consideration.
Author Contributions
Writing—original draft, A.N.T. and D.A.T. All authors have read and agreed to the published version of the manuscript.
Funding
This research received no external funding.
Institutional Review Board Statement
Not applicable.
Informed Consent Statement
Not applicable.
Data Availability Statement
Not applicable.
Acknowledgments
The authors wish to thank F. Götze for the several fruitful discussions on this paper.
Conflicts of Interest
The authors declare no conflict of interest.
References
- Wishart, J. The generalised product moment distribution in samples from a normal multivariate population. Biometrika 1928, 20A, 32–52. [Google Scholar] [CrossRef] [Green Version]
- Wigner, E.P. Characteristic vectors of bordered matrices with infinite dimensions. Ann. Math. 1955, 62, 548–564. [Google Scholar] [CrossRef]
- Marchenko, V.A.; Pastur, L.A. Distribution of eigenvalues for some sets of random matrices. Mat. Sb. 1967, 72, 507–536. [Google Scholar]
- Telatar, E. Capacity of multi-antenna Gaussian channels. Eur. Trans. Telecomm. 1999, 10, 585–595. [Google Scholar] [CrossRef]
- Newman, M. Random graphs as models of networks. In Handbook of Graphs and Networks; Bornholdt, S., Schuster, H.G., Eds.; Wiley-VCH: Hoboken, NJ, USA, 2002; pp. 35–68. [Google Scholar]
- Granziol, D. Beyond Random Matrix Theory for Deep Networks. arXiv 2021, arXiv:2006.07721v2. [Google Scholar]
- Erdős, L.; Knowles, A.; Yau, H.-T.; Yin, J. Spectral statistics of Erdős–Rényi graphs I: Local semicircle law. Ann. Probab. 2013, 41, 2279–2375. [Google Scholar] [CrossRef] [Green Version]
- Erdős, L.; Knowles, A.; Yau, H.-T.; Yin, J. Spectral statistics of Erdős-Rényi graphs II: Eigenvalue spacing and the extreme eigenvalues. Comm. Math. Phys. 2012, 314, 587–640. [Google Scholar] [CrossRef] [Green Version]
- Huang, J.; Landon, B.; Yau, H.-T. Bulk universality of sparse random matrices. J. Math. Phys. 2015, 56, 123301. [Google Scholar] [CrossRef]
- Huang, J.; Yau, H.-T. Edge Universality of Sparse Random Matrices. arXiv 2022, arXiv:2206.06580. [Google Scholar]
- Lee, J.O.; Schnelli, K. Tracy–Widom distribution for the largest eigenvalue of real sample covariance matrices with general population. Ann. Appl. Probab. 2016, 26, 3786–3839. [Google Scholar] [CrossRef] [Green Version]
- Hwang, J.Y.; Lee, J.O.; Schnelli, K. Local law and Tracy–Widom limit for sparse sample covariance matrices. Ann. Appl. Probab. 2019, 29, 3006–3036. [Google Scholar] [CrossRef] [Green Version]
- Götze, F.; Tikhomirov, A.N. On the largest and smallest singular values of sparse rectangular random matrices. Electron. J. Probab. 2021; submitted. [Google Scholar]
- Rudelson, M.; Vershynin, R. Smallest singular value of a random rectangular matrix. Comm. Pure Appl. Math. 2009, 62, 1707–1739. [Google Scholar] [CrossRef] [Green Version]
- Götze, F.; Naumov, A.A.; Tikhomirov, A.N. On the local semicircular law for Wigner ensembles. Bernoulli 2018, 24, 2358–2400. [Google Scholar] [CrossRef] [Green Version]
- Götze, F.; Naumov, A.A.; Tikhomirov, A.N. Moment inequalities for linear and nonlinear statistics. Theory Probab. Appl. 2020, 65, 1–16. [Google Scholar] [CrossRef]
- Götze, F.; Naumov, A.A.; Tikhomirov, A.N. Local Semicircle Law under Moment Conditions. Part I: The Stieltjes Transform. arXiv 2016, arXiv:1510.07350v4. [Google Scholar]
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).