Abstract
We are interested in an n by p matrix where the n rows are strictly stationary -mixing random vectors and each of the p columns is an independent and identically distributed random vector; goes to infinity as , satisfiying , where , . We obtain a logarithmic law of using the Chen–Stein Poisson approximation method, where denotes the sample correlation coefficient between the ith column and the jth column of .
MSC:
60F15
1. Introduction
Random matrix theory is being used in variety of fields, from physics to various areas of mathematics. Ref. [1] test the structure of the regression coefficient matrix under the multivariate linear regression model using the LRT statistic. The correlation coefficient matrix holds significance as a crucial statistic in the multivariate analysis, and its maximum likelihood estimator is the sample correlation matrix. Consider an n by p matrix , , representing observations from a specific multivariate distribution. It has an unknown mean , unknown correlation coefficient matrix , and unknown covariance matrix .
This paper shows the logarithmic law for the largest entries of a sample correlation matrices while an -mixing assumption holds. This study is the promotion of the statistical hypothesis testing problem that [2] analyzed. Ref. [2] considered the statistical test as the sample size n and the sample dimension p are both large; the null hypothesis is , where is the identity matrix. The null hypothesis of [2] postulates that the components of are uncorrelated and, in the case of , have a p-variate normal distribution. Ref. [2]’s test statistic is
where
is the Pearson correlation coefficient between and . Then, is the sample correlation matrix generated by .
Let ; let , . We have
where , and represent the Euclidean norm in .
Ref. [2] proved the following theorem, focusing on the test statistic as and is a set of i.i.d. random variables where the n rows are observations from a certain multivariate distribution and each of the p columns is an n observation from a variable of the population distribution.
Theorem 1.
Suppose that be i.i.d random variables. Let be an matrix. For any , . If , then
Theorem 2.
Suppose that be i.i.d. random variables. Let be an matrix. For some , . If , then, for any ,
as , where .
Subsequently, many scholars consider the limiting property of the largest entries of the sample correlation matrices under the weaker moment conditions. Ref. [3] showed that the moment condition in Theorem 2 could be as under . Ref. [4] showed that Theorem 2 holds under the moment condition as . Ref. [5] improved the moment condition and obtained the limit theorems of . Refs. [6,7] showed the results as bounded from zero to infinity under a more relaxed moment assumption. Meanwhile, p reaches infinity as ; some scholars also pursue the limiting property of the largest entries of the sample correlation matrices, they are interested in the relationship of the sample dimension p and the sample size n. Ref. [8] obtained the limit theorem as for without the Gaussian assumption. Most of the work is based on the assumption of sample independence; it is reasonable to assume the independence of samples, but it is difficult to verify the independence of the samples. Therefore, it is necessary to study the largest entries of the sample correlation matrices under mixing assumptions. Ref. [9] showed the asymptotic distribution of under a -mixing assumptions. Ref. [10] showed the asymptotic distribution of under an -mixing assumption. Under the dependent case, the logarithmic law for remains largely unknown.
We will establish a logarithmic law of under an -mixing assumption. Let be a random variables sequence on probability space . Let represent the -field generated by the random variables . For any two -fields , set . The strong mixing coefficients of are defined as: , if , is called -mixing (see [11]).
The first section introduces the background of, and the motivation for, this study. In Section 2, we show the main result of this paper. In Section 3, we introduce notations and present some classical or elementary facts, which we include for easier referencing. The proof of the main theorem is discussed in Section 4 and is the main novel ingredient of the paper. In Section 5, we present the significance of the main result and its applications.
2. Main Result
Assumption 1.
Let be a random vector sequence, , and assume be a strictly stationary α-mixing random vector sequence, satisfied with .
Assumption 2.
Let be a sequence of random vector, assume be an i.i.d. random vector sequence.
Remark 1.
For , is independent. Let be a random sampling of ; it is reasonable to suppose is independent. Therefore, Assumption 2 can reasonably obtain the logarithmic law of .
Theorem 3.
Under the Assumptions 1 and 2, let , and define . Suppose that , and , where , . , for some . Then,
Corollary 1.
Suppose that are i.i.d. random variables, . Let be an matrix, let , and define . Suppose that , where , . , for some . Then,
Remark 2.
Theorem 3 considered , where . Theorems 1 and 2 only considered the case . In Theorem 3, the n rows of are strictly stationary α-mixing random vectors. This case is more complex than the i.i.d. case considered in Theorems 1 and 2 because of the dependence. Under the i.i.d. assumption, we obtained Corollary 1, and generalized the Theorem 1 from the same order n and to ; the moment condition of Corollary 1 is weaker than Theorem 1.
3. Preliminaries
The proofs of the main result are intricate and complex. We will gather and establish several technical tools that contribute to the proof of the main result. The subsequent lemmas play a crucial role in validating our findings. Lemma 1 could be seen from [12].
Lemma 1.
Suppose X and Y are random variables taking their values in Borel spaces , , respectively; suppose U is a uniform- random variable independent of . Assume N is a positive integer, is a measurable partition of . There exists an -valued random variable ; f is a measurable function from into , such that
(i) is independent of X,
() the probability distribution of and Y on are identical,
() and Y are not from the same .
Lemma 2.
Let be a strictly stationary α-mixing sequence of random variables: . If , for some , for some , there exists such that
Proof of Lemma 2.
Let , where . Then, we have . Using Theorem 1 of [13], we have
Then, we can obtain
□
Lemma 3.
Assume is a sequence of α-mixing random variables, , and , , . Then
Proof of Lemma 3.
See Corollary 6.1 in [11]. □
Lemma 4.
For any independent random variables sequence with and finite variance, there exists an independent normal variables sequence with , , such that
for all and , whenever , . Here, A is a universal constant.
Proof of Lemma 4.
See [14]. □
Lemma 5.
Let be an independent symmetric random variables sequence and . Then, there exist positive numbers and depending only on j for each integer , for all ,
Proof of Lemma 5.
See [15]. □
Lemma 6.
Let be an independent random variables sequence. . For ,
Proof of Lemma 6.
See [16]. □
Lemma 7.
Let be an independent random variables sequence and for any , ; then, there exists a positive constant depending only on g, such that
Proof of Lemma 7.
See [17]. □
Lemma 8.
If is an i.i.d. random variables sequence with , , , and . Then,
Proof of Lemma 8.
See [16]. □
The following is the Chen–Stein method, as shown in [18].
Lemma 9.
Let be a random variables sequence on an index set and is a set of subsets of , then for each ,. For any , let . Then,
where
and is the algebra generated by . If is independent of for each α, then vanishes.
Now we define
Define for any square matrix .
Lemma 10.
Proof of Lemma 10.
See [19]. □
4. Proofs
The next lemma refers to Lemma 2 in [20]; we generalize its result to the -mixing condition.
Lemma 11.
Under the Assumptions 1 and 2, let , and be constants.
is the sufficient condition for
Proof of Lemma 11.
Without loss of generality, suppose . Since, for and ,
where . To conclude that the probability on the left-hand side of this inequality is equal to zero, it is sufficient to show that
Let and . Then and . Let g be an integer such that . It is easy to see
We have that
Then, note that is a sequence of α-mixing random variables by Assumption 1. Using Lemma 2, we obtain
It is easy to see that , ,
where and C are constants; these can be different. We can obtain that
We could estimate for a large k. We have
for all . Hence,
Finally, since , we have
Hence,
Lemma 12.
Under the Assumptions 1 and 2, , . Suppose that , where , . If, for some , , then
as .
Proof of Lemma 12.
It is easy to see that . For any , we know that . We obtain
Note that . By Lemma 11, when , the first and second maximum presented above reach zero. This is true under the assumptions of Theorem 3. Therefore, the first limit is proved. We can obtain the second limit through the first one. Since , we have
the limit that a.s. is proved by the relationship between and the rightmost term in (12). □
Let , where is sufficiently small for .
Let , move close to , . moves close to 0, such that . Let , , , . Let
let , , ,.
Lemma 13.
Under the condition of Theorem 3, let be i.i.d. normal random variables, and variance , . Then
Proof of Lemma 13.
We have , where we could write . Since , for some , it follows that
as . Therefore,
as . By Lemma 3, it follows that
as , where , , . We have
as , where , , . We have that , by Lemma 3, we can obtain
as , where . Hence,
as . Recall that , Conclusively,
where . □
Lemma 14.
Let be i.i.d. normal random variables, and variance . Then,
Proof of Lemma 14.
Choose , set ; we can suppose that . Using Lemma 13, we have , as , where . Then, we can obtain
as n is large, where we use
as (shown in [16]). We define , for any integer ,
where
By (14),
Since , using the Borel–Cantelli lemma,
Now, let us estimate as in (17).
Let partial sums and . Observe that the distribution of is equal to that of for all . Thus, by Lemma 6, we have
as n is sufficiently large, since , where Ottaviani’s inequality in Lemma 6 is used in the last inequality. Note that, for fixed g and t, as n is sufficiently large. From (15), we have
Therefore,
where , since g is chosen such that . Using the Borel–Cantelli lemma again, we obtain
Lemma 15.
Let be i.i.d. normal random variables, and variance . Then,
Proof of Lemma 15.
First prove that
as , for , depending only on t and the distribution of only. For any , set , and , we can suppose . Take an integer g satisfying . Then, we have . Since , using the Borel–Cantelli lemma, we obtain
for any . We can see the definition of in (17), and obtain
Now, we prove (22) using Lemma 9.
Let . Set , , , and . Usnig Lemma 9,
Evidently,
Recall that is a sum of i.i.d. normal random variables. Recall (15). We have
as . Provided that and ,
The two events in (27) are conditionally independent given ’s. and represent the conditional probability and expectation of , respectively. Then, the probability in (27) is
Set
for and . Choose and . Let for . Then, . Using the Chebyshev inequality and Lemma 8,
as , where
Let be an independent copy of . Using (29), for any n that is large enough, we have
by repeating (29). Choose an integer , set . Lemma 5 implies that there are positive constants and , satisfying
Since , . From the equality in (31), we have that
Lemma 16.
Under the condition of Theorem 3, take ; then,
Proof of Lemma 16.
Using Markov inequality and Lemma 2, for , we can obtain
for and sufficiently large q, using the Borel–Cantelli lemma, we have
and using the Markov inequality and Lemma 2, for ,
for sufficiently large q, using the Borel–Cantelli lemma, we can obtain
Hence, we only need to prove
Since Lemma 1, we have an independent random variables sequence , and have same distribution, and we could obtain that . We can prove that
Using the Borel–Cantelli lemma, we only need to prove
Let , , where , is an independent replication of . Thus, we only need to prove
Let be an independent normal random variables sequence with variance . Since Lemma 4, we have that
for sufficiently large q. Using the Borel–Cantelli lemma,
Thus, with Lemmas 14 and 15 and Borel–Cantelli lemma, we could obtain the result. □
Proof of Lemma 3.
We could find in (4). Let , since . Using the triangle inequality and Lemmas 10 and 12, we have that
as n is large enough. Hence,
If , a.s. then , a.s. Hence, in order to prove Theorem 3, we need to show that
Take . We have that
Using Lemma 2, let for any , since , where ; for some , we have
Using the Borel–Cantelli lemma,
To prove , we need to show that
5. Examples
In certain applications, such as the construction of compressed sensing matrices, the means and are given and one is interested in
The corresponding coherence is defined by
Compressed sensing is a rapidly evolving field, aiming to construct measurement matrices that enable the exact recovery of any k-sparse signal from linear measurements using computationally efficient recovery algorithms.
Two commonly employed conditions in compressed sensing are the Restricted Isometry Property (RIP) and the Mutual Incoherence Property (MIP). In this paper, the derived limiting laws can be utilized to assess the likelihood of a random matrix satisfying the MIP condition, as demonstrated by Cai and Jiang [19].
Example 1.
The MIP condition, which is frequently utilized, requires the pairwise correlations among the column vectors of , denoted as , to be small. It has been established that the condition
guarantees the exact recovery of a k-sparse signal β in the absence of noise, where , and enables the stable recovery of a sparse signal in the presence of noise, where . Here, z represents an error vector that may not necessarily be random.
Author Contributions
Writing–original draft, H.Z.; Writing–review & editing, Y.Z. All authors have read and agreed to the published version of the manuscript.
Funding
This work was supported by National Natural Science Foundation of China under Grant [No. 11771178, 12171198]; the Science and Technology Development Program of Jilin Province under Grant [No. 20210101467JC]; and Fundamental Research Funds for the Central Universities.
Data Availability Statement
Data sharing is not applicable to this article as no datasets were generated or analyzed during the current study.
Conflicts of Interest
The authors declare no conflict of interest.
References
- Bai, Y.; Zhang, Y.; Liu, C. Moderate deviation principle for likelihood ratio test in multivariate linear regression model. J. Multivar. Anal. 2023, 194, 105139. [Google Scholar] [CrossRef]
- Jiang, T. The asymptotic distributions of the largest entries of sample correlation matrices. Ann. Appl. Probab. 2004, 14, 865–880. [Google Scholar] [CrossRef]
- Zhou, W. Asymptotic distribution of the largest off-diagonal entry of correlation matrices. Trans. Am. Math. Soc. 2007, 359, 5345–5363. [Google Scholar] [CrossRef]
- Liu, W.; Lin, Z.; Shao, Q. The asymptotic distribution and Berry-Esseen bound of a new test for independence in high dimension with an application to stochastic optimization. Ann. Appl. Probab. 2008, 18, 2337–2366. [Google Scholar] [CrossRef]
- Li, D.; Rosalsky, A. Some strong limit theorems for the largest entries of sample correlation matrices. Ann. Appl. Probab. 2006, 16, 423–447. [Google Scholar] [CrossRef]
- Li, D.; Liu, W.; Rosalsky, A. Necessary and sufficient conditions for the asymptotic distribution of the largest entry of a sample correlation matrix. Probab. Theory Relat. Fields 2010, 148, 5–35. [Google Scholar] [CrossRef]
- Li, D.; Qi, Y.; Rosalsky, A. On Jiang’s asymptotic distribution of the largest entry of a sample correlation matrix. J. Multivar. Anal. 2012, 111, 256–270. [Google Scholar] [CrossRef]
- Shao, Q.; Zhou, W. Necessary and sufficient conditions for the asymptotic distributions of coherence of ultra-high dimensional random matrices. Ann. Probab. 2014, 42, 623–648. [Google Scholar] [CrossRef]
- Liu, W.; Lin, Z. Asymptotic distributions of the largest entries of sample correlation matrices under dependence assumptions. Chin. Ann. Math. Ser. 2008, 29, 543–556. [Google Scholar]
- Zhao, H.; Zhang, Y. The asymptotic distributions of the largest entries of sample correlation matrices under an α-mixing assumption. Acta. Math. Sin.-Engl. Ser. 2022, 38, 2039–2056. [Google Scholar] [CrossRef]
- Lin, Z.; Lu, C. Limit Theory on Mixing Dependent Random Variables; Kluwer Academic Publishers: Dordrecht, The Netherland, 1997. [Google Scholar]
- Bradley, R. Approximation theorems for strongly mixing random variables. Mich. Math. J. 1983, 30, 69–81. [Google Scholar] [CrossRef]
- Kim, T. A note on moment bounds for strong mixing sequences. Stat. Probab. Lett. 1993, 16, 163–168. [Google Scholar] [CrossRef]
- Sakhanenko, A. On the accuracy of normal approximation in the invariance principle. Sib. Adv. Math. 1991, 1, 58–91. [Google Scholar]
- Li, D.; Rao, M.; Jiang, T.; Wang, X. Complete convergence and almost sure convergence of weighted sums of random variables. J. Theoret. Probab. 1995, 8, 49–76. [Google Scholar] [CrossRef]
- Chow, Y.; Teicher, H. Probability Theory, Independence, Interchangeability, Martingales, 2nd ed.; Springer: New York, NY, USA, 1988. [Google Scholar]
- Allan, G. Probability: A Graduate Course; Springer: New York, NY, USA, 2005. [Google Scholar]
- Arratia, R.; Goldstein, L.; Gordon, L. Two moments suffice for Poisson approximations: The Chen-Stein method. Ann. Probab. 1989, 17, 9–25. [Google Scholar] [CrossRef]
- Cai, T.; Jiang, T. Limiting laws of coherence of random matrices with applications to testing covariance structure and construction of compressed sensing matrices. Ann. Stat. 2011, 39, 1496–1525. [Google Scholar] [CrossRef]
- Bai, Z.; Yin, Y. Limit of the smallest eigenvalue of a large-dimensional sample covariance matrix. Ann. Probab. 1993, 21, 1275–1294. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).