Abstract
High-dimensional parameter testing is commonly used in bioinformatics to analyze complex relationships in gene expression and brain connectivity studies, involving parameters like means, covariances, and correlations. In this paper, we present a novel approach for testing U-statistics-type parameters by leveraging jackknife pseudo-values. Inspired by Tukey’s conjecture, we establish the asymptotic independence of these pseudo-values, allowing us to reformulate U-statistics-type parameter testing as a sample mean testing problem. This reformulation enables the use of established sample mean testing frameworks, simplifying the testing procedure. We apply a multiplier bootstrap method to obtain critical values and provide a rigorous theoretical analysis to validate the approach. Simulation studies demonstrate the robustness of our method across a variety of scenarios. Additionally, we apply our approach to investigate differences in the dependency structures of a subset of genes within the Wnt signaling pathway, which is associated with lung cancer.
MSC:
62F03
1. Introduction
With the development of technology, high-dimensional parameter testing is widely used in various bioinformatics analyses. Numerous studies employ mean tests to investigate changes in genes of interest, such as [1,2]. Additionally, some studies use covariance matrix or precision matrix tests to illustrate brain connectivity analysis and apply correlation matrix tests for gene co-expression network analysis, such as [3,4,5]. In this paper, we concentrate on U-statistics type parameter testing, encompassing but not limited to mean, covariance, and correlation tests.
High-dimensional mean tests have been well studied, as reviewed by [6]. We categorize the proposed test statistics into five broad groups. Firstly, -norm-based test statistics, such as [7,8,9,10,11], are known for their effectiveness under dense alternatives. Another group is the -norm-based tests, as evidenced by [12,13,14], which are more suitable for sparse alternatives. Since the alternative hypothesis is usually unknown in advance, a third category of test statistics aims to accommodate diverse alternatives by combining the p-values of tests based on various norms, such as [15,16,17]. Additionally, some tests simplify the problem by projecting the high-dimensional mean vector onto lower dimensions. For instance, some studies explore random projections, such as [18,19], while ref. [20] seeks the optimal projection directions. Most of the tests mentioned earlier impose sparsity conditions on covariance matrices, but dense patterns are common in reality. To address this, studies such as [21,22,23,24] enhance the signal strength and test performance by incorporating common factors.
High-dimensional covariance tests have also achieved significant advancements in recent years. The one-sample covariance test (: ) mainly used methods including spectral norm-based tests, such as [25,26,27], and Frobenius norm-based tests, such as [28,29,30]. For the two-sample covariance test, Frobenius norm-based tests, such as [30,31,32,33], perform well for dense alternatives, while tests based on maximum entry-wise norm, such as [34], show a strong performance for sparse alternatives.
High-dimensional correlation tests have received significant attention. For the one-sample Pearson correlation test, ref. [35] proposes a test suitable for sparse alternatives, while ref. [36] provides a test that is powerful for dense alternatives. For the two-sample correlation test, ref. [37] introduces a test suitable for sparse alternatives, and ref. [38] develops a general framework for testing correlation structures across one, two, and multiple sample scenarios. In addition to Pearson correlation matrix tests, rank-based correlation matrix tests, including Kendall’s tau and Spearman’s rank correlation, have been well studied, as demonstrated by [39,40,41]. Furthermore, ref. [42] proposes a framework for the equality test of U-statistic-based correlation matrices.
For all the aforementioned parameter tests, there exist U-statistic-based test statistics, such as [10,30,33,34,42]. There are also some adaptive and unified tests. Ref. [16] proposes a unified framework for testing high-dimensional parameters that can be estimated by U-statistic-based vectors. Ref. [43] constructs a family of U-statistics as unbiased estimators for -norms of test parameters, further combining the p-values of different orders. Ref. [44] proposes a two-step Gaussian approximation for high-dimensional non-degenerate U-statistics and a bootstrap method for computing their probabilities within hyper-rectangles.
In this paper, we propose a novel approach for U-statistics-type parameters. Inspired by Tukey’s conjecture [45], we establish the asymptotic independence of jackknife pseudo-values for the U-statistic estimator. By constructing test statistics based on the sample means of these pseudo-values, we effectively transfer U-statistic-type testing into the sample mean testing framework. This reformulation allows us to apply established methods from sample mean testing, simplifying the testing procedure. We derive the critical values for our test statistics by applying a multiplier bootstrap to the pseudo-values. In addition, we conduct a comprehensive theoretical analysis of our proposed test, including validation of the multiplier bootstrap procedure for accurate critical value estimation, as well as an assessment of its asymptotic properties, such as size control and power performance.
The rest of this paper is organized as follows: In Section 2, we present the detailed testing procedures. Section 3 verifies the effectiveness of the multiplier bootstrap used in Section 2 and analyzes the theoretical performance of our proposed tests. Section 4 presents simulation results to justify the empirical performance of our methods. In Section 5, we apply our methodology to analyze the dependency differences in the Wnt signaling pathway between lung cancer patients and control patients. Finally, some conclusions and discussions are provided in Section 6.
2. Methodology
Let and be two d -dimensional random vectors independent of each other. are independent and identically distributed (i.i.d.) random samples from with Similarly, are i.i.d. random samples from with . We set , , and
where , with q being the dimensionality of the parameter we are interested in, and is an m-order symmetric kernel function. We assume that is symmetric and that each kernel function is of the same order m only for notational simplicity. We then define two U-statistic based vectors as (1),
We use to denote the expectation of i.e., with for and We are interested in testing the following hypotheses:
- (i)
- (One-sample problem) For a given
- (ii)
- (Two-sample problem)
Intuitively, with the estimations for in (1), the two problems above in (2) and (3) can be treated as the one sample mean and two sample mean tests in high dimensions. However, it is difficult to obtain or calculate the asymptotic distribution for the test statistics based on the U-statistics estimation in (1). Hence, to overcome this obstacle, we propose the test based on the jackknife pseudo-values, which are defined by
where is the U-statistic vector based on , and is the U-statistic vector based on . In [45,46], the jackknife pseudo-values are unbiased estimators of , i.e.,
According to [47], the jackknife pseudo-values are not only uncorrelated but also asymptotically independent. Hence, can be estimated by a sample mean of jackknife pseudo-values,
Further, we provide the variance estimators for as follows:
Remark 1.
The variance estimator in (7) is also the delete-1 jackknife estimator for . As long as the is a smooth function of the observations, this jackknife estimator is consistent.
- Hence, for the one sample testing problem (2), we construct the following test statistic:
Remark 2.
If represents the mean of , the jackknife pseudo-values reduce to the independent sample observations, and the test statistics simplify to the sample mean test in [12,13].
In high dimensions, for the centered independent random vectors , ref. [48] derives that the distribution of can be approximated by the maximum of the sum of the Gaussian random vectors with the same covariance matrices as . Meanwhile, ref. [48] proposes a multiplier bootstrap procedure to obtain the Gaussian random vectors. Motivated by [48], we apply the multiplier bootstrap procedure to the asymptotically independent jackknife pseudo-values . By this procedure, one can approximate the distribution for and in (8) and (9). Specifically, set as a sequence of iid with and . The b-th multiplier bootstrap samples of are . Correspondingly, the b-th multiplier bootstrap sample of is as follows:
Based on , we define the b-th bootstrap sample of the test statistics and by the following:
Based on these multiplier bootstrap samples, the critical value and p-value for and can be estimated by
After obtaining the critical value, we obtain the test for the hypothesis in (2) and (3) as follows:
Thus, we reject (2) if and only if , and reject (3) if and only if . Correspondingly, we can construct the p-value estimator of and as
For a given significance level , we then reject the of (2) if and only if , and reject (3) if and only if .
3. Theoretical Analysis
In this section, we justify the validity of the multiplier bootstrap method used in the last section and study the empirical size and power of our proposed tests. Before presenting the detailed theoretical results, some mild assumptions are introduced.
Assumption 1.
There exists such that . For the one-sample problem, . For the two-sample problem, , where the two sample sizes are comparable .
Next, we will introduce additional notations to present the assumptions for the kernel functions. Specifically, based on , set the centered kernel function and its derivation as follows:
Further, set
Analogously, we can construct vectors and based on .
Assumption 2.
For any indexes and ,
Assumption 3.
There exists a positive constant b, such that and , for any , where .
Assumption 4.
There exists a constant such that holds for all , and .
This assumption specifies the relationship between the sample size n and the parameter dimension q. Assumption 1 permits the parameter dimension q and sample size n go to infinity as long as holds. Additionally, it requires the sample size of each group to be in the same order. Assumption 2 requires that the centered kernel functions and follow a sub-exponential distribution. This assumption is mild, as the bounded (such as useful rank-based U-statistic) satisfies this condition. Assumption 3 excludes the degenerate U-statistics and requires that the inner product of and is not degenerate. Assumption 4 include mild moment conditions. These Assumptions are crucial for the high-dimensional central limiting theorem.
The multiplier bootstrap method plays an important role in our tests. Based on the above assumptions, we justify the validity of the multiplier bootstrap procedure used in Section 2 by the following theorem:
The detailed proof of Theorem 1 is presented in Appendix A.3. This theorem ensures that the distribution of our tests can be well approximated by their corresponding multiplier bootstrap’s distribution under .
Given the pre-specified level , Theorem 2 establishes that the size of and are well under control. We omit the proof procedure, as this theorem can be viewed as a consequence of Theorem 1. Specifically, according to Theorem 1, the distribution of our tests and can be well approximated by that of and . Further, by the Dvoretsky–Kiefer–Wolfowitz inequality of the Massart version, we have
where and are the empirical distributions used to obtain the critical values in the test procedures, while and represent the theoretical distributions in Theorem 1. Thus, Theorem 2 is established by combining these results. In addition to asymptotic size control, we investigate the power properties of and in the subsequent theorem.
Theorem 3.
The detailed proof for this theorem is provided in Appendix A.4. This result demonstrates that, under certain mild conditions, the power of our proposed test converges to 1.
4. Simulation Study
In this section, we conduct comprehensive simulations to investigate the empirical performance of our proposed test. As discussed in Section 2, for the sample mean test our method reduces to what was introduced by [12,13]. Therefore, we focus on testing the Kendall tau correlation matrix and compare our method with several established methods. Specifically, we consider the sample covariance-based methods from [49,50], as well as the U-statistic-based Kendall tau correlation test from [42]. For simplicity, we refer to these methods as CLX, HD, and ZU, respectively, and denote our method as UJB.
We generate two sets of independent random samples, and , where and . Here, , where , and are independent and identically distributed (i.i.d.) random variables with variances . To mimic practical scenarios, we assume are samples from the following three models:
- Model 1 (Gamma Distribution): Let , for .
- Model 2 (Zero-Inflated Poisson Distribution): Let with probability 0.15, and with probability 0.85, for .
- Model 3 (Student’s t Distribution): Let , where is drawn from Unif, for .
Thus, the covariance matrix of is , and the covariance matrix of is . Under the null hypothesis, we assume . We consider the following four covariance structures for :
- Case I (Block): Set , where is a diagonal matrix with i.i.d. entries drawn from Unif. The matrix represents a block correlation structure, with for all k, for (for ), and otherwise.
- Case II (Decay): Set , where .
- Case III (Non-Sparse): Define , where with and . The matrix is diagonal, with entries drawn from Unif, and is uniformly distributed on the Stiefel manifold , i.e., and , where is the identity matrix of dimension (set ).
- Case IV (Long-range dependence): Set , where , with .
Table 1 shows the empirical sizes across Models 1–3 and Cases I–III. Overall, all methods maintain sizes close to the nominal level of , which indicates good control of Type I error rates. In Models 1 and 2, the empirical sizes of the CLX and HD methods remain stable around the nominal level. ZU and UJB also perform well, although UJB slightly exceeds the nominal size in some instances. In Model 3, which employs a heavy-tailed t-distribution, the empirical size of UJB continues to align closely with the nominal level. In contrast, the empirical sizes for CLX and HD are more conservative, highlighting the robustness of UJB in the presence of heavy-tailed data.
Table 1.
Empirical sizes of Model 1–3 with Case I–IV based on 500 replications, with .
Under the alternative hypothesis, we introduce a random symmetric matrix with exactly eight nonzero entries. Among these, four entries are randomly selected from the upper triangle of , with magnitudes generated from the uniform distribution on , where is the maximum diagonal value of . The remaining four entries are determined by symmetry. We then define and , where . These matrices, and , are used to generate samples for and under the alternative hypothesis.
The empirical power performance across Models 1–3 and Cases I-IV is summarized in Table 2. For Models 1 and 2, which involve Gamma and zero-inflated Poisson distributions, the data quickly approximate normality. This allows the CLX and HD methods to perform comparably to UJB and ZU. However, UJB and the rank-based ZU tests demonstrate a slight advantage. In Model 3, where the data follow a heavy-tailed t-distribution, UJB and ZU significantly outperform CLX and HD, highlighting the effectiveness of rank-based correlation tests for heavy-tailed distributions. Across all cases, UJB maintains comparable or slightly superior performance to ZU, highlighting the strength and robustness of our proposed method.
Table 2.
Empirical powers of Model 1–3 with Case I–IV based on 500 replications, with .
5. Real Data Analysis
In this section, we apply our proposed method to explore potential dependency differences within the Wnt signaling pathway, which is associated with lung cancer as well as other cancers, including gastric and breast cancer [51,52,53]. The dataset used for this analysis is publicly available through the Gene Expression Omnibus (GEO) repository (https://www.ncbi.nlm.nih.gov/geo/, accessed on 25 November 2024) under accession number GDS2771. It contains 22,283 microarray-derived gene expression measurements from large airway epithelial cells in 97 patients diagnosed with lung cancer and 90 control patients without lung cancer.
In this study, we focus on a subset of 119 genes within the Wnt signaling pathway. As the number of potential dependencies grows quadratically with the number of genes, reliable statistical inference becomes challenging with a limited sample size. To address this, we apply individual t-tests to each gene to assess differential expression between the two groups. This conservative approach reduces dimensionality while retaining potentially relevant genes for further analysis. Ultimately, we select 32 genes with statistically significant expression differences for the dependency analysis. This selected subset includes genes previously identified as important in lung cancer research, such as WNT1, WNT2, WNT5A [51], FDZ [54], and RHOA [55]. Based on this subset, we use the testing methods outlined in Section 4 to test for the equality in the correlation matrices of these 32 genes between lung cancer patients and control patients. Given that the CLX and HD methods are conservative for heavy-tailed data, we transformed the gene expression levels by applying a logarithmic scale and then standardized each gene within its respective group.
All methods reject the null hypothesis, providing strong evidence against equal dependency structures. Specifically, the p-values for our proposed method (UJB) and the comparison methods (CLX and HD) are 0.019, 0.046, and 0.029, respectively. This finding suggests that the dependency structure within the Wnt signaling pathway is likely distinct between lung cancer patients and controls. Thus, beyond differential gene expression, changes in dependency structures among genes in the Wnt pathway may offer new insights into the mechanisms underlying lung cancer progression. For example, ref. [56] shows that WNT5A–RHOA signaling drives tumorigenesis and represents a therapeutic target in small-cell lung cancer. It also highlights the importance of gene dependency structures in the Wnt pathway for understanding lung cancer progression.
6. Discussion
In this paper, we introduce a novel approach that reformulates the testing of U-statistic-type parameters into a sample mean testing problem by using jackknife pseudo-values. This transformation allows us to apply established methods for sample mean testing, simplifying the test process while maintaining the flexibility and power inherent to U-statistic-based inference. Moreover, we obtain critical values for the test by using multiplier bootstrap and ensuring the validity of this process.
Our simulation study involving samples from various distributions with different covariance structures, demonstrates the robustness and adaptability of the proposed method. Theoretical analysis further confirms the validity of our approach. However, a notable limitation lies in the computational cost of the jackknife procedure, which increases with the sample size. This can become a significant challenge as the number of samples grows too large. Future work could explore strategies to improve computational efficiency or alternative approaches that preserve the statistical accuracy of the test while reducing its computational burden.
In addition to addressing computational challenges, another possible work direction lies in addressing size distortions. Some empirical sizes in Table 1 deviate from the nominal significance level of , particularly for Model 3. To improve the empirical performance, future work could incorporate methods like simulation-based calibration to better approximate null distributions [57], imputation-based techniques for enhanced accuracy [58], or adjustments to test statistics to address structural dependencies [59].
Author Contributions
Methodology, M.Z.; validation, L.J.; writing—original draft preparation, M.Z.; writing—review and editing, L.J. All authors have read and agreed to the published version of the manuscript.
Funding
This work was supported by the National Natural Science Foundation of China (No. 12101412) to M.Z., and the National Philosophy and Social Science Foundation of China (No. 23BTJ046) to L.J.
Data Availability Statement
The real data can be obtained from the Gene Expression Omnibus (GEO) repository (https://www.ncbi.nlm.nih.gov/geo/, accessed on 25 November 2024).
Conflicts of Interest
The authors declare no conflicts of interest.
Appendix A. Proof of the Main Theorem
Appendix A.1. Useful Lemmas
To establish the proof of the main results, some useful lemmas are required. Firstly, some additional notations are introduced. Let be independent centered random vectors in with the covariance matrix , and with .
Set to be independent Gaussian random vectors in with the same mean vector and covariance matrix as . Further, set
Define as the infimum over all numbers u such that
Similarly, are the corresponding quantities for the analog Gaussian case, namely with is replaced by in the above definition. According to [48], the following Kolmogorov–Smirnov distance
can be bounded as follows.
Lemma A1
(Gaussian approximation, Theorem 2.2 of [48]). Suppose there exist some positive constants such that for all . For any , we have
where C is a positive constant that only depends on and , and .
For all , set . Then, in (A1) can be bounded by . Hence, by taking , we have the bound for as follows
In addition, suppose and are two centered Gaussian random vectors in , with the covariance matrices as and , respectively. Ref. [48] proposed to bound the Kolmogorov–Smirnov distance of the maximum of these two different Gaussian distributions by the following lemma.
Lemma A2
(Comparison of the distributions of Gaussian maxima, Lemma 3.1 of [48]). Suppose all the diagonal elements of are bounded away from 0 and ∞. Then, we have
where , C is a positive constant.
Let be independent random vectors, is a kernel function of order. Define the corresponding U-statistic as follows:
Lemma A3 (Hoeffding, 1963).
If the kernel function is bounded,
where a and b are the lower and upper bound of , respectively, and .
Lemma A4.
is a random vector with marginal distribution . For any , we have
By is the m -order symmetric kernel function, and combining the definition in (12), we set the covariance matrices of and are and , respectively. Further, the corresponding correlation matrices are defined as follows:
Specifically, the -th entry of is given by the following:
The following lemma provides an upper bound for the estimation errors of the sample covariance matrix and its correlation matrix .
Lemma A5.
Suppose the Assumptions 1–4 hold. When are sufficiently large, we have that
Appendix A.2. Proof of Lemma A5
Proof.
To prove Lemma A5, it suffices to show that the upper bound in (A3) holds for both and individually. According to similar definitions and assumptions, it is easy to address the upper bound with similar arguments. Without loss of generality, we will show . According to the definition,
By the definition of , we need to bound the following equation,
By Theorem 6 in [60], and the sub-exponential assumptions for , it is easy to obtain the following upper bound for :
For ,
To bound , we bound and , respectively. For , it is obvious that
Hence, the key to bound is to study . According to the definition of in (4), and with some simple calculations, we have the following:
Next, we introduce
where
with threshold . By the triangle inequality, we have
According to the definition of in (A7), and by the triangle inequality, we have
For any , by choosing proper C, we have . By setting , we have
Using the exponential inequality for bounded U -statistics, we have
By Assumption 2, we also have
Combining these results, we have
Therefore, for a sufficiently large , we have
with probability .
For , using the definition of , we simplify the expression by applying the following Hoeffding decomposition. Specifically,
where , . Similarly, for , we have
Set
According to the definition of , we have
Hence, plugging these into , and performing some calculations, we obtain
By triangle inequality and Cauchy–Swartz inequality, we have
Next, we bound and , respectively. By Assumption 4, it’s obvious that bounded. For ,
Given , can be treated as a symmetric kernel function of zero mean and order, and is a U-statistic. Analogously, is a U-statistic with a kernel function of zero mean and m order. Using the comparable technique employed to bound , we can introduce a threshold kernel and exponential inequality for U-statistic. Thus, given , for sufficiently large , we have
with probability . Hence, with probability . Furthermore, for sufficiently large ,
with probability . Combining the bound of in (A10), it is straightforward to obtain
Therefore, combining the bound of in (A4), we have
Next, we show the bound for . According to the definition of and the triangle inequality, we have
Hence, we need to bound and , respectively. For , considering , we have
By Assumptions 3 and 4, there are constants b and B, such that for Hence, we have
For , we have
Therefore, we have
By combining the bounding results from (A13) and (A14), we complete the proof of Lemma A5. □
Appendix A.3. Proof of Theorem 1
Proof.
The proof procedures for (13) and (14) are similar. Without loss of generality, we provide the specific proof process for (14).
According to the definition of in (9), we introduce an oracle test statistic with known variances
As the jackknife pseudo-values are asymptotically independent, and by Lemma A1, we have
where is a Gaussian distribution random vector defined as with . Here, , where and are the covariance matrices of and , respectively.
For , with a simple calculation, we have
By Assumptions 3 and 4, there exist positive constants b and B, such that . According to Assumption 1, the sample size and is in the same order. Thus,
Further, by Lemma A5 we have
Hence, we have . For , there are similar arguments for in Lemma A5. By Hoeffding inequality, we have
Hence, by combining (A16), we have
Appendix A.4. Proof of Theorem 3
Proof.
The proof procedures for (17) and (19) are similar. Without loss of generality, we outline the specific proof process for (19). Following the approach in Theorem 2, we define the oracle critical value from the theoretical distribution ,
The critical value used in the test serves as the bootstrap estimator of . As , we obtain
Therefore, to prove Theorem 3, it suffices to show that the lower bound of approaches 1 as .
First, we establish the lower bound for . As demonstrated in the proof for Theorem 1, given and , we find that follows the standard normal distribution. According to Lemma A4, by setting and , we have
By Theorem 5.8 of [61], it follows that
Thus, we have
Consequently, we have
Next, we focus on establishing the lower bound for L. Under , there exists some , . Set,
By the triangle inequality we have
Define the subset
According to Lemma A5 and (A18), with a probability of at least , we have
Thus, we have Under , we have
Consequently, we have
Considering the signal size requirements in (20), by choosing z satisfying , we have
By the triangle inequality,
Thus, we have
With similar arguments to obtain the upper bound for in the proof of Lemma A5, we have as . Further, we have with a probability of 1 as , i.e., , Theorem 3 is proved. □
References
- Hu, R.; Qiu, X.; Glazko, G.; Klebanov, L.; Yakovlev, A. Detecting Intergene Correlation Changes in Microarray Analysis: A New Approach to Gene Selection. BMC Bioinform. 2009, 10, 20. [Google Scholar] [CrossRef] [PubMed]
- Hu, R.; Qiu, X.; Glazko, G. A New Gene Selection Procedure Based on the Covariance Distance. Bioinformatics 2010, 26, 348–354. [Google Scholar] [CrossRef] [PubMed]
- Shaw, P.; Greenstein, D.; Lerch, J.; Clasen, L.; Lenroot, R.; Gogtay, N.; Evans, A.; Rapoport, J.; Giedd, J. Intellectual Ability and Cortical Development in Children and Adolescents. Nature 2006, 440, 676–679. [Google Scholar] [CrossRef]
- Shedden, K.; Chen, W.; Kuick, R.; Ghosh, D.; Macdonald, J.; Cho, K.R.; Giordano, T.J.; Gruber, S.B.; Fearon, E.R.; Taylor, J.M.G.; et al. Comparison of Seven Methods for Producing Affymetrix Expression Scores Based on False Discovery Rates in Disease Profiling Data. BMC Bioinform. 2005, 6, 26. [Google Scholar] [CrossRef]
- Dubois, P.C.; Trynka, G.; Franke, L.; Hunt, K.A.; Romanos, J.; Curtotti, A.; Zhernakova, A.; Heap, G.A.R.; Adány, R.; Aromaa, A.; et al. Multiple Common Variants for Celiac Disease Influencing Immune Gene Expression. Nat. Genet. 2010, 42, 295–302. [Google Scholar] [CrossRef]
- Huang, Y.; Li, C.; Li, R.; Yang, S. An Overview of Tests on High-Dimensional Means. J. Multivar. Anal. 2022, 188, 104813. [Google Scholar] [CrossRef]
- Bai, Z.; Saranadasa, H. Effect of High Dimension: By an Example of a Two Sample Problem. Stat. Sin. 1996, 6, 311–329. [Google Scholar]
- Srivastava, M.S.; Du, M. A Test for the Mean Vector with Fewer Observations than the Dimension. J. Multivar. Anal. 2008, 99, 386–402. [Google Scholar] [CrossRef]
- Srivastava, M.S. A Test for the Mean Vector with Fewer Observations than the Dimension Under Non-Normality. J. Multivar. Anal. 2009, 100, 518–532. [Google Scholar] [CrossRef]
- Chen, S.X.; Qin, Y.L. A Two-Sample Test for High-Dimensional Data with Applications to Gene-Set Testing. Ann. Stat. 2010, 38, 808–835. [Google Scholar] [CrossRef]
- Li, H.; Aue, A.; Paul, D.; Peng, J.; Wang, P. An Adaptable Generalization of Hotelling’s T2 Test in High Dimension. Ann. Stat. 2020, 48, 1815–1847. [Google Scholar] [CrossRef]
- Cai, T.T.; Liu, W.; Xia, Y. Two-Sample Test of High Dimensional Means Under Dependence. J. R. Stat. Soc. Ser. B Stat. Methodol. 2014, 76, 349–372. [Google Scholar]
- Chang, J.; Zheng, C.; Zhou, W.X.; Zhou, W. Simulation-Based Hypothesis Testing of High Dimensional Means Under Covariance Heterogeneity. Biometrics 2017, 73, 1300–1310. [Google Scholar] [CrossRef] [PubMed]
- Xue, K.; Yao, F. Distribution and Correlation-Free Two-Sample Test of High-Dimensional Means. Ann. Stat. 2020, 48, 1304–1328. [Google Scholar] [CrossRef]
- Xu, G.; Lin, L.; Wei, P.; Pan, W. An Adaptive Two-Sample Test for High-Dimensional Means. Biometrika 2016, 103, 609–624. [Google Scholar] [CrossRef]
- Zhou, C.; Zhang, X.; Zhou, W.; Liu, H. A Unified Framework for Testing High Dimensional Parameters: A Data-Adaptive Approach. arXiv 2018, arXiv:1808.02648. [Google Scholar]
- Feng, L.; Jiang, T.; Li, X.; Liu, B. Asymptotic independence of the sum and maximum of dependent random variables with applications to high-dimensional tests. Stat. Sin. 2024, 34, 1745–1763. [Google Scholar] [CrossRef]
- Lopes, M.E.; Jacob, L.J.; Wainwright, M.J. A More Powerful Two-Sample Test in High Dimensions Using Random Projection. arXiv 2012, arXiv:1108.2401v2. [Google Scholar]
- Srivastava, R.; Li, P.; Ruppert, D. RAPTT: An Exact Two-Sample Test in High Dimensions Using Random Projections. J. Comput. Graph. Stat. 2016, 25, 954–970. [Google Scholar] [CrossRef]
- Liu, W.; Yu, X.; Zhong, W.; Li, R. Projection Test for Mean Vector in High Dimensions. J. Am. Stat. Assoc. 2024, 119, 744–756. [Google Scholar] [CrossRef]
- Zhou, C.; Kong, X.B. Testing of High Dimensional Mean Vectors via Approximate Factor Model. J. Stat. Plan. Inference 2015, 167, 216–227. [Google Scholar] [CrossRef]
- Zhang, M.; Zhou, C.; He, Y.; Zhang, X. Adaptive Test for Mean Vectors of High-Dimensional Time Series Data with Factor Structure. J. Korean Stat. Soc. 2018, 47, 450–470. [Google Scholar] [CrossRef]
- He, Y.; Zhang, M.; Zhang, X.; Zhou, W. High-Dimensional Two-Sample Mean Vectors Test and Support Recovery with Factor Adjustment. Comput. Stat. Data Anal. 2020, 151, 107004. [Google Scholar] [CrossRef]
- Ma, H.; Feng, L.; Wang, Z.; Bao, J. Adaptive testing for alphas in conditional factor models with high dimensional assets. J. Bus. Econ. Stat. 2024, 42, 1356–1366. [Google Scholar] [CrossRef]
- Johnstone, I.M. On the Distribution of the Largest Eigenvalue in Principal Components Analysis. Ann. Stat. 2001, 29, 295–327. [Google Scholar] [CrossRef]
- Soshnikov, A. A Note on Universality of the Distribution of the Largest Eigenvalues in Certain Sample Covariance Matrices. J. Stat. Phys. 2002, 108, 1033–1056. [Google Scholar] [CrossRef]
- Péché, S. Universality Results for the Largest Eigenvalues of Some Sample Covariance Matrix Ensembles. Probab. Theory Relat. Fields 2009, 143, 481–516. [Google Scholar] [CrossRef]
- Birke, M.; Dette, H. A Note on Testing the Covariance Matrix for Large Dimension. Stat. Probab. Lett. 2005, 74, 281–289. [Google Scholar] [CrossRef]
- Srivastava, M.S. Some Tests Concerning the Covariance Matrix in High Dimensional Data. J. Jpn. Stat. Soc. 2005, 35, 251–272. [Google Scholar] [CrossRef]
- Chen, S.X.; Zhang, L.X.; Zhong, P.S. Tests for High-Dimensional Covariance Matrices. J. Am. Stat. Assoc. 2010, 105, 810–819. [Google Scholar] [CrossRef]
- Schott, J.R. A Test for the Equality of Covariance Matrices When the Dimension Is Large Relative to the Sample Sizes. Comput. Stat. Data Anal. 2007, 51, 6535–6542. [Google Scholar] [CrossRef]
- Srivastava, M.S.; Yanagihara, H. Testing the Equality of Several Covariance Matrices with Fewer Observations than the Dimension. J. Multivar. Anal. 2010, 101, 1319–1329. [Google Scholar] [CrossRef]
- Li, J.; Chen, S.X. Two Sample Tests for High-Dimensional Covariance Matrices. Ann. Stat. 2012, 40, 908–940. [Google Scholar] [CrossRef]
- Cai, T.T.; Ma, Z. Optimal Hypothesis Testing for High Dimensional Covariance Matrices. Bernoulli 2013, 19, 2359–2388. [Google Scholar] [CrossRef]
- Cai, T.T.; Jiang, T. Limiting Laws of Coherence of Random Matrices With Applications to Testing Covariance Structure and Construction of Compressed Sensing Matrices. Ann. Stat. 2011, 39, 1496–1525. [Google Scholar] [CrossRef]
- Qiu, Y.; Chen, S.X. Test for Bandedness of High-Dimensional Covariance Matrices and Bandwidth Estimation. Ann. Stat. 2012, 40, 1285. [Google Scholar] [CrossRef]
- Cai, T.T.; Zhang, A. Inference for High-Dimensional Differential Correlation Matrices. J. Multivar. Anal. 2016, 143, 107–126. [Google Scholar] [CrossRef] [PubMed]
- Zheng, S.; Cheng, G.; Guo, J.; Zhu, H. Test for High Dimensional Correlation Matrices. Ann. Stat. 2019, 47, 2887–2921. [Google Scholar] [CrossRef]
- Zhou, W. Asymptotic Distribution of the Largest Off-Diagonal Entry of Correlation Matrices. Trans. Am. Math. Soc. 2007, 359, 5345–5363. [Google Scholar] [CrossRef]
- Bao, Z.; Lin, L.C.; Pan, G.; Zhou, W. Spectral Statistics of Large Dimensional Spearman’s Rank Correlation Matrix and Its Application. Ann. Stat. 2015, 43, 2588–2623. [Google Scholar] [CrossRef]
- Han, F.; Chen, S.; Liu, H. Distribution-Free Tests of Independence in High Dimensions. Biometrika 2017, 104, 813–828. [Google Scholar] [CrossRef] [PubMed]
- Zhou, C.; Han, F.; Zhang, X.S.; Liu, H. An Extreme-Value Approach for Testing the Equality of Large U-Statistic Based Correlation Matrices. Bernoulli 2019, 25, 1472–1503. [Google Scholar] [CrossRef]
- He, Y.; Xu, G.; Wu, C.; Pan, W. Asymptotically Independent U-Statistics in High-Dimensional Testing. Ann. Stat. 2021, 49, 154. [Google Scholar] [CrossRef] [PubMed]
- Chen, X. Gaussian and Bootstrap Approximations for High-Dimensional U-Statistics and Their Applications. Ann. Stat. 2018, 46, 642–678. [Google Scholar] [CrossRef]
- Tukey, J.W. Bias and Confidence in Not Quite Large Samples. Ann. Math. Stat. 1958, 29, 614. [Google Scholar]
- Quenouille, M.H. Notes On Bias In Estimation. Biometrika 1956, 3–4, 3–4. [Google Scholar]
- Shi, X. The Approximate Independence of Jackknife Pseudo-Values and the Bootstrap Methods. J. Wuhan Univ. Hydraul. Electr. Eng. 1984, 2, 83–90. [Google Scholar]
- Chernozhukov, V.; Chetverikov, D.; Kato, K. Gaussian Approximations and Multiplier Bootstrap for Maxima of Sums of High-Dimensional Random Vectors. Ann. Stat. 2013, 41, 2786–2819. [Google Scholar] [CrossRef]
- Cai, T.; Liu, W.; Xia, Y. Two-Sample Covariance Matrix Testing and Support Recovery in High-Dimensional and Sparse Settings. J. Am. Stat. Assoc. 2013, 108, 265–277. [Google Scholar] [CrossRef]
- Chang, J.; Zhou, W.; Zhou, W.X.; Wang, L. Comparing Large Covariance Matrices Under Weak Conditions on the Dependence Structure and Its Application to Gene Clustering. Biometrics 2017, 73, 31–41. [Google Scholar] [CrossRef]
- Mazieres, J.; He, B.; You, L.; Xu, Z.; Jablons, D.M. Wnt Signaling in Lung Cancer. Cancer Lett. 2005, 222, 1–10. [Google Scholar] [CrossRef] [PubMed]
- Clements, W.M.; Wang, J.; Sarnaik, A.; Kim, O.J.; Macdonald, J.; Fenogliopreiser, C.; Groden, J.; Lowy, A.M. Beta-Catenin Mutation Is a Frequent Cause of Wnt Pathway Activation in Gastric Cancer. Cancer Res. 2002, 62, 3503–3506. [Google Scholar] [PubMed]
- Howe, L.R.; Brown, A.M. Wnt Signaling and Breast Cancer. Cancer Biol. Ther. 2004, 3, 36–41. [Google Scholar] [CrossRef]
- Corda, G.; Sala, A. Non-Canonical WNT/PCP Signalling in Cancer: Fzd6 Takes Centre Stage. Oncogenesis 2017, 6, e364. [Google Scholar] [CrossRef] [PubMed]
- Rapp, J.; Jaromi, L.; Kvell, K.; Miskei, G.; Pongracz, J.E. WNT Signaling Lung Cancer Is No Exception. Respir. Res. 2017, 18, 167. [Google Scholar] [CrossRef]
- Kim, K.B.; Kim, D.W.; Kim, Y.; Tang, J.; Kirk, N.; Gan, Y.; Kim, B.; Fang, B.; Park, J.l.; Zheng, Y.; et al. WNT5A–RHOA signaling is a driver of tumorigenesis and represents a therapeutically actionable vulnerability in small cell lung cancer. Cancer Res. 2022, 82, 4219–4233. [Google Scholar] [CrossRef]
- Lloyd, C.J. Estimating test power adjusted for size. J. Stat. Comput. Simul. 2005, 75, 921–933. [Google Scholar] [CrossRef]
- Cuparić, M.; Milošević, B. To impute or to adapt? model specification tests’ perspective. Stat. Pap. 2024, 65, 1021–1039. [Google Scholar] [CrossRef]
- Papadimitriou, C.K.; Meintanis, S.G.; Andrade, B.B.; Tsionas, M.G. Specification tests for normal/gamma and stable/gamma stochastic frontier models based on empirical transforms. Econom. Stat. 2024, in press. [Google Scholar] [CrossRef]
- Delaigle, A.; Hall, P.; Jin, J. Robustness and Accuracy of Methods for High Dimensional Data Analysis Based on Student’s T-Statistic. J. R. Stat. Soc. Ser. B Stat. Methodol. 2011, 73, 283–301. [Google Scholar] [CrossRef]
- Boucheron, L.G.S.; Massart, P. Concentration Inequalities: A Nonasymptotic Theory of Independence; Oxford University Press: Oxford, UK, 2013. [Google Scholar]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).