Improved Test for High-Dimensional Mean Vectors and Covariance Matrices Using Random Projection
Abstract
1. Introduction
- and are the sample mean vectors,
- ,
- , and
- is the pooled sample covariance matrix:
2. Random Projection
2.1. A Brief Review
2.2. An Improved Test for Equality of Two Mean Vectors
- (1)
- Maximum: ;
- (2)
- Average: ;
- (3)
- 100p-th Percentile: , where and denotes the smallest integer not less than x.
2.3. An Improved Test for Equality of Two Covariance Matrices
- (1)
- Minp: ;
- (2)
- Avep: .
3. Numerical Results
3.1. Comparing Two Mean Vectors
3.1.1. Setup and Simulation Design
- Independent Structure: , where is the identity matrix.
- Toeplitz Structure: is a symmetric Toeplitz matrix defined by the autocorrelation sequence , so that
- Dense alternative: 75% of the components of are nonzero.
- Sparse alternative: 1% of the components of are nonzero.
3.1.2. Simulation Results
- (1)
- Maximum (Max): ;
- (2)
- Average (Ave): ;
- (3)
- 95th Percentile (95): .
Type-I Error Control
Power Comparisons
Case 1: Dense Alternative with and
Case 2: Sparse Alternative with and
Case 3: Dense Alternative with (Toeplitz) and
Case 4: Sparse Alternative with and
3.2. Comparing Two Covariance Matrices
3.2.1. Setup and Simulation Design
- Model a:
- and are independently generated as follows: let be generated from a standard normal distribution, and then construct by letting
- Model b:
- Model c:
3.2.2. Simulation Results
Type-I Error Control
Power Comparisons
4. Application to Acute Lymphoblastic Leukemia (ALL) Data
4.1. Data Preparation
4.2. Procedures
- Initial Screening: We follow the same procedure in Chen et al. [6] to perform an initial screening using the genefilter package (Bioconductor version 3.6) to retain 2391 genes for analysis.
- Gene Set Selection: GO term-based gene sets are extracted, excluding those with fewer than 2 genes to ensure meaningful multivariate analysis. This yields 3468, 571, and 803 GO terms for BP, CC, and MF, respectively. The largest gene set contains 1644 genes.
- Projected Test: For each gene set, we apply the covariance matrix test Avep test. For those gene sets that fail to reject, we then apply the test Ave with random projections to compare mean vectors between the BCR/ABL and NEG groups. We consider two training sample size settings: (no training data) and (with training data) as suggested by Table 5.
- Multiple Testing Correction: p-values are adjusted using the Benjamini-Hochberg procedure to control the false discovery rate (FDR) at 5%.
4.3. Results
- Case 1:
- BP: 208 (6.00%) gene sets are significant ( after FDR correction)
- CC: 25 (4.38%) gene sets are significant
- MF: 27 (3.36%) gene sets are significant
- Case 2:
- BP: 206 (5.94%) gene sets are significant ( after FDR correction)
- CC: 28 (4.90%) gene sets are significant
- MF: 32 (3.99%) gene sets are significant
5. Summary and Conclusions
Funding
Data Availability Statement
Conflicts of Interest
References
- Lopes, M.; Jacob, L.; Wainwright, M.J. A More Powerful Two-Sample Test in High Dimensions using Random Projection. In Advances in Neural Information Processing Systems 24; Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F., Weinberger, K.Q., Eds.; Curran Associates, Inc.: New York, NY, USA, 2011; pp. 1206–1214. [Google Scholar]
- Bai, Z.; Saranadasa, H. Effect of high dimension: By an example of a two sample problem. Stat. Sin. 1996, 6, 311–329. [Google Scholar]
- Chen, S.X.; Qin, Y.L. A two-sample test for high-dimensional data with applications to gene-set testing. Ann. Stat. 2010, 38, 808–835. [Google Scholar] [CrossRef]
- Srivastava, R.; Li, P.; Ruppert, D. RAPTT: An exact two-sample test in high dimensions using random projections. J. Comput. Graph. Stat. 2016, 25, 954–970. [Google Scholar] [CrossRef]
- Thulin, M. A high-dimensional two-sample test for the mean using random subspaces. Comput. Stat. Data Anal. 2014, 74, 26–38. [Google Scholar] [CrossRef]
- Chen, S.X.; Zhang, L.X.; Zhong, P.S. Tests for high-dimensional covariance matrices. J. Amer. Stat. Assoc. 2010, 105, 810–819. [Google Scholar] [CrossRef]
- Bai, Z.D. Convergence rate of expected spectral distributions of large random matrices. II. Sample covariance matrices. Ann. Probab. 1993, 21, 649–672. [Google Scholar] [CrossRef]
- Bai, Z.D.; Yin, Y.Q. Limit of the smallest eigenvalue of a large-dimensional sample covariance matrix. Ann. Probab. 1993, 21, 1275–1294. [Google Scholar] [CrossRef]
- Bickel, P.J.; Levina, E. Covariance regularization by thresholding. Ann. Stat. 2008, 36, 2577–2604. [Google Scholar] [CrossRef] [PubMed]
- Bickel, P.J.; Levina, E. Regularized estimation of large covariance matrices. Ann. Stat. 2008, 36, 199–227. [Google Scholar] [CrossRef]
- Li, J.; Chen, S.X. Two sample tests for high-dimensional covariance matrices. Ann. Stat. 2012, 40, 908–940. [Google Scholar] [CrossRef]
- Won, J.H.; Lim, J.; Kim, S.J.; Rajaratnam, B. Condition-number-regularized covariance estimation. J. R. Stat. Soc. Ser. B Stat. Methodol. 2013, 75, 427–450. [Google Scholar] [CrossRef] [PubMed]
- Li, P.; Hastie, T.J.; Church, K.W. Very Sparse Random Projections. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, NY, USA, 20–23 August 2006; KDD ’06. pp. 287–296. [Google Scholar] [CrossRef]
- Li, P.; Owen, A.B.; Zhang, C.H. One Permutation Hashing. In Proceedings of the 25th International Conference on Neural Information Processing Systems, Granada Spain, 12–15 December 2011; NIPS’12. pp. 3113–3121. [Google Scholar]
- Wu, T.L.; Li, P. Projected tests for high-dimensional covariance matrices. J. Stat. Plann. Inference 2020, 207, 73–85. [Google Scholar] [CrossRef]
- Härdle, W.K.; Simar, L. Applied Multivariate Statistical Analysis, 5th ed.; Springer: Cham, Switzerland, 2019. [Google Scholar] [CrossRef]
RAPTT | Max | Ave | 95 | |||||
---|---|---|---|---|---|---|---|---|
c | m1 = 10 | m1 = 20 | m1 = 10 | m1 = 20 | m1 = 10 | m1 = 20 | m1 = 10 | m1 = 20 |
0 | 0.040 | 0.050 | 0.046 | 0.064 | 0.042 | 0.050 | 0.046 | 0.064 |
0.5 | 0.072 | 0.094 | 0.086 | 0.090 | 0.096 | 0.090 | 0.086 | 0.090 |
1 | 0.172 | 0.148 | 0.158 | 0.136 | 0.224 | 0.168 | 0.158 | 0.136 |
1.5 | 0.316 | 0.306 | 0.300 | 0.240 | 0.412 | 0.358 | 0.300 | 0.240 |
2 | 0.480 | 0.492 | 0.448 | 0.426 | 0.600 | 0.558 | 0.448 | 0.426 |
2.5 | 0.676 | 0.676 | 0.650 | 0.548 | 0.784 | 0.742 | 0.650 | 0.548 |
3 | 0.818 | 0.792 | 0.778 | 0.716 | 0.876 | 0.852 | 0.778 | 0.716 |
3.5 | 0.880 | 0.870 | 0.844 | 0.806 | 0.926 | 0.924 | 0.844 | 0.806 |
4 | 0.932 | 0.952 | 0.936 | 0.892 | 0.974 | 0.974 | 0.936 | 0.892 |
4.5 | 0.978 | 0.972 | 0.964 | 0.936 | 0.992 | 0.990 | 0.964 | 0.936 |
5 | 0.990 | 0.984 | 0.982 | 0.978 | 0.996 | 0.996 | 0.982 | 0.978 |
5.5 | 0.998 | 0.994 | 0.996 | 0.996 | 1.000 | 1.000 | 0.996 | 0.996 |
6 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 |
RAPTT | Max | Ave | 95 | |||||
---|---|---|---|---|---|---|---|---|
c | m1 = 10 | m1 = 20 | m1 = 10 | m1 = 20 | m1 = 10 | m1 = 20 | m1 = 10 | m1 = 20 |
0 | 0.064 | 0.052 | 0.044 | 0.046 | 0.058 | 0.044 | 0.042 | 0.034 |
0.5 | 0.212 | 0.178 | 0.112 | 0.122 | 0.222 | 0.172 | 0.212 | 0.144 |
1 | 0.510 | 0.388 | 0.296 | 0.216 | 0.500 | 0.384 | 0.446 | 0.316 |
1.5 | 0.770 | 0.632 | 0.454 | 0.380 | 0.766 | 0.632 | 0.692 | 0.552 |
2 | 0.888 | 0.788 | 0.664 | 0.542 | 0.898 | 0.774 | 0.858 | 0.698 |
2.5 | 0.972 | 0.930 | 0.796 | 0.746 | 0.974 | 0.918 | 0.942 | 0.858 |
3 | 0.988 | 0.982 | 0.898 | 0.874 | 0.988 | 0.982 | 0.978 | 0.952 |
3.5 | 1.000 | 0.994 | 0.970 | 0.920 | 0.998 | 0.990 | 1.000 | 0.974 |
4 | 1.000 | 0.998 | 0.996 | 0.974 | 1.000 | 0.998 | 1.000 | 0.998 |
4.5 | 1.000 | 1.000 | 0.996 | 0.984 | 1.000 | 1.000 | 1.000 | 0.996 |
5 | 1.000 | 1.000 | 0.996 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 |
5.5 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 |
6 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 |
Minp | Avep | max Fj | |||||||
---|---|---|---|---|---|---|---|---|---|
m1 | Model a | Model b | Model c | Model a | Model b | Model c | Model a | Model b | Model c |
0 | 0.0420 | 0.0530 | 0.0590 | 0.0590 | 0.0580 | 0.0560 | 0.0420 | 0.0530 | 0.0590 |
2 | 0.0490 | 0.0520 | 0.0600 | 0.0500 | 0.0440 | 0.0410 | 0.0490 | 0.0520 | 0.0600 |
4 | 0.0410 | 0.0320 | 0.0450 | 0.0490 | 0.0470 | 0.0580 | 0.0410 | 0.0320 | 0.0450 |
6 | 0.0670 | 0.0530 | 0.0590 | 0.0640 | 0.0620 | 0.0590 | 0.0670 | 0.0530 | 0.0590 |
8 | 0.0500 | 0.0630 | 0.0630 | 0.0520 | 0.0600 | 0.0530 | 0.0500 | 0.0630 | 0.0630 |
10 | 0.0640 | 0.0710 | 0.0730 | 0.0450 | 0.0390 | 0.0430 | 0.0640 | 0.0710 | 0.0770 |
12 | 0.0430 | 0.0430 | 0.0420 | 0.0590 | 0.0470 | 0.0560 | 0.0430 | 0.0430 | 0.0420 |
14 | 0.0510 | 0.0520 | 0.0390 | 0.0490 | 0.0490 | 0.0440 | 0.0510 | 0.0520 | 0.0390 |
16 | 0.0620 | 0.0560 | 0.0520 | 0.0550 | 0.0610 | 0.0590 | 0.0620 | 0.0560 | 0.0520 |
18 | 0.0390 | 0.0440 | 0.0440 | 0.0480 | 0.0400 | 0.0550 | 0.0390 | 0.0440 | 0.0440 |
20 | 0.0480 | 0.0430 | 0.0480 | 0.0430 | 0.0450 | 0.0420 | 0.0480 | 0.0430 | 0.0480 |
Minp | Avep | max Fj | |||||||
---|---|---|---|---|---|---|---|---|---|
m1 | Model a | Model b | Model c | Model a | Model b | Model c | Model a | Model b | Model c |
0 | 0.0400 | 0.0460 | 0.0430 | 0.0610 | 0.0530 | 0.0500 | 0.0400 | 0.0460 | 0.0430 |
2 | 0.0660 | 0.0630 | 0.0610 | 0.0640 | 0.0480 | 0.0530 | 0.0660 | 0.0630 | 0.0610 |
4 | 0.0460 | 0.0330 | 0.0420 | 0.0720 | 0.0580 | 0.0660 | 0.0460 | 0.0330 | 0.0420 |
6 | 0.0440 | 0.0510 | 0.0490 | 0.0740 | 0.0470 | 0.0570 | 0.0440 | 0.0510 | 0.0490 |
8 | 0.0540 | 0.0420 | 0.0490 | 0.0770 | 0.0420 | 0.0440 | 0.0540 | 0.0420 | 0.0490 |
10 | 0.0620 | 0.0630 | 0.0650 | 0.0720 | 0.0580 | 0.0580 | 0.0620 | 0.0630 | 0.0650 |
12 | 0.0390 | 0.0610 | 0.0540 | 0.0760 | 0.0600 | 0.0670 | 0.0390 | 0.0610 | 0.0540 |
14 | 0.0590 | 0.0590 | 0.0540 | 0.0700 | 0.0490 | 0.0600 | 0.0590 | 0.0590 | 0.0540 |
16 | 0.0450 | 0.0380 | 0.0380 | 0.0800 | 0.0530 | 0.0670 | 0.0450 | 0.0380 | 0.0380 |
18 | 0.0420 | 0.0390 | 0.0460 | 0.0790 | 0.0610 | 0.0620 | 0.0420 | 0.0390 | 0.0460 |
20 | 0.0560 | 0.0510 | 0.0580 | 0.0820 | 0.0580 | 0.0520 | 0.0560 | 0.0510 | 0.0580 |
Minp | Avep | max Fj | |||||||
---|---|---|---|---|---|---|---|---|---|
m1 | Model a | Model b | Model c | Model a | Model b | Model c | Model a | Model b | Model c |
0 | 0.2380 | 0.1150 | 0.2280 | 0.5860 | 0.2400 | 0.5970 | 0.2340 | 0.1100 | 0.2560 |
2 | 0.2090 | 0.1060 | 0.2100 | 0.6280 | 0.2790 | 0.5930 | 0.2280 | 0.1370 | 0.2730 |
4 | 0.2320 | 0.1140 | 0.2410 | 0.6040 | 0.2590 | 0.5790 | 0.2130 | 0.0950 | 0.2210 |
6 | 0.2000 | 0.0930 | 0.2010 | 0.5070 | 0.2120 | 0.5210 | 0.2190 | 0.1330 | 0.2310 |
8 | 0.2300 | 0.1040 | 0.2350 | 0.5730 | 0.2490 | 0.5510 | 0.2320 | 0.1270 | 0.2360 |
10 | 0.1660 | 0.0860 | 0.1520 | 0.5520 | 0.2600 | 0.5660 | 0.2170 | 0.1110 | 0.2360 |
12 | 0.1810 | 0.0880 | 0.1980 | 0.5000 | 0.2010 | 0.5150 | 0.1920 | 0.0970 | 0.1980 |
14 | 0.1790 | 0.1150 | 0.2090 | 0.4620 | 0.2010 | 0.4910 | 0.1860 | 0.1190 | 0.2160 |
16 | 0.1970 | 0.1040 | 0.2130 | 0.5050 | 0.2230 | 0.4570 | 0.2190 | 0.1170 | 0.2040 |
18 | 0.1600 | 0.0990 | 0.1800 | 0.4430 | 0.1840 | 0.4280 | 0.1390 | 0.0880 | 0.1570 |
20 | 0.1580 | 0.1000 | 0.1700 | 0.3960 | 0.1760 | 0.4370 | 0.1690 | 0.0800 | 0.1820 |
Minp | Avep | max Fj | |||||||
---|---|---|---|---|---|---|---|---|---|
m1 | Model a | Model b | Model c | Model a | Model b | Model c | Model a | Model b | Model c |
0 | 0.2970 | 0.1300 | 0.3150 | 0.9990 | 0.8050 | 0.9980 | 0.3210 | 0.1560 | 0.3330 |
2 | 0.2890 | 0.1390 | 0.3120 | 0.9990 | 0.8350 | 0.9990 | 0.3720 | 0.1730 | 0.3760 |
4 | 0.3490 | 0.1720 | 0.4120 | 0.9970 | 0.7970 | 0.9970 | 0.2860 | 0.1190 | 0.3030 |
6 | 0.2840 | 0.1180 | 0.3070 | 0.9980 | 0.7910 | 0.9990 | 0.2510 | 0.1090 | 0.2670 |
8 | 0.2410 | 0.1330 | 0.2860 | 0.9960 | 0.7840 | 0.9990 | 0.2620 | 0.1340 | 0.2760 |
10 | 0.2690 | 0.1490 | 0.2730 | 0.9950 | 0.7600 | 0.9970 | 0.2910 | 0.1370 | 0.3040 |
12 | 0.1880 | 0.0940 | 0.2220 | 0.9950 | 0.7770 | 0.9960 | 0.2850 | 0.1450 | 0.2840 |
14 | 0.2590 | 0.1210 | 0.2790 | 0.9900 | 0.7500 | 0.9940 | 0.2470 | 0.1260 | 0.2540 |
16 | 0.2440 | 0.1200 | 0.2630 | 0.9860 | 0.7320 | 0.9900 | 0.2170 | 0.1100 | 0.2150 |
18 | 0.2350 | 0.1200 | 0.2430 | 0.9840 | 0.6610 | 0.9910 | 0.2140 | 0.1030 | 0.2310 |
20 | 0.2310 | 0.1380 | 0.2350 | 0.9760 | 0.6630 | 0.9890 | 0.2360 | 0.1140 | 0.2260 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wu, T.-L. Improved Test for High-Dimensional Mean Vectors and Covariance Matrices Using Random Projection. Mathematics 2025, 13, 2060. https://doi.org/10.3390/math13132060
Wu T-L. Improved Test for High-Dimensional Mean Vectors and Covariance Matrices Using Random Projection. Mathematics. 2025; 13(13):2060. https://doi.org/10.3390/math13132060
Chicago/Turabian StyleWu, Tung-Lung. 2025. "Improved Test for High-Dimensional Mean Vectors and Covariance Matrices Using Random Projection" Mathematics 13, no. 13: 2060. https://doi.org/10.3390/math13132060
APA StyleWu, T.-L. (2025). Improved Test for High-Dimensional Mean Vectors and Covariance Matrices Using Random Projection. Mathematics, 13(13), 2060. https://doi.org/10.3390/math13132060