On the (Apparently) Paradoxical Role of Noise in the Recognition of Signal Character of Minor Principal Components
Abstract
:1. Introduction
1.1. The Peculiar Role of Principal Component Analysis in Life Sciences
1.2. Why in Biomedical Sciences the Relevant Information Often Hides in Minor Components
2. Methods
Noise–Signal Discrimination by Noise Titration
- Using a relatively short series:The first step of our simulation is the generation of a time series by a Gaussian distribution at zero mean and unit standard deviation . This series, by the action of a 15-dimensional embedding procedure at [14], gives rise to a multivariate matrix with 15 variables and 86 statistical units. The relatively low number of statistical units is consistent with the numerosity of a great part of biological experimentation.
- Presence of correlation structures in the extracted components:The character of PCA as a filter for correlated information makes the component scores relative to the matrix show a certain amount of internal correlation, so we do not have any ‘pure noise’ component. In the second step of the procedure, we added to the original series an extremely weak signal (corresponding to a ’saw-wave’), composed of alternating 0.1 and −0.1 values. The extremely low power of the added signal makes the resulting series practically identical to (Pearson ; see Section 3) and have a largely superimposable eigenvalue distribution of the original () and signal-added () embedding matrices (see Section 3). Looking at component loading matrix, the 14th component, well inside the noise-floor, keeps trace of the square-wave signal. This component is the ‘analyte’ we expect to ‘react’ with the added noise.
- Adding noise:Having checked the superposition between the and series and the consequent coincidence between the eigenvalue distribution of the relative and 15-dimensional embedding matrices, as the third step, we generated 20 noise-corrupted copies of . These contaminated series are named – (according to the increasing amount of added zero-mean Gaussian noise), from a minimal of 0.05 () to a maximum of 1 (), with each copy differing 0.05 units from the previous one.
- Titration:As the fourth step, the 15-dimensional embedding matrices relative to the – series are analyzed by PCA and the Pearson correlation between the PC6 and PC15 (noise floor) component scores relative to the matrix and each corresponding component relative to each of the 20 embedding matrices’ noise-added series are computed. It is worth noting that, due to their almost identical eigenvalues, the PC6–PC15 ordering varies both across noise-corrupted data sets and with respect to the original matrix. Thus, for any value of added noise, we picked up the component having the higher correlation with the original one as the ‘corresponding component’, independent of its relative order of explained variance.
- Recognition of the weak signal:The fifth and last step of the procedure is to check if the ‘weak signal’ component (PC14 in the original matrix) shows a significantly higher between the original corrupted versions of the PC14 (weak signal) correlation and the amount of added noise with respect to the other minor components. It is worth noting that the recognition of the weak signal relies on the expected effect of added noise in decreasing the correlation between the original and noise-contaminated versions of the component (the Pearson r between added noise and original noise-added components are all negative).All the analyses were performed in R using the ‘rnorm’ function for randomly generating data from a normal distribution and the ‘princomp’ for PCA. Both functions belong to the ‘stats’ package.
3. Results and Discussion
4. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Preisendorfer, R. Principal component analysis in meteorology and oceanography. Elsevier Sci. Publ. 1988, 17, 425. [Google Scholar]
- Giuliani, A. The application of principal component analysis to drug discovery and biomedical data. Drug Discov. Today 2017, 22, 1069–1076. [Google Scholar] [CrossRef] [PubMed]
- Pelissetto, A.; Vicari, E. Critical phenomena and renormalization-group theory. Phys. Rep. 2002, 368, 549–727. [Google Scholar] [CrossRef]
- Giuliani, A.; Mancini, A.M.; Ghirardi, O.; Ramacci, M.T.; Voronina, T.; Sirabella, P.; Colosimo, A. Micro-and macrostructure of learning in active avoidance: A quantitative approach. Neurobiol. Learn. Mem. 1996, 65, 82–90. [Google Scholar] [CrossRef] [PubMed]
- David, C.C.; Jacobs, D.J. Principal component analysis: A method for determining the essential dynamics of proteins. Protein Dyn. Methods Protoc. 2014, 1084, 193–226. [Google Scholar]
- Gorban, A.N.; Smirnova, E.V.; Tyukina, T.A. Correlations, risk and crisis: From physiology to finance. Biophys. Rev. 2021, 389, 3193–3217. [Google Scholar]
- Zimatore, G.; Tsuchiya, M.; Hashimoto, M.; Kasperski, A.; Giuliani, A. Self-organization of whole-gene expression through coordinated chromatin structural transition. Phys. Stat. Mech. Its Appl. 2010, 389, 3193–3217. [Google Scholar] [CrossRef]
- Giuliani, A.; Vici, A. Stability/Flexibility: The tightly coupled homeostasis generator is at the same time the driver of change. Ann. Ist. Sup. San. 2023, in press. [Google Scholar]
- Giuliani, A.; Colafranceschi, M.; Webber, C.L., Jr.; Zbilut, J.P. A complexity score derived from principal components analysis of nonlinear order measures. Phys. Stat. Mech. Its Appl. 2001, 301, 567–588. [Google Scholar] [CrossRef]
- Roden, J.C.; King, B.W.; Trout, D.; Mortazavi, A.; Wold, B.J.; Hart, C.E. Mining gene expression data by interpreting principal components. BMC Bioinform. 2006, 7, 194. [Google Scholar] [CrossRef] [PubMed]
- Vilenchik, D.; Yichye, B.; Abutbul, M. To interpret or not to interpret PCA? This is our question. In Proceedings of the International AAAI Conference on Web and Social Media, Münich, Germany, 11–14 June 2019; Volume 13. [Google Scholar]
- Jade, A.M.; Srikanth, B.; Jayaraman, V.K.; Kulkarni, B.D.; Jog, J.P.; Priya, L. Feature extraction and denoising using kernel PCA. Chem. Eng. Sci. 2003, 58, 4441–4448. [Google Scholar] [CrossRef]
- Song, F.; Guo, Z.; Mei, D. Feature selection using principal component analysis. In Proceedings of the 2010 International Conference on System Science, Engineering Design and Manufacturing Informatization, Yichang, China, 12–14 November 2010; IEEE: Washington, DC, USA, 2010; Volume 1, pp. 27–30. [Google Scholar]
- Broomhead, D.S.; King, G.P. Extracting qualitative dynamics from experimental data. Phys. Nonlinear Phenom. 1986, 20, 217–236. [Google Scholar] [CrossRef]
- Jolicoeur, P.; Mosimann, J.E. Size and shape variation in the painted turtle. A principal component analysis. Growth 1960, 24, 339–354. [Google Scholar] [PubMed]
- Hansen, L.K.; Larsen, J.; Nielsen, F.Å.; Strother, S.C.; Rostrup, E.; Savoy, R.; Lange, N.; Sidtis, J.; Saver, C.; Paulson, O.B. Generalizable patterns in neuroimaging: How many principal components? NeuroImage 1999, 9, 534–544. [Google Scholar] [CrossRef] [PubMed]
- Giuliani, A.; Colosimo, A.; Benigni, R.; Zbilut, J.P. On the constructive role of noise in spatial systems. Phys. Lett. A 1998, 247, 47–52. [Google Scholar] [CrossRef]
- Sneath, P.H.A. Distortions of taxonomic structure from incomplete data on a restricted set of reference strains. Microbiology 1983, 129, 1045–1073. [Google Scholar] [CrossRef]
- Poon, C.S.; Barahona, M. Titration of chaos with added noise. Proc. Natl. Acad. Sci. USA 2001, 98, 7107–7112. [Google Scholar] [CrossRef] [PubMed]
- Jolliffe, I.T. Principal Component Analysis; Springer: New York, NY, USA, 2002; pp. 111–149. [Google Scholar]
- Kay, S. Can detectability be improved by adding noise? IEEE Signal Process. Lett. 2009, 7, 8–10. [Google Scholar] [CrossRef]
- Amini, A.; Wainwright, M.J. High-dimensional analysis of semidefinite relaxations for sparse principal components. In Proceedings of the 2008 IEEE International Symposium on Information Theory, Toronto, ON, Canada, 6–11 July 2008; IEEE: Washington, DC, USA, 2008. [Google Scholar]
- Garg, A.; Belarbi, M.O.; Tounsi, A.; Li, L.; Singh, A.; Mukhopadhyay, T. Predicting elemental stiffness matrix of FG nanoplates using Gaussian Process Regression based surrogate model in framework of layerwise model. Eng. Anal. Bound. Elem. 2022, 143, 779–795. [Google Scholar] [CrossRef]
- Mukdasai, K.; Sabir, Z.; Raja, M.A.Z.; Sadat, R.; Ali, M.R.; Singkibud, P. A numerical simulation of the fractional order Leptospirosis model using the supervise neural network. Alex. Eng. J. 2022, 61, 12431–12441. [Google Scholar] [CrossRef]
PC1 | PC2 | PC3 | PC4 | PC5 |
---|---|---|---|---|
0.996 | 0.0029 | 0.00006 | 0.00004 | 0.000005 |
Variables | Components | |||||||
---|---|---|---|---|---|---|---|---|
PC1 | PC2 | PC3 | PC4 | PC5 | PC6 | PC7 | PC8 | |
T0 | 0.293 | 0.000 | 0.443 | 0.000 | 0.191 | 0.000 | 0.259 | 0.000 |
T1 | 0.307 | 0.000 | 0.000 | 0.447 | 0.141 | −0.272 | 0.175 | −0.130 |
T2 | 0.180 | −0.167 | −0.168 | −0.105 | 0.473 | −0.298 | −0.254 | 0.000 |
T3 | 0.133 | −0.316 | 0.000 | 0.000 | 0.258 | 0.597 | −0.182 | 0.000 |
T4 | −0.114 | −0.434 | −0.400 | 0.000 | −0.116 | 0.143 | 0.402 | 0.180 |
T5 | −0.321 | −0.326 | 0.000 | −0.403 | 0.000 | −0.151 | 0.218 | −0.395 |
T6 | −0.383 | −0.202 | 0.190 | 0.121 | −0.263 | 0.138 | −0.506 | −0.154 |
T7 | −0.428 | 0.000 | 0.159 | 0.102 | 0.000 | 0.000 | 0.167 | 0.638 |
T8 | −0.287 | 0.345 | 0.223 | 0.172 | 0.105 | 0.000 | 0.314 | −0.437 |
T9 | −0.131 | 0.411 | −0.377 | 0.161 | 0.000 | −0.130 | −0.310 | −0.179 |
T10 | 0.000 | 0.396 | −0.138 | −0.372 | 0.118 | 0.253 | −0.107 | 0.289 |
T11 | 0.241 | 0.221 | −0.123 | 0.000 | −0.340 | 0.448 | 0.213 | −0.208 |
T12 | 0.192 | 0.000 | −0.321 | −0.202 | −0.481 | −0.285 | 0.000 | 0.000 |
T13 | 0.239 | 0.000 | 0.388 | −0.490 | −0.168 | −0.192 | 0.000 | 0.000 |
T14 | 0.263 | −0.158 | 0.250 | 0.339 | −0.408 | 0.000 | −0.220 | 0.000 |
Explained Variance | 0.101 | 0.096 | 0.084 | 0.083 | 0.082 | 0.073 | 0.068 | 0.066 |
Variables | Components | |||||||
PC9 | PC10 | PC11 | PC12 | PC13 | PC14 | PC15 | ||
T0 | 0.206 | 0.447 | 0.439 | 0.209 | 0.183 | 0.263 | 0.114 | |
T1 | 0.000 | 0.370 | −0.495 | 0.169 | 0.000 | −0.357 | 0.000 | |
T2 | 0.466 | −0.259 | 0.000 | 0.144 | −0.231 | 0.395 | 0.000 | |
T3 | 0.277 | 0.122 | 0.000 | −0.362 | −0.150 | −0.341 | 0.242 | |
T4 | 0.147 | 0.122 | −0.128 | −0.181 | 0.385 | 0.349 | −0.225 | |
T5 | 0.169 | 0.000 | 0.000 | 0.377 | 0.209 | −0.240 | 0.354 | |
T6 | 0.140 | 0.408 | 0.000 | 0.240 | −0.125 | 0.166 | −0.320 | |
T7 | 0.315 | 0.000 | −0.194 | 0.000 | −0.281 | 0.000 | 0.323 | |
T8 | 0.414 | −0.187 | 0.000 | −0.316 | 0.000 | 0.000 | −0.328 | |
T9 | 0.149 | 0.268 | 0.000 | −0.179 | 0.353 | 0.171 | 0.469 | |
T10 | 0.235 | 0.000 | −0.165 | 0.306 | 0.344 | −0.293 | −0.353 | |
T11 | 0.000 | 0.000 | −0.299 | 0.314 | −0.316 | 0.332 | 0.235 | |
T12 | 0.350 | 0.198 | 0.416 | −0.105 | −0.293 | −0.233 | −0.119 | |
T13 | 0.000 | 0.197 | −0.454 | −0.451 | 0.000 | 0.156 | 0.000 | |
T14 | 0.333 | −0.449 | 0.000 | 0.000 | 0.404 | −0.127 | 0.101 | |
Explained Variance | 0.058 | 0.057 | 0.056 | 0.051 | 0.048 | 0.037 | 0.031 |
Value of Adjusted R-Squared | |||||||||
---|---|---|---|---|---|---|---|---|---|
PC6 | PC7 | PC8 | PC9 | PC10 | PC11 | PC12 | PC13 | PC14 | PC15 |
0.393 | 0.588 | 0.350 | 0.530 | 0.612 | 0.662 | 0.609 | 0.600 | 0.800 | 0.681 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Giuliani, A.; Vici, A. On the (Apparently) Paradoxical Role of Noise in the Recognition of Signal Character of Minor Principal Components. Stats 2024, 7, 54-64. https://doi.org/10.3390/stats7010004
Giuliani A, Vici A. On the (Apparently) Paradoxical Role of Noise in the Recognition of Signal Character of Minor Principal Components. Stats. 2024; 7(1):54-64. https://doi.org/10.3390/stats7010004
Chicago/Turabian StyleGiuliani, Alessandro, and Alessandro Vici. 2024. "On the (Apparently) Paradoxical Role of Noise in the Recognition of Signal Character of Minor Principal Components" Stats 7, no. 1: 54-64. https://doi.org/10.3390/stats7010004
APA StyleGiuliani, A., & Vici, A. (2024). On the (Apparently) Paradoxical Role of Noise in the Recognition of Signal Character of Minor Principal Components. Stats, 7(1), 54-64. https://doi.org/10.3390/stats7010004