# Partition Quantitative Assessment (PQA): A Quantitative Methodology to Assess the Embedded Noise in Clustered Omics and Systems Biology Data

^{1}

^{2}

^{*}

^{†}

## Abstract

**:**

## Featured Application

**A method to quantify statistically the intrinsic noise of clustered data**.

## Abstract

## 1. Introduction

## 2. Methodology

#### 2.1. Assigning Numeric Labels to Classifications

#### 2.2. Partition Quantitative Assessment (PQA) Score

_{i}(order vector i-th position), n (length of x), ${\rho}_{i}$ (resulting SC)).

#### 2.3. Background-Noise Correlation Factor in the PQA Score

#### 2.4. Statistical Significance of the PQA Score

#### 2.5. Defining Noise Proportions

#### 2.6. Effect of the Length and Number of Partitions of the Vector in the Z-Score Distributions

## 3. Results and Discussion

#### 3.1. Effects of Permuted Numeric Labels on the Partition

#### 3.2. Length of Partitions as a Proxy of the Number of Classifications

#### 3.3. Proof of Concept: Quantifying Real Noise

#### 3.3.1. Cancer Methylation Signatures

^{−17}), both numbers imply that even though there may be disordered in the VP, there is not a very high noise proportion nor a high PQA score. These results suggest that, like any other statistical test, the longer the number of items in the partition the more diluted is the effect of disorder in the VP, and the results also lead to a greater statistical significance as shown in the analysis of the number of items and classifications. Moreover, the authors concluded that their clustering analysis results made sense from their molecular and biological background, as well as the perspectives about the analyzed profiles; they only assessed grouping just by visual inspection and concluded the grouping was done well. However, understanding the noise in the cluster can help to pursue better markers since it could help to narrow the search space in these kinds of studies.

#### 3.3.2. Distribution of microRNAs in Cancer

^{−10}) providing a quantitative assay to support the grouping that the authors claimed. Furthermore, in comparison with the methylation profiles discussed above, we can appreciate that a partition which appears even less fuzzy has an even higher noise ratio, supporting the idea of how visual inspection could lead to misleading results.

#### 3.3.3. Comparison of Genetic Regulatory Networks with Theoretical Models

^{−40}, Z-score = 13.2) and the proportion of noise was 5.8% (Figure 6). In contrast to the previous examples, here we obtained a highly ordered clustering and a very low proportion of noise, which suggests that although the models recapitulate some of the properties of genetic regulatory networks, each of them is not sufficient to capture their structural properties.

## 4. Conclusions

## Author Contributions

## Funding

## Institutional Review Board Statement

## Informed Consent Statement

## Data Availability Statement

## Acknowledgments

## Conflicts of Interest

## References

- Kang, S.; Kim, B.; Park, S.B.; Jeong, G.; Kang, H.-S.; Liu, R.; Kim, S.J. Stage-specific methylome screen identifies that NEFL is downregulated by promoter hypermethylation in breast cancer. Int. J. Oncol.
**2013**, 43, 1659–1665. [Google Scholar] [CrossRef] [PubMed] - Kiselev, V.Y.; Andrews, T.S.; Hemberg, M. Challenges in unsupervised clustering of single-cell RNA-seq data. Nat. Rev. Genet.
**2019**, 20, 273–282. [Google Scholar] [CrossRef] [PubMed] - Al-Harbi, S.H.; Rayward-Smith, V.J. Adapting k-means for supervised clustering. Appl. Intell.
**2006**, 24, 219–226. [Google Scholar] [CrossRef] - Hassani, M.; Seidl, T. Using internal evaluation measures to validate the quality of diverse stream clustering algorithms. Vietnam. J. Comput. Sci.
**2017**, 4, 171–183. [Google Scholar] [CrossRef][Green Version] - Fyfe, S.; Williams, C.; Mason, O.J.; Pickup, G. Apophenia, theory of mind and schizotypy: Perceiving meaning and intentionality in randomness. Cortex
**2008**, 44, 1316–1325. [Google Scholar] [CrossRef] [PubMed] - Getmansky, M.; Lo, A.W.; Makarov, I. An econometric model of serial correlation and illiquidity in hedge fund returns. J. Financial Econ.
**2004**, 74, 529–609. [Google Scholar] [CrossRef][Green Version] - Shen, J.; Hu, Q.; Schrauder, M.; Yan, L.; Wang, D.; Medico, L.; Guo, Y.; Yao, S.; Zhu, Q.; Liu, B.; et al. Circulating miR-148b and miR-133a as biomarkers for breast cancer detection. Oncotarget
**2014**, 5, 5284–5294. [Google Scholar] [CrossRef] [PubMed][Green Version] - Toyooka, S.; Toyooka, K.O.; Maruyama, R.; Virmani, A.K.; Girard, L.; Miyajima, K.; Brambilla, E. DNA Meth-ylation Profiles of Lung Tumors. Mol. Cancer Ther.
**2001**, 1, 61–67. [Google Scholar] [PubMed] - Schieber, T.A.; Carpi, L.; Díaz-Guilera, A.; Pardalos, P.M.; Masoller, C.; Ravetti, M.G. Quantification of network structural dissimilarities. Nat. Commun.
**2017**, 8, 13928. [Google Scholar] [CrossRef] [PubMed][Green Version] - Escorcia-Rodríguez, J.M.; Tauch, A.; Freyre-González, J.A. Abasy Atlas v2.2: The most comprehensive and up-to-date inventory of meta-curated, historical, bacterial regulatory networks, their completeness and system-level characterization. Comput. Struct. Biotechnol. J.
**2020**, 18, 1228–1237. [Google Scholar] [CrossRef] [PubMed] - Barabási, A.-L.; Oltvai, Z.N. Network biology: Understanding the cell’s functional organization. Nat. Rev. Genet.
**2004**, 5, 101–113. [Google Scholar] [CrossRef] [PubMed]

**Figure 2.**Z-scores of the PQA scores from partitions varying in the number of classifications and the length of the partition.

**Figure 3.**Visual representation of clustered data used to assess the method. (

**a**) Dataset from Jie Shen et al. (

**b**) Dataset from Tooyoka et al.

**Figure 4.**Z-score distribution by percentage of randomized items. (

**a**) Dataset from Jie Shen et al. (

**b**) Dataset from Tooyoka et al. The red dots represent the Z-score interpolation of the corresponding data sets.

**Figure 5.**Cluster analysis of distance among gene regulatory networks and theoretical network models. The abbreviations and colors used in the posterior classification are as follows: Barabasi–Alberts (BA, red), Erdos–Renyi (ER, blue), scale-free (SF, green), hierarchical modularity (HM, purple), and biological networks (Bi, orange).

**Figure 6.**Z-score distribution by percentage of randomized items of vector of profiles (VP) from genetic regulatory networks. The red dot represents the Z-score interpolation of the actual data set.

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Camacho-Hernández, D.A.; Nieto-Caballero, V.E.; León-Burguete, J.E.; Freyre-González, J.A.
Partition Quantitative Assessment (PQA): A Quantitative Methodology to Assess the Embedded Noise in Clustered Omics and Systems Biology Data. *Appl. Sci.* **2021**, *11*, 5999.
https://doi.org/10.3390/app11135999

**AMA Style**

Camacho-Hernández DA, Nieto-Caballero VE, León-Burguete JE, Freyre-González JA.
Partition Quantitative Assessment (PQA): A Quantitative Methodology to Assess the Embedded Noise in Clustered Omics and Systems Biology Data. *Applied Sciences*. 2021; 11(13):5999.
https://doi.org/10.3390/app11135999

**Chicago/Turabian Style**

Camacho-Hernández, Diego A., Victor E. Nieto-Caballero, José E. León-Burguete, and Julio A. Freyre-González.
2021. "Partition Quantitative Assessment (PQA): A Quantitative Methodology to Assess the Embedded Noise in Clustered Omics and Systems Biology Data" *Applied Sciences* 11, no. 13: 5999.
https://doi.org/10.3390/app11135999