Next Article in Journal
Unveiling Biological Activities of Marine Fungi: The Effect of Sea Salt
Previous Article in Journal
MenuNER: Domain-Adapted BERT Based NER Approach for a Domain with Limited Dataset and Its Application to Food Menu Domain
Previous Article in Special Issue
A Comparative Cross-Platform Meta-Analysis to Identify Potential Biomarker Genes Common to Endometriosis and Recurrent Pregnancy Loss
 
 
Article
Peer-Review Record

Partition Quantitative Assessment (PQA): A Quantitative Methodology to Assess the Embedded Noise in Clustered Omics and Systems Biology Data

Appl. Sci. 2021, 11(13), 5999; https://doi.org/10.3390/app11135999
by Diego A. Camacho-Hernández 1,2,†, Victor E. Nieto-Caballero 1,2,†, José E. León-Burguete 1,2 and Julio A. Freyre-González 1,*
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Appl. Sci. 2021, 11(13), 5999; https://doi.org/10.3390/app11135999
Submission received: 28 December 2020 / Revised: 8 January 2021 / Accepted: 10 January 2021 / Published: 28 June 2021
(This article belongs to the Special Issue Towards a Systems Biology Approach)

Round 1

Reviewer 1 Report

This article proposes a measure score called Partition Quantitative Assessment (PQA) to solve the subjectivity of the visual inspection of hierarchical clustered items to detect noise.

 

Comments:

Authors might consider changing the title from "...in clustered omics..." into "... in hierarchical clustered omic..." as the study just covered the hierarchical clustering approach.

 

Abstract:

Line 20 "none measure has been developed to statistically quantify the noise in an arranged vector posterior a clustering algorithm, i.e., how much of the clustering is due to randomness"

This article maps the noise with randomness. It would make more sense to map the randomness with insignificance. A cluster might have random objects in it, these objects have to be somehow similar to be grouped in that cluster. Thus, the cluster would be considered insignificant, not a noisy cluster. This point needs to be explained in the article.

 

Line 41 "In several papers..." --> cite a few of them

 

Line 46 define the "intrinsic and extrinsic" measures

 

Line 56 “Unfortunately, as much as we know, there is …. ”

the motivation of this study is not that clear, why the need to replace the visual inspection method with a new measure? Would this new measure be sufficient to replace the visual inspection? The results show that this method detected only 25% to 30% of noise.

 

proofreading for errors

Line 48 "PQA gathers these elements no to"  --> not

Line 175 "though there may be disordered I the VP" --> what is "I"?

Line 235 "and concluded that that the effect is negligible" --> that

 

Define symbols used in Eq. 1 and 2

Line 113 "the mean of PQA scores of one thousand randomizations of the VP" --> a justification for the thousand times is needed. Does it depend on the size of the dataset?

 

160 Proof of concept:

"Cancer methylation signatures dataset": "detected 25.1% of noise"

Line 179 “authors concluded that their clustering analysis results made sense from their molecular and biological background, as well as the perspectives about the analyzed profiles, they only assessed grouping just by visual inspection and concluded the grouping was well done”

If the clusters with noise made sense to the researchers, why they should use the proposed measure to detect the 25% of noise?  

 

 

Author Response

This article proposes a measure score called Partition Quantitative Assessment (PQA) to solve the subjectivity of the visual inspection of hierarchical clustered items to detect noise.

Comments:

Authors might consider changing the title from "...in clustered omics..." into "... in hierarchical clustered omic..." as the study just covered the hierarchical clustering approach.

R. We only use hierarchical clustering in the proof of concept since it is one of the most used in today's omics, however, this is not a limiting factor to applied the method. We have incorporated the following in the conclusion:

Lines 253-255: “Although in this work we focused on examples where hierarchical clustering is performed, this framework can apply to any partition algorithm in which the elements are identified and a vector of the order can be acquired”.

Abstract:

Line 20 "none measure has been developed to statistically quantify the noise in an arranged vector posterior a clustering algorithm, i.e., how much of the clustering is due to randomness"

This article maps the noise with randomness. It would make more sense to map the randomness with insignificance. A cluster might have random objects in it, these objects have to be somehow similar to be grouped in that cluster. Thus, the cluster would be considered insignificant, not a noisy cluster. This point needs to be explained in the article.

R.  We appreciate the comment of the reviewer, we agree that. If a cluster joins random objects with true data it means that the algorithm found similarities between them. Nonetheless, we provide PQA as a measure of a support to any cluster analysis. Each researcher needs to think about this effect and try to minimize it, that is why we highlight it in our conclusion.

Lines 262-267: “The PQA could be used as a benchmark to test what clustering algorithm should be appropriate for the analyzed dataset by minimizing the noise proportion and to guide omics experimental designs. Nevertheless, a word of caution, the PQA score alone can be subject to subjectivity if not used properly since it depended on the characteristics of the analyzed data. Thus, the PQA score is thought to be considered as a quantification of noise in clustered data and should be used with discretion.”

 

Line 41 "In several papers..." --> cite a few of them

R.  We have not found explicit examples of the inappropriate use of this concept; but we know that these terms could be confused especially for those not familiar with statistical methods or computational work, we changed the paragraph and now reads as follows:

Lines 41-43: “This procedure should not be confused with “supervised clustering”, which provides a vector of classes starting the desired partitioning a priori. This is then used to guide the clustering algorithms by allowing the learning of the metric distances that optimizes the partitioning”

Line 46 define the "intrinsic and extrinsic" measures

R. We have defined them according to the reviewer’s suggestion. The sentence now reads:

Lines 46-48: “These metrics are used for clustering algorithm validation. The extrinsic validation compares the clustering to a goal to say whether it is a good clustering or not. The internal validation compares the elements within the cluster and their differences [4].”

These metrics are primarily used to assess the performance of a clustering algorithm. Not to be confused with PQA, since this metric quantify the noise of a resulting cluster.

Line 56 “Unfortunately, as much as we know, there is …. ”

the motivation of this study is not that clear, why the need to replace the visual inspection method with a new measure? Would this new measure be sufficient to replace the visual inspection? The results show that this method detected only 25% to 30% of noise.

R. In order to explain a little further we have added the following in Lines 61-63: “This is a serious caveat, since the insertion of noise can lead to false conclusion or misleading results. Furthermore, the purging of this noise can lead to a more efficient descriptions of markers and its phenomena, accelerating the advance in many fields”.  

proofreading for errors

Line 48 "PQA gathers these elements no to"  --> not

R. Corrected  

Line 175 "though there may be disordered I the VP" --> what is "I"?

R. Corrected, it was a typo of “in”

Line 235 "and concluded that that the effect is negligible" --> that

R. Corrected

Define symbols used in Eq. 1 and 2

R. Lines 102-103 : “Equation 1, xi (order vector i-th position), n (length of x),  ρi (resulting SC)” and Lines 109-111: “Equation 2, ρx  (SC of the VP), ρRandx (Mean of the SC of one thousand randomizations), ρPerfectx (SC of the sorted vector in ascending order)”.

Line 113 "the mean of PQA scores of one thousand randomizations of the VP" --> a justification for the thousand times is needed. Does it depend on the size of the dataset?

R. That number was chosen to assure a solid random background to compare it to the real signal and yield a Z-score with statistical significance, and at the same time not increase the computational time to run the method unnecessarily.

We have incorporated the following to the text:

Lines 127-129: “These randomizations have the purpose of generating a solid random background to compare it to the real signal. The number of randomizations does not depend on the size of the VP.”

160 Proof of concept: "Cancer methylation signatures dataset": "detected 25.1% of noise"

Line 179 “authors concluded that their clustering analysis results made sense from their molecular and biological background, as well as the perspectives about the analyzed profiles, they only assessed grouping just by visual inspection and concluded the grouping was well done”

If the clusters with noise made sense to the researchers, why they should use the proposed measure to detect the 25% of noise? 

R. They make sense out of the cluster, since not all of it may be noise. Now, we think it is relevant to assess the noise to pursue better markers that can classify the elements in a more efficient way.

We added the next sentence to the text:

Lines 196-198: “However, understanding the noise in the cluster can help to pursue better markers since it could help to narrow the search space in these kind of studies”.

Reviewer 2 Report

The presented work is of clinical relevance.

Too often publications interpret in wrong manner results of unsupervised clustering leading to discordant and inappropriate conclusions.

It is therefore mandatory to develop methods to evaluate the clustering algorithm results.

The presented Partition Quantitative Assessment (PQA) provide an original idea based on serial correlation, substituting the time effect with the several variable detection to judge the quality of the clustering. The serial correlation is tested evaluating the vector of profiles on 1000 permutation of each vector.

Test revision should be performed mainly to correct typos.

In figure 1 the Pearson correlation of lagged partitions do not seem to correspond to the plotted unsupervised hierarchical clustering. This should be modified.

The results of the three examples reported should be better further discussed.

Author Response

The presented work is of clinical relevance.

Too often publications interpret in wrong manner results of unsupervised clustering leading to discordant and inappropriate conclusions.

It is therefore mandatory to develop methods to evaluate the clustering algorithm results.

The presented Partition Quantitative Assessment (PQA) provide an original idea based on serial correlation, substituting the time effect with the several variable detection to judge the quality of the clustering. The serial correlation is tested evaluating the vector of profiles on 1000 permutation of each vector.

Test revision should be performed mainly to correct typos.

R. Corrected  

In figure 1 the Pearson correlation of lagged partitions do not seem to correspond to the plotted unsupervised hierarchical clustering. This should be modified.

R. It does not correspond to the plotted unsupervised hierarchical clustering since the purpose of figure 1 is visually explain the general pipeline of the method, we decide to put a vector of small size only to represent the concept of lagged partitions.

The results of the three examples reported should be better further discussed.

R. In the first two examples we contrast our results with the correspondent interpretations of the papers where data comes from, we have added the following to make a point regarding how the visual inspection of different patterns can be equivocal:

Lines 214-216: “Furthermore, in comparison with the methylation profiles discussed above, we can appreciate that a partition which appear even less fuzzy has even a higher noise ratio, supporting the idea of how visual inspection could lead to misleading results.”

And also, we expanded our discussion in the third example:

Lines 233-235: “In contrast to the previous examples, here we obtained a highly ordered clustering and a very low proportion of noise, which suggests that although the models recapitulate some of the properties of genetic regulatory networks, each of them is not sufficient to capture their structural properties”.

Back to TopTop