Next Article in Journal
A Comparative Evaluation of the Thermal Performance of Passive Facades with Variable Cavity Widths for Near-Zero Energy Buildings (nZEB): A Modeling Study
Previous Article in Journal
Efficient Fault Diagnosis of Elevator Cabin Door Drives Using Machine Learning with Data Reduction for Reliable Transmission
Previous Article in Special Issue
Protein Identification Improvement in Complex Samples Using Higher Frequency MS Acquisition and PEAKS Software
 
 
Article
Peer-Review Record

Common Molecular Mechanisms and Biomarkers in Breast, Colon and Ovarian Cancer

Appl. Sci. 2025, 15(13), 7018; https://doi.org/10.3390/app15137018
by Vicente M. García-Cañizares 1,†, Alejandro González-Vidal 1,2, Antonio M. Burgos-Molina 2,3, Silvia Mercado-Sáenz 2,4, Francisco Sendra-Portero 1,2 and Miguel J. Ruiz-Gómez 1,2,*,†
Reviewer 1:
Reviewer 3: Anonymous
Reviewer 4: Anonymous
Reviewer 5:
Appl. Sci. 2025, 15(13), 7018; https://doi.org/10.3390/app15137018
Submission received: 19 March 2025 / Revised: 6 June 2025 / Accepted: 18 June 2025 / Published: 22 June 2025
(This article belongs to the Special Issue Recent Applications of Artificial Intelligence for Bioinformatics)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

Dear Authors,

  1. The authors are trying to identify common biomarkers across breast, colon, and ovarian cancer. The topic itself is clinically important, no doubt, but this is not really a new idea anymore. Also, when the entire workflow is based on data mining, it raises the question of how much we can trust these findings without further validation.
  2. Most of the work here depends on tools like STRING, GeneCodis4 (which has not even been updated since 2021), GEPIA2, and Metascape. Honestly, I don’t see any solid in-silico analysis going on beyond basic database usage. It feels like the authors are leaning too much on public resources without deeper investigation.
  3. There is no explanation given for why tools like STRING or Metascape were chosen over others. Also, why use the hypergeometric test? Why such an unusually strict p-value like p < 0.0000005? These choices need to be explained better.
  4. The authors claim MSH2, and KIT are promising biomarkers across all three cancers, but based on what is shown, that feels like a stretch. The expression differences are modest, and the survival analysis is not strong enough to support such a broad conclusion.
  5. The paper mentions that MSH2 and KIT are expressed in all cancers but in opposite directions. But what does that really mean biologically? There is no discussion on how this might matter for diagnosis or treatment. That’s a missed opportunity.
  6. Breast, colon, and ovarian cancers are very different biologically and clinically. So, I don’t quite understand why the authors combined all these patients into a single survival analysis. That seems like it would introduce a lot of noise and confounding. This needs more justification.

Author Response

REVIEWER #1

I sincerely appreciate the time and effort dedicated to reviewing our manuscript. Your valuable comments greatly improve the quality of the work, and we truly appreciate your contribution.

The changes made to the manuscript have been highlighted in yellow.

 

  1. The authors are trying to identify common biomarkers across breast, colon, and ovarian cancer. The topic itself is clinically important, no doubt, but this is not really a new idea anymore. Also, when the entire workflow is based on data mining, it raises the question of how much we can trust these findings without further validation.

Response:

We appreciate your comment and acknowledge the importance of validating findings obtained through data mining. However, our study goes beyond data extraction by integrating interaction network analysis and gene expression profiling to identify meaningful patterns. The identification of MSH2 and KIT as common biomarkers is supported by their differential expression and association with better survival rates, reinforcing their potential clinical value.

In addition, the data obtained has been extracted exclusively from curated databases that have undergone prior validation. Data sourced from literature has been excluded, as it lacks validation and is therefore not incorporated into curated databases.

The statement indicating that the data has been exclusively extracted from curated databases is specified on page 1 line 20, page 4 line 146, and page 5 lines 180-181. Marked in yellow in the manuscript.

While the search for shared biomarkers across different cancer types has been explored previously, our work provides a comprehensive perspective by simultaneously analyzing breast, colon, and ovarian cancer, identifying a core set of 27 proteins involved in key biological processes. Furthermore, we believe that future experimental studies could validate our findings, further strengthening their clinical application.

As part of our methodology, we have conducted an initial in silico validation using real patient data through the GEPIA platform. This analysis provides additional support for our results, reinforcing the identification of relevant biomarkers. Nevertheless, we agree that a more exhaustive experimental validation would be ideal for consolidating these findings. However, such validation is beyond the scope of this study, which primarily aims to identify and screen potential biomarkers that facilitate future research and optimize resources by reducing the need for multiple experimental procedures in subsequent stages.

 

The following text has been included in discussion to clarify these aspects, page 19, line 567:

“To ensure the reliability of the findings, data was exclusively obtained from curated databases. Additionally, an in-silico validation was performed using real patient data through the GEPIA platform, providing further evidence of the relevance of the identified biomarkers. While the importance of experimental validation is recognized, the results presented offer a solid foundation for future studies that could confirm their clinical applicability. Furthermore, this approach stands out for its simultaneous integration of data from breast, colon, and ovarian cancer, enabling a comprehensive comparison of shared molecular mechanisms and presenting a novel perspective in the search for common biomarkers.”

 

  1. Most of the work here depends on tools like STRING, GeneCodis4 (which has not even been updated since 2021), GEPIA2, and Metascape. Honestly, I don’t see any solid in-silico analysis going on beyond basic database usage. It feels like the authors are leaning too much on public resources without deeper investigation.

Response:

We appreciate your comment and understand the importance of a robust methodology for in silico studies. While our analysis relies on tools such as STRING, GeneCodis4, GEPIA2, and Metascape, these are widely validated and commonly used platforms in biomedical research for data integration, protein interaction analysis, gene expression profiling, and functional enrichment.

Thank you for your observation and understand your concerns regarding the update frequency of the tools used in our analysis. It is important to note that many of these platforms access databases that are updated periodically, ensuring the relevance and accuracy of the information they provide.

In the specific case of STRING, its database is regularly updated, maintaining up-to-date protein interaction data. For further reference, you can review its update history and usage details at the following link:

https://string-db.org/cgi/access?sessionId=b2RGIhAUYqhl&footer_active_subpage=archive

https://string-db.org/cgi/access?sessionId=b2RGIhAUYqhl&footer_active_subpage=usage

Additionally, some of the tools used in this study function primarily as analytical frameworks that incorporate algorithms for data processing. These algorithms access sources that are updated periodically, even if the algorithm itself is updated less frequently. This distinction does not diminish the validity of the analysis, as the underlying data remains current and reliable.

GeneCodis is not a database itself but rather an analytical tool that extracts information from various databases. The following link specifies the data sources from which it gathers information. Since GeneCodis functions as an analytical tool, its algorithm updates occur less frequently than those of the databases it relies on, which does not affect the validity of the results obtained.

https://genecodis.genyo.es/

11- Database sources used by GeneCodis

Bioplanet

https://tripod.nih.gov/bioplanet/download/pathway.csv

DoRothEA

https://github.com/saezlab/dorothea/raw/master/data/dorothea_hs.rda

https://github.com/saezlab/dorothea/raw/master/data/dorothea_mm.rda

https://github.com/saezlab/dorothea/tree/master/data (mus_musculus, homo_sapiens)

EBI

ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/UNIPROT/goa_uniprot_all.gaf.gz

EPIC

https://webdata.illumina.com/downloads/productfiles/methylationEPIC/infinium-methylationepic-v-1-0-b5-manifest-file-csv.zip

Ensembl

ftp://ftp.ensembl.org/pub/current_gtf (canis_lupus_familiaris, bos_taurus, caenorhabditis_elegans, danio_rerio, drosophila_melanogaster, gallus_gallus, homo_sapiens, mus_musculus, rattus_norvegicus, sus_scrofa)

EnsemblGenomes

ftp://ftp.ensemblgenomes.org/pub/bacteria/current/species_EnsemblBacteria.txt

ftp://ftp.ensemblgenomes.org/pub/bacteria/current/gtf (escherichia_coli_str_k_12_substr_mg1655_gca_000005845)

ftp://ftp.ensemblgenomes.org/pub/plants/current/gtf (arabidopsis_thaliana, oryza_sativa)

ftp://ftp.ensemblgenomes.org/pub/fungi/current/gtf (aspergillus_nidulans, saccharomyces_cerevisiae, candida_albicans_sc5314_gca_000784635)

Gene Ontology

http://purl.obolibrary.org/obo/go.obo

http://geneontology.org/data/sars-cov-2_targets.gaf

http://current.geneontology.org/annotations/ (all organisms)

Human Phenotype Ontology

https://ci.monarchinitiative.org/view/hpo/job/hpo.annotations/lastSuccessfulBuild/artifact/rare-diseases/util/annotation/genes_to_phenotype.txt

Human microRNA Disease Database

http://www.cuilab.cn/static/hmdd3/data/alldata.xlsx

Infinium MethylationEPIC

http://webdata.illumina.com.s3-website-us-east-1.amazonaws.com/downloads/productfiles/methylationEPIC/infinium-methylationepic-v-1-0-b5-manifest-file-csv.zip

KEGG

https://www.kegg.jp/kegg/rest/keggapi.html

LINCS

https://clue.io/touchstone

Mammalian ncRNA-Disease Repository

https://www.rna-society.org/mndr/download/miRNA-disease%20information.zip

Mouse Genome Informatics

http://www.informatics.jax.org/downloads/reports/HOM_AllOrganism.rpt

http://www.informatics.jax.org/downloads/reports/VOC_MammalianPhenotype.rpt

http://www.informatics.jax.org/downloads/reports/MRK_ENSEMBL.rpt

http://www.informatics.jax.org/downloads/reports/MGI_PhenoGenoMP.rpt

NCBI

https://ftp.ncbi.nlm.nih.gov/gene/DATA/gene2go.gz

ftp://ftp.ncbi.nih.gov/gene/DATA/gene_info.gz

ftp://ftp.ncbi.nih.gov/gene/DATA/gene2refseq.gz

ftp://ftp.ncbi.nih.gov/gene/DATA/mim2gene_medgen

ftp://ftp.ncbi.nlm.nih.gov/pub/medgen/NAMES.RRF.gz

ftp://ftp.ncbi.nih.gov/gene/DATA/gene_refseq_uniprotkb_collab.gz

ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene2ensembl.gz

https://ftp.ncbi.nih.gov/pub/taxonomy/taxdmp.zip

Panther

http://data.pantherdb.org/ftp/pathway/current_release/

PharmGKB

https://s3.pgkb.org/data/drugs.zip

https://api.pharmgkb.org/v1/download/file/data/genes.zip

https://api.pharmgkb.org/v1/download/file/data/relationships.zip

Reactome

https://reactome.org/download/current/NCBI2Reactome_All_Levels.txt

TAM2

http://www.lirmed.com/tam2/Public/static/data/mirset_v9.txt

WikiPathways

http://data.wikipathways.org/current/gmt/ (Arabidopsis_thaliana, Bos_taurus, Caenorhabditis_elegans, Canis_familiaris, Danio_rerio, Drosophila_melanogaster, Gallus_gallus, Homo_sapiens, Mus_musculus, Oryza_sativa, Rattus_norvegicus, Saccharomyces_cerevisiae, Sus_scrofa)

miRBase

https://www.mirbase.org/download/CURRENT/miRNA.dat

miRTarBase

https://mirtarbase.cuhk.edu.cn/~miRTarBase/miRTarBase_2019/cache/download/8.0/miRTarBase_SE_WR.xls

 

The applied methodology, based on a sequential analysis using various bioinformatics tools, enables efficient data filtering. This approach ensures that only biologically relevant information is incorporated, preventing the inclusion of insignificant data and thereby enhancing the robustness of the in silico analysis.

We hope this clarification helps provide better context regarding the reliability of the resources used in our work.

Our study is not limited to the basic use of these tools but employs a systematic approach to identify and prioritize common biomarkers in breast, colon, and ovarian cancer, integrating various analyses to generate interpretable and applicable results for future research. The in silico validation using real patient data through GEPIA provides initial support for our findings, reinforcing the identification of relevant biomarkers.

We acknowledge that further experimental validation would be ideal to strengthen these results and consider that our findings establish a solid foundation for future experimental studies.

 

The following text has been included in discussion to clarify these aspects, page 21, line 672:

“The methodology employed in this study integrates multiple bioinformatics tools to ensure a comprehensive and reliable analysis. By combining STRING, GeneCodis4, GEPIA2, and Metascape, the approach allows for robust data filtering and interpretation, minimizing the inclusion of irrelevant information. While these tools rely on publicly available databases, their curation, consistent updates and validation processes maintain high reliability in biomedical research. Furthermore, the sequential nature of the analysis strengthens the identification of meaningful biomarkers, providing a solid foundation for future experimental validation.”

 

  1. There is no explanation given for why tools like STRING or Metascape were chosen over others. Also, why use the hypergeometric test? Why such an unusually strict p-value like p < 0.0000005? These choices need to be explained better.

Response:

We sincerely appreciate your comments and recognize the importance of providing a clear and well-founded rationale for our methodological choices.

STRING and Metascape were selected for this study due to their reliability and widespread acceptance in molecular biology research. STRING offers a well-curated access to multiple curated databases of protein interactions derived from both experimental findings and computational predictions, ensuring a robust foundation for network analysis. Metascape, meanwhile, integrates multiple functional enrichment databases within a unified framework, facilitating a comprehensive interpretation of biological processes. Moreover, it uses a powerful algorithm named as MCODE for identification of core genes/proteins based in relevant biological meanings and interactions. By leveraging the strengths of both platforms, we aimed to obtain a more thorough and informative perspective on shared biomarkers across breast, colon, and ovarian cancer.

 

The following texts has been included in Materials and methods section to clarify these aspects, page 4, line 156:

“STRING was selected for their reliability and widespread use in molecular biology research. It provides access to multiple curated databases containing protein interactions supported by both experimental data and computational predictions, ensuring a robust foundation for network analysis.“

And the following paragraph in page 6, line 202:

“Metascape integrates various functional enrichment resources within a unified analytical framework, facilitating the interpretation of biological processes in an efficient and comprehensive manner. Additionally, Metascape employs the MCODE algorithm to identify core genes and proteins based on biologically meaningful interactions, allowing for the detection of highly interconnected molecular clusters. The combination of these tools enables a systematic approach to biomarker identification.”

Regarding the statistical methodology, the hypergeometric test was employed as it is a well-established approach for enrichment analysis, allowing for the identification of biological categories that contain a greater representation of genes than would be expected by random distribution. This method provides a rigorous and unbiased evaluation of the functional significance of the identified gene sets. This test is used by these applications and we cannot change that use.

We adopted a p-value <0.05 as the significant threshold. However, the reliability of the results found in the analysis showed a p-value as low as 0.00000005 or even lower. These values obtained in the analysis reinforced the associations found and minimize the risk of false positive associations, ensuring high specificity in biomarker identification. Given the large datasets involved, applying a highly restrictive statistical filter strengthens the confidence in our results and aligns with best practices in biomarker discovery. Additionally, comparable thresholds have been utilized in prior studies to ensure the robustness of reported associations.

The following text has been included in Materials and methods section to clarify, page 5, line 185:

The hypergeometric test was employed as a standard method for enrichment analysis to determine whether specific gene sets are represented more frequently in biological categories than expected by chance. This approach ensures a rigorous evaluation of the functional relevance of identified biomarkers. The statistical threshold was set at p<0.05; however, the results obtained showed highly significant associations, with p-values as low as 0.00000005 or even lower. These values reinforce the reliability of the identified associations, minimizing the risk of false positives and ensuring high specificity in biomarker selection. Given the large datasets analyzed, applying a highly restrictive statistical filter enhances confidence in the findings, aligning with best practices in biomarker discovery”.

And the following paragraph in discussion, page 21, line 693:

“The statistical approach employed in this study, including the hypergeometric test and the application of a highly restrictive p-value threshold, reinforces the robustness of the identified associations. By minimizing the risk of false positives and ensuring high specificity in biomarker selection, these methodological choices strengthen confidence in the reported findings. Such stringent criteria align with best practices in biomarker discovery and have been applied in previous studies to enhance result reliability and interpretability.”

We appreciate the opportunity to clarify these methodological aspects and hope this explanation provides further insight into our approach. Please do not hesitate to share any additional concerns or suggestions, as we highly value constructive feedback that contributes to refining our study.

 

  1. The authors claim MSH2, and KIT are promising biomarkers across all three cancers, but based on what is shown, that feels like a stretch. The expression differences are modest, and the survival analysis is not strong enough to support such a broad conclusion.

Response:

We appreciate your comment and understand the importance of ensuring strong evidence when identifying potential biomarkers. While the expression differences observed for MSH2 and KIT are indeed modest, both genes are consistently implicated in critical biological processes such as DNA repair and cell proliferation. Their involvement in multiple cancer-related pathways suggests a functional relevance that warrants further investigation.

MSH2 and KIT were the only genes with statistically significant differential expression across all three analyzed cancer types, unlike other identified proteins that showed variability among different tumors. Their consistent regulation suggests their potential as transversal biomarkers and justifies their study in future experimental validations.

 

To clarify these aspects in the manuscript, the following paragraphs have been included:

In materials and methods section, page 6, line 213:

“Differential gene expression analysis was performed to identify genes consistently altered across breast, colon, and ovarian cancer.”

In results sections, page 16, line 457:

The analysis revealed that MSH2 and KIT exhibited statistically significant differential expression across breast, colon, and ovarian cancer. In contrast, other identified proteins displayed variable expression patterns, with some showing differential expression in certain cancers but not in others. The consistent regulation of MSH2 and KIT suggests their potential role as transversal biomarkers.”

In discussion section, page 19 line 600:

“Although the differential expression levels of MSH2 and KIT may be considered modest, their consistent presence across all three cancer types and involvement in key biological processes—such as DNA repair and cell proliferation—support their functional relevance. This consistency, unlike the variability observed in other proteins, strengthens the rationale for further validation studies to explore their potential as shared biomarkers in cancer research.”

The text: “Specifically, the genes MSH2 and KIT stood out as potential common biomarkers across the three cancer types, showing significant expression alterations in all the cancer types studied.”, has been erased.

 

Regarding the survival analysis, although the association is not highly pronounced, the observed trends indicate a potential link between their expression levels and patient outcomes. We acknowledge that additional studies with larger datasets and experimental validation would be beneficial to further substantiate these findings. Nonetheless, our study provides an initial framework for considering MSH2 and KIT as promising candidates for future biomarker research.

 

To clarify these aspects in the manuscript, the following paragraphs have been included:

In results section, page 17, line 483:

Survival analysis revealed a potential link between MSH2 and KIT expression levels and patient outcomes. Although the observed association was not strongly pronounced, the identified trends suggest that these genes may have a role in prognosis. Further studies with larger datasets will be necessary to confirm these findings and explore their clinical relevance in greater detail.”

In discussion section, page 21, line 702:

“Although the association between MSH2 and KIT expression and patient survival was modest, the observed trends suggest a possible prognostic value. These results highlight the need for further research with larger datasets and experimental validation to confirm their potential as reliable biomarkers. Nonetheless, this study provides an initial framework for considering MSH2 and KIT as potential biomarkers across breast, colon, and ovarian cancer. Therefore”

 

 

 

  1. The paper mentions that MSH2 and KIT are expressed in all cancers but in opposite directions. But what does that really mean biologically? There is no discussion on how this might matter for diagnosis or treatment. That’s a missed opportunity.

Response:

Thank you very much for your comment. We agree with the importance of discussing the biological and clinical implications of MSH2 and KIT expression patterns. MSH2, a key player in DNA mismatch repair, and KIT, a receptor tyrosine kinase involved in cell proliferation, exhibit opposing expression trends across the studied cancers. This divergence may reflect distinct regulatory mechanisms influencing tumor progression and response to treatment.

Understanding how these expression patterns affect tumor behavior is crucial for exploring potential applications in diagnostics and targeted therapies. Given that alterations in MSH2 have been linked to genomic instability and treatment resistance, and KIT dysregulation is associated with aberrant proliferative signaling, their combined assessment could provide valuable insights into cancer stratification.

 

To clarify these aspects in the manuscript, the following paragraphs have been included:

In discussion section, page 18, line 561:

“The opposing expression patterns of MSH2 and KIT across breast, colon, and ovarian cancer may reflect distinct tumor regulatory mechanisms. MSH2, involved in DNA mismatch repair, plays a critical role in maintaining genomic stability, whereas KIT, a receptor tyrosine kinase, influences proliferative signaling. These differences suggest potential implications in treatment responses, particularly regarding therapies targeting genomic integrity and proliferative pathways.

 

In conclusion section, page 22, line 766:

“The opposing expression patterns of MSH2 and KIT across breast, colon, and ovarian cancer highlight potential implications for diagnosis and therapeutic strategies. Their distinct regulatory roles in genomic stability and proliferative signaling suggest that assessing their combined expression could provide valuable insights for patient stratification and treatment selection. Future studies are needed to explore their clinical applicability and validate their relevance as biomarkers in personalized oncology.”

 

 

 

  1. Breast, colon, and ovarian cancers are very different biologically and clinically. So, I don’t quite understand why the authors combined all these patients into a single survival analysis. That seems like it would introduce a lot of noise and confounding. This needs more justification.

Response:

We appreciate your comment and recognize the importance of ensuring a well-justified methodological approach. The combined survival analysis aimed to identify shared biomarkers across breast, colon, and ovarian cancer, focusing on molecular mechanisms common to multiple tumor types rather than their individual biological differences. By integrating these datasets, we sought to highlight genes with potential cross-cancer relevance.

While these cancers exhibit distinct biological characteristics, our approach was designed to minimize confounding effects by applying statistical controls to reduce noise in the analysis. This strategy allows for the identification of trends that may be relevant across different cancer types, providing a broader perspective on biomarker applicability. Nonetheless, we acknowledge that future studies analyzing survival data separately could offer additional insights into cancer-specific mechanisms.

 

To clarify these aspects in the manuscript, the following paragraphs have been included:

In materials and methods section, page 6, line 215:

“To identify potential biomarkers relevant across multiple cancer types, a combined survival analysis was performed, integrating patient data from breast, colon, and ovarian cancer. Despite their biological differences, this approach aimed to highlight molecular mechanisms shared across tumor types. Statistical controls and normalization techniques were applied to minimize confounding effects, ensuring reliable identification of genes with cross-cancer relevance.”

 

In discussion section, page 19, line 608:

“By analyzing survival data collectively, this study identifies trends that may be relevant across multiple cancer types, highlighting potential biomarkers with cross-cancer applicability. Despite biological differences, this approach allows for the detection of shared molecular mechanisms that could inform future research into diagnosis and treatment strategies.”

 

Reviewer 2 Report

Comments and Suggestions for Authors

Review on Common Molecular Mechanisms and Biomarkers in Breast, Colon, and Ovarian Cancer.

 

Dear Authors,

The paper offers an organised analysis of common molecular pathways and biomarkers in breast, colon, and ovarian cancer that use bioinformatics methodologies. The study is useful in discovering possible prevalent targets that may aid in detection or treatment techniques. However, there are numerous areas where detailing, rigor, and impact might be improved.

  • The methodology employs an organized, multi-stage method, and the application of the curated STRING database ensures the veracity of protein data. However, the study did not give information for the criteria used for including/excluding proteins from STRING.
  • To highlight the study’s uniqueness and superiority, the authors must be explicit. While the study cited recent studies on similar processes and biomarkers, they failed to demonstrate clearly how their method is superior to other previous studies. I recommend that the authors compare the previously identified biomarkers (e.g., MSH2 and KIT) to those from previous multi-cancer biomarker experiments. 

Author Response

REVIEWER #2

 

The paper offers an organised analysis of common molecular pathways and biomarkers in breast, colon, and ovarian cancer that use bioinformatics methodologies. The study is useful in discovering possible prevalent targets that may aid in detection or treatment techniques. However, there are numerous areas where detailing, rigor, and impact might be improved.

Response:

I deeply appreciate your words and the time and effort dedicated to reviewing our manuscript. Your valuable comments greatly improve the quality of the work, and we truly appreciate your contribution.

The changes made to the manuscript have been highlighted in green.

 

The methodology employs an organized, multi-stage method, and the application of the curated STRING database ensures the veracity of protein data. However, the study did not give information for the criteria used for including/excluding proteins from STRING.

Response:

We appreciate your comment and the opportunity to clarify the protein selection criteria in our study. As described in the Materials and Methods section, we used the STRING database (version 12.0) due to its reliability and widespread application in molecular research.

Regarding the inclusion/exclusion criteria for proteins, we followed a procedure based on the following considerations:

Data source: Only proteins classified as relevant to breast, colon, and ovarian cancer in curated databases within STRING were included, ensuring reliability and accuracy in the information used.

Duplicate removal: To ensure analytical accuracy, redundant proteins were removed before comparing cancer types.

Exclusion criteria: To maintain the reliability and precision of the study, we excluded proteins that had only been reported in scientific publications without being supported by curated databases. This prevented the incorporation of proteins with limited or unverifiable evidence in the cancer context.

To ensure the relevance of the obtained data, only proteins associated with the search terms listed in Table 1 were selected. These terms were defined to reflect key mechanisms involved in the development and progression of breast, colon, and ovarian cancer. As a result, proteins that were not directly related to these terms were excluded, preventing the inclusion of molecules without a clear connection to the biological processes under investigation. This approach allowed us to generate a list of proteins with strong evidence of involvement in cancer biology, ensuring the accuracy of the analysis

The following text has been included in Materials and methods section, page 4, line 146, to clarify:

To ensure the accuracy and relevance of the analyzed data, a rigorous criterion for the inclusion and exclusion of proteins was applied. Only proteins described in curated databases within STRING were considered, ensuring their support in experimental evidence or systematic reviews. Proteins reported exclusively in scientific literature without validation in curated databases were excluded, avoiding potential biases caused by unverified data. Additionally, only proteins corresponding to the search terms specified in Table 1 were selected, ensuring their direct relevance to the biological processes of breast, colon, and ovarian cancer. This approach enabled the identification of a highly reliable and functionally relevant set of proteins for studying shared mechanisms among these cancer types.

 

To highlight the study’s uniqueness and superiority, the authors must be explicit. While the study cited recent studies on similar processes and biomarkers, they failed to demonstrate clearly how their method is superior to other previous studies. I recommend that the authors compare the previously identified biomarkers (e.g., MSH2 and KIT) to those from previous multi-cancer biomarker experiments. 

Response:

We appreciate your comment and the suggestion to emphasize the uniqueness of our study compared to previous research. In this regard, our approach presents several distinguishing features:

Multi-cancer data integration: Unlike previous studies that analyze biomarkers in individual cancer types, our study identifies common proteins in three types of cancer (breast, colon, and ovarian), providing a broader perspective on shared mechanisms.

Use of curated data and validation across multiple databases: Protein selection was based exclusively on curated databases, ensuring the reliability of the proposed biomarkers and avoiding biases that may arise from unverified data.

Study objective: Our goal is not to report biomarkers superior to those described in previous studies focused on a single cancer type, but rather to identify robust markers through a reliable methodology that are relevant across multiple cancers. This approach highlights the biological relationship between these tumors and lays the groundwork for new diagnostic and therapeutic strategies, considering tumor complexity and shared mechanisms across different cancers.

Differential identification of MSH2 and KIT: While previous studies have reported these biomarkers in isolated contexts, our analysis reveals their simultaneous differential expression in all three cancer types, emphasizing their potential role in patient stratification and targeted therapies. Moreover, identifying common markers across multiple tumors could be valuable in assessing the risk for patients with one tumor to develop another that shares characteristics and biomarkers. This opens new possibilities for clinical surveillance strategies and personalized prevention.

The following paragraph has been included in discussion section to clarify, page 20, line 645:

“Our study does not aim to identify biomarkers superior to those described in previous research focused on a single tumor type but rather to provide a rigorous analysis that reveals robust markers present in multiple cancer types. The identification of shared proteins among breast, colon, and ovarian cancer not only highlights common molecular mechanisms but also presents new perspectives on the biological interconnection between these tumors. Specifically, the differential expression of MSH2 and KIT across the three cancer types suggests their potential role in patient stratification and the design of therapeutic strategies. Furthermore, the presence of common biomarkers in multiple tumors could be key in assessing a patient's risk of developing another cancer with similar biological characteristics.”

We remain open to any further observations or suggestions that may help improve the manuscript. Please do not hesitate to inform us of any additional questions you may have.

 

Author Response File: Author Response.pdf

Reviewer 3 Report

Comments and Suggestions for Authors

This study utilized multiple bioinformatics approaches to explore the shared molecular mechanisms among three cancer types and identify potential novel biomarkers. However, several critical issues regarding data selection and methodology need to be addressed:

  1. What was the scientific basis for selecting these three specific cancer types? Given that some may be prevalent among women, was gender considered as a potential confounding factor in the analyses? Is there clinical or molecular evidence supporting the classification of these as synchronous tumors?
  2. While the study used protein data directly available from STRING, numerous other comprehensive cancer protein databases (e.g., CPTAC, TCGA, or HPA) were not utilized. What was the rationale for restricting data sources, and how might this limitation affect the robustness of the findings?
  3. The omics validation yielded concerning results, with only 2 out of 27 putative biomarkers showing significant differential expression. This low validation rate raises questions about the appropriateness of the initial protein selection criteria and the overall reliability of the identified biomarkers.

Author Response

REVIEWER #3

 

This study utilized multiple bioinformatics approaches to explore the shared molecular mechanisms among three cancer types and identify potential novel biomarkers. However, several critical issues regarding data selection and methodology need to be addressed:

We thank your thoughtful comments and the opportunity to refine our methodology. Your insights are valuable, and we are pleased to address the key concerns raised.

The changes made to the manuscript have been highlighted in blue.

 

  1. What was the scientific basis for selecting these three specific cancer types? Given that some may be prevalent among women, was gender considered as a potential confounding factor in the analyses? Is there clinical or molecular evidence supporting the classification of these as synchronous tumors?

Response:

We appreciate your question regarding the selection of the cancer types included in the study. The choice of breast, colon, and ovarian cancer was based on scientific criteria related to their clinical impact and the potential to identify shared molecular mechanisms. These tumors have high global incidence and have been extensively studied.

Regarding gender as a potential confounding factor, while breast and ovarian cancer predominantly affect women, colon cancer affects both sexes, allowing the exploration of molecular relationships that are not exclusively gender-dependent. Our approach focuses on the analysis of common proteins without making comparisons between sexes; thus, the study does not aim to assess gender-specific differences in biomarker expression.

Concerning the classification of these tumors as synchronous, although they are not traditionally considered synchronous from a clinical perspective, the identification of shared markers across different tumor types supports the hypothesis that certain biological mechanisms may be common. This could have implications for assessing the risk of developing multiple neoplasms with similar characteristics.

The following text has been included in discussion section, to clarify, page 22, line 744:

“Although breast and ovarian cancers predominantly affect women, the inclusion of colon cancer in this analysis allows for the exploration of shared molecular mechanisms without gender being a determining factor. Our approach does not aim to compare sex differences but rather to identify common proteins that could have implications across multiple cancer types. Additionally, while these tumors are not traditionally considered synchronous, the identification of shared biomarkers suggests the presence of common biological mechanisms that may influence the development of tumors with similar characteristics, providing new perspectives in cancer research.”

We remain open to any further observations that may help improve the manuscript.

 

  1. While the study used protein data directly available from STRING, numerous other comprehensive cancer protein databases (e.g., CPTAC, TCGA, or HPA) were not utilized. What was the rationale for restricting data sources, and how might this limitation affect the robustness of the findings?

Response:

We appreciate your comment regarding the selection of databases used in our study. The decision to use STRING was based on its reliability, its extensive use in protein interaction studies, and its integration of curated data from multiple sources. This approach allowed us to work with validated information and avoid biases stemming from unverified data.

While databases such as CPTAC, TCGA, or HPA provide valuable information on protein expression and clinical data, the objective of our study was not to evaluate expression differences at the tissue level or within clinical cohorts but rather to identify proteins with strong evidence of interactions across multiple cancer types considering biological information. The methodology used ensures that the identified biomarkers are supported by multiple sources within STRING, enhancing the reliability of our findings.

We acknowledge that incorporating additional data sources could complement our analysis, and we consider that future research could integrate multiple databases to validate and expand our observations. We remain open to exploring this possibility in subsequent studies.

We welcome any further suggestions or comments that may contribute to improving the manuscript.

The following texts have been included to clarify the use of STRING, also according to the comment of reviewer #1, page 4, line 156:

“STRING was selected for its reliability and widespread use in molecular biology research. It provides access to multiple curated databases containing protein interactions supported by both experimental data and computational predictions, ensuring a robust foundation for network analysis.”

And page 4, line 167:

“While other databases such as CPTAC, TCGA, or HPA provide valuable information on protein expression and clinical data, the objective of our study was not to evaluate expression differences at the tissue level or within clinical cohorts but rather to identify proteins with strong evidence of interactions across multiple cancer types considering biological information.”

 

 

 

  1. The omics validation yielded concerning results, with only 2 out of 27 putative biomarkers showing significant differential expression. This low validation rate raises questions about the appropriateness of the initial protein selection criteria and the overall reliability of the identified biomarkers.

Response:

Thank you very much for your observation regarding the number of validated biomarkers. However, it is important to highlight that the fact that only two out of the 27 identified biomarkers exhibited significant differential expression does not indicate a weakness in the analysis, but rather the opposite: it is the result of a rigorous and highly selective filtering process. Our methodological approach does not aim to maximize the number of biomarkers, but rather to identify those with the highest robustness and biological relevance in the context of breast, colon, and ovarian cancer. An excessively high number of biomarkers would have indicated a deficient filtering process, compromising the specificity of the analysis. The applied methodology, which included advanced bioinformatics tools and differential expression analysis, has allowed us to highlight those biomarkers with the highest likelihood of being relevant across multiple tumor types. In this regard, the validation of MSH2 and KIT reinforces the existence of shared molecular mechanisms among these cancers, underscoring their potential clinical utility in diagnosis and prognosis. Furthermore, the selection process employed ensures that the identified biomarkers are not the product of random variability, but rather established biological patterns, which validates the strength of the study.

To support this point and highlight the robustness provided by the filtering process using bioinformatics tools, which has yet to be fully addressed in the discussion, the following paragraph has been included, page 21, line 684.

“The filtering process applied in this study has enabled the identification of biomarkers with the highest robustness and biological relevance in breast, colon, and ovarian cancer. The selection of 27 proteins and the validation of MSH2 and KIT as differential markers reinforce the existence of shared molecular mechanisms among these tumor types. An excessive number of biomarkers would have indicated a deficient filtering process, compromising the specificity of the analysis. On the contrary, the strict application of methodological and bioinformatics criteria ensures that the selected markers are representative of key processes, providing a solid foundation for future research and clinical validations.”

 

 

Author Response File: Author Response.pdf

Reviewer 4 Report

Comments and Suggestions for Authors

The manuscript “Common molecular mechanisms and biomarkers in breast, colon and ovarian cancer” submitted by Miguel J. Ruiz-Gómez and collegues. This manuscript presents an extensive in-silico analysis aimed at identifying shared molecular mechanisms and potential biomarkers among breast, colon, and ovarian cancers. The authors employ a robust suite of bioinformatics tools and curated databases to identify 27 common proteins, analyze their interactions, and validate key candidates such as MSH2 and KIT using patient gene expression and survival data. The topic is of high relevance, and the integrative approach is well-conceived. However, some areas require clarification and revision to enhance the scientific rigor and clarity of presentation.

Revisions:

  1. However, the conclusions rely entirely on in-silico analyses. The authors should acknowledge this limitation more directly and, ideally, propose follow-up validation in independent datasets or experimental models to support the applicability of their findings.
  2. The statistical methodology needs clearer explanation. The manuscript should specify how p-values were corrected for multiple testing (e.g., FDR, Bonferroni). In the survival analyses, the authors mention Cox proportional hazards but do not report hazard ratios, confidence intervals, or model details.
  3. Several of the identified proteins, such as TP53, BRCA2, and MSH2, are already well-established cancer-related genes. The novelty of the study should be better emphasized, particularly in terms of what new insights it contributes beyond current knowledge.
  4. The discussion on viral pathways and radiation resistance is interesting but should be more cautiously interpreted unless supported by stronger mechanistic data. These associations are speculative in their current form.
  5. Figures and tables need clearer formatting. Table 2 is dense and could be streamlined. Figures 4–7 should include more detailed legends with statistical parameters. Additionally, references to supplementary tables and figures should be accompanied by short descriptions in the main text to improve flow and clarity.
  6. References are missing for lines/statements in the introduction, please insert references where you are discussing previously known facts, statistics or studies.
  7. Language and grammar are generally clear, but some editing would improve precision and readability (e.g., consistent use of “in-silico” vs. “computational,” clearer explanations in the abstract and discussion).
  8. Overall, the study is scientifically sound and addresses an important question in cancer biomarker research. With revisions to improve statistical transparency, completeness, and interpretation, the manuscript has strong potential.

All the best!

Comments for author File: Comments.pdf

Comments on the Quality of English Language

The overall quality of English in the manuscript is adequate but not polished. The writing conveys the scientific content clearly in most places, but there are frequent grammatical issues, awkward phrasing, and inconsistent use of terminology that detract from the manuscript’s professional presentation. Should consult a professional writing expert.

Author Response

REVIEWER #4

 

This manuscript presents an extensive in-silico analysis aimed at identifying shared molecular mechanisms and potential biomarkers among breast, colon, and ovarian cancers. The authors employ a robust suite of bioinformatics tools and curated databases to identify 27 common proteins, analyze their interactions, and validate key candidates such as MSH2 and KIT using patient gene expression and survival data. The topic is of high relevance, and the integrative approach is well-conceived. However, some areas require clarification and revision to enhance the scientific rigor and clarity of presentation.

Response:

Thank you very much for your thoughtful feedback and recognition of our methodological approach. Your positive remarks encourage us to refine our study further and enhance its scientific rigor.

The changes made to the manuscript have been highlighted in light green.

 

Revisions:

  1. However, the conclusions rely entirely on in-silico analyses. The authors should acknowledge this limitation more directly and, ideally, propose follow-up validation in independent datasets or experimental models to support the applicability of their findings.

Response:

We appreciate your comment regarding the need for additional experimental validation. While our study is based on a robust bioinformatics approach using curated data, we acknowledge that validation in experimental models and independent cohorts is a crucial step in confirming the clinical applicability of our findings. In this regard, we propose that future studies include analyses of biological samples and functional assays to evaluate the relevance of the identified biomarkers. Additionally, integrating data from other independent databases would further strengthen the reliability of our results and ensure their reproducibility across different clinical contexts.

Other reviewers have indicated the same limitation. In the discussion section, all these limitations and needs have been argued in response to the other reviewers' comments. As an additional note to the suggestion you made, the following paragraph has been included in the conclusions, page 23, line 773:

“While this study is based on a rigorous in-silico approach, we acknowledge that experimental validation is essential to confirm the clinical relevance of the identified biomarkers. Future research should include validation in independent patient cohorts and experimental models to strengthen the applicability of these findings. Expanding the dataset and performing longitudinal studies could provide deeper insights into the role of these biomarkers in disease progression and treatment response. The methodological framework presented here lays the foundation for further studies that will validate and refine these results in a clinical setting.”

 

  1. The statistical methodology needs clearer explanation. The manuscript should specify how p-values were corrected for multiple testing (e.g., FDR, Bonferroni). In the survival analyses, the authors mention Cox proportional hazards but do not report hazard ratios, confidence intervals, or model details.

Response:

To ensure statistical rigor in our analysis, STRING automatically applies the False Discovery Rate (FDR) correction using the Benjamini–Hochberg procedure. This method is designed to control the expected proportion of false positives among rejected null hypotheses, offering a balance between specificity and statistical power. Unlike the Bonferroni correction, which is highly conservative and may reduce the ability to detect true biological associations, FDR-controlling procedures allow for greater sensitivity while maintaining a controlled error rate. By implementing this correction, we minimize the likelihood of false discoveries while ensuring that significant enrichment findings reflect meaningful molecular associations.

The following paragraph has been included in materials and methods to clarify, page 4, line 160:

“For statistical analysis, STRING automatically applies the False Discovery Rate (FDR) correction using the Benjamini–Hochberg procedure. This method controls the expected proportion of false positives among rejected null hypotheses, ensuring a balance between sensitivity and specificity in biomarker selection. Compared to more conservative approaches like Bonferroni correction, FDR provides greater statistical power while maintaining control over Type I errors. This ensures that the reported associations reflect meaningful biological relevance in the context of cancer research.”

 

Thank you very much for your comment on the survival analysis and the lack of details regarding the Cox proportional hazards model. For these analyses, we used the GEPIA tool, which employs the Log-rank test (also known as the Mantel-Cox test) for survival curve comparisons. GEPIA automatically calculates the hazard ratio (HR) and confidence intervals (CI), which we have reported in the corresponding figure.

To improve the transparency of the results, we have added a clarification in the manuscript specifying that GEPIA performs these calculations automatically and that the reported HR and CI values originate from this analysis.

We hope this information helps clarify our methodology, and we appreciate your observation, which has been useful in improving the presentation of our results.

 

The following paragraph has been included in results section to clarify, page 16, line 475:

“The survival analyses in this study were conducted using the GEPIA2 platform, which employs the Log-rank test (Mantel-Cox test) for hypothesis testing and automatically calculates hazard ratios (HR) along with 95% confidence intervals (CI). The cutoff for high and low gene expression levels was set at 50%. The survival curves generated by GEPIA2 include dotted lines representing the 95% CI, ensuring the robustness of the statistical evaluation. Given that GEPIA performs these calculations automatically based on curated datasets, we directly report the HR and CI values provided by the platform without additional model adjustments, as shown in the corresponding figure.”

 

  1. Several of the identified proteins, such as TP53, BRCA2, and MSH2, are already well-established cancer-related genes. The novelty of the study should be better emphasized, particularly in terms of what new insights it contributes beyond current knowledge.

Response:

Thank you for your comment regarding the novelty of our study. While it is true that genes such as TP53, BRCA2, and MSH2 are widely recognized for their roles in cancer, our approach provides a novel perspective by systematically analyzing their presence and functionality in three distinct cancer types simultaneously. The integration of protein interaction networks, functional enrichment analysis, and survival studies across multiple cancer types has allowed us to identify shared patterns that had not been previously characterized in such detail.

Additionally, our study highlights the differential expression of MSH2 and KIT among breast, colon, and ovarian cancers, a finding that suggests these genes may have broader diagnostic and therapeutic implications than previously known. We have strengthened this section in the revised manuscript to emphasize how these results can contribute to the development of common biomarkers and cross-cancer therapeutic strategies.

We hope these adjustments help to better highlight the contribution of our study beyond existing knowledge. We sincerely appreciate your valuable feedback.

The following paragraph has been included in discussion section to clarify, page 18, line 530:

“While TP53, BRCA2, and MSH2 are well-known cancer-related genes, this study offers a novel perspective by systematically analyzing their common molecular mechanisms across breast, colon, and ovarian cancers. Unlike previous studies that focus on individual tumor types, our integrative approach identifies cross-cancer patterns and highlights potential common biomarkers that could aid in diagnostic and therapeutic strategies. By employing in silico methodologies and curated datasets, our findings contribute to a broader understanding of the molecular similarities among these cancers.”

 

  1. The discussion on viral pathways and radiation resistance is interesting but should be more cautiously interpreted unless supported by stronger mechanistic data. These associations are speculative in their current form.

Response:

Thank you very much for your comment regarding the interpretation of the associations between viral pathways and radiation resistance. We agree that these connections still require more robust validation with experimental data.

To address this concern, we have revised the discussion and adjusted the wording to emphasize that these findings are preliminary and should be explored further in future studies.

The paragraph referred to has been modified to read as follows, page 19, line 613:

“The identification of shared proteins also enabled the evaluation of their involvement in various signaling pathways, including those related to cancer and responses to viral infections. These pathways include PI3K-Akt, MAPK, P53, and pathways related to drug resistance, such as platinum drug resistance. In this regard, previous reports from our laboratory published the relationship between multiple dysregulated proteins and resistance to cisplatin [39], as well as the expression of multiple resistance mechanisms to radiation in tumor cells [40]. Although some studies have suggested potential interactions between theses phenomenon, these associations remain speculative and require further mechanistic validation. Friedenson [41] describes that the transformation caused by the Epstein-Barr virus in mammary cells induces genetic disruption of essential functions for protection against carcinogenesis, suggesting a possible link to tumor progression, though additional experimental studies are necessary to confirm these mechanisms. While the identified viral pathways and radiation resistance mechanisms provide interesting insights, it is important to interpret these findings with caution. The current evidence is based on bioinformatics analyses and previously reported associations, but experimental validation is required to confirm direct mechanistic links. Future studies should include functional assays to determine the extent to which viral-related signaling pathways influence resistance mechanisms in breast, colon, and ovarian cancer. Until such validation is available, these observations should be considered as preliminary hypotheses for further exploration.”

 

  1. Figures and tables need clearer formatting. Table 2 is dense and could be streamlined. Figures 4–7 should include more detailed legends with statistical parameters. Additionally, references to supplementary tables and figures should be accompanied by short descriptions in the main text to improve flow and clarity.

Response:

Thank you for your comment clarifying the tables and figures. Table 2 has been streamlined to include only the description and the essential information regarding the functions of each protein.

The legends of figure 4-8 have been modified to include detailed statistical parameters. They have been also included in the text.

References to supplementary tables and figures are accompanied by short descriptions of the main results found to improve flow and clarity. They have been marked in light green where appropriate along the results section.

 

  1. References are missing for lines/statements in the introduction, please insert references where you are discussing previously known facts, statistics or studies.

Response:

We appreciate your observation regarding the assignment of references in the introduction. We have carefully reviewed the introduction section and adjusted the citations to ensure that each statement is properly supported by the corresponding literature. These modifications enhance the accuracy and coherence of the text, ensuring that the mentioned data and facts rigorously reflect the available information.

The modified references have been highlighted in light green in the introduction section.

 

  1. Language and grammar are generally clear, but some editing would improve precision and readability (e.g., consistent use of “in-silico” vs. “computational,” clearer explanations in the abstract and discussion).

Response:

Thank you for this observation. The language and grammar have been revised along the text.

After reviewing the study's approach, we have determined that in-silico is the most precise term, as the work focuses on specific bioinformatics analyses and simulations. In contrast, computational is a broader concept that encompasses various computational methods, not necessarily centered on the type of simulations and analyses conducted in this study. This choice ensures greater accuracy in methodological descriptions and consistency in terminology, guaranteeing that the language accurately reflects the nature of the work.

Some modifications have been made along the text. They have been marked in light green.

 

  1. Overall, the study is scientifically sound and addresses an important question in cancer biomarker research. With revisions to improve statistical transparency, completeness, and interpretation, the manuscript has strong potential.

All the best!                                                                    

Response:

We sincerely appreciate your comments and the time you dedicated to reviewing the manuscript. We have addressed all the issues raised, and thanks to your valuable observations, we have improved the clarity, precision, and robustness of the article. Your review has been instrumental in optimizing the presentation of our findings and ensuring that the study transparently reflects its impact on cancer biomarker research. Thank you very much for your contribution to this process.

 

The overall quality of English in the manuscript is adequate but not polished. The writing conveys the scientific content clearly in most places, but there are frequent grammatical issues, awkward phrasing, and inconsistent use of terminology that detract from the manuscript’s professional presentation. Should consult a professional writing expert.

Response:

Thank you for your comment. We have reviewed the language in the manuscript to improve clarity and coherence, ensuring a more professional presentation.

 

Author Response File: Author Response.pdf

Reviewer 5 Report

Comments and Suggestions for Authors

Thank you for the opportunity to review this interesting manuscript, which tackles the timely question of shared molecular mechanisms across breast, colon, and ovarian cancers using an in-silico workflow. Below are my detailed comments.

  • Explain why only STRING’s “curated” subset was mined; valuable low-throughput experimental records in other STRING channels were excluded and may bias the protein lists. Justify selection of GeneCodis over more cancer-focused enrichment tools (e.g., g:Profiler, EnrichR).
  • Statistical thresholds / multiple testing: Report how many GO/KEGG terms survived FDR correction rather than p-value alone. Clarify whether log10(p) values in MCODE clusters already reflect multiple-testing correction.
  • RNA-seq expression in GEPIA2 is a useful first screen but does not establish protein-level relevance. Add discussion of limitations and propose future proteomic or immunohistochemical validation on clinical cohorts.
  • High MSH2 or KIT expression is linked to better survival in your pooled analysis. Explain the apparent paradox with existing literature where MSH2 loss drives mismatch-repair deficiency and KIT overexpression is oncogenic in some GI tumours.

Author Response

REVIEWER #5

 

Thank you for the opportunity to review this interesting manuscript, which tackles the timely question of shared molecular mechanisms across breast, colon, and ovarian cancers using an in-silico workflow. Below are my detailed comments.

Thank you very much for the time and effort you have dedicated to reviewing our manuscript. Your comments help us to improve the quality of our work.

The changes made according to your revision have been highlighted in orange.

 

  • Explain why only STRING’s “curated” subset was mined; valuable low-throughput experimental records in other STRING channels were excluded and may bias the protein lists. Justify selection of GeneCodis over more cancer-focused enrichment tools (e.g., g:Profiler, EnrichR).

Response:

This STRING “curated” subset was selected because it contains manually reviewed and validated data, ensuring higher accuracy in identifying relevant proteins. Other STRING channels include predicted or low-throughput interactions, which could introduce noise into the analysis and reduce the reliability of the identified biomarkers.

GeneCodis was chosen for its ability to perform multi-category enrichment analysis, integrating multiple information sources and allowing for a broader interpretation of the biological functions associated with common biomarkers.

Tools like g:Profiler and EnrichR are excellent for studies focused on a single cancer type, but GeneCodis enabled a more global and multidimensional perspective by considering multiple cancer types simultaneously.

Moreover, STRING has been used due to their reliability and widespread acceptance in molecular biology research. STRING offers a well-curated access to multiple curated databases of protein interactions derived from both experimental findings and computational predictions, ensuring a robust foundation for network analysis.

Only STRING’s curated subsets were mined due to their robustness and high confidence level. These curated subsets are based on well-documented, expert-curated protein-protein interactions from trusted sources such as KEGG, Reactome, and Gene Ontology Complexes. These resources offer reliable pathways, complexes, and functional relationships, ensuring a high level of reliability.

Unlike other STRING channels, including the experimental channel, the STRING database channel focuses on curated protein-protein associations and consistently assigns high confidence scores, highlighting their well-established nature.

Due to their high reliability, protein lists obtained from curated ontologies through data mining represent the most robust and dependable research resources currently available.

The STRING experimental channel gathers interaction evidence from laboratory-based assays, including biochemical, biophysical, and genetic experiments. It utilizes data from primary repositories such as BioGRID and the IMEx consortium. Confidence indices are established by globally benchmarking the accuracy of the annotated experimental techniques, as well as by evaluating performance and consistency within the dataset in the context of high-throughput experiments.

At the STRING Database, after selecting a curated geneset from the list provided below and clicking "Continue," the next page will appear. When you click on "Viewers," you will see eight clickable rectangles: one for the current network displayed above and seven corresponding to the different STRING channels. After selecting the experiment channel, it was noted that the average detection confidences were mostly exploratory or medium, with only a few items showing high confidence. The scarcity of high average detection confidence items suggests that the experiment channel may not be a reliable source for data mining.

Certainly, focusing exclusively on STRING's curated subsets during the mining process can introduce bias in protein lists by overlooking other valuable sources, such as low-throughput experimental records from the STRING experiment channel. These records may have lower confidence scores for associations compared to the curated subsets.  Nevertheless, protein lists obtained solely from curated ontologies remain robust and reliable, even if some proteins from less reliable sources or channels were not included in the mined list.

GeneCodis4 is a suitable enrichment tool for cancer research, similar to g:Profiler and EnrichR, as it draws from 21 diverse databases covering 14 species of major model organisms, including Homo sapiens.

GeneCodis4 was chosen for its unique capability as a web tool that performs both Singular Enrichment Analysis (SEA) and Modular Enrichment Analysis (MEA) simultaneously. It allows for the integration of diverse information to functionally characterize genes, proteins, and regulatory elements, including CpG sites, miRNAs, and transcription factors. GeneCodis4 analyzes data related to genes, proteins, and regulatory elements sourced from the National Center for Biotechnology Information (NCBI) and ENSEMBL.

GeneCodis4 is appropriate enrichment tool that is comparable to g:Profiler or EnrichR. GeneCodis4 is a reliable web platform that integrates data from 21 well-established databases, organized into four categories: functional, regulatory, phenotype, and drug databases. The functional category includes several databases, such as BioPlanet, Gene Ontology (which consists of Biological Process, Molecular Function, and Cellular Component), KEGG Pathways, the Mouse Genome Informatics database, Reactome, and WikiPathways.

GeneCodis4 also integrates resources such as MiRTarBase; TAM database for miRNA set enrichment analysis; HMDD, a database of experimentally supported human microRNA; and MNDR (Mammalian ncRNA-Disease Repository), which curates associations between ncRNA and diseases. Additionally, it utilizes the Comparative Toxicogenomics Database (CTD) to explore how environmental chemicals impact human health, the Library of Integrated Network-Based Cellular Signatures (LINCS Consortium); the Pharmacogenomics Knowledgebase (PharmGKB); DisGeNET, a database of gene-d associations; the Human Phenotype Ontology (HPO), a formal ontology of human phenotypes; and the Online Mendelian Inheritance in Man (OMIM), a comprehensive database of human inherited disorders and their genetic foundations.

The following text has been included in discussion to clarify, page 21, line 679:

“This study prioritized validated data by using STRING’s "curated" subset, minimizing noise and ensuring accuracy. GeneCodis was chosen for its multi-category analysis capability, allowing a broader evaluation of common cancer biomarkers. These methodological choices strengthen the reliability of the findings and their future experimental validation.”

 

  • Statistical thresholds / multiple testing: Report how many GO/KEGG terms survived FDR correction rather than p-value alone. Clarify whether log10(p) values in MCODE clusters already reflect multiple-testing correction.

Response:

Thank you for your comment. GeneCodis4 employs the standard hypergeometric distribution test as a statistical method for Singular Enrichment Analysis (SEA), also known as Fisher’s Exact Test. This method is used to assess whether a significant association exists between two categorical variables. All computed p-values are adjusted for multiple testing using the False Discovery Rate (FDR) correction method proposed by Benjamini and Hochberg.

In our research paper, during the GO/KEGG enrichment stage, we used GeneCodis4's corrected p-values (pval_adj) for multiple testing via the False Discovery Rate (FDR) method according to Benjamini and Hochberg. We exclusively considered these adjusted values rather than the unadjusted p-values. This approach ensures robustness for all terms that passed the selection process. Our enrichment criteria included only those terms and their associated genes/proteins with a pval_adj (FDR adjusted) of less than 0.05. Terms with a pval_adj (FDR adjusted) of 0.05 or greater did not meet the FDR correction criteria, were rejected, and were consequently not considered enriched.

METASCAPE enriches all ontology terms using the widely recognized hypergeometric test. It then applies p-value correction using the Benjamini-Hochberg algorithm to provide a q-value that represents the False Discovery Rate (FDR). The false discovery rate indicates the proportion of false positives among all positive results. In other words, it reflects the expected proportion of discoveries or rejected null hypotheses, that are actually incorrect rejections of the null hypothesis. Similarly, the FDR represents the expected ratio of false positive classifications (false discoveries) to the total number of positive classifications (rejections of the null hypotheses).

METASCAPE selects the term with the lowest and most significant p-value from each cluster to represent the clusters in tables, heatmaps, and bar graphs. Additionally, it offers a downloadable functional enrichment summary spreadsheet in CSV format. This spreadsheet includes both plain p-values (LogP) and Benjamini-Hochberg adjusted values, or q-values (Log(q-value)), in separate columns.

As previously mentioned, the log10(p) values displayed in MCODE clusters on the METASCAPE webpage do not reflect multiple testing corrections. Instead, the webpage should display q-values, which appropriately account for multiple testing using the False Discovery Rate (FDR). This adjustment would be a valuable improvement for METASCAPE to consider.

 

  • RNA-seq expression in GEPIA2 is a useful first screen but does not establish protein-level relevance. Add discussion of limitations and propose future proteomic or immunohistochemical validation on clinical cohorts.

Response:

Thank you for your comment. That is absolutely correct; GEPIA2 is a highly valuable web tool for the preliminary screening of RNA sequencing expression. The next logical step in this research is to assess the relevance of KIT and MSH2 protein levels through proteomics. This involves quantitation, isolation, separation, detection, and analysis of proteins using chromatographic, electrophoretic, and immunochemical techniques (immunoassays). This analysis should be conducted across multiple patient cohorts with breast, colon, and/or ovarian cancers at all stages of the disease, beginning with initial studies focused on the Andalusia region in southern Spain. Additionally, analyzing microRNA and non-coding RNA profiles within these clinical cohorts would provide valuable insights for clinicians.

The following text has been included in discussion to clarify, page 21, line 710:

“Although RNA-seq expression analysis through GEPIA2 provides valuable insights into gene expression differences between tumor and normal tissues, it does not establish protein-level relevance, which is essential for understanding functional implications. As gene expression does not always correlate directly with protein abundance or activity, future studies should incorporate proteomic analyses, such as mass spectrometry, to confirm protein expression patterns in clinical samples. Additionally, immunohistochemical validation in independent patient cohorts could provide further evidence of the biological significance of MSH2 and KIT as potential biomarkers. These complementary approaches will strengthen the translational impact of the findings and improve their applicability in personalized oncology.”

 

  • High MSH2 or KIT expression is linked to better survival in your pooled analysis. Explain the apparent paradox with existing literature where MSH2 loss drives mismatch-repair deficiency and KIT overexpression is oncogenic in some GI tumours.

Response:

Thank you very much for this important observation and suggestion.

KIT is a type 3 transmembrane receptor protein for stem cell factor, also known as mast cell growth factor (MGF). KIT is a proto-oncogene whose overexpression or mutations can lead to cancer development in patients. Mutations in this gene are associated with gastrointestinal (GI) stromal tumors, mast cell disease, acute myelogenous leukemia and piebaldism (http://gepia2.cancer-pku.cn/#index).

Interestingly, according to GEPIA2 (http://gepia2.cancer-pku.cn/#general), the gene expression profile of KIT varies across all tumor samples and normal tissues depending on the organ under consideration. In other words, KIT expression levels [median expression: log2(TPM +1)] is oncogenic in certain tumors in specific organs compared to the corresponding normal tissue, for example:

  • Adrenocortical carcinoma (ACC) [Tumor: 1.33; Normal: 0.31]
  • Cholangiocarcinoma (CHOL) [Tumor: 0.69; Normal: 0.33]
  • Lymphoid Neoplasm Diffuse Large B Lymphoma (DLBC) [Tumor: 0.24; Normal: 0.11]
  • Kidney Chromophobe (KICH) [Tumor: 6.47; Normal: 322]
  • Acute Myeloid Leukemia (LAML) [Tumor: 5.54; Normal: 2.18]
  • Pancreatic Adenocarcinoma (PAAD) [Tumor: 2.15; Normal: 0.7]
  • Pheochromocytoma Paraganglioma (PCPG) [Tumor: 1.26; Normal: 0.42]
  • Testicular Germ Cell Tumors (TGCT) [Tumor: 4; Normal: 2.06]

Conversely, in humans, the lower expression, rather than overexpression, of KIT is oncogenic in certain tumors in specific organs, for example:

  • Breast Invasive Carcinoma (BRCA) [Tumor: 2.36; Normal: 5.02]
  • Colon Adenocarcinoma (COAD) [Tumor: 1.06; Normal: 3.09]
  • Rectum Adenocarcinoma (READ) [Tumor: 1.34; Normal: 3.17]
  • Ovarian Serous Cystadenocarcinoma (OV) [Tumor: 0.4; Normal: 4.06]
  • Cervical Squamous Cell Carcinoma and Endocervical Adenocarcinoma (CESC) [Tumor: 0.78; Normal: 2.45]
  • Kidney Renal Papillary Cell Carcinoma (KIRP) [Tumor 1.07; Normal: 3.08]

The context-dependent role of KIT expression and its organ-specific variability, presents a promising area for further research. Instead of a simple activation or inhibition strategy, it is essential to explore the functional impact of KIT modulation in different tumor types. Future studies should investigate whether targeted modulation of KIT signaling could improve survival outcomes in breast, colon, and ovarian cancers, rather than assuming a universal activation approach.

In cases where non-target organs develop cancer, KIT overexpression is significantly elevated, as indicated by high RNA sequencing data. In such cancers, KIT inhibition remains a clinically relevant strategy for controlling tumor progression. Current tyrosine kinase inhibitors targeting KIT, such as Imatinib, Sunitinib, Dasatinib, and Sorafenib, exemplify this approach.

This reasoning is important because using KIT inhibitors may effectively treat cancer in specific organs while inadvertently promoting its development in others. In cancers where KIT is underexpressed, its modulation should be carefully evaluated in experimental and clinical studies.

Although this concept may seem unconventional, it could be highly relevant for personalized oncology, ensuring that therapeutic interventions align with the unique molecular characteristics of each tumor type.

The following text has been included in discussion to clarify, page 22, line 720:

“The association of high MSH2 and KIT expression with better survival outcomes in our analysis contrasts with previous reports where MSH2 loss leads to mismatch-repair deficiency and KIT overexpression is implicated in oncogenesis in certain gastrointestinal tumors. This apparent paradox underscores the context-dependent role of these proteins across different cancer types. While MSH2 is essential for DNA repair, its differential expression may reflect compensatory mechanisms in tumors attempting to maintain genomic integrity. Similarly, KIT signaling can contribute to oncogenesis in specific tissues, but its regulation varies depending on genetic alterations, tumor microenvironment, and cellular context. The organ-specific variability of KIT expression highlights the need for a nuanced approach to its modulation in cancer therapy. Rather than assuming a simple activation or inhibition strategy, future studies should explore the functional implications of KIT signaling in different tumor types. Targeted modulation of KIT, considering its oncogenic or tumor-suppressive roles in specific cancers, could provide a more refined therapeutic approach. In cases where KIT overexpression correlates with oncogenesis, inhibition remains a clinically relevant strategy for controlling tumor progression, as demonstrated by current tyrosine kinase inhibitors such as Imatinib, Sunitinib, Dasatinib, and Sorafenib. Conversely, in cancers where KIT is underexpressed, its potential functional restoration should be carefully evaluated through proteomic and immunohistochemical studies to determine its relevance in disease progression and response to therapy. These discrepancies underscore the complexity of tumor biology and reinforce the need for experimental validation to clarify the functional significance of MSH2 and KIT expression in breast, colon, and ovarian cancer. Future studies integrating proteomic analyses, mechanistic experiments, and clinical cohort validation will be essential to defining their precise roles in personalized oncology.”

 

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

The manuscript has been significantly enhanced, and all concerns raised have been thoroughly addressed.

Reviewer 3 Report

Comments and Suggestions for Authors

The new version has not sloved my previous concern, I am so soory that I can not recommend the acceptance.

Back to TopTop