Review Reports - Interrelational Proteomic Sequence Features Enhance Predictive Modeling: Application to COVID-19 Severity

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

The proposed web server is conceptually sound and has the potential to assist in the identification of descriptors that may contribute to the development of new protein categorization and prediction tools. However, during its use, I encountered several practical difficulties. In addition, some limitations and operational constraints of the web server are not sufficiently clarified in the manuscript and should be addressed, as detailed below. Furthermore, the Introduction would benefit from a clearer and more in-depth justification for the choice of the proposed descriptors, including their biological relevance and how they relate to existing approaches in protein characterization and prediction.

I attempted to use the web server with the UniProt IDs provided in the examples; however, none of them worked. For all tested IDs, the following error message was returned:
“Sorry! Some protein IDs have not been found in UniProt: XXXXX. Please, check that you have used correct protein entry names according to the UniProtKB format. Died at protein_features.pl line 171.”
I also tested the FASTA sequences provided in the examples and obtained the same error message.
The web server only functioned when I downloaded the sequence files from the example dataset and uploaded them directly to the server. This limitation should be clearly stated.
The output table could be more informative. For example, the ID column could include more descriptive identifiers instead of only abbreviations. In addition, the meaning of the numbers following domain identifiers in the SEQ_CT and SEQ_CK columns (e.g., PF00155 (4)) is not explained.
Does the web server only support protein sequences available in the UniProt database? If so, this restriction should be explicitly clarified in the manuscript.
Line 143: How are structural contacts computed?
Lines 213–214: “Anyway, when a user requests a protein not yet available locally, the system retrieves the necessary annotations in real time and incorporates them into the database.”
Does this apply to any protein sequence, or only to those present in the UniProt database?
Line 277: I am not fully convinced of the benefits of deriving MSA-based statistics from proteins that are functionally and evolutionarily unrelated. Additional justification or clarification would be helpful.
Line 329: Which features were excluded from the ranking procedure?

Author Response

Please see the attachment

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

In this study, the authors proposed a tool called INPROF for calculating protein features. It generates inter-protein relationship features through multiple sequence alignment (MSA) and annotates protein sequences, structures, and functions. What’s more, they used COVID-19 data to demonstrate that these features are more effective than simple gene expression in classifying disease severity. This is a solid and valuable tool paper, and the study is clearly written, logically structured, and easy to follow. The web server and GitHub further enhance the reproducibility and practical utility of the work. Here are my comments.

It would be helpful to briefly clarify how variability in the number of DEGs (and corresponding proteins) across patients is handled when computing the final feature values.
A brief discussion of how INPROF features might generalize to other diseases or independent cohorts would further strengthen the manuscript and emphasize the broader applicability of the tool.

This study addresses whether inter-protein relationship features derived from multiple sequence alignment can provide a more informative representation than gene expression features for disease severity classification and introduces INPROF as a practical computational tool integrating protein sequence, structural, and functional annotations. The topic is relevant, and the approach is original in its focus on relationship-aware protein features, addressing a gap between transcriptomic analyses and protein-level modeling. Compared with existing methods, INPROF offers a distinct and useful framework, further strengthened by the availability of a web server and open-source implementation. The methodology is sound and supported by sufficient data, and the conclusions are consistent with the presented results. Minor clarifications regarding how variability in the number of differentially expressed genes across patients is handled, as well as a brief discussion of generalizability to other diseases or cohorts, would further strengthen the manuscript. The references, tables, and figures are appropriate. Overall, this is a valuable tool paper that requires only minor revision.

Author Response

Please see the attachment

Author Response File: Author Response.pdf

Reviewer 3 Report

Comments and Suggestions for Authors

Your work reports a newly developed webserver for MSI seemingly based on basic statistical properties of amino acid chains, if improved in some claimed and tested areas
the work well deserve publication.

line 72
How exactly is this RNA seq analysis applied in this study ?
Such analysis usually capture functional snapshot of how cells respond to biological perturbations. The one case study here is both inadequate
and not valid enough to draw such widely misleading claims. Authors should tone down the admiration of their webserver which has yet to prove its efficiency.

line 131
An obvious weakness of such properties are their lack of 3D relations vital to understanding PPIs, allosteric networks as well as catalytic sites. authors should
provide a table of all 46 features and comment on whether or not they are base don thes basic statistical properties only.

line 142
This is just an example of how this webserver is capable of reporting some less noisy 3D related interpretation of protein seq data authors should give more
specifics of these claimed structural annotations and whether or not they seem to appear in their sensitive feature set in further downstream analysis.

Figure 3 line 317
while UMAP can reflect biologically meaningful disease progression and immune trajectories , t-SNE often tends to break trajectories into artificially
separated clusters specially when reproducibility and integration is vital. How are authors confident that they are captring real dat and not noise?

Figure 3 line 342
Parameter sensitivity is one of the worst most commonly encountered disadvantages of tSNE, yet authors use it as a evaluation metric. How authors can explain it.
At least authors need to do other dimensionality reduction techniques like UMAP for comparison. Since if UMAP fails to capture something most likely the observed
trend is just and over separation not real neighboring cluster

Comments for author File: Comments.pdf

Author Response

Please see the attachment

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

I acknowledge the authors’ effort in carefully addressing all of my previous comments and suggestions, both in the manuscript and in the web server.

Overall, the revisions have substantially improved the clarity and quality of the work. In my opinion, the manuscript is now in good shape, and the current version is suitable for publication in its present form.