Previous Article in Journal
Evaluating the Impact of Two Different Diets on the Protein Profile of the Brain, Liver, and Intestine of the Barramundi
 
 
Article
Peer-Review Record

PTMs_Closed_Search: Multiple Post-Translational Modification Closed Search Using Reduced Search Space and Transferred FDR

by Yury Yu. Strogov 1, Sergey A. Spirin 2,3,4, Mark V. Ivanov 5, Maria A. Kulebyakina 6, Anastasia Yu. Efimenko 6 and Oleg I. Klychnikov 7,*
Reviewer 1:
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Reviewer 4: Anonymous
Submission received: 21 October 2025 / Revised: 19 January 2026 / Accepted: 28 January 2026 / Published: 2 February 2026
(This article belongs to the Section Proteome Bioinformatics)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

The authors Stogov et al. describe in their manuscript a python-based framework for searching multiple post-translational modifications in datasets. Starting from an un-modified search and an external PTM database, the framework searches for all PTMs expected in the dataset, separately. Consecutively, the framework estimates the FDR of PTM-specific search results by extending a FDR transfer method by Fu et al. replacing the linear estimate of the modified proportion of false identified results, by a more accurate one, based on multiple linear splines. Furthermore, the manuscript illustrates the performance of the modified transfer method in comparison to separate searches on a dataset from Chick et al, and also presents the result visualization capabilities of the framework.

 

The manuscript provides a valuable tool for the PTM proteomics community. However, there remain a number of concerns.

 

Mayor:

 

1) The authors test their method only against a single dataset (Chick et al.) and this is not sufficient. We would like to see that the results also manifest on a second dataset. Furthermore, the dataset used is more than 10 years old and not representative for current technology datasets. A suitable addition may be ie. PXD063604 at ProteomeXchange.

 

2) The authors modify an existing FDR transfer method (Fu et al.), from single linear estimate to multiple linear splines, and a quantitative comparison to the old method lacks. We would like to see the performance of the old method, next to separate FDR as references to the ne new modified FDR transfer method.

 

3) Validation of the quality of gained search results using the new method as compared to former methods is not addressed. For instance whether the presence or absence of differential PTM peptides makes biological sense given the particular dataset.

 

4) An optimal database size of 5000 is presented as optimal. It would be valuable to use the second

dataset to confirm that this is still the case as this size could be dataset dependent.

 

Some minor issues remain:

 

3) The term standard search is used (line 114) but not defined.

 

Several typos should be corrected:

 

4) line 129 variables should be variable

 

5) Line 368 PSMs should be PSM

 

6) Line 372 positives should be positive

 

7) In line 414 the word “improved” implies that false positives and false negatives decrease and this is not shown. Please rephrase or supply the necessary evidence.

Author Response

Thank you very much for your comments. Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

This manuscript describes a workflow for sequential PTM closed searches and introduces a transferred FDR method for PTM-specific FDR estimation. The authors show its performance on a single dataset, identifying 13 new PTM types and increasing protein coverage.

While the manuscript is well-organized, some method descriptions are unclear, and the workflow has not been tested on additional datasets or compared with other closed or open search methods. More technical explanations and evaluations are needed.

Major points:

  1. Lines 59-65: Lack of explanation for “one-by-one and transferred FDR” and “sequential analysis”. It seems “one-by-one FDR” and “separate FDR” mean the same. Please clarify these terms, especially what transferred FDR is, how it differs from separate FDR, and why it could improve PTM identification when combining PSMs from each PTM and non-PTM search.
  2. The transferred FDR calculation is quite different from the conventional FDR approach. Normally PSMs are ranked from highest to lowest score with rank 1 PSM having the highest score. But in the proposed method, this is reversed as rank 1 is the lowest hyperscore not the highest. The authors should state this difference in this section and explain why PSMs are ranked in this way - from lowest to highest score.
  3. Are there any differences in hyperscore distributions between unmodified peptides from the initial non-PTM search and modified peptides from PTM searches? Reporting this would show whether hyperscores from multiple searches are suitable for FDR estimation of PTM-specific peptides.
  4. It’s quite difficult to follow the methods section of “2.5.2 Error propagation calculation for the data filtering”.
    a) The statement in lines 203-204 does not look correct. Based on Figure 5A, it seems it is “the proportion of PTM decoys” (from the combined PSMs results, not for each PTM search) become unstable as rank increasing rather than “proportion of k-modified peptides within one PTM analysis”.
    b) The authors stated, “as the rank value increases, the calculated proportion of k-modified peptides within one PTM analysis becomes unstable, and these data should be filtered out”. Are these increasing-rank PSMs also the ones with high scores? If so, can the authors explain why these PTM decoys have high scores, and what “unstable” means in this context?
    c) Please explain why error propagation is calculated from low- to high-scored PSMs, and what it really captures?
  5. The reason why adding random target proteins to each database is unclear. In “3.2 Search space optimization,” please explain why search space choice matters and should be optimized, rather than using just modified proteins or the full proteome.

Minor points:

  1. Line 27: The sentence “search for each PTM based on previously annotated PTMs” is incomplete, please clarify what is meant by “annotated PTMs.”
  2. Line 120: Please add references for the “IdentiPy search engine”.
  3. Line 127: How does the tool handle 5000 random forward protein sequences in FDR estimation for each target PTM search? Please elaborate.
  4. For q-value distributions, please clarify whether target PTM PSMs from these random proteins are included in the FDR calculations.
  5. Lines 234-238: Boxplots alone are not enough to support claims that q-value distributions are different or comparable between different databases, especially when adding 3,000 or 5,000 random proteins. Please provide statistical evaluations.

Author Response

Thank you very much for your comments. Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 3 Report

Comments and Suggestions for Authors

Thank you for this opportunity to read and review this early version of “PTMs_closed_search: multiple post-translational modifications closed search using reduced search space and transferred FDR”. In this manuscript the authors describe an elegant looping model that I’m a little jealous I didn’t think of myself and now appears very obvious in hindsight. Rather than deal with the explosion in search space when using “open search” techniques, this group stepped right outside the box and performed searches with small numbers of PTMs and then looped back and reanalyzed the data with another set of PTMs. The logic is absolutely sound and the method is clever. The manuscript, however, is missing some key details that are necessary for other scientists with interest in protein informatics to reproduce these results. With some additional details to allow recreation of these results - and some comparisons to existing tools and workflows, I would consider this a very worthy manuscript for publication here.

Major comments:

  • My biggest concern is the lack of details regarding the reanalyzed dataset. I can go look it up and I’m probably sure I’d know this dataset if I looked at the first figure, but a reader shouldn’t have to. Many scientists in this field probably aren’t aware of the staggering number of options the Orbitrap Elite had for fragmenting a peptide. Please include whether the spectra were high or low resolution. If the former, what resolution. What fragmentation mode was employed is also important. The low mass cutoff would be critical information as well.
  • While the FDR propagation issues seem to have been thoroughly considered by the authors, I do think a more focused analysis would be helpful here. What happens in the case of a single disagreement between Search 1 and Search 4. For example, if search 1 identifies an unmodified peptide sequence but Search 4 identifies the residual modification of a ubiquityl group (K-GG) how did the FDR method described here resolve this conflict? Does a manual look at the MS/MS spectra support the identification post FDR analysis? If the team of authors are from a more informatic background than a mass spectrometry one, there are handy tools out there for generating MS/MS spectra visually for inspection. The Interactive Spectral Annotator online from the Coon lab is a great example. Pyteomics likely has similar tools. Sometimes it’s just nice to see a single sanity check point
  • Relatedly – I do think there will be objections if a reanalysis with at least one popular toolkit for opensearch proteomics is not utilized for comparator data. MetaMorpheus from the Smith lab can search these files in a couple of hours on a desktop computer while looking for several hundred PTMs. FragPipe’s opensearch would be the more popular option. I’d listen to a rebuttal that this is beyond the scope and you’d already done a lot of work, but I do think this could strengthen the manuscript. In the case of Fragpipe, I suspect that your approach would look very powerful. True opensearch FDR is, as you’ve noted, tough. And I always expect fewer IDs with opensearch than with focused closed search. If number of PSMs is the metric, you should perform well in this comparison.
  • I’m moving this to major comments, but there might be some formatting issues apparent in the PDF. I suspect this was Latex to word, but it does insert odd symbols here and there. I’ve mentioned specific line items below.

Minor comments

Overall comment, I would suggest including the name of the search engine used for the basis of this study in the abstract and/or introduction.

Line 33: Suggest adding “relative” coverage or something

Line 45: Eukaryotic might be the correct spelling for the region of this journal. Please consult guide for authors

Line 76: Suggest “in this study” for convention sake

Line 77: I think there was a Latex to PDF conversion issue here with symbol insertion. Important to check the original meaning, I think.

Line 89: Clunky sentence that needs improved a little

Section 2.1: Please include some relevant information for the readers regarding these files. I’m particularly interested in whether the MS/MS spectra were high resolution or low resolution. Also, the collision energy type. Probably CID on ion trap or HCD on Orbitrap, but both and other combinations can be used on an Orbitrap Elite. Important to downstream analysis considerations.

Line 100: Same funny symbols as line  77, might be nothing, but worth noting

Section 2.4 and 2.5 – I don’t see the mass tolerances considered by PyTeomics here. Please include.

Line 404: A discussion of sequential or open aware FDR should be included here. I believe Alexey N. wrote a paper on the former several years ago, but the latter is discussed here - https://www.nature.com/articles/s41467-020-17921-y . I’m putting this here as minor because the authors note that they believe a new tool to roll up modified peptides is beyond the scope of this work. Now I’m having trouble finding the former paper and have to run to a meeting marathon.  In a nutshell, they proposed performing FDR on unmodified peptides first. Then common human PTMs and then less common PTMs separately. I’m not sure it has ever been released in desktop pipelines, although it has been used in proteomics cloud computing tools.

Figure S4: Hopefully this can be made in higher resolution somehow. I am having trouble reading all the text on a relatively large screen.

Author Response

Thank you very much for your comments. Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 4 Report

Comments and Suggestions for Authors

The manuscript introduces a promising algorithm for PTM annotation that addresses key challenges in proteomics. The innovations in FDR estimation and database optimization are significant contributions to the field. However, expanding the scope of the algorithm, providing quantitative evidence, and addressing database dependency would further enhance its impact and applicability.

 

  1. A comparative analysis between the proposed algorithm, traditional CS and OMS methods is necessary. Including a table or figure summarizing differences in sensitivity, accuracy, and computational efficiency would provide readers with a clearer understanding of the algorithm's advantages and limitations.

 

  1. Section 3.4 mentions that the search analysis identified 13 protein modifications in the test data. However, this result lacks cross-verification or comparison with other methods. Including a validation step or benchmarking against known datasets would strengthen the reliability of these findings.

 

  1. The manuscript references Figures A2, A3, and A4 in Line 338, but these figures are either mislabeled or missing.

 

  1. The current algorithm is designed to focus exclusively on single modifications, including methionine oxidation. This limitation reduces its applicability for peptides with multiple simultaneous modifications. While the manuscript acknowledges this constraint, it does not propose specific strategies or future directions to extend the algorithm's capability to handle more complex modification scenarios.

 

  1. Glycosylation is a critical post-translational modification involved in protein-protein interactions and cell adhesion. The manuscript does not address how the proposed algorithm performs in identifying glycosylated peptides, which is a significant omission given the biological importance of this PTM. A dedicated discussion or example demonstrating the algorithm's efficacy in glycosylation analysis would enhance the manuscript's relevance.

 

  1. The algorithm relies heavily on UniProt and dbPTM databases for constructing protein datasets. This dependency could limit its generalizability, particularly for poorly annotated proteins or rare PTMs. The manuscript should explore alternative strategies, such as integrating experimental data, leveraging de novo peptide sequencing, or using machine learning models to predict PTMs.

 

  1. While the manuscript qualitatively highlights the benefits of transferred FDR and spline regression, it lacks quantitative comparisons. Metrics such as sensitivity, false positives, identification rates, and computational time should be included to substantiate the claims. These comparisons would provide a stronger basis for evaluating the algorithm's performance against traditional methods.

 

 

 

Author Response

Thank you very much for your comments. Please see the attachment.

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

My points have been addressed in the current form.

Author Response

 

Author Response File: Author Response.docx

Reviewer 2 Report

Comments and Suggestions for Authors

The authors have addressed all comments except for one missing figure related to my previous comment on the hyperscore distribution differences between unmodified and modified peptides, which appears to be missing from the appendices.

In addition, I have two minor comments on the revised version.

1. Line 121: Technical error. For LTQ Orbitrap Velos, only MS1 spectra are acquired in the Orbitrap analyzer. MS2 spectra are typically acquired in the LTQ (ion trap), not Orbitrap.

2. For Figure 10, reporting the number of identified ubiquitinated peptides, rather than PSM-level differences between the two approaches would more effectively support conclusions regarding identification sensitivity. Please include the peptide-level differences as well.

Author Response

 

Author Response File: Author Response.docx

Reviewer 3 Report

Comments and Suggestions for Authors

Thank you for taking the time to read and respond to my comments, in particular, the requests for additional comparative analysis. The approach was initially both clever and elegant, now it is simply better defended and more easily quantifiable. I do apologize for the delay in review, I had a big semester to prepare for during and after our holiday break. I believe it is ready for publication in the current form. 

Author Response

 

Author Response File: Author Response.docx

Back to TopTop