Next Article in Journal
Taxonomy—Dependent Seed Tocochromanol Composition in the Rutaceae Family: Application of Sustainable Approach for Their Extraction
Previous Article in Journal
Joint Selection for Growth and Leaf Color in Superior Trees of Sapium discolor in Fujian Province, China
 
 
Article
Peer-Review Record

Identification and Development of Pathogen- and Pest-Specific Defense–Resistance-Associated SSR Marker Candidates Assisted by Machine Learning and Discovery of Putative QTL Hotspots in Camellia sinensis

by AyÅŸenur EminoÄŸlu
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Submission received: 14 January 2026 / Revised: 26 January 2026 / Accepted: 30 January 2026 / Published: 2 February 2026
(This article belongs to the Special Issue Genomics and Transcriptomics for Plant Development and Improvement)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors The manuscript presents a report on the development of SSR markers in Camellia sinensis based on machine learning. The article as a whole is systematized, is valuable, and is in the trend of modern research, but there are a number of comments on it.     1. The authors state that the work is "the first comprehensive SSR resource", however, the introduction and discussion do not clearly distinguish exactly how the proposed pQTL approach is fundamentally superior to the existing genic/EST-SSR panels for Camellia sinensis. It is recommended to add a direct comparison with previously published SSR resources and specify specific winning metrics (coverage, PIC, portability).  2. The use of Random Forest is described as auxiliary, but there is no quantification of its contribution (for example, uplift by PIC>0.5, precision@k). ML looks decorative without it. At least one ablative analysis is needed. Please adjust it. 3. The panel is strongly biased towards Exobasidium vexans (most SSRS and primers), which limits the versatility of the resource. It is necessary either to justify such an imbalance, or explicitly indicate the area of applicability of the panel. 4. The results are limited in silico PCR. For practical applicability, at least a pilot in vitro validation (several markers with high PIC on 2-3 genotypes) or a clear discussion of the risks of discrepancy is necessary.  5. Permutation thresholds (95/99%) are applied correctly, but it is not clear everywhere how the cases σ=0 were handled and how this affects the Z-scores. It is recommended to make these rules more explicit in Methods and add code/parameters, for example, in the Supplementary. Conclusion. The work is strong in scope and biological context, but requires enhanced comparative analysis, quantitative ML validation, and minimal experimental verification so that its results can be used in practice. 6. Very poor quality of figures 2 and 7. This needs to be fixed. Conclusion. The work is strong in scope and biological context, but requires enhanced comparative analysis, quantitative ML validation, and minimal experimental verification so that its results can be used in practice.

Author Response

Response to Reviewers Comments

We would like to sincerely thank the reviewers for their careful evaluation of our manuscript and for their constructive comments and suggestions, which have helped us improve the clarity and quality of the work. All comments have been addressed, and the manuscript has been revised accordingly. Please find the detailed responses below and the corresponding revisions/corrections highlighted in red in the resubmitted file.

Response to Reviewer 1 Comments

Point-by-point response to Comments and Suggestions for Authors

Comment 1: The authors state that the work is "the first comprehensive SSR resource", however, the introduction and discussion do not clearly distinguish exactly how the proposed pQTL approach is fundamentally superior to the existing genic/EST-SSR panels for Camellia sinensis. It is recommended to add a direct comparison with previously published SSR resources and specify specific winning metrics (coverage, PIC, portability). 

Response 1: Thank you for the reviewer’s valuable comment. As the reviewer emphasized, comprehensive SSR panels have been developed in the literature for DNA barcoding or genotype identification/discrimination in tea. However, in this study, the term “comprehensive” refers to a large-scale SSR resource/pipeline approach focusing on target loci associated with pathogen–pest defense/resistance. Within the scope of our current and accessible literature survey, the only study we encountered was an EST-SSR–based study targeting blister blight (Exobasidium vexans), which belongs to Karunarathna et al. (2021). In that study, among the screening performed on 11 EST-SSR loci in tea, only the EST-SSR073 locus was identified as a diagnostic marker. Such studies are valuable in terms of marker–trait association; however, they do not provide a hotspot-based and scalable SSR production pipeline that systematically targets defense/resistance loci. Our study aims to fill this gap by providing SSR candidates enriched around defense loci through literature-based target selection and pQTL-like hotspot identification. The statement in the Introduction section, “In the present study, the first comprehensive and integrative SSR marker resource for tea was developed…”, as the reviewer also noted, could lead to misinterpretations; therefore, it was revised as follows: “To address this gap, we developed the first comprehensive and integrative SSR primer resource centered on defense- and resistance-associated targets …” (lines: 92-94). In addition, in the study of Karunarathna et al. (2021), the EST-SSR073 locus showed high polymorphism across 64 cultivars (PIC = 0.7727) and was associated with blister blight. In the present study, it was shown that the primers can also generate amplification in silico across different genomes, including Tieguanyin (n = 178), UPASI-3 (n = 168), TES-34 (n = 163), L618 (n = 156), and TV1 (n = 123); the number of loci to which the primers bind in each genome is reported in detail in Supplementary File S11. The PIC values are based not on population-based experimental genotyping but on an in silico evaluation performed across multiple genomes; therefore, the PIC results were not used for direct trait-diagnostic performance comparison and were evaluated only for the purpose of prioritizing candidate SSR primers and and the relevant explanation was added in the Materials and Methods section under “4.7. In Silico PCR” (lines 859–862).

Reference: Karunarathna, K. H. T., Mewan, K. M., Weerasena, O. V. D. S. J., Perera, S. A. C. N., & Edirisinghe, E. N. U. (2021). A functional molecular marker for detecting blister blight disease resistance in tea (Camellia sinensis L.). Plant Cell Reports, 40(2), 351-359.

Comments : The use of Random Forest is described as auxiliary, but there is no quantification of its contribution (for example, uplift by PIC>0.5, precision@k). ML looks decorative without it. At least one ablative analysis is needed. Please adjust it.

Response 2: We thank the reviewer for this important suggestion. To quantify the contribution of the Random Forest step, we added an ablation analysis comparing the ML based prioritization (ranking by prob_pos and selecting the top SSR primer candidates per locus) against a non-ML baseline ranking based on structural heuristics (repeat number and distance to gene). Using the same selection rule (top-5 SSR primer candidates per locus), the ML-based approach yielded a substantially higher enrichment of pQTL/hotspot-positive candidates (pos_rate = 0.964; 188/195) compared to the baseline (pos_rate = 0.488; 179/367), corresponding to an uplift of +0.476. These results demonstrate that the ML component is not decorative, but provides a measurable prioritization benefit. The full results are provided in the Supplementary File S8. In addition, the ablation method and analysis results were added to the Results section (lines: 261–267) and the Materials and Methods section (lines: 785–805).

Comment 3: The panel is strongly biased towards Exobasidium vexans (most SSRS and primers), which limits the versatility of the resource. It is necessary either to justify such an imbalance, or explicitly indicate the area of applicability of the panel.

Response 3:  We thank the reviewer for the comment. We would like to clarify that the apparent imbalance towards Exobasidium vexans was already addressed in the Results section. Specifically, we stated that ‘When the distribution of SSRs was examined according to pathogen and pest spe-cies, they were found to be clearly distributed in accordance with the locus density targeted in the study’ (line: 160-163).  We clarified in the Discussion that this distribution is expected, as resistance genes reported in the literature are predominantly concentrated in Exobasidium-related diseases, whereas certain pathogen–host interactions remain less explored (lines 456–468). Therefore, the SSR/primer distribution reflects the current locus-density landscape and available resistance-gene evidence rather than an unintended bias.

Comment 4: The results are limited in silico PCR. For practical applicability, at least a pilot in vitro validation (several markers with high PIC on 2-3 genotypes) or a clear discussion of the risks of discrepancy is necessary. 

Response 4: We thank the reviewer for this valuable suggestion. As the reviewer also noted, in vitro validation is an important step that strengthens the practical applicability of SSR primers. In the present study, the primers were evaluated across multiple genomes using in silico PCR, and this approach provides a strong framework in terms of pre-screening and portability. Nevertheless, the reasons that may cause discrepancies between in silico and in vitro results were explicitly added to the Discussion section (lines 589–599). It was also clearly emphasized in the Conclusion section that population-based wet-laboratory validation is required before routine use of the primers and trait diagnostic applications (lines 896–898). With this revision, the scope of the study and the roadmap to be followed in terms of practical applicability were framed in a more realistic and transparent manner.

Comment 5: Permutation thresholds (95/99%) are applied correctly, but it is not clear everywhere how the cases σ=0 were handled and how this affects the Z-scores. It is recommended to make these rules more explicit in Methods and add code/parameters, for example, in the Supplementary 14

Response 5: We thank the reviewer for this constructive suggestion. In the manuscript, the relevant explanation was added to the Materials and Methods section under the heading “4.5.1. SSR Density–Based Detection of Putative QTL Hotspots” (lines 684–686) and to Table 4 on page 33, and the necessary additional code was provided separately in Supplementary File S14.

Comment 6: Very poor quality of figures 2 and 7. This needs to be fixed.

Response 6: We thank the reviewer for the valuable comment. Figure 2 was completely removed from the study, as its data structure as a heatmap resulted in excessive empty space, and this decision was made to improve readability and reduce the figure density in the main text. Figure 7 was removed from the main text and is now presented only in the Supplementary Materials section.

Final Comments: Conclusion. The work is strong in scope and biological context, but requires enhanced comparative analysis, quantitative ML validation, and minimal experimental verification so that its results can be used in practice.

Response to final comments: We thank the reviewer for the overall evaluation and constructive suggestions. This study was structured as a resource/pipeline work focusing on the development of an SSR primer panel through defense/resistance-oriented target locus selection, a pQTL-based hotspot approach, multi-layered statistical filtering, and in silico validation across multiple genomes. Accordingly, the comparative evaluation and the machine learning component were presented more clearly and quantitatively in the revision, and the risks that may lead to discrepancies between in silico and in vitro results, as well as the need for experimental validation, were explicitly stated in the Discussion and Conclusion sections. With these revisions, the framework and limitations required for the practical implementation of the study were clearly defined.

Author Response File: Author Response.docx

Reviewer 2 Report

Comments and Suggestions for Authors

This manuscript presents a substantial and carefully designed in‑silico study that delivers a large, biologically contextualized SSR resource focused on pathogen‑ and pest‑associated defense loci in Camellia sinensis. The combination of literature‑based target selection, pQTL‑style sliding‑window analysis, permutation testing, and machine‑learning‑assisted prioritization is methodologically sound and, to my knowledge, novel in the tea context, and with further tightening the work should make a valuable resource and pipeline paper for the community. The Introduction provides a comprehensive overview of tea pests, defense pathways, and SSR applications, but it is longer than necessary and occasionally repetitive; condensing background on general SSR properties and plant defense into a smaller number of paragraphs and ending with a sharper statement of the specific gaps this work addresses would greatly improve readability.

In the Results, the overall logic from target selection through SSR discovery, hotspot definition, and primer design is clear, yet the text is very dense, with frequent reiteration that SSR distributions are not random. It would help readers if each subsection began with one or two concise “take‑home” topic sentences, with more detailed numerical summaries and explanations of non‑randomness moved into a smaller number of key paragraphs and into the Discussion, and if methodological detail that properly belongs in Materials and Methods were shifted out of the Results to avoid breaking the narrative flow. The figures and extensive supplementary files are informative and carefully prepared, but several multi‑panel figures are visually crowded; increasing font sizes, simplifying color schemes, and using the legends to highlight one or two concrete examples per figure would make the main findings more immediately accessible. Likewise, adding brief one‑line descriptions of each supplementary file at first mention would help readers navigate this rich supporting material.

The Methods are impressively detailed and should allow reproducibility, but their current length and level of embedded parameter detail make them hard to read end‑to‑end. A compact summary table of key parameter choices (motif classes, repeat thresholds, window sizes, permutation numbers, and Random Forest settings) would provide a useful overview, and a short description of the classifier’s performance on held‑out data would help readers gauge how well it distinguishes strong from weak SSR candidates. In the Discussion, the manuscript convincingly relates the observed SSR patterns to known biology of WRKY, NAC, PR1, NLRs, BAHD acyltransferases and other defense‑related gene families, and it clearly outlines how the resulting marker sets might be used in breeding and genetic studies. To keep expectations appropriately calibrated, it would be beneficial to state explicitly that all validation to date is in silico, that PIC values are derived from genome assemblies rather than field populations, and that functional and association studies will be required before these markers can be considered trait‑diagnostic.

Comments on the Quality of English Language

Overall, the English is very good, but a focused language edits to shorten long sentences and reduce repeated phrases such as “in this context” and “this distribution indicates” would substantially improve clarity. With these largely presentational revisions, the manuscript should provide a strong and useful contribution on defense‑oriented SSR development in tea.

Author Response

Response to Reviewers Comments

We would like to sincerely thank the reviewers for their careful evaluation of our manuscript and for their constructive comments and suggestions, which have helped us improve the clarity and quality of the work. All comments have been addressed, and the manuscript has been revised accordingly. Please find the detailed responses below and the corresponding revisions/corrections highlighted in red in the resubmitted file.

Response to Reviewer 2 Comments

Point-by-point response to Comments and Suggestions for Authors

Comment 1: This manuscript presents a substantial and carefully designed in‑silico study that delivers a large, biologically contextualized SSR resource focused on pathogen‑ and pest‑associated defense loci in Camellia sinensis. The combination of literature‑based target selection, pQTL‑style sliding‑window analysis, permutation testing, and machine‑learning‑assisted prioritization is methodologically sound and, to my knowledge, novel in the tea context, and with further tightening the work should make a valuable resource and pipeline paper for the community. The Introduction provides a comprehensive overview of tea pests, defense pathways, and SSR applications, but it is longer than necessary and occasionally repetitive; condensing background on general SSR properties and plant defense into a smaller number of paragraphs and ending with a sharper statement of the specific gaps this work addresses would greatly improve readability.

Response 1: We thank the reviewer for the constructive evaluation. To make the Introduction more effective in terms of length and readability, the text was thoroughly revised, and repetitive/overly detailed explanations, particularly those related to plant defense mechanisms and the general properties of SSRs, were simplified and merged into a smaller number of paragraphs (lines 36–40 and 73–80). Repeated wording was corrected. In addition, to emphasize the specific gap addressed by this study more clearly, the final part of the Introduction was reorganized and the specific deficiency that this work aims to fill was stated more sharply (lines 88–98). With these revisions, the Introduction section was made more fluent, non-redundant, and focused. In addition, as the Introduction was condensed to reduce redundancy, the references cited in this section were also revised accordingly.

Comment 2: In the Results, the overall logic from target selection through SSR discovery, hotspot definition, and primer design is clear, yet the text is very dense, with frequent reiteration that SSR distributions are not random. It would help readers if each subsection began with one or two concise “take home” topic sentences, with more detailed numerical summaries and explanations of non randomness moved into a smaller number of key paragraphs and into the Discussion, and if methodological detail that properly belongs in Materials and Methods were shifted out of the Results to avoid breaking the narrative flow.

Response 2: We thank the reviewer for this constructive suggestion. To strengthen the readability and narrative flow of the Results section, the text was revised; in particular, repetitive statements such as “SSRs are not randomly distributed” were reduced, and interpretive statements were moved to the Discussion where appropriate. In addition, the Results subsections were initiated with concise introductory sentences summarizing the main quantitative outcomes, thereby presenting the findings in a clearer and more fluent manner.

Comment 3: The figures and extensive supplementary files are informative and carefully prepared, but several multi‑panel figures are visually crowded; increasing font sizes, simplifying color schemes, and using the legends to highlight one or two concrete examples per figure would make the main findings more immediately accessible. Likewise, adding brief one‑line descriptions of each supplementary file at first mention would help readers navigate this rich supporting material.

Response 3: Thank you to the reviewer for the suggestions regarding the readability of the figures. All figures were prepared in the R environment, presented at a resolution of 1200 dpi, and colorblind-friendly color palettes were used in multi-colored panels. Nevertheless, in line with the reviewer’s suggestions, the multi-panel figures in the main text were reviewed to improve readability; font sizes were increased and revisions were made to reduce visual complexity. We also streamlined the visual content. Specifically, Table 4 (“Distribution of defence-resistance associated SSR primers across genes, protein families and pathogen-pest targets”) was removed to reduce redundancy. In addition, the figure summarizing SSR marker counts (Figure 8, now renumbered as Figure 6) was simplified into a one-panel format, showing the number of unique SSR markers designed for each gene/gene family. Accordingly, the figure and table numbering was updated throughout the manuscript. Moreover, at the first mention of the Supplementary Files, one-line descriptions briefly defining the content of each Supplementary File were added to help readers follow the supporting material more easily. Furthermore, the figure legends were clarified to enable more explicit highlighting of concrete examples representing the relevant findings.

Comments 4: The Methods are impressively detailed and should allow reproducibility, but their current length and level of embedded parameter detail make them hard to read end‑to‑end. A compact summary table of key parameter choices (motif classes, repeat thresholds, window sizes, permutation numbers, and Random Forest settings) would provide a useful overview, and a short description of the classifier’s performance on held‑out data would help readers gauge how well it distinguishes strong from weak SSR candidates.

Response 4: Thank you to the reviewer for the constructive suggestions. In order to make the parameter choices used in the study easier for the reader to follow, a compact table (Table 4) summarizing the motif classes, repeat thresholds, window sizes, permutation numbers, and Random Forest settings was added to the Materials and Methods section (page 33). In addition, the classifier’s performance on the held-out data was briefly reported, making its ability to distinguish strong and weak SSR candidates clearer. Furthermore, to quantify the contribution of the Random Forest step, we added an ablation analysis comparing the ML-based prioritization (ranking by prob_pos and selecting the top SSR primer candidates per locus) against a non-ML baseline ranking based on structural heuristics (repeat number and distance to gene). Using the same selection rule (top-5 SSR primer candidates per locus), the ML-based approach yielded a substantially higher enrichment of pQTL/hotspot-positive candidates (pos_rate = 0.964; 188/195) compared to the baseline (pos_rate = 0.488; 179/367), corresponding to an uplift of +0.476. These results demonstrate that the ML component is not decorative, but provides a measurable prioritization benefit. The full results are provided in Supplementary File S8. In addition, the ablation method and analysis results were added to the Results section (lines: 261–267) and the Materials and Methods section (lines: 785–805).

Comments 5: In the Discussion, the manuscript convincingly relates the observed SSR patterns to known biology of WRKY, NAC, PR1, NLRs, BAHD acyltransferases and other defense related gene families, and it clearly outlines how the resulting marker sets might be used in breeding and genetic studies. To keep expectations appropriately calibrated, it would be beneficial to state explicitly that all validation to date is in silico, that PIC values are derived from genome assemblies rather than field populations, and that functional and association studies will be required before these markers can be considered trait diagnostic.

Response 5: Thank you for the reviewer’s positive evaluation of the Discussion section. The reasons that may cause discrepancies between in silico and in vitro results were explicitly added to the Discussion section (lines 589–599). It was also clearly emphasized in the Conclusion section that population-based wet-laboratory validation is required before routine use of the primers and trait diagnostic applications (lines 896–898). The PIC values are based not on population-based experimental genotyping but on an in silico evaluation performed across multiple genomes; therefore, the PIC results were not used for direct trait-diagnostic performance comparison and were evaluated only for the purpose of prioritizing candidate SSR primers, and the relevant explanation was added in the Materials and Methods section under “4.7. In Silico PCR” (lines 859–862).

Comment 6: Overall, the English is very good, but a focused language edits to shorten long sentences and reduce repeated phrases such as “in this context” and “this distribution indicates” would substantially improve clarity. With these largely presentational revisions, the manuscript should provide a strong and useful contribution on defense‑oriented SSR development in tea.

Response 6: We thank the reviewer for the positive evaluation regarding the language use. In line with the reviewer’s suggestion, long sentences throughout the manuscript were shortened, repetitive phrasing was reduced, and the overall readability and flow of the text were improved.

Author Response File: Author Response.docx

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

Thanks to the authors for correcting the comments.

Back to TopTop