Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

The KH Gene Family in Tomato (Solanum lycopersicum): Genomic Expansion, Structural Basis of RNA Binding, and Haplotype Variation Associated with Fruit Weight

Agronomy 2026, 16(5), 576; https://doi.org/10.3390/agronomy16050576

by Wen Liu, Zhaoyilan He, Yuanheng Li, Yingfeng Ding, Ting Wu, Zhengan Yang and Hui Shen^*

Reviewer 1: Anonymous

Reviewer 2:

Anju Pandey

Reviewer 3:

Tiago Camponogara Tomazetti

Reviewer 4: Anonymous

Reviewer 5: Anonymous

Agronomy 2026, 16(5), 576; https://doi.org/10.3390/agronomy16050576

Submission received: 20 January 2026 / Revised: 28 February 2026 / Accepted: 5 March 2026 / Published: 6 March 2026

(This article belongs to the Special Issue Genetic Basis of Crop Selection and Evolution)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

This manuscript presents a comprehensive genome-wide analysis of the KH (K-homology) domain gene family in tomato (Solanum lycopersicum), integrating phylogenetics, synteny, evolutionary selection analyses, haplotype–phenotype associations, expression profiling, and AlphaFold 3–based structural modeling. The work represents a valuable resource for the community and goes beyond a descriptive gene family survey by linking specific SlKH members to fruit weight variation and proposing a structural mechanism for RNA recognition.

Overall, the study is well designed, data-rich, and technically sound. The manuscript is generally clearly written, figures are informative, and the conclusions are largely supported by the data. However, the current version would benefit from clarification and tightening in several areas.

Major Comments

The identification of several orthologous pairs with Ka/Ks > 1 (e.g., SlKH3/AtKH6, SlKH41/AtKH1) is intriguing. However, the manuscript occasionally over-interprets these ratios as definitive evidence of positive selection. Please clarify how many sites contribute to these elevated Ka/Ks values and whether they are driven by a small number of substitutions. Also, consider adding a brief cautionary note acknowledging that Ka/Ks > 1 at the gene level does not necessarily imply pervasive positive selection across the entire protein.
The haplotype analysis of SlKH47, SlKH43, and SlKH35 is one of the most interesting aspects of the study. However, the manuscript sometimes implies causality where only association has been demonstrated. Please temper statements suggesting that these genes “regulate” fruit weight, and instead emphasize that they are candidate loci associated with fruit weight variation. It would also be helpful to clarify whether population structure or relatedness was considered or controlled for in the ANOVA analysis, as this could influence haplotype–trait associations.
The observed negative or weak correlations between RNA-seq and RT-qPCR for several SlKH genes (e.g., SlKH29, SlKH11) are discussed, but the explanation remains somewhat speculative. A brief summary sentence acknowledging that these discrepancies limit direct quantitative comparison would improve transparency. Also, please clarify whether primer positions overlap all annotated isoforms.

Minor comments

Please ensure consistent use of terms such as “KH domain,” “KH family,” and “SlKH genes” throughout the text. In several places, “cis-regulatory elements” and “RNA-binding motifs” are used in close proximity; consider clarifying that these refer to DNA-level promoter elements versus RNA-level motifs.
In Section 2.6, please specify the minimum haplotype frequency threshold (if any) used to retain haplotypes for analysis. For Ka/Ks calculations, clarify how alignment quality was assessed before analysis.
Some figure legends (e.g., Figures 3 and 5) would benefit from slightly more detailed explanations of color scales and symbols. Please double-check minor typographical errors in figure labels (e.g., spacing, inconsistent capitalization).
The manuscript is generally well written, but a light language polish would improve readability in the Discussion, where sentences occasionally become overly long.

Author Response

Major Comments

Comment 1. The identification of several orthologous pairs with Ka/Ks > 1 (e.g., SlKH3/AtKH6, SlKH41/AtKH1) is intriguing. However, the manuscript occasionally over-interprets these ratios as definitive evidence of positive selection. Please clarify how many sites contribute to these elevated Ka/Ks values and whether they are driven by a small number of substitutions. Also, consider adding a brief cautionary note acknowledging that Ka/Ks > 1 at the gene level does not necessarily imply pervasive positive selection across the entire protein.

Response: We thank the reviewer for this highly insightful and constructive comment. We completely agree that a gene-level Ka/Ks ratio > 1 does not inherently indicate pervasive positive selection across the entire protein, and that such signals are often driven by localized, highly variable regions. We also recognize the necessity of avoiding over-interpretation of these algorithmic outputs. To fully address your concerns, we have performed a new set of targeted analyses and made comprehensive revisions to the manuscript.

To explicitly answer the question regarding which sites contribute to the elevated Ka/Ks values, we conducted a new region-specific evolutionary pressure analysis. Based on Conserved Domain Database (CDD) annotations, we partitioned the sequence alignments of these orthologous pairs into highly conserved functional domains (Domains) and less constrained inter-domain regions (Linkers). We then recalculated the Ka/Ks ratios for these regions independently. The new results (newly added Figure S5 and updated Table S4) clearly demonstrate that the core functional domains generally remain under strict purifying selection (Ka/Ks < 1). The elevated global Ka/Ks ratios are indeed predominantly driven by a small number of substitutions localized in the highly variable linker regions (Ka/Ks > 1). This confirms your hypothesis and provides a much more nuanced understanding of their evolutionary dynamics. Following your precise recommendation, we have incorporated explicit cautionary statements in both the Materials and Methods (lines 196-199) and the Results (lines 460-464). We explicitly stated that a gene-level Ka/Ks > 1 does not necessarily imply pervasive positive selection across the entire protein. Furthermore, in the Discussion section, we added a note acknowledging the methodological limitations: based on the algorithms employed, Ka/Ks values might be overrated when based on limited sequence divergence, requiring cautious interpretation of positive selection signals (lines 729-732). Additionally, the definitive claim of pervasive positive selection was revised (lines 735-738).

Comment 2. The haplotype analysis of SlKH47, SlKH43, and SlKH35 is one of the most interesting aspects of the study. However, the manuscript sometimes implies causality where only association has been demonstrated. Please temper statements suggesting that these genes “regulate” fruit weight, and instead emphasize that they are candidate loci associated with fruit weight variation. It would also be helpful to clarify whether population structure or relatedness was considered or controlled for in the ANOVA analysis, as this could influence haplotype–trait associations.

Response: We fully agree with the importance of strictly differentiating genetic association from causality and ensuring that population structure is properly accounted for. We have made comprehensive revisions, as suggested, we have thoroughly revised the manuscript (Sections 3.6 and 4.4) to replace causal terms such as “regulate” or “regulator” with more appropriate phrasing, such as “candidate loci associated with fruit weight variation” or “candidate genes linked to fruit weight.” This ensures that our conclusions remain within the scope of the genetic association evidence.

Regarding the ANOVA analysis, we acknowledge that standard ANOVA does not inherently account for population stratification. To address this, we have now integrated a multi-layered validation (newly added Figure S4 and Table S5). We implemented an MLM (Q+K) framework using the rMVP package, which incorporates Principal Component Analysis (PCA) as fixed effects and a Kinship matrix (VanRaden algorithm) as random effects. As shown in the Q-Q plots (Figure S4A), the observed P-values strictly follow the expected null distribution, indicating that population structure and relatedness were effectively controlled. Under this rigorous model, SNPs within the SlKH47 (P_min = 2.48×10^-2) and SlKH43 (P_min = 4.69×10^-2) regions retained significant associations with fruit weight (Table S5). Additionally, to definitively exclude confounding effects from macro-evolutionary divergence, we performed haplotype-trait associations within the wild (SP), cherry (SLC), and cultivated (SLL) subgroups separately (Figure S4C). Notably, significant phenotypic differences between ancestral and elite haplotypes were observed even within the same subpopulation (e.g., within the SLC group for SlKH43), providing robust evidence that these loci exert functional effects independent of the broader population stratification (lines 221-233, 501-510, 801-810).

We believe these integrated statistical and stratified analyses provide a solid foundation for the reported associations.

Comment 3. The observed negative or weak correlations between RNA-seq and RT-qPCR for several SlKH genes (e.g., SlKH29, SlKH11) are discussed, but the explanation remains somewhat speculative. A brief summary sentence acknowledging that these discrepancies limit direct quantitative comparison would improve transparency. Also, please clarify whether primer positions overlap all annotated isoforms.

Response: Thank you for bringing this up. We have verified the gene structures for all 11 genes analyzed by RT-qPCR using the SL3.0 assembly in Ensembl Plants. We confirm that each of these genes is annotated with only one transcript isoform. Consequently, the RT-qPCR primers were designed to amplify a region present in all mature transcripts of their respective genes. Therefore, the discordance between RNA-seq and RT-qPCR profiles is likely driven by factors other than isoform-specific amplification, such as differences in the dynamic range, sensitivity of the detection platforms, or post-transcriptional buffering. In response to the suggestion, we have added a clarifying sentence in the Discussion section to enhance transparency regarding these discrepancies. Lines 694-699.

Minor comments

Comment 1. Please ensure consistent use of terms such as “KH domain,” “KH family,” and “SlKH genes” throughout the text. In several places, “cis-regulatory elements” and “RNA-binding motifs” are used in close proximity; consider clarifying that these refer to DNA-level promoter elements versus RNA-level motifs.

Response: We have conducted a thorough revision of terminology throughout the manuscript. "KH domain" is now consistently used to refer to the protein structural unit, “KH family” and “KH gene family” were unified to "KH gene family" for consistency, and "SlKH genes" specifically for tomato members. Although cis-regulatory element defaults to the DNA level, to distinguish DNA and RNA levels, we added specific DNA-level description (or explicitly link to "promoter regions") alongside with cis-regulatory element.

Comment 2. In Section 2.6, please specify the minimum haplotype frequency threshold (if any) used to retain haplotypes for analysis. For Ka/Ks calculations, clarify how alignment quality was assessed before analysis.

Response: In Section 2.6, we have now specified that a minimum haplotype frequency threshold of 5% was applied in our analysis. This filtering step was used to ensure the statistical power of the comparisons and to minimize the impact of rare variants on the overall association results. For the rare haplotype SlKH43-Hap2, which represents 4.7% of the population, we have added a brief justification for its inclusion due to its extreme and consistent phenotypic effect. For Ka/Ks analysis, we have added in Section 2.2: " To ensure alignment quality and reliability of substitution rate estimation, poorly aligned regions (gap > 50%) were identified and abnormal Ka/Ks values (e.g., 9.9999) were removed." Lines 173-176.

Comment 3. Some figure legends (e.g., Figures 3 and 5) would benefit from slightly more detailed explanations of color scales and symbols. Please double-check minor typographical errors in figure labels (e.g., spacing, inconsistent capitalization).

Response: We have optimized all figures in this study. Specifically, the cis-regulatory element visualization originally included in Figure 2 has been moved to a standalone Supplementary Figure S2 to improve clarity. In Figure 3B, the Ka/Ks values, previously represented by color, have now been replaced with numerical labels for greater precision. The legend for Figure 5 has been expanded to clearly distinguish between chromosomal representations and Ka/Ks bubble plots. Additionally, all figure labels have been carefully checked and corrected for consistent capitalization and spacing.

Comment 4. The manuscript is generally well written, but a light language polish would improve readability in the Discussion, where sentences occasionally become overly long.

Response: We fully agree with the reviewer that the Discussion section required language polishing to improve its readability. In response, we have carefully revised the entire Discussion by breaking down overly long and complex sentences, correcting minor grammatical issues, and ensuring consistent formatting of gene and protein names. We also clarified the logical connection between the single transcript isoform status of SlKH genes and the observed discrepancies between RT-qPCR and RNA-seq results in Section 4.1. These revisions have enhanced the clarity and flow of the manuscript, and all changes have been clearly marked in the revised version.

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

This manuscript provides a comprehensive genome-wide analysis of the K-homology (KH) domain gene family in tomato, identifying 47 SlKH genes and characterizing their evolutionary relationships, gene structures, expression patterns, and putative roles in fruit development. The authors employ a wide range of bioinformatics approaches, including phylogenetic and synteny analyses, haplotype analysis, expression profiling, and AlphaFold3-based structural modeling.

Minor comments:

Some figures are dense and difficult to interpret, particularly Figures 2 and 6; improving clarity and readability would be beneficial.
Table S2 is referenced in the text, but its specific contents are not adequately described.
It may be an issue on my end, but several figures appear to be repeated in the manuscript. For example, Figure 3 is shown at lines 353 and 365. Similar repetitions are observed for Figures 4, 6, and 10. Please verify and correct these duplications.

Author Response

Minor comments:

Comment 1. Some figures are dense and difficult to interpret, particularly Figures 2 and 6; improving clarity and readability would be beneficial.

Response: Thank you for pointing this out. We have comprehensively revised the figures as suggested. To reduce density, the cis-element visualization was extracted from Figure 2, and is now presented as Figure S2. This allows for a clearer view of both gene/protein structure and regulatory elements. We have redesigned the visualization of Ka/Ks ratios in Figure 3 to make the pairwise comparisons clearer and easier to interpret. For Figure 5, we have added a chromosome legend and detailed labels to provide better context for the syntenic blocks. We have optimized Figure 6 by increasing font sizes and colors of the base characters and other texts for better readability, and ensuring the color scheme is more distinct. Finally, all figures were thoroughly reviewed to standardize, capitalization, and formatting for consistency and professional presentation.

Comment 2. Table S2 is referenced in the text, but its specific contents are not adequately described.

Response: We agree that the specific contents of Table S2 should be clearly introduced in the text. In the revised manuscript, we have expanded the description in Section 3.1 to explicitly detail the comprehensive parameters provided in Table S2. Lines 302-306.

Comment 3. It may be an issue on my end, but several figures appear to be repeated in the manuscript. For example, Figure 3 is shown at lines 353 and 365. Similar repetitions are observed for Figures 4, 6, and 10. Please verify and correct these duplications.

Response: Thank you for your careful review and for bringing to our attention the potential duplication of figures in the manuscript. We have thoroughly examined the text and confirm that, due to an oversight during manuscript preparation, several figures were inadvertently repeated in the PDF version. We have now corrected these errors by removing the duplicate figures and ensuring that each figure appears only once at the appropriate location in the text. We have also verified all figure citations and numbering to maintain consistency throughout the manuscript. The revised version now presents each figure uniquely and clearly.

Author Response File: Author Response.pdf

Reviewer 3 Report

Comments and Suggestions for Authors

Dear Authors,

Thank you for the opportunity to review your manuscript, "The KH Gene Family in Tomato (Solanum lycopersicum): Genomic Expansion, Structural Basis of RNA Binding, and Haplotype Variation Associated with Fruit Weight" (agronomy-4135962). This is a comprehensive and well-executed study that represents a significant contribution to the fields of tomato genomics and fruit biology. The work is methodologically sound, the results are presented with exceptional clarity, and the conclusions are well-supported by the data.

Major Strengths:

Comprehensive Scope: The multi-layered approach, combining genome-wide identification, phylogenetic analysis, expression profiling, population genetics, and structural modeling, is a major strength. This provides a complete and robust characterization of the SlKH gene family.
Novel and Significant Findings: The identification of a significant association between haplotypes in SlKH47, SlKH43, and SlKH35 and fruit weight is a novel and highly impactful discovery. This finding is of immediate interest to the tomato breeding community and provides a strong foundation for future functional studies.
High-Quality Presentation: The manuscript is exceptionally well-written, and the figures and tables are of high quality. Figure 4, in particular, is an excellent example of how to clearly and effectively communicate complex association data. The logical flow of the manuscript makes it easy for the reader to follow your scientific narrative.
Scientific Rigor: The methods are appropriate, transparent, and described in sufficient detail to ensure reproducibility. The cautious and well-reasoned interpretation of the results, particularly in distinguishing correlation from causation, demonstrates high scientific integrity.

Minor Suggestions for Improvement: While the manuscript is already of a very high standard, the following minor suggestions may help to further refine it:

In the Discussion section: When discussing the structural model from AlphaFold 3 (Figure 6), you use the word "elucidated" to describe the mechanism. While the model is highly informative, it is still a prediction. You might consider slightly softening the language to something like "provides a predictive model for" or "suggests a molecular mechanism by which..." to more precisely reflect the in silico nature of the finding. This is a minor semantic point, as the inference is strong, but it can add a layer of technical precision.
In the Introduction: The introduction is very effective. To make it even more compelling, you could consider adding a single sentence that briefly touches upon why post-transcriptional regulation is so critical specifically for complex processes like fleshy fruit development and ripening (e.g., mentioning its role in mediating hormonal signals or ensuring precise temporal control). This could strengthen the rationale for studying RNA-binding proteins in tomato even further.

These are minor points intended for polishing and do not detract from the overall high quality and merit of your work. Congratulations on an excellent study. I am confident it will be of great interest to the readership of Agronomy.

Author Response

Minor Suggestions for Improvement: While the manuscript is already of a very high standard, the following minor suggestions may help to further refine it:

In the Discussion section: When discussing the structural model from AlphaFold 3 (Figure 6), you use the word "elucidated" to describe the mechanism. While the model is highly informative, it is still a prediction. You might consider slightly softening the language to something like "provides a predictive model for" or "suggests a molecular mechanism by which..." to more precisely reflect the in silico nature of the finding. This is a minor semantic point, as the inference is strong, but it can add a layer of technical precision.

Response: Thank you very much for your time and effort in reviewing our manuscript. We sincerely appreciate your constructive suggestions, which have helped us further refine the technical precision and improve the overall quality of our study. We have carefully addressed all the minor comments provided. We agree that precision in language is paramount, especially when differentiating between experimental data and predictive modeling. Following your recommendation, we have updated the text in Section 4.3 to soften the language. We have changed "elucidated the molecular basis of this constraint" to "provides a predictive model for the molecular basis of this constraint" to more accurately reflect that this mechanism is derived from in silico modeling (lines 742). Additionally, we have reviewed the entire Discussion section to ensure that descriptions related to AlphaFold 3 structural modeling are presented as predictive hypotheses rather than definitive experimental conclusions.

In the Introduction: The introduction is very effective. To make it even more compelling, you could consider adding a single sentence that briefly touches upon why post-transcriptional regulation is so critical specifically for complex processes like fleshy fruit development and ripening (e.g., mentioning its role in mediating hormonal signals or ensuring precise temporal control). This could strengthen the rationale for studying RNA-binding proteins in tomato even further.

Response: This is an excellent point that significantly strengthens the rationale of our study. We have added a sentence in the Introduction to emphasize the role of post-transcriptional regulation in fruit development. We inserted the following sentence: "precisely orchestrating this complex process requires post-transcriptional regulation to modulate ripening-related hormonal signals and ensure strict temporal control over transcriptome reprogramming." (lines 89-91)

We believe these minor revisions have enhanced the precision and impact of our manuscript. Thank you again for your time and effort in reviewing our work.

Author Response File: Author Response.pdf

Reviewer 4 Report

Comments and Suggestions for Authors

Although there are multiple plant papers with similar strategies on other crops, this manuscript has some novelty that makes it publishable. While KH-domain proteins have been extensively studied in Arabidopsis and rice, direct functional links between KH family members and tomato fruit size are scarce. Therefore, the association of SlKH47, SlKH43, and SlKH35 haplotypes with fruit weight represents a potentially novel contribution. However, the current evidence remains correlative and would benefit from functional validation or population structure-controlled association analysis.

The study presents intriguing correlations between SlKH haplotypes and fruit weight. However, functional validation through reverse genetics approaches (e.g., CRISPR knockout, RNAi, or overexpression lines) would be necessary to establish causality. The current conclusions should therefore be presented more cautiously as predictive or hypothesis-generating. So needs to tone down the language in the discussion part.

Comments

Name of species used

In lines 105-106

The complete genome, protein sequences, and general genome feature files of Solanum lycopersicum, Oryza sativa, Arabidopsis thaliana, Solanum tuberosum, and Vitis vinifera were downloaded…” But later phylogeny uses C. sativa instead of S. tuberosum.

In lines 162-163

The five species (S. lycopersicum, A. thaliana, V. vinifera, C. sativa, and O. sativa).

No of genes used for qPCR? Clarify it

In Line 216

“Relative Expression of 8 KH family genes…”

In lines 226-227

It says it has specific primers for 10 SIKH genes

Furthermore, in lines 247-248, it stated 12 SIKH gene RNA binding motif. Check for the exact number of SIKH genes studied (8, 10, or 12).

Intron use?

In the 185-187

“filtered to retain sites located in the UTR and CDS regions, while intronic and intergenic variants were excluded.”

Later in the lines 753-756

It claims SIKH43 exhibits a signature of selection enriched in intronic regions…”

Correlation value

In the line 570-574 … SlKH29 (R = -0.79) but But Discussion states in line 672 SlKH29 (R = -0.98)

In lines 357-358

I stated that SlKH5 (Chr2) – SlKH22 (Chr2) are segmental. If these genes are near each other, this may be a tandem duplication, not a segmental.

Ka/Ks interpretation overstated

Author need to think about this interpretation if it is overstated. Line 360-363, line 440-444, line 706-710. If you claim this, you need to also report alignment filtering or reliability checks also.

Grammar errors

Line 121, conserved domain, line 197 visualizations, line 371 blue, line 147 were excluded, line 248 direct experimental evidence, line 594 investigation.

The supplementary figure.

In the method section, you are not required to put the figure caption; just state the figure number.

Tone town

Line 623-629, 728-737, 778-779. These imply an experimentally validated atomic mechanism, but the study uses AF3 prediction only.

Figure layout correction

Many figures are snapped out in the pdf version, hope it will be corrected in the final submitted version. Check out all those figures and lay them out properly. Figure 3A, 3B, 4 B, 10A, 10B.

Detail method explanation.

Exact genome assembly versions/release numbers, exact parameters for MEME (motif width range, number of motifs, background model), criteria for calling “segmental duplication” vs other duplication types in MCScanX,

References

Line 927, DOI appears inconsistent with the journal.

Some DOIs appear duplicated/mismatched (e.g., one citation shows an IJMS DOI attached to an unrelated paper). This should be checked carefully by the authors.

Inconsistent DOI formatting (“DOI:https://DOI.org/…”).

Author Response

Comment 1. Name of species used.

In lines 105-106. The complete genome, protein sequences, and general genome feature files of Solanum lycopersicum, Oryza sativa, Arabidopsis thaliana, Solanum tuberosum, and Vitis vinifera were downloaded…” But later phylogeny uses C. sativa instead of S. tuberosum.

Response: We sincerely apologize for this inconsistency. In the early stages of this study, potato (Solanum tuberosum) was initially included for comparative analysis. However, to align the evolutionary study with the downstream RNA-binding motif analysis (which utilized specific empirical data available for Cannabis sativa in the CISBP-RNA database), we subsequently replaced S. tuberosum with C. sativa. Unfortunately, a reference to S. tuberosum was inadvertently left in the Methods section. We have now corrected this in the revised manuscript (Lines 108-112).

Comment 2. In lines 162-163. The five species (S. lycopersicum, A. thaliana, V. vinifera, C. sativa, and O. sativa). No of genes used for qPCR? Clarify it.

Response: We thank the reviewer for pointing out this lack of clarity. We have revised the manuscript to clearly distinguish between the species used for comparative genomics and the specific genes used for expression analysis. The five species mentioned (S. lycopersicum, A. thaliana, V. vinifera, C. sativa, and O. sativa) were used for phylogenetic and comparative synteny analyses to understand the evolution of the KH gene family. For the RT-qPCR expression profiling, we specifically focused on 10 SlKH genes from S. lycopersicum.

Comment 3. In Line 216. “Relative Expression of 8 KH family genes…” In lines 226-227. It says it has specific primers for 10 SIKH genes. Furthermore, in lines 247-248, it stated 12 SIKH gene RNA binding motif. Check for the exact number of SIKH genes studied (8, 10, or 12).

Response: We apologize for the clerical error in the sub-heading of Section 2.9. We have corrected the manuscript to ensure consistency in the gene numbers used for each analysis. A total of 10 SlKH genes were selected for RT-qPCR validation. We have corrected the title of Section 2.9 (Lines 249). For the RNA-binding motif analysis (Section 2.10), we utilized 12 SlKH proteins for which "Inferred Motif Evidence" was available in the CISBP-RNA database. The remaining members of the family were excluded from this specific analysis as they lacked PWM models in the database.

Comment 4. Intron use?

In the 185-187.“filtered to retain sites located in the UTR and CDS regions, while intronic and intergenic variants were excluded.” Later in the lines 753-756. It claims SIKH43 exhibits a signature of selection enriched in intronic regions…

Response: The intronic regions were indeed used for the genome-wide selection analysis, but excluded for haplotype-phenotype association analysis. We have clarified this distinction in the Methods section (Lines 204-207) to remove the contradiction.

Comment 5. Correlation value.

In the line 570-574 … SlKH29 (R = -0.79) but But Discussion states in line 672 SlKH29 (R = -0.98). In lines 357-358. I stated that SlKH5 (Chr2) – SlKH22 (Chr2) are segmental. If these genes are near each other, this may be a tandem duplication, not a segmental.

Response: We apologize for the inconsistency in reporting the correlation coefficient (R) for SlKH29. We have re-checked our data analysis pipeline, and the correct value is -0.99. We have updated section 3.8 (Line 589), and removed the value in Discussion to ensure consistency throughout the manuscript.

We appreciate the reviewer's guidance in improving the accuracy of the genomic analysis. Based on the physical genomic coordinates obtained from the GFF3 annotation file, we determined the physical distance and the number of intervening genes between each pair. Pair 1 (SlKH5/SlKH22, Solyc02g067210 / Solyc02g088117) is located on Chromosome 2, separated by approximately 12.98 Mb and 1672 intervening genes. Pair 2 (SlKH19/SlKH20, Solyc10g078750 / Solyc10g080480) is located on Chromosome 10, separated by approximately 1.25 Mb and 170 intervening genes. According to the classification criteria established by Cannon et al. (2004), tandem duplications are typically defined as genes located in close proximity, generally separated by no more than five intervening genes. Given the large physical distances and the high number of intervening genes, these pairs are more accurately classified as segmental duplications or proximal duplications, not tandem duplications. We have added the reference to section 3.3 (Lines 377).

Related reference:
Cannon, S.B.; Mitra, A.; Baumgarten, A.; Young, N.D.; May, G. The roles of segmental and tandem gene duplication in the evolution of large gene families in Arabidopsis thaliana. BMC Plant Biol. 2004, 4, 10. DOI: 10.1186/1471-2229-4-10.

Comment 6. Ka/Ks interpretation overstated.

Response: We have revised the text in the Discussion to adopt a more cautious tone, acknowledging the limitations of global Ka/Ks ratios, especially when sequence divergence is low or when the signal is driven by only a few sites. Specifically, we have clarified that even if localized Ka/Ks values within structural domains exceed 1, these signals may still be overestimated due to methodological limitations or insufficient sequence divergence (Lines 735-738).

Comment 7. Grammar errors

Line 121, conserved domain, line 197 visualizations, line 371 blue, line 147 were excluded, line 248 direct experimental evidence, line 594 investigation.

Response: Thank you for pointing out these grammatical errors and typos. We have carefully reviewed the manuscript and corrected all the mentioned issues (Lines 121 to 127; 197 to 232; 371 to 395, “blue” was changed to “highlighted”; 147 to 155; 248 to 273, and 594 to 612) in the revised version.

Comment 8. The supplementary figure.

In the method section, you are not required to put the figure caption; just state the figure number.

Response: We sincerely appreciate the reviewer’s reminder regarding the formatting of supplementary figure citations. We have removed detailed captions or extensive descriptions of supplementary files are included in this section. All supplementary data are cited only by their respective numbers (e.g., Table S1, Figure S1) to maintain conciseness.

Comment 9. Tone town

Line 623-629, 728-737, 778-779. These imply an experimentally validated atomic mechanism, but the study uses AF3 prediction only.

Response: We sincerely apologize for the overstatement in our original manuscript regarding the AlphaFold 3 results. We agree that AF3 provides powerful predictive insights, but not direct experimental proof of the atomic mechanism. To address this, we have extensively revised the Results section (Section 3.10, lines 633-651) and the Discussion section (Section 4.3, lines 740-777). We have replaced assertive terms like "determinants" and "dissect the molecular mechanics" with cautious, predictive terminology such as "predictive model," "putative," and "suggests," to accurately reflect the in-silico nature of these findings.

Comment 10. Figure layout correction

Many figures are snapped out in the pdf version, hope it will be corrected in the final submitted version. Check out all those figures and lay them out properly. Figure 3A, 3B, 4 B, 10A, 10B.

Response: We apologize for the formatting issues in the PDF version of the manuscript. We have thoroughly reviewed all figures, and have adjusted their layout and resolution to ensure they are properly displayed and fully visible in the final submission.

Comment 11. Detail method explanation.

Response: We thank the reviewer for pointing out these missing details. We have comprehensively updated the Methods section to include all the requested methodological parameters to ensure full reproducibility. We have added the exact version/release numbers for all species used (Solanum lycopersicum SL 3.0, Oryza sativa IRGSP-1.0, Arabidopsis thaliana TAIR10, and Vitis vinifera ASM3070453v1 from Ensembl Plants). For Cannabis sativa, we utilized the Purple Kush genome assembly from CannabisGDB rather than Ensembl Plants. This choice was made to ensure direct consistency with the protein sequence data retrieved from the CISBP-RNA database, which serves as the primary reference for the RNA-binding motif analysis and is based on this specific assembly. Using the same genomic resource across both motif scanning and phylogenetic analyses ensures the reliability and traceability of the comparative genomic data (lines 109-112). Additionally, we have explicitly stated the parameters using, including the motif width range (6 to 50 amino acids), the number of motifs (10), and the background model, a zero-order Markov background model derived from the input dataset (lines 132-136). We also added the specific criteria used to differentiate duplication types, clarifying that tandem duplications were defined as homologous genes separated by fewer than five intervening genes on the same chromosome, while segmental duplications were designated when homologous pairs were located within large collinear blocks containing at least five conserved gene pairs (lines 160-167).

Comment 12. References

Line 927, DOI appears inconsistent with the journal.

Some DOIs appear duplicated/mismatched (e.g., one citation shows an IJMS DOI attached to an unrelated paper). This should be checked carefully by the authors.

Inconsistent DOI formatting (“DOI:https://DOI.org/…”).

Response: All references have now been carefully reviewed and corrected where necessary. We appreciate the reviewer's attention to this detail, which has helped improve the accuracy and professionalism of our manuscript.

Author Response File: Author Response.pdf

Reviewer 5 Report

Comments and Suggestions for Authors

The manuscript addresses a relevant topic within the field of agronomy and presents data that may contribute to current knowledge. The objectives are clearly stated, and the study design is generally appropriate. However, several aspects of the manuscript would benefit from clarification and refinement to improve its overall quality and impact.

Strengthen the Introduction.
While the background is adequate, the research gap should be more explicitly articulated. Incorporating additional recent references (last 3–5 years) would better contextualize the study within current developments in the field.

Clarify Methodological Details. Certain experimental procedures require additional detail to ensure reproducibility. Please clarify replication strategy, statistical analyses (including assumptions and software used), and any parameter settings critical to data interpretation.

Improve Presentation of Results. The results are well organized; however, interpretation occasionally overlaps with data presentation. Separating descriptive results from discussion will enhance clarity. Consider streamlining repetitive statements.

Enhance Figures and Tables: Some figures and tables would benefit from more descriptive captions that allow them to be understood independently of the main text. Ensure all abbreviations are defined and units are consistently reported.

Expand the Discussion. The discussion should more clearly compare findings with previously published studies and highlight the practical or agronomic implications of the results. Including a brief acknowledgment of study limitations would strengthen the conclusions.

Language Revision: The manuscript would benefit from careful English language editing to correct grammatical inconsistencies and improve readability.

Overall, the study shows merit and potential relevance. Addressing the points above will significantly improve clarity, rigor, and presentation quality.

Comments on the Quality of English Language

The manuscript is generally understandable; however, several grammatical inconsistencies, awkward sentence constructions, and minor typographical errors are present throughout the text. In some sections, long and complex sentences reduce clarity and readability. Additionally, inconsistencies in verb tense and article usage were observed.

It is recommended that the manuscript undergo careful English language editing, preferably by a fluent English speaker or a professional editing service, to improve clarity, flow, and overall readability. Addressing these language issues will enhance the scientific communication and strengthen the presentation of the work.

Author Response

Comment 1. Strengthen the Introduction. While the background is adequate, the research gap should be more explicitly articulated. Incorporating additional recent references (last 3–5 years) would better contextualize the study within current developments in the field.

Response: We sincerely appreciate this constructive feedback. We have carefully revised the Introduction to more explicitly articulate the research gap. Specifically, in the fourth paragraph of the revised manuscript, we highlighted that while KH-domain proteins have been studied in model plants, direct functional and genetic links between KH family members and agronomically important traits (such as tomato fruit weight and ripening) remain remarkably scarce (lines 77-78). Furthermore, to better contextualize our study within current developments, we have incorporated new reference. The additional citation covers recent pan-genomic advances in crop RNA-binding proteins and the latest applications of structural prediction technologies in plant biology, thereby strengthening the rationale for our integrated evolutionary and structural approach (lines 88-94).

Related references:
Rehman, S.; Bahadur, S.; Xia, W.; Runan, C.; Ali, M.; Maqbool, Z. From genes to traits: trends in RNA-binding proteins and their role in plant trait development: a review. Int. J. Biol. Macromol. 2024, 282, 136753. DOI: 10.1016/j.ijbiomac.2024.136753.

Bach-Pages, M.; Menon, A.; Wadley, B.; Castello, A.; Preston, G.M. RNA-binding proteins orchestrating immunity in plants. Plant J. 2025, 123, e70433. DOI:10.1111/tpj.70433.

Lin, P.Y.; Huang, S.C.; Chen, K.L.; Huang, Y.C.; Liao, C.Y.; Lin, G.J.; Lee, H.; Chen, P.Y. Analysing protein complexes in plant science: insights and limitation with AlphaFold 3. Bot. Stud. 2025, 66, 14. DOI:10.1186/s40529-025-00462-2.

Comment 2. Clarify Methodological Details. Certain experimental procedures require additional detail to ensure reproducibility. Please clarify replication strategy, statistical analyses (including assumptions and software used), and any parameter settings critical to data interpretation.

Response: Thank you for highlighting the importance of methodological transparency. We have comprehensively updated the Materials and Methods section to ensure full reproducibility, addressing this alongside similar concerns raised by other reviewers. Specifically, we detailed the statistical analyses, including exact parameter settings used for software and algorithms, such as the motif width range and background model for MEME (lines 132-136), and the specific criteria used to differentiate duplication types in MCScanX (lines 160-167).

These additions ensure that all parameters critical to data interpretation are clearly documented.

Comment 3. Improve Presentation of Results. The results are well organized; however, interpretation occasionally overlaps with data presentation. Separating descriptive results from discussion will enhance clarity. Consider streamlining repetitive statements.

Response: We thank the reviewer for this suggestion, which has greatly helped us enhance the clarity of the manuscript. We have meticulously reviewed the manuscript to ensure that the Results section remains strictly descriptive. Speculative interpretations, comparisons with previous studies, and detailed mechanisms of selection that were previously included in the Results section have been removed.

We have also streamlined the Discussion section by removing repetitive descriptions of specific data points that were already fully detailed in the Results. The Discussion now focuses solely on interpreting the biological significance and context of our findings. (e.g., lines 684-688, lines 699-704, lines 714-717)

Comment 4. Enhance Figures and Tables: Some figures and tables would benefit from more descriptive captions that allow them to be understood independently of the main text. Ensure all abbreviations are defined and units are consistently reported.

Response: We have optimized all figures and tables in the revised manuscript. Following your advice and similar feedback from other reviewers, we have significantly expanded the figure legends (e.g., Figures 3 and 5) to provide more detailed explanations of color scales, symbols, and abbreviations, ensuring they can be understood independently of the main text. We also redesigned dense figures for better readability (e.g., extracting the cis-element visualization to Figure S2) , and verified all abbreviations and numerical labels for consistency.

Comment 5. Expand the Discussion. The discussion should more clearly compare findings with previously published studies and highlight the practical or agronomic implications of the results. Including a brief acknowledgment of study limitations would strengthen the conclusions.

Response: In Section 4.1, we now compare our proposed cytoplasmic role for SlKH proteins with recent functional evidence from tomato, where the RNA-binding protein SlRBP1 interacts with translation initiation factor SleIF4A2 to regulate photosynthetic gene expression (Ma et al., 2022). This supports our hypothesis that SlKH members may similarly couple nuclear splicing with cytoplasmic translational control (lines 676-680). In Section 4.2, we contextualize our finding that Motif 1 serves as an evolutionary anchor by referencing a recent structural analysis of over 40 KH domain-nucleic acid complexes, which defined a conserved Helix-clasp-Helix-Strand-Loop (HcH-SL) recognition motif centered on the GXXG clasp (Tainer & Tsutakawa, 2025). This provides structural validation for the ultra-conservation of Motif 1 observed across angiosperms in our study (lines 720-724). In Section 4.4, we now link our observation of multi-level selection signatures (UTRs, introns, and CDS) to the broader regulatory networks governing tomato fruit development, where transcription factors, regulatory RNAs, and epigenetic modifiers coordinately control flowering and fruit set (Shchennikova et al., 2023). This places SlKH genes within an integrated model of domestication-driven regulatory fine-tuning (lines 775-781).

We have incorporated brief acknowledgments of study limitations, particularly regarding the need for future experimental validation of Ka/Ks based positive selection signals (lines 736-739), AF3 prediction (lines 751-754), and the need for larger population studies to confirm causal variants in haplotype analysis (lines 802-805).

Comment 6. Language Revision: The manuscript would benefit from careful English language editing to correct grammatical inconsistencies and improve readability.

Response: We deeply appreciate the reviewer’s helpful suggestion regarding the language and readability of our manuscript. We have taken this comment seriously and carefully revised the entire text.

The manuscript has been extensively edited by an experienced native-level academic editor to correct grammatical inconsistencies, refine vocabulary, and polish the overall English expression. The language has been smoothed for clarity, flow, and structural coherence. We hope the revised manuscript now reads naturally and meets the high linguistic standards of the journal.

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

I appreciate the authors’ thorough and thoughtful responses to the previous round of comments. The newly added analyses and revisions substantially improve the rigor, clarity, and interpretative balance of the manuscript. In particular, the additional region-specific Ka/Ks analysis, the MLM-based population structure correction, and the clarification regarding isoform annotation significantly strengthen the study. The manuscript is now much clearer and more cautious in its claims.

The revisions to terminology, figure clarity, and language in the Discussion have improved readability. I appreciate the authors’ careful attention to detail in this revision.

Reviewer 4 Report

Comments and Suggestions for Authors

Author have adressed all my comments. The manuscript is ok for publication.

Article Menu

The KH Gene Family in Tomato (Solanum lycopersicum): Genomic Expansion, Structural Basis of RNA Binding, and Haplotype Variation Associated with Fruit Weight

Further Information

Guidelines

MDPI Initiatives

Follow MDPI