A Hybrid Analytical Framework for Cracking and Some Fruit Quality Features in Sweet Cherries
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsThe abstract should be restructured using a Background–Methods–Results–Conclusions format and should include numerical results.
Methodological Innovation: The integration of PCA (for dimensionality reduction), Random Forest (for nonlinear prediction), and SHAP (for interpretability) is a novel contribution to horticultural phenotyping. Highlight this triad in the Abstract and Introduction as a replicable pipeline for trait-dissection studies.
The research design is generally appropriate but requires supplementation of critical details. In the methodology section, exact models and manufacturers of instruments must be specified, along with their accuracy, to ensure reproducibility.
Measurement protocols for individual parameters should be described in detail rather than solely referencing prior studies. Criteria for harvest maturity (for example, color index, TSS threshold) and the methodology for determining the harvest stage must be clearly outlined.
Additionally, it is necessary to clarify how fruits were allocated into replicates and whether replicates contained comparable proportions of fruit by size. The randomized block design with three replications lacks clarity on fruit size distribution across replicates; it should be specified whether replicates contained equal proportions of small, medium, and large fruits to avoid sampling bias.
There is also a lack of information on calibration protocols, such as the frequency of penetrometer calibration using certified weights, which is critical for reproducibility. For all measuring instruments, the type and calibration procedures must be stated.
There is no justification for excluding field validation of cracking predictions.
Hypotheses are implied but not explicitly stated, for example, the hypothesis that biochemical traits are more important than physical traits in cracking prediction should be clearly formulated.
The use of excessive rhetorical questions distracts from scientific focus and should be avoided.
The term "full maturity" for fruit ripeness criteria is undefined; measurable parameters should be provided.
The study analyses data from 2020–2021 but does not justify differences in experimental protocols between years; it should be clarified whether environmental variations, such as precipitation in Table 1, influenced methodological adjustments.
The manuscript mentions climatic data but omits storage humidity levels during fruit sampling; specific humidity parameters during fruit storage and transport should be added to ensure reproducibility.
Results presentation is organized, but table and figure captions require revisions. For Table 3, the caption should be expanded to include interpretation of LSD (Least Significant Difference) and CV (Coefficient of Variation), and error margins for presented values should be specified.
In Figure 5 (SHAP Summary Plot), units for DPPH antioxidant capacity (e.g., mmol TE/g) are missing.
Statistical analyses are appropriate, but p-values and standard deviations should be consistently reported across all results.
The manuscript repeatedly emphasizes the relationship between total soluble solids (TSS) and cracking susceptibility without advancing to a mechanistic synthesis. For instance, while TSS is noted as a predictor of cracking risk, the biochemical basis (e.g., osmotic stress mechanisms) remains underdeveloped. Consolidate these sections to clarify whether TSS acts as a direct driver or a correlated variable.
The SHAP analysis assumes feature independence, which is unrealistic in biological systems where variables like fruit size and antioxidant levels are often correlated. Address how multicollinearity might distort SHAP values (e.g., overestimating DPPH’s impact due to its linkage to fruit weight). Include sensitivity analyses or variance inflation factors (VIF) to quantify these interactions.
The SHAP interpretation is tied to the specific Random Forest model used. Discuss whether similar results would emerge with alternative algorithms (e.g., gradient boosting) or datasets, and propose validation steps for broader applicability.
Link the hybrid framework’s findings to concrete breeding strategies.
Replace vague statements like "mutual effects" with specific outcomes.
Author Response
Comments1:
The abstract should be restructured using a Background–Methods–Results–Conclusions format and should include numerical results.
Response1:
The abstract has been revised according to the reviewers' suggestions, with an emphasis on background, methods, results, and conclusion.
Comments 2:
Methodological Innovation: The integration of PCA (for dimensionality reduction), Random Forest (for nonlinear prediction), and SHAP (for interpretability) is a novel contribution to horticultural phenotyping. Highlight this triad in the Abstract and Introduction as a replicable pipeline for trait-dissection studies.
Response 2:
We added following sentences to the abstract.
A hybrid analytical pipeline was developed by integrating Principal Component Analysis (PCA) for dimensionality reduction, Random Forest regression for nonlinear prediction, and Shapley Additive Explanations (SHAP) for interpretability. This triad offers a robust and replicable framework for trait-dissection studies in horticultural phenotyping, enabling deeper insights into complex trait interactions.
We also revised the Introduction to explicitly emphasize the methodological innovation combining PCA, Random Forest, and SHAP. This triad is now clearly presented as a replicable and interpretable pipeline for trait-dissection studies in horticultural phenotyping.
Comments 3: The research design is generally appropriate but requires supplementation of critical details. In the methodology section, exact models and manufacturers of instruments must be specified, along with their accuracy, to ensure reproducibility.
Response 3:
According to the reviewer’s suggestion, the basic specifications, brands, and accuracy levels of the instruments used have been included in the Methods section.
Comments 4:
Measurement protocols for individual parameters should be described in detail rather than solely referencing prior studies. Criteria for harvest maturity (for example, color index, TSS threshold) and the methodology for determining the harvest stage must be clearly outlined.
Response 4:
We would like to express our sincere thanks to the reviewer for this specific comment, as it provided us with an opportunity to address a gap in the clarity of our manuscript. The harvest was carried out when each cultivar reached its characteristic color and TSS content values. This information has also been included in the Methods section. As with other revisions, this addition has been highlighted in red within the text.
Comments 5:
Additionally, it is necessary to clarify how fruits were allocated into replicates and whether replicates contained comparable proportions of fruit by size. The randomized block design with three replications lacks clarity on fruit size distribution across replicates; it should be specified whether replicates contained equal proportions of small, medium, and large fruits to avoid sampling bias.
Response 5:
To ensure homogeneity among the fruits in the experiment, an equal number of small, medium, and large fruits were used. This detail has been included in the Methods section, and the relevant text has been highlighted in red.
Comments 6:
There is also a lack of information on calibration protocols, such as the frequency of penetrometer calibration using certified weights, which is critical for reproducibility. For all measuring instruments, the type and calibration procedures must be stated.
Response 6:
The instruments used in the analyses were calibrated according to the calibration protocols provided by the manufacturers. These calibration protocols have been specified in the Methods section.
Comments 7:
There is no justification for excluding field validation of cracking predictions.
Response 7:
The local genotypes included in the experiment are relatively new, and studies on their field cracking performance are still ongoing. However, the field cracking performance of the standard cultivars has been provided in parentheses and highlighted in red in the Materials section.
Comments 8:
The term "full maturity" for fruit ripeness criteria is undefined; measurable parameters should be provided.
Response 8:
As stated above, full fruit maturity was defined based on cultivar-specific color and cultivar-specific TSS values.
Comments 9:
The study analyses data from 2020–2021 but does not justify differences in experimental protocols between years; it should be clarified whether environmental variations, such as precipitation in Table 1, influenced methodological adjustments.
Response 9:
Since the data were similar, the average of the two years was presented. Additionally, the climatic conditions of both years were also comparable.
Comments 10:
The manuscript mentions climatic data but omits storage humidity levels during fruit sampling; specific humidity parameters during fruit storage and transport should be added to ensure reproducibility.
Response 10:
Fruit characteristics in fresh fruits were determined under approximate laboratory conditions (20±1 °C). The cracking index was also measured under the same laboratory conditions. Biochemical measurements were conducted after transporting the samples to Istanbul at -80 °C and were performed under laboratory conditions in Istanbul. This information has been included in the Methods section and is highlighted in red.
Comments 11:
Results presentation is organized, but table and figure captions require revisions. For Table 3, the caption should be expanded to include interpretation of LSD (Least Significant Difference) and CV (Coefficient of Variation), and error margins for presented values should be specified.
Response 11:
The reviewer’s suggestions, which we also agree with, have been implemented in the tables.
Comments 12:
In Figure 5 (SHAP Summary Plot), units for DPPH antioxidant capacity (e.g., mmol TE/g) are missing.
Response 12:
The necessary revisions have been made in accordance with the reviewer’s suggestion.
Comments 13:
The manuscript repeatedly emphasizes the relationship between total soluble solids (TSS) and cracking susceptibility without advancing to a mechanistic synthesis. For instance, while TSS is noted as a predictor of cracking risk, the biochemical basis (e.g., osmotic stress mechanisms) remains underdeveloped. Consolidate these sections to clarify whether TSS acts as a direct driver or a correlated variable.
Response 13:
A new text has been added in red to the section of the Discussion where the relationships between cracking and TSS are explained.
Comments 14:
The SHAP analysis assumes feature independence, which is unrealistic in biological systems where variables like fruit size and antioxidant levels are often correlated. Address how multicollinearity might distort SHAP values (e.g., overestimating DPPH’s impact due to its linkage to fruit weight). Include sensitivity analyses or variance inflation factors (VIF) to quantify these interactions.
Response 14:
Thank you for highlighting the issue regarding the assumption of feature independence in SHAP analysis. In response, we conducted a multicollinearity diagnostic using Variance Inflation Factors (VIF) to evaluate the degree of interdependence among predictor variables. The results revealed substantial multicollinearity among biochemical traits, especially between DPPH, anthocyanin, flavonoids, and phenolic (VIF > 50). Based on this finding, we revised the Discussion section to explicitly address how shared variance may lead to overestimation of individual feature contributions in SHAP outputs. The following paragraph has been added to the Discussion section to clarify this point.
Since SHAP analysis assumes feature independence, we conducted a multicollinearity diagnostic using Variance Inflation Factors (VIF). Results indicated high collinearity among biochemical traits, particularly between DPPH, anthocyanin, flavonoids, and phenolics (VIF > 50). This suggests that SHAP importance scores may overrepresent individual biochemical contributions due to shared variance. Therefore, interpretability results should be viewed in light of these interdependencies.
Comments 15:
The SHAP interpretation is tied to the specific Random Forest model used. Discuss whether similar results would emerge with alternative algorithms (e.g., gradient boosting) or datasets, and propose validation steps for broader applicability.
Response 15:
Thank you for this important observation. We acknowledge that SHAP interpretations are inherently tied to the specific model architecture. To assess the robustness and generalizability of our findings, we plan to extend our analysis using alternative algorithms such as Gradient Boosting Machines (GBM) and XGBoost. Additionally, validating the analytical framework on independent datasets or across different cultivars will be a key focus of future work. This strategy will help determine whether the identified feature contributions are stable across models and biological contexts. A note to this effect has been added to the Discussion section.
It is important to note that SHAP-based interpretability is inherently model-specific. While Random Forest served as the core algorithm in this study, future analyses should include additional ensemble models such as Gradient Boosting and XGBoost to assess the consistency of trait importance rankings. Moreover, applying the hybrid framework to external datasets or other sweet cherry cultivars would further test the robustness and generalizability of the observed relationships. Such validation steps are critical for translating the findings into widely applicable breeding strategies.
Comments 16:
Link the hybrid framework’s findings to concrete breeding strategies.
Response 16:
In response to this critique, a new paragraph has been added to the final section of the Results, and the text has been highlighted in red.
Comments 16:
Replace vague statements like "mutual effects" with specific outcomes.
Response 16:
Thank you for this valuable suggestion. In response, we have added a new paragraph at the end of the Results section, which clearly links the findings of the hybrid analytical framework to concrete breeding strategies. Specifically, we highlight how the identification of genotypes combining high antioxidant capacity (e.g., DPPH) with low cracking susceptibility can inform selection decisions. Additionally, vague expressions such as “mutual effects” have been revised and replaced with specific trait interactions and measurable outcomes. The revised section has been added to the discussion section of the manuscript for your review.
These findings have direct implications for breeding programs aiming to reduce cracking susceptibility while enhancing fruit quality. For instance, genotypes exhibiting both high DPPH antioxidant activity (>13.5 µmol TE/g) and low cracking index (<10%) represent promising candidates for cultivar development. This dual selection criterion enables breeders to target genotypes that combine physiological resilience with nutritional value. Moreover, identifying the dominant influence of stalk thickness and anthocyanin content on cracking risk provides actionable insights for parent selection and cross-breeding strategies.
Author Response File: Author Response.pdf
Reviewer 2 Report
Comments and Suggestions for AuthorsThis manuscript presents relevant and promising research that uses modern analytical tools to tackle a complex agronomic problem. However, it requires a more thorough description of methods, model validation, and limitations, along with improvements in English language clarity and technical accuracy. Upon revision, the paper has the potential to make a valuable contribution to the literature on cherry breeding and postharvest quality analysis.
Below my more detailed comments:
The Introduction part is somewhat disorganized. The issue of fruit cracking is introduced without prior context or explanation, making it feel disconnected at first. The review of literature regarding the causes of cracking (e.g., environmental, anatomical, biochemical) is minimal. I suggest reorganize the flow of the Introduction by clearly defining the problem of fruit cracking, summarizing current research, and then justifying the hybrid analytical approach.
In the Methodology section there are many unknowns. The number of genotypes and specific details about sample size per genotype are unclear. The methodology for RF is underspecified: key parameters (e.g., number of trees, maximum depth), metrics of model performance (e.g., R², RMSE), and validation method (e.g., cross-validation) are missing. The calculation of the cracking index is not explained; readers unfamiliar with the referenced method would benefit from at least a formula or short explanation. You should provide more detail on the RF implementation and cracking index computation, and include any preprocessing steps such as data normalization. Additionally, there is lack information about the analysis of phenols, flavonoids, anthocyanins and antioxidant activity (DPPH) (they are referred to the literature only).
Results part: lack precision in reporting. For example, exact values for correlation coefficients, variable importance ranks, or explained variance in PCA are not always provided. There is no information on the predictive performance of the RF model (e.g., how well the model explains variance in cracking index). Including quantitative model performance metrics (e.g., R², MAE) and correcting figure captions will enhance clarity.
The disussion section lacks a critical reflection on the limitations of the study, such as its geographic or temporal constraints, and the potential for overfitting. I suggest adding more explanations of the mechanistic role of antioxidants in reducing cracking (e.g., effects on cell wall integrity, water uptake regulation). It is also possible to discuss why antioxidants protect against cracking (e.g. by stabilizing the cell wall structure) in more detail. Have whether similar relationships been observed in other fruit species or in the literature in general?
Formatting and Graphics: Ensure that all figures (Figs. 2–6) and tables are legible and appropriately labeled. For example, Figs. 5 and 6 should have clear legends and units. Some graphics are very illegible (at least for me).
There are occasional grammatical issues and awkward phrasing throughout the text that detract from its readability.
Here some examples: “how consumer wants cherries”, "It has was revealed".
I recommend careful proofreading.
Author Response
Comments 1: This manuscript presents relevant and promising research that uses modern analytical tools to tackle a complex agronomic problem. However, it requires a more thorough description of methods, model validation, and limitations, along with improvements in English language clarity and technical accuracy. Upon revision, the paper has the potential to make a valuable contribution to the literature on cherry breeding and postharvest quality analysis.
Response 1: We sincerely thank the reviewer for their valuable comments. The manuscript has been revised for language clarity and checked using the Grammarly program.
Comments 2: Below my more detailed comments: The Introduction part is somewhat disorganized. The issue of fruit cracking is introduced without prior context or explanation, making it feel disconnected at first. The review of literature regarding the causes of cracking (e.g., environmental, anatomical, biochemical) is minimal. I suggest reorganize the flow of the Introduction by clearly defining the problem of fruit cracking, summarizing current research, and then justifying the hybrid analytical approach.
Response 2: A paragraph addressing fruit cracking, its importance, and underlying causes has been added to the Introduction section. The addition has been highlighted in red.
Comments 3: In the Methodology section there are many unknowns. The number of genotypes and specific details about sample size per genotype are unclear. The methodology for RF is underspecified: key parameters (e.g., number of trees, maximum depth), metrics of model performance (e.g., R², RMSE), and validation method (e.g., cross-validation) are missing. The calculation of the cracking index is not explained; readers unfamiliar with the referenced method would benefit from at least a formula or short explanation. You should provide more detail on the RF implementation and cracking index computation, and include any preprocessing steps such as data normalization. Additionally, there is lack information about the analysis of phenols, flavonoids, anthocyanins and antioxidant activity (DPPH) (they are referred to the literature only).
Response 3: The necessary revisions, such as methods of biochemical analysis, cracking index formula, have been made in the Methods section and highlighted in red.
Comments 4: Results part: lack precision in reporting. For example, exact values for correlation coefficients, variable importance ranks, or explained variance in PCA are not always provided. There is no information on the predictive performance of the RF model (e.g., how well the model explains variance in cracking index). Including quantitative model performance metrics (e.g., R², MAE) and correcting figure captions will enhance clarity.
Response 4: Thank you for this valuable comment. We have revised the Results section to include precise reporting of correlation coefficients, PCA-explained variance (PC1 = 47.6%, PC2 = 20.7%), and variable importance rankings. Additionally, we now provide Random Forest model performance metrics (R² = 0.63, MAE = 2.37, RMSE = 3.66) to quantify predictive accuracy. Figure captions have also been corrected to reflect the analytical content and clarify model outputs.
Comments 5: The disussion section lacks a critical reflection on the limitations of the study, such as its geographic or temporal constraints, and the potential for overfitting. I suggest adding more explanations of the mechanistic role of antioxidants in reducing cracking (e.g., effects on cell wall integrity, water uptake regulation). It is also possible to discuss why antioxidants protect against cracking (e.g. by stabilizing the cell wall structure) in more detail. Have whether similar relationships been observed in other fruit species or in the literature in general?
Response 5: As suggested by the reviewer, the effect of antioxidants on fruit cracking has been further confirmed with the inclusion of four recent studies (e.g., [60], [61], [62], [63])
Comments 6: Formatting and Graphics: Ensure that all figures (Figs. 2–6) and tables are legible and appropriately labeled. For example, Figs. 5 and 6 should have clear legends and units. Some graphics are very illegible (at least for me).
Response 6: Necessary adjustments have been made to improve the clarity of the figures.
Comments 7: There are occasional grammatical issues and awkward phrasing throughout the text that detract from its readability. Here some examples: “how consumer wants cherries”, "It has was revealed".
Response 7: The necessary revisions have been made in the Introduction section
Author Response File: Author Response.pdf
Round 2
Reviewer 2 Report
Comments and Suggestions for AuthorsThank you for taking my comments into account.
Now the article really does look better. The recipients will verify its value.