You are currently viewing a new version of our website. To view the old version click .
by
  • Weiqing Wu1 and
  • Liping Chen1,2,3,*

Reviewer 1: Anonymous Reviewer 2: Anonymous Reviewer 3: Anonymous

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors
  1. Lines 125-126: It is necessary for the author to explain why the cotton canopy at this growth stage was selected for filming. The model constructed from this stage's dataset corresponds to addressing the subsequent practical issues mentioned by the author in the introduction section (Lines 38-41).
  2. Lines 130-132: The dataset constructed solely from a single top-down perspective lacks representativeness. Additionally, how can the author ensure consistency in parameters such as height during each handheld shooting using a mobile phone.
  3. Lines 159-163: The cotton leaf mask obtained in this stage is used to distinguish the degree of leaf obstruction. How does the author achieve feature extraction in the case of leaf overlap?
  4. The statistical distribution of cotton leaves in each RGB channel has specificity. How do the authors solve the discriminative information loss caused by using CBAM for simple averaging.
  5. 3.1 should belong to the "Materials and Methods" section.
  6. Table 2: I suggest adding image segmentation metrics such as Dice coefficient, IoU per class, and Hausdorff distance for more accurate comparison and evaluation.
  7. The “Discussion” section needs major revisions. The existing content is a simple summary of the results section, lacking a discussion on the comparison and differences with other current research results.
  8. The structure of this paper does not fully conform to the standard pattern of general scientific papers. It is necessary to supplement the "Conclusion" section and clarify the shortcomings and potential solutions of this study.

Author Response

Dear Reviewer 1,

Thank you for your thorough review and insightful comments on our manuscript. We have carefully addressed each of your points and revised the paper accordingly. All changes made in the manuscript are highlighted in yellow for your convenience. Below is our point-by-point response.

Comment 1: Lines 125-126: It is necessary for the author to explain why the cotton canopy at this growth stage was selected for filming...
Response: Thank you for this suggestion. We have added the explicit rationale in Section 2.1.
Changes Made: The text now states that data collection was conducted from early June to mid-August 2024, targeting the mid-to-late growth stages when the cotton canopy exhibits maximum foliage density, overlap, and occlusion. Constructing and validating our model with this most challenging dataset is intended to ensure robustness for real-world field applications mentioned in the introduction.

Comment 2: Lines 130-132: The dataset constructed solely from a single top-down perspective lacks representativeness. Additionally, how can the author ensure consistency in parameters such as height during each handheld shooting...
Response: We appreciate your attention to methodological detail. We have clarified the procedure and acknowledged a limitation.
Changes Made:

  1. Perspective: We have clarified that the top-down perspective was chosen as the optimal and feasible viewpoint for capturing leaf morphology in this study, while acknowledging it as a defined scope condition.

  2. Consistency: We added that an approximately constant height was maintained by keeping the smartphone at arm's length during handheld capture. Furthermore, the data augmentation techniques (e.g., random scaling, rotation) employed in Section 2.2.3 help simulate variations in scale and viewpoint, enhancing model robustness.

  3. Comment 3: Lines 159-163: The cotton leaf mask obtained in this stage is used to distinguish the degree of leaf obstruction. How does the author achieve feature extraction in the case of leaf overlap?
    Response: This is a crucial question regarding the model's core capability. We have enhanced the explanation in the methodology section.
    Changes Made: While high-quality masks from Section 2.2.1 provide the supervisory signal, the core mechanism for disambiguating overlapping leaves lies in the model architecture. We have elaborated in the yellow-highlighted portion of Section 2.4.5 that the synergy between BiFPN and CBAM is key. BiFPN enriches multi-scale contextual features. Building on this, the spatial attention mechanism in CBAM dynamically weights the importance of each location in the feature map, compelling the model to focus on discriminative local regions such as leaf edges and texture discontinuities that indicate separations between overlapping leaves.

    Comment 4: The statistical distribution of cotton leaves in each RGB channel has specificity. How do the authors solve the discriminative information loss caused by using CBAM for simple averaging.
    Response: We thank you for this profound technical point, which touches on a key aspect of our design.
    Changes Made: We fully agree. In the yellow-highlighted text in Section 2.4.5, we have explicitly clarified and corrected the description of CBAM's mechanism. We now state that our CBAM module employs a dual-pooling strategy, utilizing both Global Average Pooling (GAP) and Global Max Pooling (GMP). While GAP integrates overall channel statistics (e.g., dominant hue), GMP captures the most salient localized features within each channel (e.g., leaf veins, disease spots). The fusion of outputs from both pathways ensures the resulting channel weights more comprehensively retain the discriminative information specific to cotton leaves across RGB channels, mitigating potential detail loss from relying on averaging alone.

    Comment 5: 3.1 should belong to the "Materials and Methods" section.
    Response: Accepted. We have restructured accordingly.
    Changes Made: The content from the original "3.1 Model Training Results" subsection has been moved to Chapter 2 and renamed as "2.5 Experimental Environment and Parameter Settings", aligning the paper's structure with standard conventions.

    Comment 6: Table 2: I suggest adding image segmentation metrics such as Dice coefficient, IoU per class, and Hausdorff distance for more accurate comparison and evaluation.
    Response: We thank you for this excellent suggestion, which enables a more comprehensive evaluation.
    Changes Made: We have added formal definitions for the Dice coefficient, Intersection over Union (IoU), and Hausdorff Distance (HD) in Section 2.6.2. Furthermore, a new Table 5 (and

    corresponding analysis in the results section) presents these pixel-level metrics for all compared models, providing a finer-grained performance assessment.

    Comment 7: The “Discussion” section needs major revisions... It is necessary to supplement the "Conclusion" section...
    Response: We sincerely appreciate your guidance on improving the paper's depth and structure. This has led to significant revisions.
    Changes Made:

    1. Discussion: We have completely rewritten Section 4 ("Discussion"). The new section contextualizes our findings within related work cited in the introduction, discusses the innovation and implications of our approach for segmenting dense, irregular leaves, and objectively outlines the model's limitations and the significance of cross-crop generalization.

    2. Conclusion: We have expanded the conclusion into a separate Section 5 ("Conclusion"). This section concisely summarizes the main contributions, clearly states the current limitations (e.g., performance boundaries in extreme occlusion, pending edge-device validation), and proposes specific directions for future work to address them.

      We believe these revisions have thoroughly addressed all your valuable comments. Thank you again for the time and expert insight you have dedicated to improving our manuscript.

      Sincerely,
      Liping Chen
      Corresponding Author
      December 24, 2025

Reviewer 2 Report

Comments and Suggestions for Authors

Dear Authors,

The topic is relevant, and the approach has potential applications; however, to strengthen the contribution and credibility of the claims regarding deployment, robustness, and field applicability, I suggest addressing the following observations.

The suggested observations are presented below:

  1. Moderate the claims about the general applicability of the model and/or present a plan for its validation in other crops and geographical locations. All data and images are from a single location (Xinjiang, China) and pertain exclusively to cotton plants. Claiming that the model is a solution for precision agriculture in general is a very optimistic extrapolation. It is unknown whether the method would work with other crops (such as soybeans in Brazil or corn in Iowa) or under different environmental, light, or soil conditions. Therefore, the scope of the conclusions should be proportionate to the evidence. Without this validation, the work is limited to segmenting cotton leaves in Xinjiang. 
  2. Conduct a more in-depth and detailed analysis of the model's shortcomings, especially in cases of severe occlusion. Despite the promising results, the manuscript acknowledges that the model still struggles with leaves of severe low occlusion (i.e., when they are heavily overlapped). While this weakness is recognized, it is not explored in depth. Simply stating that it "sometimes fails" is insufficient; it is necessary to specify which type of occlusion (e.g., coverage percentage or shadow hardness) causes problems. Being transparent about the technology's limitations does not weaken the work; rather, it strengthens it by providing the next research team with a clear starting point for future improvements.
  3. Consider presenting the performance results (speed and power consumption) on an edge computing platform, such as an Nvidia Jetson. The model was tested on a powerful lab GPU; this hardware is not representative of what would be used in the field, as a drone or agricultural robot operates with much more modest chips and limited batteries. The claim that it is a lightweight, deployable model remains purely theoretical until its efficiency is demonstrated on the hardware used in practice. The gap between the lab and the field is a crucial obstacle that many good AI ideas fail to overcome.

Author Response

Dear Reviewer 2,

Thank you for your thorough review and highly valuable observations regarding the practical application and credibility of our work. We agree that addressing these points significantly strengthens the manuscript. We have carefully considered and implemented revisions based on your suggestions, as detailed below. All corresponding changes in the manuscript are highlighted in green for your easy reference.

Comment 1: Moderate the claims about the general applicability of the model and/or present a plan for its validation in other crops and geographical locations...
Response: We sincerely thank you for this critical point. We agree that over-claiming generality without cross-validation is a common pitfall. To directly address this and provide concrete evidence of the model’s transfer potential, we have conducted a new cross-crop generalization experiment.
Changes Made:

  1. We have introduced a new Section 2.3 (“Cross-Crop Generalization Evaluation Dataset”) describing the independent ‘SoyCotton-Leafs’ public dataset (developed by the University of São Paulo), which contains mixed soybean and cotton leaves under varied conditions.

  2. A comprehensive zero-shot evaluation is presented in the new Section 3.6 (“Cross-Crop Generalization Analysis”), including quantitative results (Table 8) and visual examples (Figure 17-18).

  3. We have moderated the language throughout the manuscript (especially in the Abstract and Conclusion) to clarify that our primary contribution is a robust solution for cotton leaves, while the cross-crop experiment provides promising evidence of feature transferability to other dicots like soybean, setting a clear foundation for future multi-crop systems.

     

    Comment 2: Conduct a more in-depth and detailed analysis of the model's shortcomings, especially in cases of severe occlusion.
    Response: Thank you for this suggestion. We have substantially expanded the failure analysis to provide deeper insight.
    Changes Made: 
    The new Section 3.5 (“Analysis of Failure Cases and Limitations”) and Figure 16 are dedicated to this. We categorize and visualize failure modes occurring under specific challenging conditions prevalent in the field, namely occlusion by other leaves, drastic illumination changes, and surface coverage by dust. We then qualitatively link these failures to the loss of discriminative features or context when leaves are heavily obscured or their appearance is homogenized. The analysis explicitly defines the model’s current operational envelope and states that addressing these edge cases requires future work on structural reasoning.

    Comment 3: Consider presenting the performance results (speed and power consumption) on an edge computing platform, such as an Nvidia Jetson...
    Response: This is an exceptionally important and practical point. We completely agree that lab GPU performance is not representative of field deployment constraints.
    Changes Made: While performing comprehensive profiling on an edge device like Jetson was logistically unfeasible within the current revision cycle, we have taken significant steps to acknowledge this gap and outline a clear validation path:

    1. In the Discussion (Section 4), we now explicitly state that the reported high FPS is achieved on a research-grade GPU and acknowledge that validation on resource-constrained edge hardware remains a critical, pending step for real-world deployment.

    2. More importantly, in the Conclusion (Section 5), we have listed the hardware-specific validation as a primary item in our future work plan“Implement a full deployment workflow, including model conversion (e.g., to TensorRT), quantization, and rigorous performance/power profiling on platforms like NVIDIA Jetson.”
      We believe this honest framing strengthens the paper by clearly scoping the current contribution while providing a concrete and necessary direction for transitioning the technology from lab to field.Thank you again for your valuable feedback.

     

Sincerely,
Liping Chen
Corresponding Author
December 24, 2025

Reviewer 3 Report

Comments and Suggestions for Authors

The manuscript presents a valuable contribution to agricultural instance segmentation; however, several methodological and reporting gaps must be addressed.

  1. To substantiate the claim of superior instance segmentation performance, comparisons with at least one or two strong non-YOLO segmentation models should be included.
  2. Since SAM-assisted labeling is a major contribution, quantitative evidence is necessary. Without such evidence, the contribution remains speculative.
  3. Since the method targets real-world agriculture, testing on other crops (e.g., tomato, grape, wheat leaves) or external datasets is necessary to validate the model’s robustness.
  4. Complex margins and extreme occlusion are acknowledged as limitations, but no visual or quantitative analysis is provided for failure modes.
  5. The manuscript claims suitability for lightweight deployment but provides no actual measurements on edge platforms.
  6. A deeper analysis is necessary to understand how feature fusion and attention impact FPS under varying input resolutions and batch sizes.
  7. Although described as lightweight, the exact overhead of BiFPN and CBAM (params, FLOPs) is not reported separately.
  8. The dataset limitations were not fully addressed. The dataset consists of images from a single region, a fixed camera type, and a narrow time window. Its bias and generalization impact should be discussed more rigorously.
  9. Some segmentation colors overlap heavily, making results difficult to interpret (Figures 11 and 12). Using distinct boundary contours or label outlines would improve clarity.
  10. Mathematical formulations should be standardized and reorganized. Equations (2)–(11) mix different notational styles and lack explanation of variables (e.g., exact dimensions of feature maps). A consistent notation format is needed.
  11. A table summarizing contributions of each module (parameter changes, FPS impact) would help readability.
  12. The reference formatting and completeness require correction. The reference list must be polished to meet journal standards (e.g., Ref . 12).
Comments on the Quality of English Language

Improving the English would enhance clarity, increase readability, and better highlight the novelty of the method. Many sentences are complex, with nested clauses; meaning can be simplified without losing precision.

Author Response

Dear Reviewer 3,

Thank you for your comprehensive and constructive review. Your detailed observations have been invaluable in significantly improving the rigor, clarity, and presentation of our manuscript. We have carefully addressed each of your points, as detailed in the point-by-point response below. All corresponding changes in the revised manuscript are highlighted in green for your easy reference.

Comment 1: To substantiate the claim… comparisons with at least one or two strong non-YOLO segmentation models should be included.
Response: We agree that comparison with non-YOLO models provides a more comprehensive benchmark. As suggested, we have added a classical two-stage instance segmentation model to the comparison.
Changes Made: Mask R-CNN has been included as a baseline in the comparative experiments. Its performance metrics are now reported in the updated Table 4 in Section 3.2. The accompanying analysis discusses the efficiency-accuracy trade-off between one-stage and two-stage architectures.

Comment 2: Since SAM-assisted labeling is a major contribution, quantitative evidence is necessary.
Response: We thank you for highlighting this. We have conducted a controlled experiment to quantify the efficiency gain.
Changes Made: A new subsection, 2.2.2 “Annotation Efficiency Comparison: SAM-assisted vs. Manual”, has been added. It reports the average time saving of 73.26% in a new Table 1, and provides a visual quality comparison in the new Figure 4 and Figure 5.

Comment 3: …testing on other crops or external datasets is necessary to validate the model’s robustness.
Response: We fully agree. We have performed a new cross-crop evaluation.
Changes Made: We have introduced an independent, public ‘SoyCotton-Leafs’ dataset in a new Section 2.3. A comprehensive zero-shot evaluation is presented in the new Section 3.7, including quantitative results (Table 8) and visualizations (Figure 17-18).

Comment 4: Complex margins and extreme occlusion… no visual or quantitative analysis is provided for failure modes.
Response: We have substantially expanded the analysis of the model’s limitations.
Changes Made: A dedicated Section 3.6 (“Analysis of Failure Cases and Limitations”) and the new Figure 16 have been added. This section categorizes, visualizes, and qualitatively analyzes failure modes under severe occlusion and homogeneous clusters.

Comment 5: The manuscript claims suitability for lightweight deployment but provides no actual measurements on edge platforms.
Response: We fully acknowledge this crucial gap between lab and field deployment.
Changes Made: In the Discussion (Section 4) and Conclusion (Section 5), we now explicitly state that validation on edge hardware (e.g., NVIDIA Jetson) is a key limitation and a primary, concrete item for future work.

Comment 6: A deeper analysis is necessary to understand how feature fusion and attention impact FPS under varying input resolutions and batch sizes.
Response: We thank you for this suggestion, which prompted an empirical analysis that strengthens our discussion on deployment efficiency.
Changes Made: We have added a new Section 3.4 (“Efficiency Analysis under Different Operational Settings”) and Table X. It reports empirically measured FPS (e.g., 86.3 FPS at 640×640 with batch size=1) under various configurations, providing a realistic view of real-time performance and clarifying the distinction with theoretical throughput.

Comment 7: Although described as lightweight, the exact overhead of BiFPN and CBAM (params, FLOPs) is not reported separately.
Response: Thank you for pointing out this missing detail.
Changes Made: In the ablation study (Section 3.3), we have added a new Table 7 (“Stepwise analysis of incremental module contributions”). This table quantitatively reports the changes in parameters (ΔParams), computational cost (ΔGFLOPs), inference speed (ΔFPS), and performance (ΔmAP) introduced by each module.

Comment 8: The dataset limitations were not fully addressed…
Response: We have addressed the generalizability concern both methodologically and through new experiments.
Changes Made: As noted in response to Comment 3, the use of the external ‘SoyCotton-Leafs’ dataset for evaluation directly tackles this concern. We have also moderated claims about generality in the Abstract and Conclusion, framing cross-crop performance as “promising evidence” for future development.

Comment 9: Some segmentation colors overlap heavily… Using distinct boundary contours or label outlines would improve clarity.
Response: We agree and have updated the visualization for better interpretability.
Changes Made: In all result visualization figures (e.g., Figure 11, 12), the segmentation masks are now overlaid with distinct white boundaries to clearly separate adjacent instances.

Comment 10: Mathematical formulations should be standardized and reorganized…
Response: We have thoroughly reviewed and reformatted all mathematical expressions for consistency.
Changes Made: All equations in Section 2.4 have been checked and standardized to use a consistent notation format. Variable definitions have been clarified inline.

Comment 11: A table summarizing contributions of each module… would help readability.
Response: We agree. This table now complements the ablation study.
Changes Made: As mentioned in response to Comment 7, the new Table 7 serves precisely this purpose, providing a clear, consolidated summary of each module's impact.

Comment 12: The reference formatting and completeness require correction.
Response: We have meticulously corrected the entire reference list.
Changes Made: The reference list has been completely reformatted to strictly adhere to the journal’s provided template. Author names, journal abbreviations, and DOI formatting have been standardized.

We believe these revisions have thoroughly and positively addressed all your concerns. Thank you again for your time and expert feedback, which have been essential in enhancing the quality of our work.

Sincerely,
Liping Chen
Corresponding Author
December 24, 2025

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

The authors have already responded well to my comments.

Reviewer 3 Report

Comments and Suggestions for Authors

The authors have moderately replied to all my questions.