Optimization of Sorghum Spike Recognition Algorithm and Yield Estimation
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsSummary of the work
This paper describes a complete detection-tracking-estimation pipeline for spike detection in a natural field environment. The improvements in YOLOv8s architecture through the addition of a dual-branch GOLD feature pyramid and an LSKA attention mechanism resulting in a lighter model with higher accuracy. An enhanced version of DeepSort is also implemented for spike tracking in cases involving occlusion and motion, with the parameters optimized for real-time use. Yield estimation is done through a field-based average dry weight approach. The suggested approach delivers robust performance in detection (mAP = 85.86%, F1 = 81.19%) and tracking (MOTA = 67.96%, 42 FPS), with the accuracies for yield estimates going up to 96% in images and 93% in video.
Major comments
Lines 109-130: the dataset seems to be self-produced and not publicly accessible. This severely hinders reproducibility. Please explain whether the dataset (4,500 images post-augmentation) can be shared on request or deposited in a public repository. If not, provide unambiguous information on how comparable data can be obtained or simulated.
Lines 303, 329: the settings for n = 3, max_age = 40, and threshold = 0.46 are based on experimental results; however, the justification would be more convincing with additional statistical measures, such as standard deviations or significance tests of detection errors across different parameter configurations. For the threshold = 0.46, Figure 8 supports the choice by showing the difference between the actual and detected quantities at varying confidence levels, but similar evidence is lacking for the other parameters.
There are several spots with clumsy or wordy wording. A few include:
Line 12: “Severe occlusion among spikes significantly increase” should be “increases”
Line 157: “for experimentation” can be changed to “for this study” or “for model development”
Minor comments
Line 141: Error! Reference source not found. is due to a citation placeholder not being handled in the manuscript. Either remove this placeholder or correct it. Make sure all the figure and reference links are functional.
Line 131: "LambelImg" is a typo and must be corrected.
Equations 208-246 (Kalman filter equations): correct but could be improved with a corresponding diagram or algorithm pseudo-code to facilitate understanding, particularly for agricultural readers who may not be familiar with tracking theory.
Lines 496-498 (Figure 13): The drying protocol is described well, but the measurement instruments (e.g., model of oven or balance) are not specified. Please add more details about the equipment used to ensure method reproducibility.
Line 541-542 (Table 5): there is a discrepancy with “Picture number” and the fact that this is video-based yield estimation. “Segment number” might be a more correct term.
Line 576: this sentence is clear but can be reinforced by indicating in what ways this work is quantitatively different from current state-of-the-art approaches.
Author Response
For research article
Response to Reviewer 1 Comments
|
||
1. Summary |
|
|
Thank you very much for taking the time to review this manuscript. Please find the detailed responses below and the corresponding revisions/corrections highlighted/in track changes in the re-submitted files.
|
||
2. Questions for General Evaluation |
Reviewer’s Evaluation |
Response and Revisions |
Does the introduction provide sufficient background and include all relevant references? |
Can be improved |
Changes have been made in the manuscript |
Is the research design appropriate? |
Yes |
|
Are the methods adequately described? |
Can be improved |
Changes have been made in the manuscript |
Are the results clearly presented? |
Can be improved |
Changes have been made in the manuscript |
Are the conclusions supported by the results? |
Yes |
|
Are all figures and tables clear and well-presented? |
Can be improved |
Changes have been made in the manuscript |
3. Point-by-point response to Comments and Suggestions for Authors |
||
Comments 1: Lines 109-130: the dataset seems to be self-produced and not publicly accessible. This severely hinders reproducibility. Please explain whether the dataset (4,500 images post-augmentation) can be shared on request or deposited in a public repository. If not, provide unambiguous information on how comparable data can be obtained or simulated. |
||
Response 1: Thank you for your attention to our research and for your valuable suggestions. Regarding the dataset availability, the dataset used in this study was independently collected and constructed by our research team, consisting of a total of 4,500 images after augmentation. As this dataset is part of our ongoing research projects, it is currently not suitable for public release in an open repository. We fully recognize the importance of reproducibility in scientific research. Therefore, for researchers with a genuine academic need, we welcome direct contact via email. Upon reasonable request and with a clear description of the intended use, we will evaluate and may provide limited access to the dataset for academic purposes. Thank you again for your thoughtful review and understanding. |
||
Comments 2: Lines 303, 329: the settings for n = 3, max_age = 40, and threshold = 0.46 are based on experimental results; however, the justification would be more convincing with additional statistical measures, such as standard deviations or significance tests of detection errors across different parameter configurations. For the threshold = 0.46, Figure 8 supports the choice by showing the difference between the actual and detected quantities at varying confidence levels, but similar evidence is lacking for the other parameters. |
||
Response 2: Thank you for pointing this out. In response to your comment regarding the lack of statistical justification for the parameter settings, We have updated the original Figure 8 by adding error bars to illustrate the mean differences and standard deviations between actual and detected counts under different confidence thresholds. This provides a clearer visual representation of the rationale and stability behind the selected threshold value. For more information, see lines 407. |
||
Comments 3: Line 12: “Severe occlusion among spikes significantly increase”should be “increases”.Line 157: “for experimentation” can be changed to “for this study” or “for model development”. |
||
Response 3: Thank you for pointing this out. We have made the corrections based on your suggestions. For more information, see lines 12、216. |
||
Comments 4: Line 141: Error! Reference source not found.is due to a citation placeholder not being handled in the manuscript. Either remove this placeholder or correct it. Make sure all the figure and reference links are functional. Response 4: Thank you for pointing this out. We have reinserted the references. For more information, see lines 200. |
||
Comments 5: Line 131: "LambelImg" is a typo and must be corrected. Response 5: Thank you for pointing this out. We have made the corrections based on your suggestions. For more information, see lines 190. |
||
Comments 6: Equations 208-246(Kalman filter equations): correct but could be improved with a corresponding diagram or algorithm pseudo-code to facilitate understanding, particularly for agricultural readers who may not be familiar with tracking theory. Response 6: Thank you for pointing this out. I have added a corresponding flowchart in the manuscript to visually illustrate the “prediction–update” cycle, helping readers better understand the algorithm's logic. For more information, see lines 325 to 327. |
||
Comments 7: Lines 496-498(Figure 13): The drying protocol is described well, but the measurement instruments (e.g., model of oven or balance) are not specified. Please add more details about the equipment used to ensure method reproducibility. Response 7: Thank you for pointing this out. I have added the specific model of the drying oven used in the experiments to ensure the reproducibility of the method (HYHG-II-270 model, Hengzi brand, Shanghai Yuejin Medical Equipment Co., Ltd). For more information, see lines 495 to 499. |
||
Comments 8: Line 541-542 (Table 5): there is a discrepancy with “Picture number” and the fact that this is video-based yield estimation. “Segment number” might be a more correct term. Response 8: Thank you for pointing this out. We have made the corrections based on your suggestions. For more information, see lines 671. |
||
Comments 9: Line 576: this sentence is clear but can be reinforced by indicating in what ways this work is quantitatively different from current state-of-the-art approaches. Response 9: Thank you for pointing this out. We conducted a quantitative comparison between our proposed method and current mainstream approaches in the original text (see Table 3: Performance Comparison Table).The improved DeepSort algorithm (YOLOv8s-GOLD-LSKA + DeepSort) achieved a MOTA of 67.96% and a processing speed of 42 FPS. Compared to the most competitive baseline, YOLOv8 + DeepSort (63.62% MOTA, 28 FPS), our method improved MOTA by 4.34 percentage points and increased processing speed by approximately 50%. Additionally, the MOTA of our method significantly outperforms that of YOLOv5 + DeepSort (60.15%) and SSD + DeepSort (55.42%) by 7.81 and 12.54 percentage points, respectively, effectively addressing the challenges of object tracking under occlusion. For more information, see lines 725 to 730. |
||
4. Response to Comments on the Quality of English Language |
||
Point 1: The English is fine and does not require any improvement. |
||
Response 1: Thank you. |
||
|
||
|
Reviewer 2 Report
Comments and Suggestions for AuthorsIn the field of smart agriculture, integrating image data and deep learning models to develop sorghum spike recognition algorithms and yield estimation methods is methodologically feasible and has important practical significance for precision management of sorghum. This study employs an optimized YOLOv deep learning model to detect spikes and estimate yield. The overall research framework is reasonable, and the results support the study's conclusions. However, several issues remain and require further improvement to enhance the quality of the manuscript:
In the introduction, the authors state that "they still face numerous challenges in the practical application of sorghum spike detection." What are these specific challenges? A deeper analysis and discussion on the nature of these problems are needed.
The literature review primarily focuses on the application scenarios and model performance of previous methods, lacking a comprehensive review and summary of existing studies on sorghum spike detection. It is recommended to focus the introduction on existing research related to sorghum spike detection.
The term “complex agricultural environment” needs to be clearly defined. The word "complex" is too qualitative; a more quantitative description of the complexity and its causes is recommended.
In the methods section, the study directly lists the model parameters but lacks a description of the parameter optimization process. It is suggested that the authors provide a detailed explanation of how these parameters were optimized to facilitate reproducibility of the study.
The dataset used in this study was collected from August to October 2023. It is recommended to analyze the model's performance at different growth stages of sorghum.
The spatiotemporal performance of the proposed deep learning model for sorghum spike detection requires further analysis and validation.
The results section needs to be reorganized. Typically, the results section should report the findings of the study, without repeating data and methods descriptions. For example, Section 3.4.1 “Measurement of average dry weight of sorghum spikes” introduces data and should be moved to the data section.
Table 4 presents the performance of the yield estimation model but includes only five comparison samples, which is insufficient. The authors are encouraged to increase the number of yield data samples to improve the reliability and robustness of the conclusions.
The discussion section lacks in-depth analysis and interpretation of the study's results. For instance, there is no uncertainty analysis of the findings, nor a comparative discussion with existing studies.
Author Response
For research article
Response to Reviewer 2 Comments
|
||
1. Summary |
|
|
Thank you very much for taking the time to review this manuscript. Please find the detailed responses below and the corresponding revisions/corrections highlighted/in track changes in the re-submitted files.
|
||
2. Questions for General Evaluation |
Reviewer’s Evaluation |
Response and Revisions |
Does the introduction provide sufficient background and include all relevant references? |
Can be improved |
Changes have been made in the manuscript |
Is the research design appropriate? |
Can be improved |
Changes have been made in the manuscript |
Are the methods adequately described? |
Can be improved |
Changes have been made in the manuscript |
Are the results clearly presented? |
Can be improved |
Changes have been made in the manuscript |
Are the conclusions supported by the results? |
Can be improved |
Changes have been made in the manuscript |
Are all figures and tables clear and well-presented? |
Can be improved |
Changes have been made in the manuscript |
3. Point-by-point response to Comments and Suggestions for Authors |
||
Comments 1: In the introduction, the authors state that "they still face numerous challenges in the practical application of sorghum spike detection." What are these specific challenges? A deeper analysis and discussion on the nature of these problems are needed. |
||
Response 1: Thank you for pointing this out. We have further expanded and refined the introduction to detail key challenges in practical sorghum panicle detection:(1) Dense distribution under natural field conditions causes severe occlusion and overlap, resulting in ambiguous target boundaries that often lead to missed and false detections;(2) Significant color and morphological variations across growth stages and cultivars increase the difficulty of model generalization;(3) Background interference from visually similar leaf structures and complex lighting conditions compromises detection accuracy;(4) During video tracking, target displacement, occlusion, and rapid movement frequently trigger ID switches that disrupt counting continuity. For more information, see lines 51 to 109. |
||
Comments 2: The literature review primarily focuses on the application scenarios and model performance of previous methods, lacking a comprehensive review and summary of existing studies on sorghum spike detection. It is recommended to focus the introduction on existing research related to sorghum spike detection. |
||
Response 2: Thank you for pointing this out. Regarding research on sorghum panicle detection, We have supplemented the introduction section. Given the scarcity of publicly available studies specifically focused on sorghum panicle detection, I have incorporated relevant research on millet and rice—crops exhibiting high analogy in target characteristics, imaging perspectives, and occlusion complexity. The expanded review thereby focuses on the latest advances in addressing critical technical challenges including dense occlusion, small-target detection, and lightweight deployment. For more information, see lines 51 to 109. |
||
Comments 3: The term “complex agricultural environment” needs to be clearly defined. The word "complex" is too qualitative; a more quantitative description of the complexity and its causes is recommended. |
||
Response 3: Thank you for pointing this out. We have revised the original term "complex agricultural environment" to "natural in-field conditions" throughout the manuscript. This terminology more accurately reflects actual data acquisition scenarios as it encompasses uncontrolled variables such as varying illumination, occlusion, and background interference. The updated phrasing provides a more precise and objective characterization of our study's application context. For more information, see lines 76. |
||
Comments 4: In the methods section, the study directly lists the model parameters but lacks a description of the parameter optimization process. It is suggested that the authors provide a detailed explanation of how these parameters were optimized to facilitate reproducibility of the study. Response 4: Thank you for pointing this out. We have supplemented the manuscript with comprehensive model training procedures and hardware environment specifications, including: optimization method (SGD), training epochs (300), batch size (5), BN regularization, momentum (0.946), and weight decay coefficient (0.0005). For more information, see lines 219 to 230. |
||
Comments 5: The dataset used in this study was collected from August to October 2023. It is recommended to analyze the model's performance at different growth stages of sorghum. Response 5: Thank you for pointing this out. The primary objective of this study is to develop and validate models for sorghum yield estimation at maturity. Consequently, data collection and analysis focused on three critical maturation stages where yield formation and panicle characteristics exhibit the most significant color and morphological changes: the dough stage, physiological maturity, and harvest maturity. These stages were selected for two key reasons: First, they present the most pronounced observable changes in sorghum panicles, providing essential discriminative information for image-based models. Second, these phases directly determine final grain yield, representing the most relevant periods for yield estimation. Although data were collected from August to October 2023, model training and validation exclusively utilized data from these three maturation stages. While we acknowledge the value of analyzing model performance across the entire growth cycle, systematic exploration of this generalization capability will be addressed in future work. |
||
Comments 6: The spatiotemporal performance of the proposed deep learning model for sorghum spike detection requires further analysis and validation. Response 6: Thank you for pointing this out. This study primarily explores and validates the performance of the YOLOv8s deep learning model for detecting and tracking sorghum panicle targets under natural field conditions, with a focus on maturation stages. The maturation phase was selected because panicle characteristics exhibit maximal distinctiveness and stability during this period, facilitating robust feature learning and performance evaluation. To control external variables (e.g., soil type, planting density, management practices), experiments were conducted within a single experimental plot, minimizing environmental interference to better evaluate the model's inherent detection capabilities. Despite spatiotemporal limitations, results demonstrate that YOLOv8s achieves high detection accuracy and operational efficiency during sorghum maturation. We fully recognize that practical utility depends on model robustness across broader environmental conditions. Future work will expand data collection to encompass diverse geographical regions, soil types, and management practices, systematically covering critical growth stages including heading, grain filling, and maturation. Additionally, we will implement advanced data augmentation techniques, domain adaptation strategies, and robust model architectures to advance practical deployment of sorghum panicle detection algorithms in complex agricultural scenarios. |
||
Comments 7: The results section needs to be reorganized. Typically, the results section should report the findings of the study, without repeating data and methods descriptions. For example, Section 3.4.1 “Measurement of average dry weight of sorghum spikes” introduces data and should be moved to the data section. Response 7: Thank you for pointing this out. We have meticulously reviewed Section 3.4.1 and relocated the data acquisition and measurement methodology pertaining to "average dry weight measurement of sorghum panicles" to the preceding Data and Methods chapter. This adjustment eliminates functional overlap with the Results section while retaining essential analytical outcomes and data interpretation within the Results section, thereby enhancing structural clarity and ensuring compliance with academic conventions. For more information, see lines 458 to 504. |
||
Comments 8: Table 4 presents the performance of the yield estimation model but includes only five comparison samples, which is insufficient. The authors are encouraged to increase the number of yield data samples to improve the reliability and robustness of the conclusions. Response 8: Thank you for pointing this out. This study employed the "five-point sampling method" – a widely adopted technique in agricultural surveys – where five representative sampling points (the center and four corners) were systematically selected within each experimental plot for data collection. This approach provides balanced representativeness for crop yield estimation, enabling efficient and comparable yield data acquisition across target areas without substantially increasing labor or resource expenditures. We acknowledge the reviewer's constructive feedback regarding sample size and will implement expanded sampling coverage and increased sample volumes in subsequent research to enhance the robustness and generalizability of our findings. For more information, see lines 648. |
||
Comments 9: The discussion section lacks in-depth analysis and interpretation of the study's results. For instance, there is no uncertainty analysis of the findings, nor a comparative discussion with existing studies. Response 9: Thank you for pointing this out. This paper provides an in-depth analysis of the innovations and limitations of our findings, with key additions to the discussion: (1) Technical Comparisons: The proposed GOLD module achieves an 3.43 percentage-point higher mAP (85.86% vs. 82.43%) than Shi's cross-scale connection approach, attributed to its fusion of local geometric calibration and global Transformer-based semantic modeling. Model size (7.48 MB) represents only 52.3% of Xu's solution, owing to LSKA's optimized dilated convolution reducing computational redundancy. Our enhanced DeepSORT strategy reduces ID switch rates by 12.7% compared to Lou's CBAM spatial calibration. (2) Limitation Attribution: Video accuracy fluctuations (75%-93%) stem from perspective distortion inducing >60% inter-panicle overlap. Current models exhibit a 15.2±2.1% increase in missed detection rates for red-panicle cultivars. Future work will expand datasets with diverse cultivars to enhance model generalization capabilities. For more information, see lines 674to 696. |
||
4. Response to Comments on the Quality of English Language |
||
Point 1: The English could be improved to more clearly express the research. |
||
Response 1: Thank you, language improvements have been made. |
||
|
Reviewer 3 Report
Comments and Suggestions for Authors- There is a citation error in line 141: Error! Reference source not found.
- Figure 12 presents the comparison of test results before and after model improvement. However, the figure legend is not complete (the blue color is missing)…
- Is any of the formulas (15)-(18) mentioned in the text? Please explain the meaning of parameters such as MC and DW.
- Can you describe in more detail how yield has been estimated using images or videos of spikes? Has the area been considered only? Or the number of spikes on that area with the average measured weight has been considered too…
- One of the model applications is its deployment on UAVs and mobile terminals... Can you provide more information about the characteristics of these platforms in the field?
Author Response
For research article
Response to Reviewer 3 Comments
|
||
1. Summary |
|
|
Thank you very much for taking the time to review this manuscript. Please find the detailed responses below and the corresponding revisions/corrections highlighted/in track changes in the re-submitted files.
|
||
2. Questions for General Evaluation |
Reviewer’s Evaluation |
Response and Revisions |
Does the introduction provide sufficient background and include all relevant references? |
Can be improved |
Changes have been made in the manuscript |
Is the research design appropriate? |
Can be improved |
Changes have been made in the manuscript |
Are the methods adequately described? |
Must be improved |
Changes have been made in the manuscript |
Are the results clearly presented? |
Must be improved |
Changes have been made in the manuscript |
Are the conclusions supported by the results? |
Can be improved |
Changes have been made in the manuscript |
Are all figures and tables clear and well-presented? |
Must be improved |
Changes have been made in the manuscript |
3. Point-by-point response to Comments and Suggestions for Authors |
||
Comments 1: There is a citation error in line 141: Error! Reference source not found. |
||
Response 1: Thank you for pointing this out. We have reinserted the references. For more information, see lines 200. |
||
Comments 2: Figure 12 presents the comparison of test results before and after model improvement. However, the figure legend is not complete (the blue color is missing) … |
||
Response 2: Thank you for pointing this out. We have re-added the original image. For more information, see lines 590. |
||
Comments 3: Is any of the formulas (15)-(18) mentioned in the text? Please explain the meaning of parameters such as MC and DW. |
||
Response 3: Thank you for pointing this out. Section 3.4.1 of the manuscript clearly presents formulas (15)–(18), which are used to calculate the average fresh weight, dry weight, and moisture content of sorghum spikes, along with detailed explanations of the experimental procedures. In these formulas, FW refers to the Fresh Weight, i.e., the weight of the sample before drying; DW denotes the Dry Weight, which is the weight of the sample after being dried to a constant weight at 105°C; and MC stands for Moisture Content, representing the proportion of water in the fresh weight. For more information, see lines 468 to 479. |
||
Comments 4: Can you describe in more detail how yield has been estimated using images or videos of spikes? Has the area been considered only? Or the number of spikes on that area with the average measured weight has been considered too… Response 4: Thank you for pointing this out. Yield estimation is conducted in two steps:Spike count per unit area: The quantity of sorghum spikes within a 1 m² area (from static images) or a 1 m² mobile range (from videos) is detected by the model, with values averaged across samples. Field-scale yield calculation: Using a preset average dry weight per spike (determined from representative field samples of the same variety), yield per mu is calculated via the formula: Yield (kg/mu) = (Spikes per unit area × Dry weight per spike × 666.7) / 1000.For more information, see lines 484. |
||
Comments 5: One of the model applications is its deployment on UAVs and mobile terminals... Can you provide more information about the characteristics of these platforms in the field? Response 5: Thank you for pointing this out. We have incorporated real-world deployment cases of representative edge computing platforms, such as Jetson Nano and Jetson TX2, in the manuscript. This addition details their performance metrics including inference speed, model size, and resource utilization (e.g., GPU load), while examining their adaptability to technical challenges like dense occlusion and small object detection tasks. This comprehensive validation demonstrates the deployment feasibility and practical applicability of our model. For more information, see lines 51 to 109. |
||
4. Response to Comments on the Quality of English Language |
||
Point 1: The English is fine and does not require any improvement. |
||
Response 1: Thank you. |
||
|
Round 2
Reviewer 2 Report
Comments and Suggestions for AuthorsThanks for the revision!
Author Response
Thank you for your guidance and evaluation to make our work more perfect.
Reviewer 3 Report
Comments and Suggestions for AuthorsPlease check the citations. Something is seriously wrong...
Author Response
Thank you for your guidance and evaluation to make our work more perfect. In the previously submitted version, there may be some errors in the references due to the error of the reference plug-in. We have re inserted the references one by one.