A Strawberry Ripeness Detection Method Based on Improved YOLOv8
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsImprove Figure 3, 5 quality.
I found English grammar mistakes.
Adjust all equations style.
If equations developed by others, include reference and also explain how they work.
Based on Figure 10, provide quantitative comparative results.
Based on Figure 10, provide comparative detection accuracy results of your model with other models.
Author Response
Comments 1: Improve Figure 3, 5 quality. |
Response 1: Thank you for pointing this out. I agree with this comment. Therefore, I have improved Figure 3, 5 quality. [In the initial version, we inserted screenshots of the finalized diagrams into the manuscript. Based on your valuable feedback, we have re-exported the diagrams in high-quality PNG format and updated the manuscript accordingly. The revised figures have been placed as Figure 4 and Figure 8. The changes can be found on page [4], paragraph 2.3.1, line 145 for the updated Figure 4, and page [7], paragraph [2.3.2], line 229 for Figure 8.]
|
Comments 2: [ I found English grammar mistakes.] |
Response 2: Thank you for your valuable feedback. Since you didn't specify the exact grammatical errors, I have taken the opportunity to revise and refine the entire manuscript. Comments 3: Adjust all equations style. If equations developed by others, include reference and also explain how they work. Response 3: Thank you for your valuable feedback. I have revised the formatting of all the equations, provided explanations of their principles, and included the relevant references. You can review these changes on 2.3.3 (Wise-IoU Loss Function) Comments 4: Based on Figure 10, provide quantitative comparative results. Response 4: Thank you for your valuable feedback. Before running the experiments, I performed a standardized quantitative treatment as follows. The configuration of the visualisation settings is as follows: the method is set to GradCAMPlusPlus, the layer to [12], and the backward_type to all, with a confi-dence threshold of 0.2. In order to maintain consistency, both the bounding box display (show_box = False) and the renormalisation (renormalize = False) functions have been disabled. Comments 5: Based on Figure 10, provide comparative detection accuracy results of your model with other models. Response 5: Thank you for your valuable feedback. I have conducted heatmap experiments for all the models used in the comparative analysis. The latest improvements can be found in line 479 of the manuscript, as shown in 3.3 (Heatmap Visualization) . |
Author Response File: Author Response.docx
Reviewer 2 Report
Comments and Suggestions for AuthorsThis paper introduces YOLOv8-CDW, which integrates Channel Attention, DySample upsampling, and Wise-IoU loss into YOLOv8 to detect strawberry ripeness, achieving 96.9 % precision and 93.6 % recall—roughly +10 % and +9 % over the original—at real-time speeds.
Define each acronym on first use. For example PLS as “Partial Least Squares”
Line 256, Model performance is evaluated using metrics including mean average precision (mAP), computational complexity, and inference speed.
For a more precise discussion of speed, if you claim “real-time,” specify the actual frame rate or latency (e.g., “≈45 fps on an RTX 3090”). You should also compare this figure directly with the inference speed of the original YOLOv8.
The paper shows 3 figures (figure 2, figure 4, figure 6) with YOLO models that contain a large number of elements, it is not easy to see the differences, the authors should clearly highlight the differences on these figures, the authors can also add a table with three columns where the differences of these three models will be listed
Line 266, Table 1, Best results should be Bold to be more obvious
Line 293, Table 2, Best results should be Bold to be more obvious
In conclusion, Highlight the key contributions by stating that the CA module boosts feature discriminability, DySample preserves fine object details during upsampling, and Wise-IoU dynamically balances hard and easy samples.
Author Response
Comments 1: Define each acronym on first use. For example PLS as “Partial Least Squares” |
||||||||||||||||||||||||||||||||||||||||||
Response 1: Thank you for pointing this out. I have addressed the requirement to define each acronym upon first use throughout the entire paper. For example, PLS is defined as 'Partial Least Squares.' You can refer to my paper in the attached file. |
||||||||||||||||||||||||||||||||||||||||||
Comments 2: Line 256, Model performance is evaluated using metrics including mean average precision (mAP), computational complexity, and inference speed. |
||||||||||||||||||||||||||||||||||||||||||
Response 2: Thank you for pointing this out. We have already used mAP as an evaluation metric. Additionally, we conducted a comprehensive evaluation of both the original YOLOv8 and the improved YOLOv8, which is why we did not include metrics such as computational complexity and inference speed. The modifications are discussed in section 3.1, from lines 361 to 446. |
||||||||||||||||||||||||||||||||||||||||||
Comments 3: For a more precise discussion of speed, if you claim “real-time”, specify the actual frame rate or latency (e.g., “≈45 fps on an RTX 3090”). You should also compare this figure directly with the inference speed of the original YOLOv8. |
||||||||||||||||||||||||||||||||||||||||||
Response 3: Thank you for pointing this out. I apologize if I haven’t fully understood your point, but I have conducted numerous comparison experiments between YOLOv8 and YOLOv8-CDW. The modifications are discussed in section 3.1, from lines 361 to 446. |
||||||||||||||||||||||||||||||||||||||||||
Comments 4:The paper shows 3 figures (figure 2, figure 4, figure 6) with YOLO models that contain a large number of elements, it is not easy to see the differences, the authors should clearly highlight the differences on these figures, the authors can also add a table with three columns where the differences of these three models will be listed. |
||||||||||||||||||||||||||||||||||||||||||
Response 4:Thank you for pointing this out. I have made changes based on your suggestions. After the modifications, Figure 2 now corresponds to Figure 3, which shows the original YOLOv8 model. Figure 4 corresponds to Figure 7, where CA has been bolded. Figure 6 corresponds to Figure 9, where a different bolding method for CA is used, making it easier for you to clearly see the improvements. |
||||||||||||||||||||||||||||||||||||||||||
Comments 5:Line 266, Table 1, Best results should be Bold to be more obvious |
||||||||||||||||||||||||||||||||||||||||||
Response 5:Thank you for pointing this out. I have made changes based on your suggestions.
|
||||||||||||||||||||||||||||||||||||||||||
Comments 6:Line 293, Table 2, Best results should be Bold to be more obvious. |
||||||||||||||||||||||||||||||||||||||||||
Response 6:Thank you for pointing this out. I have made changes based on your suggestions.
|
Author Response File: Author Response.docx
Reviewer 3 Report
Comments and Suggestions for AuthorsGeneral Comments: The subject addressed in this article, "A Strawberry Ripeness Detection Method Based on Improved YOLOv8," is worthy of investigation. The authors propose improving the YOLOv8 algorithm to automatically detect strawberry ripeness. Remarkably, two new processing blocks are proposed to be added to the YOLOv8 general architecture: channel attention to address the issue of feature redundancy during the extraction of ripe strawberry image features, and DySample, an alternative upsampling method.
Strengths:
- A good revision of the state-of-the-art was done.
- A practical dataset was created to study the problem of strawberry ripeness detection by computer vision.
- A good comparison with other deep neural network models was achieved.
Weaknesses:
- The authors must contextualize their findings by discussing how the results align with or diverge from previous studies and the initial working hypotheses, which, in general terms, motivate modifying the YOLOv8 method. Quantitative experimental results are similar between all compared models, so why is adding the Channel Attention and DySample steps essential?
- The dataset must include ground truth information. Additionally, more images should be provided in the experimental section to better showcase the contribution.
- Sections 2.3.1 and 2.3.2 could include Channel Attention and DySample examples to better describe the proposal.
- A thorough analysis of the proposed method's computational complexity is necessary for a meaningful comparison with the YOLOv8 approach. The comparison of mAP_0.5 and loss is too constrained. Figures 7, 8, and 9 illustrate that the proposed method is similar to the original YOLOv8. Thus, a quantitative and qualitative comparison is recommended.
- A general typographical review is necessary. The abstract section lacks some vertical spaces after a complete stop.
Comments for author File: Comments.pdf
Author Response
Comments 1: The authors must contextualize their findings by discussing how the results align with or diverge from previous studies and the initial working hypotheses, which, in general terms, motivate modifying the YOLOv8 method. Quantitative experimental results are similar between all compared models, so why is adding the Channel Attention and DySample steps essential? |
Response 1: Thank you for pointing this out. Despite the fact that the overall quantitative metrics of all of the models under comparison demonstrate analogous trends, the integration of CA and DySample modules results in substantial enhancements in classification precision and background suppression. The present findings are consistent with earlier studies suggesting that attention mechanisms have the capacity to augment feature representation by accentuating salient channels and mitigating inter-class confusion. CA recalibrates feature responses for each channel in order to facilitate the network's ability to focus on discriminative information that is relevant to target categories. DySample is a dynamic sampling strategy that adjusts sample selection based on data distribution. This helps the model learn from difficult examples and improves its generalisation ability. |
Comments 2: The dataset must include ground truth information. Additionally, more images should be provided in the experimental section to better showcase the contribution. |
Response 2: Thank you for pointing this out. I have made revisions based on your suggestions. The location of the surface reality information in the dataset corresponds to line 71, and the additional experimental images are discussed in sections 3.1 (Comparative Experiment) and 3.2 (Ablation Experiment). |
Comments 3: Sections 2.3.1 and 2.3.2 could include Channel Attention and DySample examples to better describe the proposal. |
Response 3: Thank you for pointing this out. The newly added sections can be found in section 2.3.1 (CA example images) and section 2.3.2 (Dysample example data). |
Comments 3: A thorough analysis of the proposed method's computational complexity is necessary for a meaningful comparison with the YOLOv8 approach. The comparison of mAP_0.5 and loss is too constrained. Figures 7, 8, and 9 illustrate that the proposed method is similar to the original YOLOv8. Thus, a quantitative and qualitative comparison is recommended. |
Response 3: Thank you for pointing this out. Changes have been made based on your suggestions, addressing the previous errors. The new experimental comparison images can be found in sections 3.1 (Comparative Experiment) and 3.2 (Ablation Experiment). |
Comments 4: A general typographical review is necessary. The abstract section lacks some vertical spaces after a complete stop. |
Response 4: Thank you for pointing this out. I have made revisions based on your suggestions. |
Author Response File: Author Response.docx
Round 2
Reviewer 1 Report
Comments and Suggestions for AuthorsWell edited and explained
Reviewer 3 Report
Comments and Suggestions for AuthorsGeneral Comments: The authors considered all suggestions made by the reviewers. Notably, the manuscript was improved in the abstract, image collection, description of the proposed model, description of the metrics to evaluate the method's performance, and the experimental results section. Considering the above, I strongly recommend accepting the paper as it appears now.