Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Open AccessArticle

Peer-Review Record

An Efficient Group Convolution and Feature Fusion Method for Weed Detection

Agriculture 2025, 15(1), 37; https://doi.org/10.3390/agriculture15010037

by Chaowen Chen^1,2, Ying Zang^1,2,3,4, Jinkang Jiao^1,2, Daoqing Yan^1,2, Zhuorong Fan^1,2, Zijian Cui^1,2 and Minghua Zhang^1,2,3,4,*

Reviewer 1: Anonymous

Reviewer 2: Anonymous

Reviewer 3: Anonymous

Reviewer 4: Anonymous

Agriculture 2025, 15(1), 37; https://doi.org/10.3390/agriculture15010037

Submission received: 28 November 2024 / Revised: 19 December 2024 / Accepted: 24 December 2024 / Published: 27 December 2024

(This article belongs to the Special Issue Intelligent Agricultural Machinery Design for Smart Farming)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

The manuscript proposed a YOLOv8-EGC-Fusion (YEF) model for vegetable weed detection. But there are some questions as following: (1) what kind of data (image, or hyperspectral image) did the authors building or using, we can’t obtain form Abstract. (2) The title is “weed detection”, but there are no experimental results(subjective and objective evaluation) about the images including the vegetable and weed, or can the method detect the weeds from vegetable samples. (3) How can the method realize plug-and-play?

Additional comments:

1. The vegetable and Weed (in Line 107-110)should be provided systematic name.

2. Adaptive Feature Fusion (AFF) has been presented in line 305, 319 and 494.

3. What do P3,P4,P5 in Figure 10 and N3,N4,N5 in Figure 11 mean?

4. The samples in Table 10 didn’t discribed in the section 2.1.1.

5. The paper should provide the reference about the methods for comparison (Faster R-CNN, TOOD-R50, RTMDet-Tiny, and RetinaNet).

6. The methods selected for comparison are not latest methods, such as, Faster R-CNN(2015), TOOD-R50(2021), RTMDet-Tiny(2022), and RetinaNet(2018).

7. “Although YOLOv9 and YOLOv10 have seen further improvements, they have yet to be widely validated and are less applied [21-22]”. why the manuscript didn’t compare with YOLOv10.

8. Some symbol is missed in Line 242 .

9. Line 320-323 “three GCAA-Fusion modules are incorporated before each detection head. The inputs to the three GCAA-Fusion modules are sourced from the 4th and 15th layers, the 6th and 18th layers, and the 9th and 21st layers”, how can we get the information from Figure 11?

10. YEF provide significant performance enhancements, but it can see in Table 4 Results of the YOLOv8 with varying numbers of GCAA-Fusion modules. What are Fusion-1 and Fusion-2?

11. What are MLCA, CBAM, SE, and EMA?

Author Response

Response to the comments of Reviewer #1

Comment No.1: what kind of data (image, or hyperspectral image) did the authors building or using, we can’t obtain form Abstract.

Response: Thanks to the expert's valuable feedback, it has been revised in lines 29-30.

Comment No.2: The title is “weed detection”, but there are no experimental results(subjective and objective evaluation) about the images including the vegetable and weed, or can the method detect the weeds from vegetable samples.

Response: Thank you for underlining this deficiency. Thank you for your insightful comment. The method can indeed detect weeds in vegetable samples. The experimental results addressing this have been added in Section 3.4.

Comment No.3: How can the method realize plug-and-play?

Response: Thank you for your question. The plug-and-play capability of the method is achieved through its modular design, which allows individual components, such as the GCAA-Fusion modules or attention mechanisms, to be easily added or replaced without affecting the overall system. This modular approach ensures flexibility in adapting the model to different tasks or datasets. Additionally, the method can be simply integrated into different models, including YOLOv5, making it easy to experiment with various configurations and enhance performance.

Comment No.4: The vegetable and Weed (in Line 107-110) should be provided systematic name.

Response: Thank you for pointing out this deficiency. The necessary revisions have been made in lines 123-127 to include the systematic names of the vegetable and weed species.

Comment No.5: Adaptive Feature Fusion (AFF) has been presented in line 305, 319 and 494.

Response: Thank you for your suggestion, it has been revised.

Comment No.6: What do P3,P4,P5 in Figure 10 and N3,N4,N5 in Figure 11 mean?

Response: Thank you for your comment. The annotations for P3, P4, P5 in Figure 10 and N3, N4, N5 in Figure 11 (now updated to Table 11 and Table 12) have been updated for clarity. These updates can be found in lines 355-356 and 367-369.

Comment No.7: The samples in Table 10 didn’t discribed in the section 2.1.1.

Response: Thank you for your comment. The data in Table 10 (now updated to Table 11) are results from tests conducted on a public dataset, not from our own collected dataset. The explanation for this is provided in line 524 of the manuscript.

Comment No.8: The paper should provide the reference about the methods for comparison (Faster R-CNN, TOOD-R50, RTMDet-Tiny, and RetinaNet).

Response: Thank you for your suggestion, it has been revised. The updates can be found in line 505.

Comment No.9: The methods selected for comparison are not latest methods, such as, Faster R-CNN(2015), TOOD-R50(2021), RTMDet-Tiny(2022), and RetinaNet(2018).

Response: Thank you for your valuable suggestion! We have incorporated new comparison experiments using DINO and YOLOv10 models. The earlier models were chosen to ensure a comprehensive evaluation: the classic two-stage Faster R-CNN model and single-stage RetinaNet model represent foundational baselines. Additionally, we included comparisons with the end-to-end DINO model, the lightweight RTMDet-Tiny model, and the large-scale TOOD-R50 model to cover a diverse range of architectures. The updates can be found in lines 502-505.

Comment No.10: “Although YOLOv9 and YOLOv10 have seen further improvements, they have yet to be widely validated and are less applied [21-22]”. why the manuscript didn’t compare with YOLOv10.

Response: Thanks to the expert's opinion, it has been revised. We have added new comparative experiments YOLOv10 models.

Comment No.11: Some symbol is missed in Line 242 .

Response: Thank you for your comment. The missing symbol has been added in Line 259.

Comment No.12: Line 320-323 “three GCAA-Fusion modules are incorporated before each detection head. The inputs to the three GCAA-Fusion modules are sourced from the 4th and 15th layers, the 6th and 18th layers, and the 9th and 21st layers”, how can we get the information from Figure 11?

Response: Thank you for your comment. The corresponding explanation has been added in the text to clarify how this information can be interpreted from Figure 11 (now updated to Table 12). The updates can be found in lines 359-363.

Comment No.13: YEF provide significant performance enhancements, but it can see in Table 4 Results of the YOLOv8 with varying numbers of GCAA-Fusion modules. What are Fusion-1 and Fusion-2?

Response: Thanks to the expert's opinion, it has been revised. Detailed explanations about Fusion-1 and Fusion-2 have been added in Lines 402-403 to clarify their meanings.

Comment No.14: What are MLCA, CBAM, SE, and EMA?

Response: Thanks to the expert's opinion, it has been revised. MLCA, CBAM, SE, and EMA are attention mechanisms employed to test the effectiveness of high-level and low-level feature fusion in the study. Detailed explanations regarding these mechanisms have been added in lines 412–415 of the revised manuscript.

Reviewer 2 Report

Comments and Suggestions for Authors

Comments and Suggestions for Authors：

The manuscript presents an innovative model, YOLOv8-EGC-Fusion, designed to enhance feature extraction and improve detection accuracy for vegetables and weeds. While the study demonstrates promising results, several aspects need further elaboration and clarification:

1. The current research results are limited to the detection of single targets (vegetables or weeds). In practical applications, images may contain multiple plant types. It is recommended to supplement detection experimental results involving multiple plant types in the same environment

2. Regarding feature fusion modules (e.g., MLCA, CBAM, SE, and EMA), the authors are encouraged to include relevant literature references.

3. The rationale behind the selection of data augmentation techniques (including brightness adjustment, rotation, color variation, and sharpness enhancement) requires further elaboration. Please explain how these methods contribute to enhancing the model's generalization capability.

4. Provide a detailed description of the dataset types used, including specific information on the validation datasets for cotton weeds and sesame weeds.

5. The specific differences between the models listed in Table 4 (Fusion-1 and Fusion-2) are not clearly explained. Supplementary information and a detailed explanation are recommended.

6. The language in the manuscript requires further refinement to ensure clarity and professionalism

7. line 105, Why is the phone 90 cm away from the object.

Author Response

The following is a point-to-point response to the reviewer’s comments.

Response to the comments of Reviewer #2

Comment No.1: The current research results are limited to the detection of single targets (vegetables or weeds). In practical applications, images may contain multiple plant types. It is recommended to supplement detection experimental results involving multiple plant types in the same environment.

Response: Thank you for underlining this deficiency. Thank you for your insightful comment. The method can indeed detect weeds in vegetable samples. We have added experimental results, in Section 3.4 to address this.

Comment No.2: Regarding feature fusion modules (e.g., MLCA, CBAM, SE, and EMA), the authors are encouraged to include relevant literature references.

Response: Thank you for underlining this deficiency, it has been revised. We have now incorporated the relevant literature references in line 414.

Comment No.3: The rationale behind the selection of data augmentation techniques (including brightness adjustment, rotation, color variation, and sharpness enhancement) requires further elaboration. Please explain how these methods contribute to enhancing the model's generalization capability.

Response: Thank you for your valuable comment. We have now elaborated on the rationale behind the selection of data augmentation techniques, including brightness adjustment, rotation, color variation, and sharpness enhancement. These techniques help improve the model's generalization capability by introducing variations in the input data, thereby preventing overfitting and enabling the model to better handle diverse real-world scenarios. The detailed explanation has been added in lines 137-138.

Comment No.4: Provide a detailed description of the dataset types used, including specific information on the validation datasets for cotton weeds and sesame weeds.

Response: Thank you for pointing out this deficiency. We have now provided a detailed description of the dataset types used, including specific information on the validation datasets for cotton weeds and sesame weeds. The relevant details have been added in lines 527 and 537.

Comment No.5: The specific differences between the models listed in Table 4 (Fusion-1 and Fusion-2) are not clearly explained. Supplementary information and a detailed explanation are recommended.

Response: Thank you for your suggestion, it has been revised in lines 402-404.

Comment No.6: The language in the manuscript requires further refinement to ensure clarity and professionalism.

Response: Thank you for your suggestion, it has been revised.

Comment No.7: line 105, Why is the phone 90 cm away from the object.

Response: Thank you for your suggestion, it has been revised. The phone is placed 90 cm away from the object to simulate the working height of a weed control robot. This is mentioned in line 120.

Reviewer 3 Report

Comments and Suggestions for Authors

The authors present an adaptation of YOLOv8 to detect weeds. The manuscript is relevant, but I have some suggestions for improvement:

- I suggest improving the introduction to make the reason for testing YOLOv8 clearer. When other studies are mentioned, I suggest showing the accuracy obtained and why the previous versions mentioned were not sufficient for weeds. Do the cited authors suggest new tests?

- In all Figures, I suggest making the caption more explanatory and not as direct as it is. For example, in Figures that have a) and b)... it is also important to explain what they mean in the caption and not only in the text when the Figure is mentioned;

- In the materials and methods at the beginning, I suggest including a map of the location of where the data was collected;

- Considering that the paper used YOLOv8n (nano), I believe it is interesting to mention that it means “nano”, because during the reading, people who have never worked with YOLO will not understand what it means;

- I suggest improving the discussion, because the results and discussion section has almost no discussion, only the presentation of the results. There are only 3 references in the section. After showing the results, it is interesting to make clear how these results contribute to the area and whether other studies also show similar characteristics. If not, what do the studies say differently and how do these results contribute to what was previously studied? It is necessary to have a good discussion of the results.

Author Response

The following is a point-to-point response to the reviewer’s comments.

Response to the comments of Reviewer #3

The authors present an adaptation of YOLOv8 to detect weeds. The manuscript is relevant, but I have some suggestions for improvement:

Comment No.1: I suggest improving the introduction to make the reason for testing YOLOv8 clearer. When other studies are mentioned, I suggest showing the accuracy obtained and why the previous versions mentioned were not sufficient for weeds. Do the cited authors suggest new tests?

Response: Thank you for highlighting this deficiency. The introduction has been revised to more clearly explain the rationale for testing YOLOv8. In the section discussing previous versions, additional details regarding the performance of earlier models have been included, along with an explanation of their limitations and the reasons for selecting YOLOv8. The revisions can be found in lines 71-98.

Comment No.2: In all Figures, I suggest making the caption more explanatory and not as direct as it is. For example, in Figures that have a) and b)... it is also important to explain what they mean in the caption and not only in the text when the Figure is mentioned;

Response: Thank you for your insightful comment. The figure captions have been revised to provide more detailed explanations, particularly for Figures 2, 7, 11, and 12.

Comment No.3: In the materials and methods at the beginning, I suggest including a map of the location of where the data was collected;

Response: Thank you for highlighting this deficiency. Thank you for noticing this detail. A new map has been created and is shown in Figure 1.

Comment No.4: Considering that the paper used YOLOv8n (nano), I believe it is interesting to mention that it means “nano”, because during the reading, people who have never worked with YOLO will not understand what it means;

Response: Thank you for your valuable suggestion. The explanation regarding the choice of "nano" has been added in lines 158-160.

Comment No.5: - I suggest improving the discussion, because the results and discussion section has almost no discussion, only the presentation of the results. There are only 3 references in the section. After showing the results, it is interesting to make clear how these results contribute to the area and whether other studies also show similar characteristics. If not, what do the studies say differently and how do these results contribute to what was previously studied? It is necessary to have a good discussion of the results.

Response: Thank you for your valuable suggestion. The paper has been revised accordingly. We have added a discussion after each experiment to provide a clearer interpretation of the results. The updated discussion sections can be found in red text in Chapter 3.

Reviewer 4 Report

Comments and Suggestions for Authors

In this work, the authors focus on the problem of weed detection and propose to tackle it by using a YOLOv8-EGC-Fusion (YEF) model (which they describe as an improvement of the YOLOv8 model). The work provides a novel method of analysis depicted as a “modular plug & play”, and the results of the proposed model indicate an improved detection accuracy for vegetable and weed identification, providing a precise detection tool. Some details are missing, especially in locating the research as an important contribution to the area and an improved discussion on how better the model is when comparing the output results with similar algorithms.

Some particular issues that need attention are:

INTRODUCTION.

1. Please mention at least other significant methods for weed detection, such as hyperspectral imaging analysis or other related remote perception techniques.

2. Add information detailing: What new information has been generated with this work? Why is this a novel technique and not another application of well-known algorithms?

MATERIALS AND METHODS

3. Section 2.2.3. Could you provide quantitative information on how much loss of information occurs due to the resolution reduction? Perform a better review of previous methods and algorithms dealing with resolution enhancement so the reader can understand different approaches to this issue.

RESULTS AND DISCUSSION

4. Figure 12. Improve the resolution of the figure.

5. Table 5. When comparing the different feature fusion modules, what are the uncertainties? Most models would be equivalent if there is an uncertainty in the reported values of about 1%. How significant is then “a 1% improvement over the original model“? Please discuss this in detail.

6. Figure 14. Improve the resolution of the figure.

CONCLUSIONS.

7. I’m unsure that the model is “validated” since only a limited number of application examples were presented in this work. A “validation process” requires more than just seeing if the model reproduces some application examples; it requires a more detailed testing process with larger controlled sets. Please replace " validation " with “successfully tested” or similar wording.

Comments on the Quality of English Language

Perform a general check of typos and misspellings throughout the text. Even in the title, there are errors.

Author Response

The following is a point-to-point response to the reviewer’s comments.

Response to the comments of Reviewer #4

Comment No.1: Please mention at least other significant methods for weed detection, such as hyperspectral imaging analysis or other related remote perception techniques.

Response: Thanks to the expert's valuable feedback, a description of other significant methods was added in lines 57-60.

Comment No.2: Add information detailing: What new information has been generated with this work? Why is this a novel technique and not another application of well-known algorithms?

Response: Thank you for the expert's valuable feedback on the introduction. The introduction has been revised according to the suggestions. The limitations of previous techniques have been highlighted through comparisons, and the rationale for designing the new method has been added. Specifically, it is emphasized that the algorithm is designed for weed detection in the complex environments of vegetable fields. Additionally, a brief overview of the contributions of this work has been included, such as the EGC module, GCAA-Fusion, and other key aspects of the approach. The revised content can be found in lines 83-111

Comment No.3: Section 2.2.3. Could you provide quantitative information on how much loss of information occurs due to the resolution reduction? Perform a better review of previous methods and algorithms dealing with resolution enhancement so the reader can understand different approaches to this issue.

Response: Thank you for your valuable suggestion. A new comparative experiment has been designed to measure the extent of information loss due to resolution reduction. Two relevant metrics, PSNR and SSIM, have been introduced to quantify this loss. The experiment demonstrates this point, and the details are provided in lines 289-310.

Comment No.4: Figure 12. Improve the resolution of the figure.

Response: Thank you for the expert's opinion. The resolution of the figure has been improved, and the updated image is now Figure 13.

Comment No.5: Table 5. When comparing the different feature fusion modules, what are the uncertainties? Most models would be equivalent if there is an uncertainty in the reported values of about 1%. How significant is then “a 1% improvement over the original model“? Please discuss this in detail.

Response: Thanks to the expert's valuable opinion, it has been revised. This has been addressed in detail in Section 3.1. The different attention mechanisms in the feature fusion modules focus on distinct aspects, and their results should not be interpreted solely in terms of a single metric. For example, CBAM shows the best performance in Precision, while SE excels in Recall. The advantage of each model in different metrics allows them to adapt to various recognition tasks. The proposed GCAA-Fusion module demonstrates the best overall performance for the weed detection task in this paper. These results suggest that the feature fusion module we introduced is effective, and replacing the attention mechanisms within the fusion module leads to improvements in the model's performance. The detailed discussion can be found in lines 412-428.

Comment No.6: Figure 14. Improve the resolution of the figure.

Response: Thank you for the expert's opinion. The resolution of the figure has been improved, and the updated image is now Figure 15.

Comment No.7: I’m unsure that the model is “validated” since only a limited number of application examples were presented in this work. A “validation process” requires more than just seeing if the model reproduces some application examples; it requires a more detailed testing process with larger controlled sets. Please replace " validation " with “successfully tested” or similar wording.

Response: Thanks to the expert's valuable opinion, it has been revised. The term "validation" has been replaced with "successfully tested" or similar wording in the discussion to better reflect the testing process.

Comment No.8: Perform a general check of typos and misspellings throughout the text. Even in the title, there are errors

Response: Thanks to the expert's opinion, it has been revised.

Round 2

Reviewer 4 Report

Comments and Suggestions for Authors

The paper has significantly improved with the review. In my opinion, now is ready to be published.

Article Menu

An Efficient Group Convolution and Feature Fusion Method for Weed Detection

Further Information

Guidelines

MDPI Initiatives

Follow MDPI