Review Reports - YOLO-MECD: Citrus Detection Algorithm Based on YOLOv11

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

The manuscript presents the results of YOLO-MECD based on the YOLO11s application for citrus fruit detection and the future development of smart orchards. It is well-written and interesting, but some critical points and minor errors emerge. Given the absence of line numbers in the PDF file, I will refer to some extracts from the paper below, to which my comments refer.

It would be better to replace YOLO11 in the keywords since it is already present in the title
Introduction: With the advancement of computer technology, automated detection technology based on machine vision has been widely used in various scenarios, enabling in-situ non-destructive detection of round fruits through morphological methods

It would be better to use computer vision technology instead of computer technology

Materials and Method

Section 2.1: The citrus images were collected from the Garden of Jiangxi Agricultural Univer-sity in Nanchang City, Jiangxi Province, during October to November 2024. The im-ages were captured using a Honor 70 camera (Honor Device Co., Ltd., Shenzhen, China)

It would be better to replace “from” with “at” and include coordinates of the experimental site where the photos were collected

Section 2.4: YOLO11 represents the latest evolution in the YOLO (You Only Look Once) object detection algorithm series. In comparison to its antecedents, it introduces sophisti-cated architectural paradigms and technological innovations, demonstrating substan-tial enhancements in both model accuracy and computational efficiency. The YOLO11 framework encompasses five distinct model variants—n, s, l, m, and x—characterized by incrementally increasing network depth and detection precision, strategically de-signed to accommodate diverse application scenarios. Among these variants, YOLO11s achieves an optimal equilibrium between detection accuracy and model complexity, thereby serving as the foundational architecture for our investigation.

It would be better to include bibliographic sources in this paragraph.

Results

It would be better to evaluate the results presented in this section also though an appropriate statistical analysis.

Section 3.1: Figure 11 illustrates the detection performance of the YOLO-MECD model under different scenarios. (a) demonstrates the detection results of upward-angle conditions, where nineteen citrus fruits were successfully detected, with even small, distant fruit targets not being missed. (b) presents the detection effectiveness in wide-angle and backlighting conditions, successfully detecting 12 larger but partially occluded hanging fruits and 13 fallen fruits at a distance with small targets. (c) shows the detection per-formance under clear sky and front-lighting conditions, where despite some occlusion and small targets, 17 hanging fruits and 3 fallen fruits were accurately detected. (d) displays the detection results under overcast and downward-angle conditions, where the model could still effectively identify targets with partial occlusion and overlap, suc-cessfully detecting 9 hanging fruits and 5 fallen fruits.

It would be appropriate to double check this part and insert capital letters where appropriate

TABLES: Check the tables inserted in the results part and indicate where present what the values in bold indicate

Discussion

Section 4.1: this is not a suitable part of the discussion, it would be better to include it in the results

Section 4.2: the content of this paragraph needs to be reorganized, for example inserting a descriptive table of the numerical values cited and moving the future prospects to the conclusion section.

References

Check the numbering of the references, the numbers in the list appear twice.

Best regards

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

[Major] The scientific soundness of the article is very low, because the important elements of text are written inaccurately. E.g. the titles of CNN architectures are written incorrectly. "YOLO11" is written as "YOLOv11", "YOLO V11", "YOLO-v11s". Meanwhile, the architectures R-CNN and Faster R-CNN are written as "RCNN" and "FasterRCNN". "YOLOv8" neither YOLO8. YOLO belongs to object detection domain neither to "target detection" mentioned in the keywords. The article presents the experiments with citrus, pomelo and kumquat, but the keywords contain "orange".

[Major] The object of the study is YOLO-MECD architecture, which is developed based on the YOLO11 architecture. There are the detailed description of the architecture and the comprehensive experiments of the architecture. However, the domain "agriculture" is lost. The main element of a agriculture is only the dataset of citrus images. The introduction presents importance to count the dropped fruits, meanwhile, the aspect of dropped number and audit after disaster is totally lost. Considering, the "agronomy" domain it would be more correct to simply take the most optimal architectures from existing solutions (based on comparison) and to investigate the effectiveness of AI in the practice by comparing the manual and AI-based methods. Meanwhile, new architecture must be tested on the benchmarks.

[Major] The experiment design of precision farming must provide the comparison between the manual and AI-based method (manually counted and AI-counted). If images are collected from different sides, the correct number of dropped fruits must be predicted by regression models. Meanwhile, the percent must be calculated using the data about dropped fruits and fruits on trees. How is it obtained number of fruits on trees and dropped together? The main idea is lost. The photo collection methodology for AI-based method must described and provided too (is it done by UGVs, by UAVs, by people?).

[Major] There is not "Discussion" and experiment, which would be around the introduction text: "Therefore, citrus breeders have been seeking superior varieties that possess certain resistance to natural disasters and are less prone to fruit drop, which is of great significance for the development of China's citrus industry. Currently, breeders and farmers count the on tree or dropped fruit after natural disasters manually, which inevitably involves problems such as high costs, low efficiency, and poor accuracy. This deficiency limits the selection of superior citrus
varieties and has become a bottleneck of the citrus industries. "

Summary (Major): At this moment, this article is neither about new architecture (YOLO-MECD) nor about dropped fruit counting after disaster to evaluate the cultivar.

[Ethical and Plagiarism] YOLO11 must be referenced considering the user license, as well as, because it is mentioned in the text - as a honor to the author.

[Minor] List:

[Abstract] There is aim to minimize CNN, therefore the number of YOLO-MECD and YOLO11 parameters must be compared too.

[Abstract] Information about dataset must be provided, which was used to compare CNNs.

[Introduction, p. 3] "From the above content, it can be seen that while fruit object detection based on deep learning has made certain progress, current research subjects mostly focus on counting on tree fruits without addressing dropped fruit detection."
It is not correct conclusion, because the provided story is about applied AI solutions and achieved accuracy.

[Introduction, p. 3] "traditional YOLO algorithms" - how to identify traditional and not traditional? It is a demagogy.

[Ch.2.2] "This augmentation protocol expanded our initial dataset to 7,200 images.", the augmentation traditionally is done neither before experiment, but within each epoch.

[Fig. 2] Legend must be provided.

[Page 5] "Among these variants, YOLO11s achieves an optimal equilibrium between detection accuracy and model complexity, thereby serving as the foundational architecture for our investigation." Why is it the most optimal? Other models can be optimal too. It is subjective without experiment. It is better to argument through the minimal size.

[Ch.2.5] "The YOLO11s model was designed for generalized applications, thus it can not obtain excellent result for citrus identification in complex environmental conditions."
This conclusion is incorrect, because YOLO11 is trained for general purpose tasks, neither for citrus detection. It is not related to the architecture of YOLO11, you can train it from scratch and tune it for citrus detection.

[Ch.2.5] "while enhancing small object detection capabilities." - it was not tested experimentally.

[Fig.6] Sorry, I can not differ filter and I/O, there are similar colors for me.

[Conclusions] Conclusions are not supported with the experiment data.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 3 Report

Comments and Suggestions for Authors

This manuscript describes machine learning methods for counting citrus fruit based on images. The authors present a lightweight (smaller number of parameters) implementation of the YOLOv11 method. Additionally they attempt to count the number of fruit on the tree and dropped to the ground. The reported metrics for accuracy etc. are slightly better than the basic YOLOv11 but with much fewer parameters making this interesting.

Given this is a new application for a new dataset I believe this is worth publishing, but there are a number of issues which I think must be addressed. (see the below comments)

Comment 1) The introduction mentions a number of studies which have used YOLO type methods successfully to detect citrus fruits. It would be good to state how the proposed methods compare against these. What are the limitations of the existing studies? And how does YOLO-MECD overcome them? Please clarify the novelty of your methods.

Comment 2) There are other existing studies which use YOLO methods combined with EMA attention mechanisms. Even I found one for citrus. Similarly there are existing methods using YOLO with MPDIOU loss functions. Is this the first time these different modification have been used together? Please clarify the novelty of your methods.

In particular the corresponding author published an EMA-YOLO method last year in the journal sensors. It would be good to also mention that study for comparison.

Comment 3) I am not sure which font the authors have used, but it is a little hard to read and I do not think it is the one suggested in the template of Agronomy. I suggest to switch to the recommended fonts, utilizing the styles in the Agronomy template.

Comment 4) The referencing in the text is a little strange.

-I suggest to move the citations (numbers in square brackets) to the end of sentences in most cases.

- When referring to a set of authors in the text it is not necessary, and perhaps confusing to write their full name. For example “Zhang Fu et al.” should be changed to “Zhang et al.”

- When referring to reference [28] in the introduction the authors refer to “He Bin et al.[28]” but in the reference list [28] does not have any author by that name.

Comment 5) The formatting of references in the reference list contains errors.

- “et al.” should not be used in the reference list. I think all authors should be listed.

- The references have been numbered twice (e.g. 9. 9. )

- Please include the full DOI link for all studies where available (http://doi.org/.....)

Comment 6) All acronyms should be defined at the first point where they are used

-YOLO

-EMA

-CSPPC

-MPDIoU

Etc.

Comment 7) One of the keywords is “orange” but you have consistently used “citrus” throughout the manuscript. So I suggest to change this to “citrus”

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 4 Report

Comments and Suggestions for Authors

The publication concerns the method of identifying citrus fruits on trees and fallen due to environmental conditions (diseases, cultivars). The neural network methods used are in line with current trends in research using image analysis. The description of the methods is detailed and explained. The results correspond to the scope of the research.

Comments:

Consider deleting Figures 3, 5, 6, and 9. The figures do not contain new information and unnecessarily enlarge the publication. Section 2.7 can also be shortened and known formulas removed. The publication database is known and properly cited in the publication.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

All the aspects highlighted in the previous review have been considered; however, there are still two points that need clarification.

More detailed descriptions should be added for each table, specifying the meaning of the numbers in bold. Furthermore, no statistical analysis of the results was conducted; it would be more appropriate to include it in the relevant section of the Materials and Methods.

After these changes, the paper will be suitable for publication.

Best regards

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

[Major] The main problem of article is depicted in Response 5 "This article primarily focuses on the design of the YOLO-MECD model for counting citrus fruits (such as oranges, pomelos, and kumquats).". The journal is called "Agronomy", it is an international and cross-disciplinary scholarly journal on agronomy and agroecology. The journal is not about the artificial intelligence itself. The horticulture problems and related business tasks must be primary discussed. The request was to extend article with the content focused on exactly fruit growers and orchard managers. Discussions are very short comparing with other content. Therefore, the discussions can be extended exactly with this information (about horticulture). Why must the fruits on ground be counted? The importance and practical usage of AI was not discussed too. See previous Major comments.

Comments 6: [Ethical and Plagiarism] You have added the references on publications about YOLO11. However, if YOLO11 framework was used ([Page 5, lines 29-31] “The YOLOv11 framework encompasses five distinct model variants—n, s, l, m, and x—characterized by incrementally increasing network depth and detection precision, strategically designed to accommodate diverse application scenarios.”), there is Acknowledgement requested: https://docs.ultralytics.com/models/yolo11/#citations-and-acknowledgements.

@software{yolo11_ultralytics,

author = {Glenn Jocher and Jing Qiu},

title = {Ultralytics YOLO11},

version = {11.0.0},

year = {2024},

url = {https://github.com/ultralytics/ultralytics},

orcid = {0000-0001-5950-6979, 0000-0002-7603-6750, 0000-0003-3783-7069},

license = {AGPL-3.0}

}

If YOLO11 project was not used, please rewrite the text to exclude ambiguity about the Ultralytics framework usage.

Comments 7: [Abstract] There is aim to minimize CNN, therefore the number of YOLO-MECD and YOLO11 parameters must be compared too.

This comment is about Abstract. You mention 2,297,334 parameters and 4.66MB for YOLO-MECD and do not mention for YOLO11s.

Comments 8: [Abstract] Information about dataset must be provided, which was used to compare CNNs.

This comment is about Abstract.

Comments 11:[Ch.2.2] "This augmentation protocol expanded our initial dataset to 7,200 images.", the augmentation traditionally is done neither before experiment, but within each epoch.

Response 11: “Since YOLO has a built-in Mosaic data augmentation method, data augmentation is performed in each epoch.”. The same is mentioned in the Comment 11, but the article provides the different description, that you generated 7200 images using augmentation and then applied YOLO training (correct the description).

Comments 12: [Fig. 2] Legend must be provided.

The chart legend is an element of a chart, which describes the visual notation.

Comments 15:[Ch.2.5] "while enhancing small object detection capabilities." - it was not tested experimentally.

Table 2 proves the enhancement to detect citrus. But it does not prove the enhancement to detect small objects, because small objects must precisely defined and measurable, they must be distinguished from large objects (in the case of bounding-box and image, the size is measured by pixels). Meanwhile, citrus is citrus neither large nor small object. Additionally, this improvement must be tested on benchmarks. This article can conceptually argue only about citrus detection neither in general.

Comments 17: [Conclusions] Conclusions are not supported with the experiment data.

“Conclusion supported by experiment data” means that quantity data from experiments must be summarized and presented in the conclusions.

Comment 18: In the previous review, there was special input to comment inappropriate literature. The recommendation was to remove paragraph [p2, Lines 14-31] (Ref.[6-10]), because your study is based on YOLO11 architecture. There is sufficient background overview in next text.

Comments on the Quality of English Language

"Quality of English Language" is the mandatory field. Meanwhile, there is written "Please only provide feedback if, based on your own proficiency in English, you feel qualified and able to assess the quality of English in this paper.". There is not option to ignore this input.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Round 3

Reviewer 2 Report

Comments and Suggestions for Authors

I think the article is sufficiently improved.

[Minor] [p 25, Line 36] "the program achieves broader applicability" - you have spoken about AI and monitoring system in Lines 25-34 -> therefore, "system".