Previous Article in Journal
Mechanism of Intermittent Hypobaric Affecting the Postharvest Quality of Cassava Roots: An Integrated Analysis Based on Respiration, Energy Metabolism, and Transcriptomics
 
 
Article
Peer-Review Record

YOLOv11n-KL: A Lightweight Tomato Pest and Disease Detection Model for Edge Devices

Horticulturae 2026, 12(1), 49; https://doi.org/10.3390/horticulturae12010049 (registering DOI)
by Shibo Peng 1,†, Xiao Chen 2,†, Yirui Jiang 3, Zhiqi Jia 1, Zilong Shang 4, Lei Shi 4, Wenkai Yan 1,* and Luming Yang 1,*
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Horticulturae 2026, 12(1), 49; https://doi.org/10.3390/horticulturae12010049 (registering DOI)
Submission received: 21 November 2025 / Revised: 19 December 2025 / Accepted: 22 December 2025 / Published: 30 December 2025
(This article belongs to the Section Vegetable Production Systems)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

The paper presents a variation of the YOLOv11 model for object detection, with a particular focus on pest detection in agriculture. The results highlighted in the abstract are interesting, and the work has scientific merit due to its real-world application.

I see that there are some points that need to be improved for the paper to be accepted:

1 - The discussion section (4) should be integrated into the results section, rather than being separate.

2 - The paper presents only one paragraph of conclusion, which is considerably short given that a new method was proposed in this work. Therefore, a specific section is needed to highlight the conclusions and provide insight into future work.

3 - Table 3 presents the values of the hyperparameters used. Why did the author not use hypertuning to select these values? An optimized solution would be more interesting than a predefined one. If you do not want to do hypertuning, consider including this as future work and discussing the subject. Here are some suggested references: Hypertuned-YOLO for interpretable distribution power grid fault location based on EigenCAM; Automatic digitalization of railway interlocking systems engineering drawings based on hybrid machine learning methods; Enhanced insulator fault detection using optimized ensemble of deep learning models based on weighted boxes fusion. As you can see, this topic is very popular and could be mentioned in your work.

4 - Between pages 3 and 4 there is a large blank space that does not need to be included.

5 - In the “Data Availability Statement” section, the link shows the availability of the algorithm but not the data, which is considered very important in this work.

6 - It is not presented in the related works section, which is considered necessary for the proposal of a new method, considering that much has been done in this area.

Author Response

Reviewer 1

The paper presents a variation of the YOLOv11 model for object detection, with a particular focus on pest detection in agriculture. The results highlighted in the abstract are interesting, and the work has scientific merit due to its real-world application.

Thank you for the comment.

I see that there are some points that need to be improved for the paper to be accepted:

1. The discussion section (4) should be integrated into the results section, rather than being separate.

Response: We appreciate the reviewer’s suggestion to integrate the Discussion into the Results section. However, we respectfully disagree with this recommendation. According to the journal’s formatting guidelines, the Discussion and Results sections should remain separate. This structure also aligns with standard practices in our field, allowing for a clearer distinction between objective findings and their interpretation. Therefore, we have retained the original structure with distinct Results and Discussion sections.

2. The paper presents only one paragraph of conclusion, which is considerably short given that a new method was proposed in this work. Therefore, a specific section is needed to highlight the conclusions and provide insight into future work.

Response: Thank you for the comment. In response, we have added a separate Conclusion section and expanded it accordingly.

3. Table 3 presents the values of the hyperparameters used. Why did the author not use hypertuning to select these values? An optimized solution would be more interesting than a predefined one. If you do not want to do hypertuning, consider including this as future work and discussing the subject. Here are some suggested references: Hypertuned-YOLO for interpretable distribution power grid fault location based on EigenCAM; Automatic digitalization of railway interlocking systems engineering drawings based on hybrid machine learning methods; Enhanced insulator fault detection using optimized ensemble of deep learning models based on weighted boxes fusion. As you can see, this topic is very popular and could be mentioned in your work.

Response: Thank you for your helpful suggestion. In the current study, the hyperparameters listed in Table 3 were determined based on commonly adopted settings in YOLO-based models and empirical validation to ensure stable and fair comparisons across experiments. We did not perform a systematic hyperparameter tuning procedure (e.g., grid search or Bayesian optimization) in this work, as our primary focus was to evaluate the effectiveness of the proposed method rather than exhaustively optimizing the training configuration. We agree that applying hyperparameter tuning could further improve model performance and provide a more optimized solution. Following the reviewer’s suggestion, we have discussed hyperparameter optimization as an important direction for future work in the revised manuscript and cited recent relevant studies on hypertuned YOLO and optimized deep learning frameworks. These approaches will be considered in our subsequent research to further enhance detection accuracy and robustness. (Line 48-Line 50)

4. Between pages 3 and 4 there is a large blank space that does not need to be included.

Response: Thanks. We have incorporated the requested revisions accordingly.

5. In the “Data Availability Statement” section, the link shows the availability of the algorithm but not the data, which is considered very important in this work.

Response: Due to the fact that the tomato pests and diseases dataset constitutes a valuable research resource and involves data usage restrictions, it cannot be publicly released at this stage. However, as stated in the manuscript, the processed dataset will be made available by the authors upon reasonable request. We have clarified this point in the Data Availability Statement to avoid any ambiguity.

6. It is not presented in the related works section, which is considered necessary for the proposal of a new method, considering that much has been done in this area.

Response: Thanks for your suggestion. According to the journal’s manuscript template and guidelines, a dedicated Related Works section is not mandatory. In the current manuscript, the relevant studies, recent advances, and existing limitations in this research area have been systematically reviewed and discussed within the Introduction section. We believe that this organization allows for a concise and coherent presentation of the research background while clearly positioning the proposed method with respect to existing work. Nevertheless, we have carefully revised the Introduction to further improve the clarity and completeness of the related work discussion.

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

This manuscript shows that YOLOv11n-KL is a lightweight tomato pest and disease detection model that integrates KernelWarehouse-based Conv_KW and C3k2_KW modules, plus a Detect_LSCD head, to enhance small-target feature extraction and multi-scale calibration. It achieves an mAP@0.5 of 92.5% with only 3.0 GFLOPs and 5.2 M parameters, reducing computation by 52.4% while slightly improving accuracy over YOLOv11n, and is therefore suitable for deployment on resource-constrained edge devices.

The sections of the manuscript are interesting, but important and relevant information is still missing. For example, at the end of the Introduction the hypothesis needs to be clearly stated and later revisited in the Discussion section. In Materials and Methods, the application of YOLO to correct identification should be described in detail: what was the sample size? Which software and equipment were used (including manufacturer, city and country)? The Results must be clearly and explicitly related to the collected data. The figure legends need to be updated so that each caption fully describes the corresponding figures. A dedicated Discussion section is required, comparing the advances reported here with previous findings in the literature. The references should be updated; even if many are from 2020, they must be directly related to the topic. Finally, statistical analyses should be included in the bar charts and tables.

Author Response

Reviewer 2

This manuscript shows that YOLOv11n-KL is a lightweight tomato pest and disease detection model that integrates KernelWarehouse-based Conv_KW and C3k2_KW modules, plus a Detect_LSCD head, to enhance small-target feature extraction and multi-scale calibration. It achieves an mAP@0.5 of 92.5% with only 3.0 GFLOPs and 5.2 M parameters, reducing computation by 52.4% while slightly improving accuracy over YOLOv11n, and is therefore suitable for deployment on resource-constrained edge devices.

Thank you for the comment.

The sections of the manuscript are interesting, but important and relevant information is still missing. For example, at the end of the Introduction the hypothesis needs to be clearly stated and later revisited in the Discussion section.

Response: We appreciate the reviewer’s suggestion. The underlying hypothesis of this study is reflected in the motivation and objectives described at the end of the Introduction, and it is further examined through the experimental results and analysis in the Discussion. Given the methodological focus of this study, we did not explicitly formulate the hypothesis as a separate statement. Instead, it is inherently addressed through the proposed framework and validation process.

In Materials and Methods, the application of YOLO to correct identification should be described in detail: what was the sample size? Which software and equipment were used (including manufacturer, city and country)?

Response: Thanks. We thank the reviewer for this comment. The details requested regarding the application of YOLO for accurate identification—including the sample size, software, and equipment—are already provided in the Materials and Methods section. Specifically, the dataset contains 10,429 images, and the software, system configurations, and parameters are comprehensively listed in Table 2 and Table 3. We believe this information sufficiently describes the experimental setup and resources used for reproducibility.

The Results must be clearly and explicitly related to the collected data. The figure legends need to be updated so that each caption fully describes the corresponding figures.

Response: Thanks. We have incorporated the requested revisions accordingly. (Line 354, Line 384, Line 409)

A dedicated Discussion section is required, comparing the advances reported here with previous findings in the literature.

Response: Thanks for your suggestion. The revised manuscript includes a dedicated Discussion section in which we explicitly compare the proposed YOLOv11n-KL model with existing approaches, including YOLOv11n, YOLOv8s, YOLOv10n, and Faster R-CNN. We highlight the superior trade-off achieved between detection accuracy and computational efficiency, and discuss how the Conv_KW, C3k2_KW, and Detect_LSCD modules contribute to these improvements. Additionally, attention map analyses and cross-scale experiments demonstrate the scalability and generalizability of our method relative to prior work. Limitations and directions for future research are also discussed to provide context and guidance for subsequent studies. This discussion situates our contributions clearly within the current literature while emphasizing the novelty and advantages of the proposed model.

The references should be updated; even if many are from 2020, they must be directly related to the topic.

Response: We have carefully reviewed all the references in the manuscript, and confirm that each cited work is directly relevant to the research topic. While some references are from 2020 or earlier, they represent foundational studies and key developments in the field. We believe that the current reference list appropriately supports the context, methodology, and discussion presented in the manuscript.

Finally, statistical analyses should be included in the bar charts and tables.

Response: Thanks for your suggestion concerning statistical analyses. In object detection tasks, performance metrics such as mAP and F1-score are typically reported as single comprehensive results derived from a complete training-testing cycle using a fixed dataset and a fixed random seed. This approach, commonly adopted in similar studies (e.g., A Lightweight Pine Wilt Disease Detection Method Based on Vision Transformer-Enhanced YOLO), inherently reflects the model’s representative performance. In our study, all experiments were conducted using fixed dataset splits under controlled hardware and software settings to ensure reproducibility. Additionally, cross-validation was employed to further ensure the robustness and reliability of the reported results.

Author Response File: Author Response.pdf

Reviewer 3 Report

Comments and Suggestions for Authors

1.How  the authors are ensure that the performance comparisons are fair and not biased by suboptimal training configurations for baseline models such as YOLOv8n, YOLOv10n, or Faster R-CNN?

2. Provide theoretical justification or further analysis explaining why these modules interact synergistically rather than functioning as independent improvements?

3.To further strengthen the introduction, the authors may cite works that demonstrate the wider use of neural-network and classification methods. “Fuzzy PD-Based Control for Excavator Boom Stabilization Using Work Port Pressure Feedback” shows the value of intelligent data-driven control, while “Neural Network-Enhanced Internal Leakage Analysis for Efficient Fault Detection in Hydraulic Actuator Cylinders” illustrates effective NN-based fault classification. These studies highlight the broader applicability of learning-based approaches .

4. authors tested the YOLOv11n-KL model under more challenging field conditions, and what strategies do they propose to ensure robustness and generalization beyond curated datasets?

5. The authors should also include a brief discussion of potential future work in the conclusion to highlight directions for further advancement and contextualize the broader impact of their findings.

Author Response

Reviewer 3

1. How the authors are ensuring that the performance comparisons are fair and not biased by suboptimal training configurations for baseline models such as YOLOv8n, YOLOv10n, or Faster R-CNN?

Response: Thanks. To ensure fair comparisons, all models—including YOLOv11n-KL, YOLOv8n, YOLOv10n, and Faster R-CNN—were trained and evaluated using identical hardware (Intel Xeon Platinum 8352V, NVIDIA RTX 4090) and software environments (PyTorch 2.2.2, CUDA 12.1). Key training settings were standardized, including input resolution (640×640), optimizer (SGD with identical momentum and weight decay), number of training epochs (200), and early stopping criteria. All models were trained and evaluated on the same dataset (10,429 images, 13 categories) with fixed train/validation/test splits. Evaluation metrics—including mAP@0.5, precision, recall, F1-score, and computational cost—were calculated on the same test set. Stage-wise ablation studies confirmed that the observed performance gains originated from the proposed modules rather than from differences in training procedures. Collectively, these measures ensure that the superior performance of YOLOv11n-KL (92.5% mAP@0.5 with 52.4% reduced computational cost) accurately reflects genuine architectural improvements.

2. Provide theoretical justification or further analysis explaining why these modules interact synergistically rather than functioning as independent improvements?

Response: Thank you for the comment. The Conv_KW, C3k2_KW, and Detect_LSCD modules interact synergistically because they address complementary aspects of feature representation: Conv_KW enhances local feature extraction, C3k2_KW improves multi-scale feature fusion, and Detect_LSCD leverages these enriched features for accurate detection. Ablation studies demonstrate that the combined application of these modules yields larger performance gains than the sum of their individual contributions, confirming a synergistic effect. This aligns with theoretical principles of hierarchical feature representation and dynamic convolution, where improvements in both extraction and fusion propagate through the network to enhance final detection outcomes.

3. To further strengthen the introduction, the authors may cite works that demonstrate the wider use of neural-network and classification methods. “Fuzzy PD-Based Control for Excavator Boom Stabilization Using Work Port Pressure Feedback” shows the value of intelligent data-driven control, while “Neural Network-Enhanced Internal Leakage Analysis for Efficient Fault Detection in Hydraulic Actuator Cylinders” illustrates effective NN-based fault classification. These studies highlight the broader applicability of learning-based approaches.

Response: Thanks. We have incorporated the requested revisions accordingly. (Line 479-Line 484)

4. authors tested the YOLOv11n-KL model under more challenging field conditions, and what strategies do they propose to ensure robustness and generalization beyond curated datasets?

Response: Currently, the YOLOv11n-KL model has been developed and validated on curated datasets but has not yet been deployed in real-world field conditions. Ensuring robustness and generalization to more challenging and uncontrolled environments represents a key direction for future research. In future studies, we plan to further optimize the model and conduct extensive testing under practical field conditions, encompassing varying lighting, occlusion, and background complexity. These efforts aim to evaluate and improve the model’s reliability, scalability, and generalization capabilities beyond controlled datasets.

5. The authors should also include a brief discussion of potential future work in the conclusion to highlight directions for further advancement and contextualize the broader impact of their findings.

Response: Thanks for your suggestion. We have revised as suggested. (Line 486-Line 496)

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

The authors did not make changes considering the suggestions that I made; however, they explained well why they didn't. I consider the paper acceptable. 

OBS: If a minor is requested by other reviewers, please include more references in your text to support the background.

 

Back to TopTop