Review Reports
- Xuelin Li1,2,
- Huanyin Yue1,2,* and
- Jianli Liu3,4
- et al.
Reviewer 1: Orly Enrique Apolo-Apolo Reviewer 2: Anonymous Reviewer 3: Xinwei Li
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsThis manuscript presents AMS-YOLO, an enhanced object detection model based on YOLOv11n, specifically designed for detecting cannabis plants from UAV-acquired imagery. The authors propose architectural modifications including a pinwheel-shaped convolution (PConv) and asymmetric padding modules (APC2f) in the backbone, and a multi-scale fusion (MSF) neck using partial convolution to improve detection of small and occluded targets. The work includes the construction of a custom UAV cannabis image dataset and evaluates the model’s performance against several YOLO versions and Faster R-CNN. The paper’s strengths lie in its practical relevance, focus on a socially important problem, and integration of modern object detection components.
2) General concept comments
Limited dataset scale and representativeness: the cannabis detection task aims to be robust to complex real-world scenarios (e.g., occlusion, dense vegetation, varied terrain). However, the dataset is derived from only 334 UAV images, expanded to 1972 small patches, which is insufficient to justify claims of generalizability. A larger and more diverse dataset (~5000+ samples from multiple regions/times) would be necessary for meaningful model evaluation in such a challenging detection context.
Lack of public dataset or code: the dataset and source code are not publicly released, which severely affects reproducibility and limits the manuscript’s value to the scientific community. The authors should either commit to releasing the data/code or justify why it cannot be made available.
Discussion section lacks depth: the discussion (Section 4) does not critically analyze failure cases, limitations, or broader implications of the findings. It misses an opportunity to explain when and why the model fails, how it might be improved, and how it compares to other recent methods beyond YOLO variants.
Performance gains are modest and unstable: while the AMS-YOLO model achieves higher mAP50 (90.7%) than YOLOv11n (88.6%), the magnitude of improvement is modest, and statistical significance or variability across runs is not reported. The lack of mAP@[.5:.95], a standard benchmark, further weakens the robustness of the evaluation.
Architectural choices need better justification: the rationale for replacing specific modules (e.g., PConv only at the first CBS location) and the design of the MSF neck structure lacks theoretical or empirical justification beyond empirical gains. A broader ablation across multiple architectural variants would strengthen these claims.
3) Specific comments
Lines 106–128: the claim that this is the "first systematic application" of YOLO to cannabis detection should be clarified and supported with more literature references, including prior work on opium poppy or similar crops using UAV and YOLO-based methods.
Line 118: Authors mention collecting "high-resolution images covering various growth stages and complex backgrounds." However, it's unclear how many different locations, seasons, or lighting conditions were sampled. This needs to be explicitly described.
Figure 2 (Line 171): The visual examples are useful, but the scale bar, resolution metadata, and annotations (e.g., bounding boxes) are missing, which are essential for understanding the dataset.
Line 295: tables 2 and 3, which report the hardware environment and training hyperparameters, are not results per se and should be moved to the methodology section. Including them under the results disrupts the logical flow of the manuscript and may confuse readers about what constitutes an experimental outcome versus a configuration detail. Consider integrating these tables into Section 2 (Materials and Methods), where they more naturally fit
.
Lines 313–319 / Table 4: mAP50 is reported, but mAP@[.5:.95] is not. This is a major omission, as mAP50 alone is no longer considered sufficient in modern object detection benchmarks.
Lines 366–369 / Figure 7: the visual comparison is helpful, but quantitative examples of false positives/negatives per class or per scenario (e.g., under occlusion) would strengthen the results. Additionally, it is unclear whether manual annotations were used for evaluation or if an automated scoring procedure was implemented.
Line 497: the claim that the model has “promising practical potential” is overstated without further deployment analysis, such as testing on unseen regions, edge-device speed benchmarks, or flight-integrated experiments.
Line 319: despite the architectural enhancements proposed in AMS-YOLO, the results presented in Table 4 do not convincingly demonstrate that the model significantly outperforms existing state-of-the-art approaches. The improvement in mAP50 over the YOLOv11n baseline is marginal (90.7% vs. 88.6%), and precision and recall values fluctuate within a narrow range compared to other YOLO variants. For example, AMS-YOLO achieves the same recall (79.8%) as YOLOv10n and lower recall than YOLOv6 and YOLOv8n, which raises concerns about its ability to consistently detect targets. Additionally, the absence of statistical metrics such as standard deviation, confidence intervals, or mAP@[.5:.95] further limits the interpretability and significance of the reported gains. Without stronger evidence, it is not clear that AMS-YOLO provides a meaningful improvement over existing lightweight models.
Author Response
Please see the attachment for our point-by-point response.
Author Response File:
Author Response.pdf
Reviewer 2 Report
Comments and Suggestions for AuthorsThis paper conducts research on cannabis detection in UAV imagery, focusing on three main aspects: (1) It proposed an Asymmetric Backbone Network to improve the detection of serrated edges of cannabis leaves and palmate textures; (2) It designed an MSF-type Neck Network to enhance the detection abality of small objects in complex backgrounds; (3) It constructs a UAV-based cannabis dataset. The manuscript is generally well-written and clearly structured. However, several important issues still need to be addressed.
- From line 106 to line 108 in Introduction Section, it writes that "To overcome the challenges outlined, this paper introduces AMS-YOLO (Asymmetric Multi-Scale YOLO), an enhanced YOLO model designed for cannabis detection in UAV imagery." AMS-YOLO is described as being designed for cannabis detection in UAV imagery. However, since the model does not appear to incorporate any domain-specific knowledge related to cannabis, how can it be demonstrated that it is specifically tailored for cannabis detection? Is merely enhancing the detection ability of the serrated edges of cannabis leaves and palmate textures sufficient to justify that the model is specifically designed for cannabis detection?
- The proposed improvements, namely the Asymmetric Backbone Network Design and the MSF-type Neck Network, are essentially integrations and adjustments of existing model components. How can the authors demonstrate the innovation of these contributions?
- It is reasonable for the authors to use YOLOv11 as the baseline model. However, we note that the latest version, YOLOv13, has been released recently. Although it is a very recent release, we recommend including a brief comparison or discussion with the latest model in the related work or discussion section.
Author Response
Please see the attachment for our point-by-point response.
Author Response File:
Author Response.pdf
Reviewer 3 Report
Comments and Suggestions for AuthorsDear author:
This study proposes an improved AMS-YOLO model based on YOLOv11n. By introducing an asymmetric backbone network and a multi-scale fusion network, it aims to address the issue of insufficient monitoring accuracy of cannabis. However, the manuscript has deficiencies in several key aspects and needs to be revised.
Insufficient data processing and analysis
- The paper on lines 165-167 mentions sliding window cropping but does not specify the basis for setting the overlap rate. Is 20% the optimal?
- Line 168 mentions that the dataset contains 1,972 images, which is of moderate size. However, it might be too small for deep learning. It is suggested to discuss data augmentation strategies or plans for expanding the dataset in the future.
- Figure 9 contains "Origin", while Figures 10 and 11 do not. It is recommended that the drawings be uniform.
The depth of the discussion and conclusion is insufficient
- The performance of the model under extreme weather conditions (haze, rain and fog) and low light conditions was not discussed in the discussion.
- It is suggested that the shortcomings and prospects of the experiment be added to the discussion.
Improper wording
- There are multiple "YOLOv11" and "YOLOv11n" in the article, and the expressions should be consistent.
Author Response
Please see the attachment for our point-by-point response.
Author Response File:
Author Response.pdf
Round 2
Reviewer 2 Report
Comments and Suggestions for AuthorsThis paper proposes AMS-YOLO, a cannabis detection model tailored for UAV imagery. The authors have provided some clarification regarding the issues raised in the previous review. Although the work is not highly innovative, it still holds considerable practical value. Overall, it is generally well-written and clearly structured. However, there are still some minor issues that need to be addressed.
(1)Several technical abbreviations, such as “APC2f,” “CBS,” and “APBottleneck,” appear without definition at their first occurrence. It is recommended to provide the full term in English along with its abbreviation upon first use, and then use only the abbreviation thereafter.
(2) While “mAP50” and “mAP50:95” are commonly used metrics in the object detection domain, it is preferable in a journal submission to include at least one formula or explanation. Moreover, these notations are not strictly standard; “mAP@0.5” and “mAP@0.5:0.95” are more widely recognized.
(3) The reference formatting does not conform to standard styles, particularly those used by the Drones journal. For example, in Ref. [1] , "Abdel-Salam, O.M.E. The Neurotoxic Effects of Cannabis on Brain: Review of Clinical and Experimental Data. MOLECULAR SCIENCES AND APPLICATIONS 2022, 2, 11–23, doi:10.37394/232023.2022.2.3.", there should be a comma between the journal name and the year.
Author Response
Please see the attachment.
Author Response File:
Author Response.pdf
Round 3
Reviewer 2 Report
Comments and Suggestions for AuthorsThe author has designed AMS-YOLO using YOLOv11 as the baseline for detecting illegal cannabis cultivation. By leveraging the morphological characteristics of cannabis, they improved both the backbone and neck networks, achieving a good balance between detection accuracy and computational efficiency. It is a meaningful and makes a valuable contribution.