YOLOSO: An Improved YOLO-Based Algorithm for UAV to Detect Small Ground Targets
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsThe manuscript proposes YOLOSO, an improved YOLOv11n based detector for small ground targets in UAV imagery. The model adds a P2 high resolution branch with stride 4, redesigns the C3k2 and C2PSA modules as C3k2SO and C2PSASO, and uses a combined loss function to reduce small object feature loss and improve scale adaptation under complex backgrounds. Experiments on VisDrone2019 DET report 3.55M parameters, Precision of 0.484, Recall of 0.362, mAP50 of 0.375, and mAP50 95 of 0.221, with gains over YOLOv11n on several metrics. Yet the evidence is mainly limited to one dataset, so claims about generalizability and real UAV deployment should be made more cautiously.
I have some comments as follows.
-
The novelty of the manuscript appears mainly incremental, since high resolution detection branches, multi scale fusion, attention redesign, and modified loss functions are already common strategies in small object detection, so the authors need to state more clearly what is technically distinct in YOLOSO compared with recent YOLO based UAV and small object detectors.
-
The experimental validation relies only on VisDrone2019 DET, which weakens the evidence for generalization, so the authors should add at least one further UAV or remote sensing dataset, or conduct cross dataset testing to assess robustness under different scenes, altitudes, densities, and imaging conditions.
-
The baseline comparison is not fully convincing because it mainly uses nano YOLO models and one RT DETR L model, so the manuscript should include closer competitors, especially YOLO variants with P2 branches, attention modules, or designs made for small object detection in UAV imagery.
-
The ablation study does not isolate all claimed contributions with sufficient clarity, since the loss function is not tested separately and the current design does not fully show the individual and joint effects of C3k2SO, C2PSASO, and the structural changes.
-
The claim that the method is suitable for real time UAV deployment is not yet supported by enough engineering evidence, so the authors should report FPS, latency, FLOPs, memory use, and preferably results on an edge device or UAV relevant hardware platform.
-
The absolute detection performance remains modest, with Recall of 0.362 and mAP50 of 0.375, so the manuscript should avoid overly strong claims and should include error analysis by class, object size, occlusion, truncation, and scene condition to clarify the practical limits of the model.
Author Response
Please see the attachment
Author Response File:
Author Response.pdf
Reviewer 2 Report
Comments and Suggestions for AuthorsThank you for your submission. Here are my questions:
1) Adding a P2 head for small UAV targets is well established (TPH-YOLO, QueryDet, many VisDrone works). What is new beyond P2 + module retuning?
2) "Loss function optimization" is claimed as one of three contributions, but Equations 11 to 14 use Focal, Inner-SIoU, and BCE off the shelf. Either drop this from contributions or show the design choice and ablation.
3) Recent VisDrone-specialized methods are missing (e.g., QueryDet, ClusDet, TPH-YOLOv5, UFPMP-Det, CEASC, DRENet). Comparing only against generic n-scale YOLOs understates the field - please add at least two other methods for comparison; otherwise we are giving unfair advantage to your method
4) typos: Equation 1 uses "Unsample", should be "Upsample". Section heading "3.2 ptimization Design" is missing the leading O. Section 3.3 is titled "Overall Algorithm Architecture" but duplicates the title of Section 3.1.
5) No analysis by object size bucket (tiny, small, medium) or by occlusion level. VisDrone provides occlusion and truncation flags, please use them.
Author Response
Please see the attachment
Author Response File:
Author Response.pdf
Reviewer 3 Report
Comments and Suggestions for AuthorsThe paper proposes an improved YOLOSO algorithm based on YOLOv11, addressing the demand for detecting small ground targets using UAVs. The topic holds practical application value, the overall research methodology is clear, and the experimental design is relatively complete, achieving certain performance improvements. However, there are still notable deficiencies:
1. The innovations are largely combinatorial optimizations of existing techniques, and the original contribution is not prominent enough.
2. The model is validated on only a single dataset, lacking sufficient support for its generalization ability and robustness in complex scenarios.
3. Real-time metrics, such as inference speed, are not provided, resulting in an inadequate discussion on the feasibility of edge deployment on UAVs.
4. The description of the core modules lacks detail, the standardization of figures, tables, and writing needs improvement, and the references could be updated with the latest achievements in recent years.
It is recommended that the authors further clarify the core innovations, supplement the study with multi-dataset and robustness experiments, add an analysis of real-time performance, refine the theoretical description of the modules, and improve the writing standards to enhance the completeness and academic rigor of the research.
Author Response
Please see the attachment
Author Response File:
Author Response.pdf
Round 2
Reviewer 1 Report
Comments and Suggestions for Authorsthanks for the efforts making the manuscript acceptable.
Author Response
Comments 1:thanks for the efforts making the manuscript acceptable.
Response 1: Thank you very much for your affirmation and encouragement. We are glad that the manuscript has met your approval, and we sincerely appreciate your time and effort throughout the review process.
Reviewer 2 Report
Comments and Suggestions for AuthorsThank you for addressing your comments.
Author Response
Comments 1:Thank you for addressing your comments
Response 1: Thank you very much for your affirmation and encouragement. We are glad that the manuscript has met your approval, and we sincerely appreciate your time and effort throughout the review process.
Reviewer 3 Report
Comments and Suggestions for AuthorsAfter thorough revisions, this paper has generally satisfied the publication requirements. However,as a metric for floating-point computational cost, GFLOPS cannot reflect real-time performance, which is an indispensable prerequisite for deployment on UAVs. Authors are suggested to supplement relevant analysis.
Author Response
Please see the attachment.
Author Response File:
Author Response.pdf
