S-Drone-YOLO: A Parameter-Efficient P2-Guided Quality-Aware YOLO Detector for Infrared Small UAV Detection
Abstract
1. Introduction
- A four-scale YOLO detector is designed for infrared UAV detection. It adds a stride-4 P2 path to the standard P3–P5 prediction structure, enabling small targets to be evaluated before significant spatial detail is lost.
- A Coordinate-Aware Residual C2f Block (CAR-C2f) is placed only in the P2 branch. It strengthens location-sensitive features while preserving the same input and output channel dimensions.
- A P2-Guided Quality-Aware Detection Head (P2-QADH) is introduced. It uses the nearby P3 context to refine P2 and applies a controlled P2-based quality bias to classification logits without changing the regression branch.
2. Related Work
2.1. Small-Object Detection Challenges and Evaluation
2.2. YOLO Detectors and UAV Detection
2.3. Attention-Based Feature Refinement
2.4. Infrared UAV Datasets and Generalization
2.5. Rationale for the YOLOv5 Base Architecture
3. Method
3.1. Design Logic and Overall Architecture
3.2. Replacement of C3 with C2fAttn in the Neck
3.3. Coordinate-Aware Residual C2f Block
3.4. P2-Guided Quality-Aware Detection Head
4. Experimental Preparation
4.1. SIDD Dataset
4.2. Generalization Datasets
4.3. Implementation Details
4.4. Evaluation Metrics
5. Results and Analysis
5.1. Architecture I Development: P2 and C2fAttn
5.2. Architecture II Model Performance and Component Ablation
5.3. Comparison with Recent YOLO Baselines
5.4. Performance by Target
5.5. Fine-Tuning Generalization on External UAV Datasets
5.6. Per-Background Analysis on SIDD
5.7. Confusion Matrix and Error Analysis
5.8. Visual Detection Analysis
6. Discussion
6.1. Effect of the P2 Pathway
6.2. Role of CAR-C2f and P2-QADH
6.3. Accuracy-Efficiency Trade-Off
6.4. Limitations
7. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Abbreviations
| YOLO | You Only Look Once |
| UAV | Unmanned Aerial Vehicle |
| SIDD | Single-frame Infrared Small-Drone Detection Dataset |
| RGB | Red, Green, and Blue |
| COCO | Common Objects in Context |
| IOU | Intersection over Union |
| P2 | Pyramid level 2 |
| P3 | Pyramid level 3 |
| P4 | Pyramid level 4 |
| P5 | Pyramid level 5 |
| C3 | CSP Bottleneck with Three Convolutions |
| C2f | Cross-Stage Partial Bottleneck with Two Convolutions and Feature Fusion |
| C2fAttn | C2f with Attention |
| AP | Average precision |
| mAP | Mean average precision |
| CAR-C2f | Coordinate-Aware Residual C2f Block |
| GFLOP | Giga Floating-Point Operations |
| P2-QADH | P2-Guided Quality-Aware Detection Head |
| PANet | Path Aggregation Network |
| QA | Quality-Aware |
| SOD | Small object detection |
| CSPNet | Cross-Stage Partial Network |
| SSD | Single Shot MultiBox Detector |
| CNN | Convolutional Neural Network |
| mAP50 | Mean Average Precision at IoU 0.50 |
| mAP50-95 | Mean Average Precision from IoU 0.50 to 0.95 |
| TP | True Positive |
| FP | False Positive |
| FN | False Negative |
| Params | Parameters |
| FPS | Frames Per Second |
| SGD | Stochastic Gradient Descent |
| LR | Learning Rate |
| AMP | Automatic Mixed Precision |
| CUDA | Compute Unified Device Architecture |
References
- Aldubaikhi, A.; Patel, S. Advancements in Small-Object Detection (2023–2025): Approaches, Datasets, Benchmarks, Applications, and Practical Guidance. Appl. Sci. 2025, 15, 11882. [Google Scholar] [CrossRef]
- Lin, T.-Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C.L. Microsoft COCO: Common Objects in Context. In Proceedings of the European Conference on Computer Vision (ECCV), Zurich, Switzerland, 6–12 September 2014; pp. 740–755. [Google Scholar]
- Yuan, S.; Sun, B.; Zuo, Z.; Huang, H.; Wu, P.; Li, C.; Dang, Z.; Zhao, Z. IRSDD-YOLOv5: Focusing on the Infrared Detection of Small Drones. Drones 2023, 7, 393. [Google Scholar] [CrossRef]
- Zhang, Q.; Wang, X.; Shi, H.; Wang, K.; Tian, Y.; Xu, Z.; Zhang, Y.; Jia, G. BRA-YOLOv10: UAV Small Target Detection Based on YOLOv10. Drones 2025, 9, 159. [Google Scholar] [CrossRef]
- Zhai, X.; Huang, Z.; Li, T.; Liu, H.; Wang, S. YOLO-Drone: An Optimized YOLOv8 Network for Tiny UAV Object Detection. Electronics 2023, 12, 3664. [Google Scholar] [CrossRef]
- Zamri, F.N.M.; Gunawan, T.S.; Yusoff, S.H.; Alzahrani, A.A.; Bramantoro, A.; Kartiwi, M. Enhanced Small Drone Detection Using Optimized YOLOv8 with Attention Mechanisms. IEEE Access 2024, 12, 90629–90643. [Google Scholar] [CrossRef]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.-Y.; Berg, A.C. SSD: Single Shot MultiBox Detector. In Proceedings of the European Conference on Computer Vision (ECCV), Amsterdam, The Netherlands, 8–16 October 2016; pp. 21–37. [Google Scholar]
- Lin, T.-Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature Pyramid Networks for Object Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 2117–2125. [Google Scholar]
- Liu, S.; Qi, L.; Qin, H.; Shi, J.; Jia, J. Path Aggregation Network for Instance Segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–22 June 2018; pp. 8759–8768. [Google Scholar]
- Wang, C.-Y.; Liao, H.-Y.M.; Wu, Y.-H.; Chen, P.-Y.; Hsieh, J.-W.; Yeh, I.-H. CSPNet: A New Backbone That Can Enhance Learning Capability of CNN. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Seattle, WA, USA, 14–19 June 2020; pp. 390–391. [Google Scholar]
- Hou, Q.; Zhou, D.; Feng, J. Coordinate Attention for Efficient Mobile Network Design. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Virtual, 19–25 June 2021; pp. 13713–13722. [Google Scholar]
- Wang, A.; Chen, H.; Liu, L.; Chen, K.; Lin, Z.; Han, J.; Ding, G. YOLOv10: Real-Time End-to-End Object Detection. arXiv 2024, arXiv:2405.14458. [Google Scholar]
- Wang, C.-Y.; Bochkovskiy, A.; Liao, H.-Y.M. YOLOv7: Trainable Bag-of-Freebies Sets New State-of-the-Art for Real-Time Object Detectors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 18–22 June 2023; pp. 7464–7475. [Google Scholar]
- Sun, H.; Yang, J.; Shen, J.; Liang, D.; Li, N.-Z.; Zhou, H. TIB-Net: Drone Detection Network with Tiny Iterative Backbone. IEEE Access 2020, 8, 130697–130707. [Google Scholar] [CrossRef]
- Zheng, Y.; Chen, Z.; Lv, D.; Li, Z.; Lan, Z.; Zhao, S. Air-to-Air Visual Detection of Micro-UAVs: An Experimental Evaluation of Deep Learning. IEEE Robot. Autom. Lett. 2021, 6, 1020–1027. [Google Scholar] [CrossRef]
- Cheng, Q.; Wang, Y.; He, W.; Bai, Y. Lightweight Air-to-Air Unmanned Aerial Vehicle Target Detection Model. Sci. Rep. 2024, 14, 2609. [Google Scholar] [CrossRef] [PubMed]
- Rouhi, A.; Umare, H.; Patal, S.; Kapoor, R.; Deshpande, N.; Arezoomandan, S.; Shah, P.; Han, D.K. Long-Range Drone Detection Dataset. In Proceedings of the 2024 IEEE International Conference on Consumer Electronics (ICCE), Las Vegas, NV, USA, 6–8 January 2024; pp. 1–6. [Google Scholar] [CrossRef]
- Zhao, J.; Zhang, J.; Li, D.; Wang, D. Vision-Based Anti-UAV Detection and Tracking. IEEE Trans. Intell. Transp. Syst. 2022, 23, 25323–25334. [Google Scholar] [CrossRef]
- Foresti, G.L.; Scagnetto, I.; Tavaris, D.; Voltan, G. Thermal UAV 2UAV Dataset for Training a Counter UAV System: A Strategic Challenge in Civil and Military Domain. Strateg. Leadersh. J. 2024, 2, 59–67. [Google Scholar]
- Ultralytics. YOLOv8 Official Model Card and Detection Model Table. Hugging Face. 2026. Available online: https://huggingface.co/Ultralytics/YOLOv8 (accessed on 2 May 2026).
- Khanam, R.; Hussain, M. YOLOv11: An Overview of the Key Architectural Enhancements. arXiv 2024, arXiv:2410.17725. [Google Scholar] [CrossRef]
- Tian, Y.; Ye, Q.; Doermann, D. YOLO12: Attention-Centric Real-Time Object Detectors. arXiv 2025, arXiv:2502.12524. [Google Scholar]
- Ultralytics. YOLO26 Official Model Card and Detection Model Table. Hugging Face. 2026. Available online: https://huggingface.co/Ultralytics/YOLO26 (accessed on 2 May 2026).












| Object Size | Train Count | Train % | Test Count | Test % |
|---|---|---|---|---|
| Large, >96 × 96 | 0 | 0 | 0 | 0 |
| Medium, 32 × 32 to 96 × 96 | 603 | 15.92 | 148 | 15.60 |
| Small, <32 × 32 | 3185 | 84.08 | 801 | 84.40 |
| Dataset | Modality | Images | Key Characteristics | Reference |
|---|---|---|---|---|
| TIB-Net | RGB | 2850 | Multi-rotor and fixed-wing UAVs in low-altitude scenes | [15] |
| Det-Fly | RGB | 13,271 | DJI Mavic targets in the sky, city, field, and mountain backgrounds | [16] |
| UAVfly | RGB | 10,281 | Urban, suburban, desert, field, lake, sky, and mountain scenes | [17] |
| LRDD v1 | RGB | 21,190 | Long-range drones with weather, scale, and background-blending challenges | [18] |
| DUT Anti-UAV | RGB | 10,000 | More than 35 UAV types, complex outdoor backgrounds, and tracking sequences | [19] |
| Thermal UAV 2UAV | IR | 3856 | UAV-to-UAV thermal images captured from an onboard UAV sensor; includes four quadcopters and two hexacopters with single-UAV, two-UAV, and background images | [20] |
| Item | SIDD Experiments | External Fine-Tuning |
|---|---|---|
| Task | detect | detect |
| Mode | train | fine-tune |
| Epochs | 250 | 250 |
| Image size | 640 | 1920 |
| Batch size | 8 | 8 |
| Optimizer | SGD | SGD |
| Initial learning rate | 0.01 | 0.01 |
| Final LR factor | 0.01 | 0.01 |
| Momentum | 0.937 | 0.937 |
| Weight decay | 0.0005 | 0.0005 |
| Pretrained weights | True | True |
| Seed | 0 | 0 |
| Deterministic | True | True |
| AMP | True | True |
| Patience | 600 | 600 |
| Validation IoU | 0.7 | 0.7 |
| Max detections | 300 | 300 |
| Augmentation | hsv_h = 0.015, hsv_s = 0.7, hsv_v = 0.4, translate = 0.1, scale = 0.5, fliplr = 0.5, mosaic = 1.0, mixup = 0.0, auto_augment = randaugment, erasing = 0.4 | hsv_h = 0.015, hsv_s = 0.7, hsv_v = 0.4, translate = 0.1, scale = 0.5, fliplr = 0.5, mosaic = 1.0, mixup = 0.0, auto_augment = randaugment, erasing = 0.4 |
| Model | Precision | Recall | mAP50-95 | F1-Score | Params (M) | GFLOPs |
|---|---|---|---|---|---|---|
| YOLOv5 baseline | 0.972 | 0.864 | 0.679 | 0.915 | 5.27 | 7.7 |
| YOLOv5 + P2 | 0.983 | 0.925 | 0.684 | 0.954 | 5.4 | 19.5 |
| YOLOv5 + P2 + C2fAttn | 0.989 | 0.931 | 0.695 | 0.959 | 8.09 | 55.6 |
| Model | Precision | Recall | mAP50-95 | F1-Score | Params (M) | GFLOPs |
|---|---|---|---|---|---|---|
| Architecture I model | 0.989 | 0.931 | 0.695 | 0.959 | 8.09 | 55.6 |
| Architecture I+ CAR-C2f | 0.986 | 0.935 | 0.697 | 0.959 | 7.95 | 50.3 |
| Architecture I + CAR-C2f + P2-QADH | 0.988 | 0.939 | 0.699 | 0.962 | 6.45 | 31.3 |
| Model | Precision | Recall | mAP50-95 | F1-Score | Params (M) | GFLOPs | FPS |
|---|---|---|---|---|---|---|---|
| S-Drone-YOLO (Architecture II) | 0.988 | 0.939 | 0.699 | 0.962 | 6.45 | 31.3 | 232 |
| YOLOv8s | 0.982 | 0.871 | 0.696 | 0.923 | 11.1 | 28.4 | 476 |
| YOLO11s | 0.985 | 0.872 | 0.687 | 0.925 | 9.4 | 21.3 | 360 |
| YOLO12s | 0.979 | 0.859 | 0.683 | 0.915 | 9.2 | 21.2 | 237 |
| YOLO26s | 0.988 | 0.918 | 0.659 | 0.952 | 9.4 | 20.5 | 245 |
| Object Size | Test Count | Recall | Precision |
|---|---|---|---|
| Large, >96 × 96 | 0 | - | - |
| Medium, 32 × 32 to 96 × 96 | 148 | 0.993 | 0.986 |
| Small, <32 × 32 | 801 | 0.936 | 0.965 |
| Dataset | Modality | Images | Precision | Recall | mAP50-95 | F1-Score |
|---|---|---|---|---|---|---|
| TIB-Net | RGB | 2850 | 0.923 | 0.960 | 0.431 | 0.941 |
| UAVfly | RGB | 10,281 | 1.000 | 0.999 | 0.897 | 0.999 |
| LRDD v1 | RGB | 21,190 | 0.980 | 0.957 | 0.754 | 0.968 |
| DUT Anti-UAV | RGB | 10,000 | 0.974 | 0.925 | 0.710 | 0.948 |
| Det-Fly | RGB | 13,271 | 0.985 | 0.970 | 0.766 | 0.977 |
| ThermalUAV2UAV | IR | 3856 | 0.934 | 0.970 | 0.843 | 0.951 |
| Background | Precision | Recall | mAP50-95 | F1-Score |
|---|---|---|---|---|
| Sky | 1.000 | 1.000 | 0.837 | 1.000 |
| Sea | 0.992 | 0.993 | 0.615 | 0.992 |
| City | 0.995 | 0.963 | 0.797 | 0.979 |
| Mountain | 0.977 | 0.886 | 0.611 | 0.929 |
| Measure | Count or Value |
|---|---|
| True positives (TP) | 897 |
| False positives (FP) | 29 |
| False negatives (FN) | 52 |
| Precision from the matrix | 0.969 |
| Recall from the matrix | 0.945 |
| F1-score from matrix | 0.957 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Aldubaikhi, A.; Patel, S. S-Drone-YOLO: A Parameter-Efficient P2-Guided Quality-Aware YOLO Detector for Infrared Small UAV Detection. Appl. Sci. 2026, 16, 5854. https://doi.org/10.3390/app16125854
Aldubaikhi A, Patel S. S-Drone-YOLO: A Parameter-Efficient P2-Guided Quality-Aware YOLO Detector for Infrared Small UAV Detection. Applied Sciences. 2026; 16(12):5854. https://doi.org/10.3390/app16125854
Chicago/Turabian StyleAldubaikhi, Ali, and Sarosh Patel. 2026. "S-Drone-YOLO: A Parameter-Efficient P2-Guided Quality-Aware YOLO Detector for Infrared Small UAV Detection" Applied Sciences 16, no. 12: 5854. https://doi.org/10.3390/app16125854
APA StyleAldubaikhi, A., & Patel, S. (2026). S-Drone-YOLO: A Parameter-Efficient P2-Guided Quality-Aware YOLO Detector for Infrared Small UAV Detection. Applied Sciences, 16(12), 5854. https://doi.org/10.3390/app16125854

