A Novel Object Detection-Based Air-to-Ground Target Search and Localization Strategy
Highlights
- The RepViT-enhanced detection model (RepViT-M1.5-YOLOv4-CBAM) achieves an mAP_0.5 of 98.58% at 18.70 FPS on a custom emergency rescue dataset, improving real-time detection speed by 2.0 FPS over the standard YOLOv4 baseline, while the CBAM attention module enhances focus on discriminative target regions in complex backgrounds and occluded scenarios.
- A tiered search localization strategy is proposed that assigns rescue personnel, supply boxes, and vehicles to priority tiers and applies the nearest-neighbor principle to plan the UAV search route; four diverse simulation scenarios confirm its correctness and robustness.
- The lightweight and accurate RepViT-enhanced model enables UAVs to perform reliable real-time ground target detection on resource-constrained onboard microcomputers (22.76 M parameters, 22.46 G FLOPs), making it well-suited for deployment in time-critical applications such as emergency rescue and material delivery.
- The tiered search strategy provides a systematic and priority-aware framework for autonomous UAV operations in complex environments, offering practical guidance for future multi-target search and localization missions while identifying dynamic re-routing as a key direction for future work.
Abstract
1. Introduction
- A RepViT-enhanced detection framework is proposed that replaces the standard CSPDarknet53 backbone of YOLOv4 with the lightweight Re-Parameterization Vision Transformer (RepViT-M1.5), while retaining the SPP and PANet feature fusion modules. This design delivers competitive detection accuracy (mAP of 98.58% on the custom dataset) while maintaining edge-deployable inference speed (18.70 FPS on an onboard UAV microcomputer), outperforming MobileNet-series backbones on the same hardware.
- The Convolutional Block Attention Module (CBAM) is inserted at the junction between the backbone and neck to enhance spatial and channel-wise feature weighting, improving the model’s ability to focus on small and partially occluded ground targets. A custom data augmentation pipeline—combining mosaic augmentation, Gaussian noise injection, and rotation—is also developed to improve robustness under varying illumination and occlusion conditions.
- A tiered target search strategy is devised that prioritizes rescue personnel (Tier 1) over supply boxes (Tier 2) and vehicles (Tier 3), using a nearest-neighbor traversal within each tier. Simulation results confirm the feasibility of this approach for systematic UAV-based search and localization in emergency rescue scenarios.
2. Materials and Methods
2.1. RepViT-Enhanced Algorithm
2.2. CBAM Module
2.3. Target Detection-Based Search Localization Strategy
- The state with was initialized, and the starting point from the unvisited set Q was removed and added to the path.
- The distances from the starting point to all nodes in the first tier were calculated, and the node with the shortest distance was determined. Then, this node was removed from the unvisited set Q and added to the path, updating the distances to the neighboring nodes in the first tier.
- After searching the first tier, the last node visited was taken as the new starting point, and step 2 was repeated to search the second tier, updating the unvisited set Q and the path.
- After searching the second tier, the last node visited was taken as the new starting point, and step 2 was repeated to search the third tier, updating the unvisited set Q and the path.
- When all points had been visited, the algorithm ended. The final planned path was stored in the path queue.
3. Results and Discussion
3.1. Dataset for Specific Scenarios and Specific Targets
3.2. Ablation Experiments on the Self-Built Dataset
3.3. Ablation Experiments of Target Detection Algorithms on Public Datasets
3.4. Deep Search Based on Recognition Results
3.5. Discussion
4. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Kurniadi, F.A.; Setianingsih, C.; Syaputra, R.E. Innovation in Livestock Surveillance: Applying the YOLO Algorithm to UAV Imagery and Videography. In Proceedings of the International Conference on Smart Instrumentation, Measurement and Applications (ICSIMA), Kuala Lumpur, Malaysia, 17–18 October 2023; pp. 246–251. [Google Scholar]
- Zheng, L.; Ai, P.; Wu, Y. Building Recognition of UAV Remote Sensing Images by Deep Learning. In Proceedings of the IGARSS 2020—2020 IEEE International Geoscience and Remote Sensing Symposium, Waikoloa, HI, USA, 26 September–2 October 2020; pp. 1185–1188. [Google Scholar]
- Likitvisetpong, M.; Erjongmanee, S.; Suwanagood, E.; Klumpol, C.; Teerataphong, P. System Development for Estimating Geolocation, Direction, and Velocity of Moving Objects in UAV Applications Using Monocular Camera. In Proceedings of the International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology (ECTI-CON), Khon Kaen, Thailand, 27–30 May 2024; pp. 1–6. [Google Scholar]
- Mokayed, H.; Nayebiastaneh, A.; Alkhaled, L.; Sozos, S.; Hagner, O.; Backe, B. Challenging YOLO and Faster RCNN in Snowy Conditions: UAV Nordic Vehicle Dataset (NVD) as an Example. In Proceedings of the 2024 2nd International Conference on Unmanned Vehicle Systems-Oman (UVS), Muscat, Oman, 12–14 February 2024; pp. 1–6. [Google Scholar]
- Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 37, 1904–1916. [Google Scholar] [CrossRef] [PubMed]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef] [PubMed]
- Sarda, A.; Dixit, S.; Bhan, A. Object Detection for Autonomous Driving using YOLO [You Only Look Once] algorithm. In Proceedings of the International Conference on Intelligent Communication Technologies and Virtual Mobile Networks (ICICV), Tirunelveli, India, 4–6 February 2021; pp. 1370–1374. [Google Scholar]
- Wang, C.-Y.; Bochkovskiy, A.; Liao, H.-Y.M. Scaled-YOLOv4: Scaling Cross Stage Partial Network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 13024–13033. [Google Scholar]
- Wen, Q.; Wang, S.; Li, L.; Li, X.; Liang, Z.; Guo, W.; Guo, X.; Tang, Q.; He, C. Technical Requirements for Autonomous Point Cloud Collection and Autonomous Inspection of Unmanned Aerial Vehicle. In Proceedings of the 2021 IEEE 5th Conference on Energy Internet and Energy System Integration (EI2), Taiyuan, China, 22–24 October 2021; pp. 3421–3424. [Google Scholar]
- Zheng, D.; Chen, C. Research on Object Detection Algorithm Based on Deep Learning. In Proceedings of the International Conference on Electronic Communication and Artificial Intelligence (ICECAI), Shanghai, China, 4–5 July 2024; pp. 725–728. [Google Scholar]
- Wang, Q.; Sheng, J.; Tong, C.; Wang, Z.; Song, T.; Wang, M.; Wang, T. A Fast Facet-Based SAR Imaging Model and Target Detection Based on YOLOv5 with CBAM and Another Detection Head. Electronics 2023, 12, 4039. [Google Scholar] [CrossRef]
- Jiang, S.; Weng, X. Multimodal Hub-Spoke Emergency Logistics Network Design. In Proceedings of the International Conference on Service Systems and Service Management (ICSSSM), Guangzhou, China, 22–24 June 2015; pp. 1–4. [Google Scholar]
- Jin, W.; Yang, J.; Fang, Y.; Feng, W. Research on Application and Deployment of UAV in Emergency Response. In Proceedings of the International Conference on Electronics Information and Emergency Communication (ICEIEC), Beijing, China, 17–19 July 2020; pp. 277–280. [Google Scholar]
- Wang, A.; Chen, H.; Lin, Z.; Han, J.; Ding, G. RepViT: Revisiting Mobile CNN from ViT Perspective. In Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 16–22 June 2024; pp. 15909–15920. [Google Scholar]
- Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Wey, T.; Andreetto, M.; Adam, H. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. arXiv 2017, arXiv:1704.04861. [Google Scholar] [CrossRef]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An Image is Worth 16 × 16 Words: Transformers for Image Recognition at Scale. arXiv 2021, arXiv:2010.11929. [Google Scholar]
- Liu, S.; Qi, L.; Qin, H.; Shi, J.; Jia, J. Path Aggregation Network for Instance Segmentation. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 8759–8768. [Google Scholar]
- Yang, Y.; Yu, J.; Fu, Z.; Zhang, K.; Yu, T.; Wang, X.; Jiang, H.; Lv, J.; Huang, Q.; Han, W. Token-Mixer: Bind Image and Text in One Embedding Space for Medical Image Reporting. IEEE Trans. Med. Imaging 2024, 43, 4017–4028. [Google Scholar] [CrossRef] [PubMed]
- Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. CBAM: Convolutional Block Attention Module. arXiv 2018, arXiv:1807.06521. [Google Scholar] [CrossRef]
- Shi, Y.; Hidaka, A. Attention-YOLOX: Improvement in On-Road Object Detection by Introducing Attention Mechanisms to YOLOX. In Proceedings of the International Symposium on Computing and Artificial Intelligence (ISCAI), Beijing, China, 16–18 December 2022; pp. 5–14. [Google Scholar]
- Zhen, M. Small Dataset in the Field of Emergency Rescue [Dataset]. 2025. Available online: https://huggingface.co/datasets/zhenmi/self_dataset/tree/main (accessed on 1 January 2025).
- Hua, W.; Chen, Q.; Chen, W. A New Lightweight Network for Efficient UAV Object Detection. Sci. Rep. 2024, 14, 13288. [Google Scholar] [CrossRef]
- Everingham, M.; Van Gool, L.; Williams, C.K.I.; Winn, J.; Zisserman, A. The PASCAL Visual Object Classes (VOC) Challenge. Int. J. Comput. Vis. 2010, 88, 303–338. [Google Scholar] [CrossRef]
- Xiao, Y.; Di, N. SOD-YOLO: A Lightweight Small Object Detection Framework. Sci. Rep. 2024, 14, 25624. [Google Scholar] [CrossRef] [PubMed]













| Model | fireman_AP (%) | redbox_AP (%) | car_AP (%) | mAP_0.5 (%) | FPS |
|---|---|---|---|---|---|
| RepViT-M1.5-YOLOv4-CBAM | 98.74 | 97.96 | 99.03 | 98.58 | 18.70 |
| RepViT-M1.5-YOLOv4 | 97.16 | 95.06 | 99.17 | 97.13 | 18.91 |
| YOLOv4 | 98.06 | 98.79 | 99.69 | 98.85 | 16.70 |
| MobilenetV3-YOLOv4 | 97.35 | 94.71 | 98.48 | 96.85 | 21.29 |
| MobilenetV2-YOLOv4 | 96.31 | 92.16 | 99.88 | 96.26 | 18.31 |
| MobilenetV1-YOLOv4 | 97.34 | 92.25 | 99.16 | 96.25 | 19.45 |
| YOLOv5s | 99.56 | 99.24 | 99.92 | 99.58 | 17.65 |
| Model | mAP_0.5 (%) | #Params | FLOPs |
|---|---|---|---|
| RepViT-M1.5-YOLOv4 | 89.99 | 22.76 M | 22.46 G |
| RepViT-M1.5-YOLOv4-CBAM | 89.28 | 22.84 M | 22.46 G |
| YOLOv4 | 92.25 | 64.36 M | 60.53 G |
| MobilenetV3-YOLOv4 | 79.01 | 11.73 M | 7.70 G |
| MobilenetV2-YOLOv4 | 80.12 | 10.80 M | 8.29 G |
| MobilenetV1-YOLOv4 | 79.72 | 12.69 M | 10.65 G |
| YOLOv5s | 86.20 | 7.277 M | 17.16 G |
| Scenario | Tier 1/2/3 Targets | Spatial Distribution | Priority Enforced | Path Deviation |
|---|---|---|---|---|
| Balanced (Figure 13) | 3/3/3 | Uniform random | Yes | <10% |
| Priority-skewed | 5/2/1 | Uniform random | Yes | <12% |
| Clustered | 3/3/3 | Tier-clustered | Yes | <8% |
| Sparse wide-area | 2/2/2 | Dispersed edges | Yes | <15% |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Li, H.; Zhang, Q.; Zhen, M. A Novel Object Detection-Based Air-to-Ground Target Search and Localization Strategy. Drones 2026, 10, 375. https://doi.org/10.3390/drones10050375
Li H, Zhang Q, Zhen M. A Novel Object Detection-Based Air-to-Ground Target Search and Localization Strategy. Drones. 2026; 10(5):375. https://doi.org/10.3390/drones10050375
Chicago/Turabian StyleLi, Haoran, Qinling Zhang, and Mi Zhen. 2026. "A Novel Object Detection-Based Air-to-Ground Target Search and Localization Strategy" Drones 10, no. 5: 375. https://doi.org/10.3390/drones10050375
APA StyleLi, H., Zhang, Q., & Zhen, M. (2026). A Novel Object Detection-Based Air-to-Ground Target Search and Localization Strategy. Drones, 10(5), 375. https://doi.org/10.3390/drones10050375

