ECP-YOLO: Integrating Edge-Aware Attention and Contextual Refinement for UAV Object Detection
Abstract
1. Introduction
- (1)
- A direction-selective sparse sampling mechanism (PConv [3]) that preserves target silhouettes against linear background clutter while reducing parameter redundancy.
- (2)
- An Edge-Aware Attention Fusion Module (EAFM) that integrates deterministic Sobel operators with learnable multi-scale attention to explicitly reinforce structural boundary cues.
- (3)
- A spatially gated context refinement block (CRB) that suppresses background noise through lightweight global context aggregation.
- (4)
- A Progressive Inter-scale Feature Fusion strategy (PISF) that cascades shallow spatial details into deep semantic layers, enforcing cross-scale consistency.
- (5)
- A high-resolution P2 detection head for micro-scale target localization.
2. Related Work
3. Methods
3.1. Structure of ECP-YOLO
3.2. EAFM
3.3. CRB Module
3.4. Pinwheel Convolution
3.5. Progressive Inter-Scale Feature Fusion Strategy
3.6. Sobel Operator-Based Edge Enhancement
4. Experiments
4.1. Datasets
4.2. Experimental Environment and Parameter Setup
4.3. Evaluation Metrics
4.4. Ablation Experiments
4.4.1. Ablation Experiments of Different Improved Modules
4.4.2. Sub-Component Ablation Within the EAFM
4.4.3. Experimental Results
4.5. Model Comparison
4.6. Performance Analysis of ECP-YOLO Across Different Scenarios
4.7. Generalization Experiment
5. Results and Visual Analysis
5.1. Result Visualization
5.2. Synergistic Analysis of Module Interactions
5.3. Heatmap Analysis
5.4. Grayscale Analysis
5.5. Cross-Domain Experiment
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention Is All You Need. Adv. Neural Inf. Process. Syst. 2017, 30, 5998–6008. [Google Scholar]
- Tian, Y.; Ye, Q.; Doermann, D. YOLOv12: Attention-Centric Real-Time Object Detectors. arXiv 2025, arXiv:2502.12524. [Google Scholar]
- Yang, J.; Liu, S.; Wu, J.; Su, X.; Hai, N.; Huang, X. Pinwheel-Shaped Convolution and Scale-Based Dynamic Loss for Infrared Small Target Detection. In Proceedings of the AAAI Conference on Artificial Intelligence, Philadelphia, PA, USA, 25 February–4 March 2025; AAAI: Washinton, DC, USA, 2025; Volume 39, pp. 9202–9210. [Google Scholar]
- Wang, A.; Chen, H.; Liu, L.; Chen, K.; Lin, Z.; Han, J.; Ding, G. YOLOv10: Real-Time End-to-End Object Detection. In Proceedings of the Advances in Neural Information Processing Systems, Vancouver, BC, Canada, 10–15 December 2024; Neural Information Processing Systems Foundation, Inc. (NeurIPS): San Diego, CA, USA, 2024; Volume 37, pp. 107984–108011. [Google Scholar]
- Khanam, R.; Hussain, M. YOLOv11: An Overview of the Key Architectural Enhancements. arXiv 2024, arXiv:2410.17725. [Google Scholar] [CrossRef]
- Zhao, Y.; Lv, W.; Xu, S.; Wei, J.; Wang, G.; Dang, Q.; Liu, Y.; Chen, J. DETRs Beat YOLOs on Real-Time Object Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 17–21 June 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 16965–16974. [Google Scholar]
- Lv, W.; Zhao, Y.; Chang, Q.; Huang, K.; Wang, G.; Liu, Y. RT-DETRv2: Improved Baseline with Bag-of-Freebies for Real-Time Detection Transformer. arXiv 2024, arXiv:2407.17140. [Google Scholar]
- Gu, A.; Dao, T. Mamba: Linear-Time Sequence Modeling with Selective State Spaces. arXiv 2023, arXiv:2312.00752. [Google Scholar] [CrossRef]
- Du, D.; Zhu, P.; Wen, L.; Bian, X.; Ling, H.; Hu, Q.; Zheng, J.; Peng, T.; Wang, X.; Zhang, Y.; et al. VisDrone-SOT2019: The Vision Meets Drone Single Object Tracking Challenge Results. In Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops (ICCVW), Seoul, Republic of Korea, 27–28 October 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 1–8. [Google Scholar]
- Zhang, H.; Xiao, P.; Yao, F.; Zhang, Q.; Gong, Y. Fusion of Multi-Scale Attention for Aerial Images Small-Target Detection Model Based on PARE-YOLO. Sci. Rep. 2025, 15, 4753. [Google Scholar] [CrossRef]
- Li, S.; Chen, C. MFA-YOLO: A Multi-Feature Aggregation Approach for Small-Object Detection Method in Drone Imagery. Sci. Rep. 2026, 16, 2484. [Google Scholar] [CrossRef] [PubMed]
- Chao, M.; Peng, C.; Yun, L.; Zhang, C.; Wang, H.; Chen, Z. A Lightweight Small Object Detection Model for UAV Images Based on Deep Semantic Integration. Sci. Rep. 2025, 15, 31888. [Google Scholar] [CrossRef]
- Fan, Q.; Li, Y.; Deveci, M.; Zhong, K.; Kadry, S. LUD-YOLO: A Novel Lightweight Object Detection Network for Unmanned Aerial Vehicle. Inf. Sci. 2025, 686, 121366. [Google Scholar] [CrossRef]
- Zhou, S.; Zhou, H.; Qian, L. A Multi-Scale Small Object Detection Algorithm SMA-YOLO for UAV Remote Sensing Images. Sci. Rep. 2025, 15, 9255. [Google Scholar] [CrossRef] [PubMed]
- Lin, T.-Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature Pyramid Networks for Object Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 2117–2125. [Google Scholar]
- Zhu, L.; Wang, X.; Ke, Z.; Zhang, W.; Lau, R. BiFormer: Vision Transformer with Bi-Level Routing Attention. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 18–22 June 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 10323–10333. [Google Scholar]
- Yuan, D.; Chang, X.; Li, Z.; He, Z. Learning Adaptive Spatial-Temporal Context-Aware Correlation Filters for UAV Tracking. ACM Trans. Multimed. Comput. Commun. Appl. 2022, 18, 70. [Google Scholar] [CrossRef]
- Liu, S.; Qi, L.; Qin, H.; Shi, J.; Jia, J. Path Aggregation Network for Instance Segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–22 June 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 8759–8768. [Google Scholar]
- Tan, M.; Pang, R.; Le, Q.V. EfficientDet: Scalable and Efficient Object Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 14–19 June 2020; IEEE: Piscataway, NJ, USA, 2022; pp. 10781–10790. [Google Scholar]
- Wang, C.; He, W.; Nie, Y.; Guo, J.; Liu, C.; Han, K.; Wang, Y. Gold-YOLO: Efficient Object Detector via Gather-and-Distribute Mechanism. In Proceedings of the Advances in Neural Information Processing Systems, New Orleans, LA, USA, 10–16 December 2023; Neural Information Processing Systems Foundation, Inc. (NeurIPS): San Diego, CA, USA, 2023; Volume 36, pp. 51094–51112. [Google Scholar]
- Woo, S.; Park, J.; Lee, J.-Y.; Kweon, I.S. CBAM: Convolutional Block Attention Module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; Springer: Berlin/Heidelberg, Germany, 2018; pp. 3–19. [Google Scholar]
- Shao, Y. Local-Global Attention: An Adaptive Mechanism for Multi-Scale Feature Integration. arXiv 2024, arXiv:2411.09604. [Google Scholar]
- Du, D.; Qi, Y.; Yu, H.; Yang, Y.; Duan, K.; Li, G.; Zhang, W.; Huang, Q.; Tian, Q. The Unmanned Aerial Vehicle Benchmark: Object Detection and Tracking. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; Springer: Berlin/Heidelberg, Germany, 2018; pp. 370–386. [Google Scholar]
- Bochkovskiy, A.; Wang, C.-Y.; Liao, H.-Y.M. YOLOv4: Optimal Speed and Accuracy of Object Detection. arXiv 2020, arXiv:2004.10934. [Google Scholar] [CrossRef]
- Cubuk, E.D.; Zoph, B.; Shlens, J.; Le, Q.V. RandAugment: Practical Automated Data Augmentation with a Reduced Search Space. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Seattle, WA, USA, 14–19 June 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 702–703. [Google Scholar]
- Lin, T.-Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C.L. Microsoft COCO: Common Objects in Context. In Proceedings of the European Conference on Computer Vision (ECCV), Zurich, Switzerland, 6–12 September 2014; Springer: Berlin/Heidelberg, Germany, 2014; pp. 740–755. [Google Scholar]
- Li, W.; Li, A.; Kong, X.; Zhang, Y.; Li, Z. MF-YOLO: Multimodal Fusion for Remote Sensing Object Detection Based on YOLOv5s. In Proceedings of the 2024 27th International Conference on Computer Supported Cooperative Work in Design (CSCWD), Tianjin, China, 8–10 May 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 897–903. [Google Scholar]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.E.; Fu, C.-Y.; Berg, A.C. SSD: Single Shot MultiBox Detector. In Proceedings of the European Conference on Computer Vision (ECCV), Amsterdam, The Netherlands, 8–16 October 2016; Springer: Berlin/Heidelberg, Germany, 2016; pp. 21–37. [Google Scholar]
- Mittal, P.; Sharma, A.; Singh, R.; Dhull, V. Dilated Convolution Based RCNN Using Feature Fusion for Low-Altitude Aerial Objects. Expert Syst. Appl. 2022, 199, 117106. [Google Scholar] [CrossRef]
- Zhang, G.; Peng, Y.; Li, J. YOLO-MARS: An Enhanced YOLOv8n for Small Object Detection in UAV Aerial Imagery. Sensors 2025, 25, 2534. [Google Scholar] [CrossRef] [PubMed]
















| Parameter | Setup |
|---|---|
| OS | Windows 11 |
| CPU | AMD Ryzen 9 7945 HX |
| GPU | RTX 3090 (24 GB) |
| Memory | DDR5 (32 GB) |
| Python | 3.9.21 |
| CUDA | 11.8 |
| Pytorch | 2.3.1 |
| Methods | P2H | EAFM | PConv | CRB | PISF | mAP@0.5 | mAP@0.5:0.95 | P | R | Params (M) | GFLOPS | FPS |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| YOLOv12s | — | — | — | — | — | 31.8 | 18.6 | 45.1 | 33.4 | 9.23 | 21.2 | 180 |
| A | ✔ | — | — | — | — | 34.0 | 19.8 | 45.7 | 35.4 | 9.58 | 28.8 | 131 |
| B | ✔ | ✔ | — | — | — | 35.9 | 21.0 | 47.4 | 36.8 | 9.90 | 45.6 | 126 |
| C | ✔ | — | ✔ | — | — | 34.7 | 20.5 | 46.0 | 35.8 | 9.16 | 32.6 | 128 |
| D | ✔ | — | — | ✔ | — | 34.4 | 20.0 | 45.9 | 35.7 | 9.58 | 28.9 | 124 |
| E | ✔ | — | — | — | ✔ | 34.6 | 20.1 | 45.5 | 36.3 | 10.2 | 35.1 | 112 |
| F | ✔ | ✔ | ✔ | — | — | 36.8 | 21.7 | 48.6 | 37.7 | 9.48 | 49.4 | 121 |
| G | ✔ | ✔ | ✔ | ✔ | — | 37.3 | 21.9 | 48.8 | 38.3 | 9.48 | 49.5 | 112 |
| H | ✔ | ✔ | ✔ | ✔ | ✔ | 38.1 | 22.1 | 48.6 | 39.5 | 11.8 | 55.7 | 79 |
| Configuration | Sobel | LGA | CBAM | SPP-Lite | mAP@0.5 |
|---|---|---|---|---|---|
| Full EAFM | ✔ | ✔ | ✔ | ✔ | 35.9 |
| w/o Sobel | ✗ | ✔ | ✔ | ✔ | 35.2 |
| w/o LGA | ✔ | ✗ | ✔ | ✔ | 34.8 |
| w/o CBAM | ✔ | ✔ | ✗ | ✔ | 35.4 |
| w/o SPP-Lite | ✔ | ✔ | ✔ | ✗ | 35.1 |
| w/o EAFM | ✗ | ✗ | ✗ | ✗ | 34.0 |
| Models | mAP@0.5 | Pedestrian | People | Bicycle | Car | Van | Truck | Tricycle | Awn-Tri | Bus | Motor |
|---|---|---|---|---|---|---|---|---|---|---|---|
| YOLOv12s | 31.8 | 27.4 | 13.9 | 9.22 | 71.8 | 37.3 | 39 | 16.7 | 18.5 | 56.2 | 28.5 |
| Ours | 38.1 | 36.8 | 24 | 14.5 | 78.3 | 43.3 | 42.2 | 22.5 | 21.8 | 59.9 | 37.4 |
| Improve | 6.3 | 9.4 | 10.1 | 5.28 | 6.5 | 6 | 3.2 | 5.8 | 3.3 | 3.7 | 8.9 |
| Models | mAP@0.5 | mAP@0.5:0.95 | P | R | Param(M) | GFLOPS |
|---|---|---|---|---|---|---|
| SSD [28] | 24.1 | 10.7 | 21.3 | 35.4 | 13.3 | 22.8 |
| YOLOv8s | 32.0 | 18.1 | 43.5 | 34.3 | 11.2 | 28.7 |
| YOLOv8m | 35.1 | 20.2 | 47.5 | 36.9 | 25.9 | 79.3 |
| YOLOv10n | 33.9 | 19.3 | 45.0 | 34.3 | 2.69 | 8.2 |
| YOLOv10s | 31.6 | 18.0 | 43.5 | 33.6 | 8.0 | 24.8 |
| YOLOv10m | 33.9 | 19.5 | 46.5 | 36.0 | 16.0 | 64 |
| YOLOv11s | 32.3 | 18.2 | 44.4 | 34.5 | 9.4 | 23.5 |
| RT-DETR-R18 | 36.2 | 20.7 | 41.4 | 32.4 | 20.1 | 57 |
| DCRFF [29] | 35.0 | 23.4 | — | — | — | — |
| MF-YOLO | 34.8 | 21.1 | — | — | 9.0 | — |
| RetinaNet | 28.7 | 13.1 | — | — | 19.8 | — |
| MFA-YOLO | 36.0 | 20.7 | — | — | 7.5 | — |
| BPD-YOLO | 35.5 | 20.8 | — | — | 3.9 | 18.2 |
| YOLO-MARS | 40.9 | 23.4 | — | — | — | — |
| Ours | 38.1 | 22.1 | 48.6 | 39.5 | 11.8 | 55.7 |
| Scenarios | Model | P | R | mAP@0.5 | Pedestrian | People | Bicycle | Car | Van | Truck | Tricycle | Aw-Tri | Bus | Motor |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Daylight | YOLOv12 | 47.7 | 35.5 | 35.8 | 39.3 | 23.3 | 7.11 | 77.1 | 46.9 | 51.1 | 16.2 | 18.6 | 46.4 | 31.7 |
| Ours | 49.7 | 41.4 | 42 | 50.7 | 37.7 | 11.7 | 82.6 | 53.7 | 54 | 21.7 | 20.9 | 47 | 39.8 | |
| Night | YOLOv12 | 37.4 | 36 | 31.4 | 14 | 7.37 | 1.7 | 57.7 | 27.3 | 36.5 | 14.2 | 99.5 | 44.1 | 12 |
| Ours | 38.8 | 41 | 37.6 | 27.7 | 11.9 | 1.9 | 68.7 | 37.7 | 39.7 | 18.1 | 99.5 | 50.3 | 24.8 | |
| Dense | YOLOv12 | 45.8 | 35.3 | 34.3 | 33.5 | 17.6 | 9.98 | 76.8 | 44 | 28.9 | 17.1 | 20.3 | 63.4 | 31 |
| Ours | 50.5 | 41.8 | 41.3 | 43.5 | 30.9 | 14.5 | 83.6 | 51.8 | 32.8 | 24.3 | 23.6 | 66.7 | 41.5 | |
| Blur | YOLOv12 | 33.1 | 16.2 | 15.8 | 10.3 | 9.08 | 1.68 | 51.1 | 21.6 | 16 | 4.59 | 7.02 | 25.1 | 11.2 |
| Ours | 38.1 | 15.4 | 16.8 | 12.2 | 12.7 | 2.23 | 54.1 | 24.8 | 16.3 | 4.36 | 7.67 | 22.3 | 11.7 | |
| Occlusion | YOLOv12 | 49 | 41.8 | 40.3 | 43.6 | 31.6 | 13.1 | 82 | 42.2 | 54.4 | 25.2 | 21.8 | 61.5 | 27.4 |
| Ours | 51.9 | 49 | 48 | 56.5 | 44.7 | 19.4 | 87.3 | 48.1 | 59.8 | 34.6 | 22.7 | 69.2 | 37.3 |
| Models | P | R | mAP@0.5 | Car | Truck | Bus | Param (M) | GFlop |
|---|---|---|---|---|---|---|---|---|
| YOLOv12 | 27 | 38.5 | 28.7 | 67.2 | 2.87 | 16 | 9.23 | 21.2 |
| Ours | 34.1 | 34.3 | 30.4 | 67.1 | 4.21 | 20.1 | 11.8 | 55.7 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Wang, Q.; Cang, M.; Chen, Y. ECP-YOLO: Integrating Edge-Aware Attention and Contextual Refinement for UAV Object Detection. Electronics 2026, 15, 2067. https://doi.org/10.3390/electronics15102067
Wang Q, Cang M, Chen Y. ECP-YOLO: Integrating Edge-Aware Attention and Contextual Refinement for UAV Object Detection. Electronics. 2026; 15(10):2067. https://doi.org/10.3390/electronics15102067
Chicago/Turabian StyleWang, Qi, Mingming Cang, and Yongji Chen. 2026. "ECP-YOLO: Integrating Edge-Aware Attention and Contextual Refinement for UAV Object Detection" Electronics 15, no. 10: 2067. https://doi.org/10.3390/electronics15102067
APA StyleWang, Q., Cang, M., & Chen, Y. (2026). ECP-YOLO: Integrating Edge-Aware Attention and Contextual Refinement for UAV Object Detection. Electronics, 15(10), 2067. https://doi.org/10.3390/electronics15102067

