A Remote Sensing Image Object Detection Model Based on Improved YOLOv11
Abstract
1. Introduction
- (1)
- The trade-off between high resolution and computational efficiency: Remote sensing images with excessively high resolution can result in a significant loss of detailed information when inputted at a smaller resolution. However, using high-resolution images directly leads to excessive hardware resource consumption. Moreover, cropping the image may result in incomplete detection of objects at the cropped boundaries, making the image slicing process a challenging problem to address.
- (2)
- Interference from complex backgrounds and inter-class similarity: In remote sensing scenes, the material reflectance characteristics of objects and backgrounds can be similar (e.g., camouflage vehicles in deserts blending with the sand). Additionally, objects of the same class may exhibit polymorphism due to viewpoint differences (e.g., flat-roofed houses and peaked-roofed houses both being classified as “buildings”). This leads to overlapping feature spaces, interfering with the model’s capability for learning the distinguishing features of the detected objects, reducing the effectiveness of object feature extraction and raising the complexity of detecting objects.
- (3)
- Arbitrary object orientation and dense arrangements: The overhead perspective of remote sensing images causes objects (such as port containers or tilted aircraft) to appear in arbitrary orientations. Horizontal bounding boxes tend to include a significant amount of background noise, while densely packed objects (such as small boats in ports) are prone to missed detections.
- (4)
- Significant variations in object scale: In remote sensing images, object sizes span multiple orders of magnitude (e.g., 10 m-class oil tankers versus 1 m-class cars). A single architecture typically struggles to capture features uniformly across different object sizes, such as the reliance of large objects on a large receptive field and global information, and the need for small objects to focus on local details.
- (5)
- Long-tailed data distribution: In satellite remote sensing object detection datasets, commonly occurring targets dominate the data, whereas small or infrequent objects are underrepresented, resulting in a distinctly long-tailed distribution. This imbalance hampers the generalization capability of deep learning models, which tend to perform well on frequent categories but poorly on rare or small targets. Consequently, the detection accuracy for small objects is notably affected.
2. Related Work
2.1. Application and Development of Deep Learning in Remote Sensing Object Detection
2.2. Reasons for Choosing YOLOv11
3. Materials and Methods
3.1. Architecture of the Proposed Object Detection Model
3.2. Frequency Domain—Space Feature Extraction Fusion Module
3.2.1. WaveleteConv
3.2.2. Freq-SpaFEFM
3.3. Deformable Attention Global Feature and Local Feature Fusion Module (DAGLF)
3.3.1. Deformable Attention Module
3.3.2. Deformable Attention Global Feature and Local Feature Fusion Module
3.4. Adaptive Threshold Focal Loss
4. Results
4.1. Description of the Experimental Environment and Dataset
4.1.1. Experiment Environment Settings
4.1.2. Evaluate Metrics
4.1.3. Experimental Datasets
- The DOTAv1 dataset [32] is a large-scale remote sensing dataset specifically designed for aerial image object detection, released by the Wuhan University team in 2018. It contains 2806 aerial images, covering 15 categories with 188,282 instances.
- The SIMD dataset [33] was proposed by the research team from the National University of Sciences and Technology (NUST) in Pakistan in 2020, primarily for vehicle detection tasks. This dataset contains 5000 high-resolution remote sensing images (size 1024 × 768) and is annotated with 45,096 target instances. It has a high proportion of small to medium-sized targets (both width and height smaller than 0.4).
- The DIOR dataset [34] was introduced in 2020 as a comprehensive benchmark for optical remote sensing image object detection. It includes 23,463 images and 190,288 annotated instances, covering 20 object categories (such as airplanes, bridges, ports, vehicles, etc.).
4.1.4. Analysis of Experimental Results
5. Discussion
5.1. Comparative Experiments
5.2. Ablation Experiments
5.3. Visual Analytics
5.3.1. Visual Analysis of the Freq-SpaFEFM Module
5.3.2. Visual Analysis for the DAGLF Module
6. Conclusions
- The Freq-SpaFEFM module effectively integrated time–frequency analysis with spatial-domain feature extraction. By adopting a multi-branch architecture that separately processes small and large targets, the module enhanced the model’s ability to detect multi-scale and densely packed objects while suppressing complex background interference.
- The DAGLF module enabled the organic fusion of local details with global contextual information. Through the incorporation of a deformable attention mechanism, the model can adaptively focus on relevant target regions, significantly improving performance on large-scale and fine-grained object detection tasks.
- The ATFL loss function dynamically adjusted loss weights to make the model focus more on hard-to-classify objects. This was especially effective in addressing the long-tailed distribution commonly found in remote sensing data, thereby significantly improving the detection accuracy of underrepresented classes and small-sample targets.
- Experimental results on three public remote sensing datasets—DOTAv1, SIMD, and DIOR—demonstrated that YOLO11-FSDAT outperformed current mainstream methods, including YOLOv5n, YOLOv8n, YOLOv10n, YOLOv11n, and RT-DETR, in terms of detection accuracy. On the DOTAv1 dataset, for instance, the proposed model achieved a mAP50 of 75.22%, representing a 4.11% improvement over the baseline YOLOv11n, validating the model’s superior performance and generalization ability in complex remote sensing scenarios. Additionally, the model exhibited stronger precision and robustness in detecting small, dense, and scale-varying targets within challenging environments.
- The model maintained high detection accuracy while also offering fast inference speed and a lightweight architecture, making it well-suited for real-time deployment and practical applications in diverse remote sensing contexts.
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Hou, L.; Li, F. Lightweight Remote Sensing Image Detection Based on Improved EAF-YOLO. J. Shenyang Univ. Technol. 2025, 44, 7–12. [Google Scholar]
- Cai, Q.; Wang, J.; Liang, H. Remote Sensing Image Object Detection Based on Hybrid Attention and Dynamic Sampling. Comput. Syst. Appl. 2025, 34, 171–179. [Google Scholar]
- Cheng, G.; Han, J.; Guo, L.; Qian, X.; Zhou, P.; Yao, X.; Hu, X. Object detection in remote sensing imagery using a discriminatively trained mixture mode. ISPRS J. Photogramm. Remote Sens. 2013, 85, 32–43. [Google Scholar] [CrossRef]
- Qiu, S.; Wen, G.; Fan, Y. Using Layered Object Representation to Detect Partially Visible Airplanes in Remote Sensing Images. In Proceedings of the 2016 International Conference on Progress in Informatics and Computing (PIC), Shanghai, China, 23–25 December 2016; pp. 196–200. [Google Scholar]
- Cheng, G.; Zhou, P.; Han, J. Learning Rotation-Invariant Convolutional Neural Networks for Object Detection in VHR Optical Remote Sensing Images. IEEE Trans. Geosci. Remote Sens. 2016, 54, 7405–7415. [Google Scholar] [CrossRef]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef]
- He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask R-CNN. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar]
- Ma, H.; Liu, Y.; Ren, Y.; Yu, J. Detection of Collapsed Buildings in Post-Earthquake Remote Sensing Images Based on the Improved YOLOv3. Remote Sens. 2020, 12, 44. [Google Scholar] [CrossRef]
- Luo, S.; Yu, J.; Xi, Y.; Liao, X. Aircraft Target Detection in Remote Sensing Images Based on Improved YOLOv5. IEEE Access 2022, 10, 5184–5192. [Google Scholar] [CrossRef]
- Ahmed, M.; El-Sheimy, N.; Leung, H.; Moussa, A. Enhancing Object Detection in Remote Sensing: A Hybrid YOLOv7 and Transformer Approach with Automatic Model Selection. Remote Sens. 2024, 16, 51. [Google Scholar] [CrossRef]
- Zhao, D.; Shao, F.; Liu, Q.; Yang, L.; Zhang, H.; Zhang, Z. A Small Object Detection Method for Drone-Captured Images Based on Improved YOLOv7. Remote Sens. 2024, 16, 1002. [Google Scholar] [CrossRef]
- Zhu, R.; Jin, H.; Han, Y.; He, Q.; Mu, H. Aircraft Target Detection in Remote Sensing Images Based on Improved YOLOv7-Tiny Network. IEEE Access 2025, 13, 48904–48922. [Google Scholar] [CrossRef]
- Yi, H.; Liu, B.; Zhao, B.; Liu, E. Small Object Detection Algorithm Based on Improved YOLOv8 for Remote Sensing. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2024, 17, 1734–1747. [Google Scholar] [CrossRef]
- Nie, H.; Pang, H.; Ma, M.; Zheng, R. A Lightweight Remote Sensing Small Target Image Detection Algorithm Based on Improved YOLOv8. Sensors 2024, 24, 2952. [Google Scholar] [CrossRef] [PubMed]
- Hwang, D.; Kim, J.-J.; Moon, S.; Wang, S. Image Augmentation Approaches for Building Dimension Estimation in Street View Images Using Object Detection and Instance Segmentation Based on Deep Learning. Appl. Sci. 2025, 15, 2525. [Google Scholar] [CrossRef]
- Zhao, Y.; Lv, W.; Xu, S.; Wei, J.; Wang, G.; Dang, Q.; Liu, Y.; Chen, J. Detrs beat yolos on real-time object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 17–21 June 2024; pp. 16965–16974. [Google Scholar]
- Li, B.; Fang, J.; Zhao, Y. RTDETR-Refa: A real-time detection method for multi-breed classification of cattle. J Real-Time Image Proc 2025, 22, 38. [Google Scholar] [CrossRef]
- Shi, Y.; He, Q.; Mao, C.; Ge, Y.; Du, H.; Wang, H. The Object Detection in Remote Sensing Images Based on Improved YOLO11 with LDConv. In Proceedings of the 2024 6th International Conference on Robotics, Intelligent Control and Artificial Intelligence (RICAI), Nanjing, China, 6–8 December 2024; pp. 605–609. [Google Scholar] [CrossRef]
- Zhao, H.; Jia, L.; Wang, Y.; Yan, F. Autonomous UAV Detection of Ochotona curzoniae Burrows with Enhanced YOLOv11. Drones 2025, 9, 340. [Google Scholar] [CrossRef]
- Agrawal, A.; Papreja, N.; Singh, A.A.; Bhurani, K.; Minocha, S. YoloNet: Hybrid YOLO11 and EfficientNet for Lumpy Skin Disease Detection. In Proceedings of the 2025 3rd International Conference on Device Intelligence, Computing and Communication Technologies (DICCT), Dehradun, India, 21–22 March 2025; pp. 63–67. [Google Scholar] [CrossRef]
- Huang, Z.; Zhao, Y.; Liu, Y. Adaptive YOLOv8-based target detection algorithm for remote sensing images. Commun. Inf. Technol. 2025, 3, 129–133. [Google Scholar]
- Liu, X.; Gong, W.; Shang, L.; Li, X.; Gong, Z. Remote Sensing Image Target Detection and Recognition Based on YOLOv5. Remote Sens. 2023, 15, 4459. [Google Scholar] [CrossRef]
- Wang, X.; Yi, J.; Guo, J.; Song, Y.; Lyu, J.; Xu, J.; Yan, W.; Zhao, J.; Cai, Q.; Min, H. A Review of Image Super-Resolution Approaches Based on Deep Learning and Applications in Remote Sensing. Remote Sens. 2022, 14, 5423. [Google Scholar] [CrossRef]
- Putty, A.; Annappa, B.; Prajwal, R.; Perumal, S.P. Semantic Segmentation of Remotely Sensed Images using Multisource Data: An Experimental Analysis. In Proceedings of the 2024 15th International Conference on Computing Communication and Networking Technologies (ICCCNT), Kamand, India, 24–28 June 2024; pp. 1–6. [Google Scholar] [CrossRef]
- Qu, B.; Li, X.; Tao, D.; Lu, X. Deep semantic understanding of high resolution remote sensing image. In Proceedings of the 2016 International Conference on Computer, Information and Telecommunication Systems (CITS), Kunming, China, 6–8 July 2016; pp. 1–5. [Google Scholar] [CrossRef]
- Sun, H.; Luo, Z.; Ren, D.; Du, B.; Chang, L.; Wan, J. Unsupervised multi-branch network with high-frequency enhancement for image dehazing. Pattern Recognit. 2024, 156, 110763. [Google Scholar] [CrossRef]
- Xia, Z.; Pan, X.; Song, S.; Li, L.E.; Huang, G. Vision Transformer with Deformable Attention. In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 21–24 June 2022; pp. 4784–4793. [Google Scholar]
- Xu, Y.; Pan, Y.; Wu, Z.; Wei, Z.; Zhan, T. Channel Self-Attention Based Multiscale Spatial-Frequency Domain Network for Oriented Object Detection in Remote Sensing Imagery. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5650015. [Google Scholar] [CrossRef]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–777. [Google Scholar]
- Finder, S.E.; Amoyal, R.; Treister, E.; Freifeld, O. Wavelet Convolutions for Large Receptive Fields. In Computer Vision–ECCV 2024. ECCV 2024. Lecture Notes in Computer Science; Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G., Eds.; Springer: Cham, Switzerland, 2025; Volume 15112. [Google Scholar]
- Yang, B.; Zhang, X.; Zhang, J.; Luo, J.; Zhou, M.; Pi, Y. EFLNet: Enhancing Feature Learning Network for Infrared Small Target Detection. IEEE Trans. Geosci. Remote Sens. 2024, 62, 590651. [Google Scholar] [CrossRef]
- Xia, G.S.; Bai, X.; Ding, J.; Zhu, Z.; Belongie, S.; Luo, J.; Datcu, M.; Pelillo, M.; Zhang, L. DOTA: A Large-Scale Dataset for Object Detection in Aerial Images. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 3974–3983. [Google Scholar]
- Haroon, M.; Shahzad, M.; Fraz, M.M. Multisized Object Detection Using Spaceborne Optical Imagery. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2020, 13, 3032–3046. [Google Scholar] [CrossRef]
- Fatima, S.A.; Kumar, A.; Pratap, A.; Raoof, S.S. Object Recognition and Detection in Remote Sensing Images: A Comparative Study. In Proceedings of the 2020 International Conference on Artificial Intelligence and Signal Processing (AISP), Amaravati, India, 10–12 January 2020; pp. 1–5. [Google Scholar]
Datasets | DOTAv1 | SIMD | DIOR | |||
---|---|---|---|---|---|---|
Class | Number | Class | Number | Class | Number | |
1 | small vehicle | 60,804 | car | 20,504 | airplane | 10,104 |
2 | large vehicle | 45,499 | truck | 2802 | airport | 1327 |
3 | plane | 21,724 | van | 5732 | baseball field | 5817 |
4 | storage tank | 14,667 | long vehicle | 1622 | basketball court | 3225 |
5 | ship | 80,761 | bus | 1991 | bridge | 3967 |
6 | harbor | 17,919 | airliner | 968 | chimney | 1681 |
7 | ground track field | 1159 | propeller aircraft | 195 | dam | 1049 |
8 | soccer ball field | 1211 | trainer aircraft | 631 | expressway–service area | 2165 |
9 | tennis court | 6352 | chartered aircraft | 640 | expressway–toll station | 1298 |
10 | swimming pool | 4183 | fighter aircraft | 49 | goldfield | 1086 |
11 | baseball diamond | 1231 | others | 785 | groundbreaking | 2318 |
12 | roundabout | 1102 | stair truck | 443 | harbor | 5509 |
13 | basketball court | 1420 | pushback truck | 209 | overpass | 3114 |
14 | bridge | 4622 | helicopter | 60 | ship | 62,400 |
15 | helicopter | 1326 | boat | 8672 | stadium | 1268 |
16 | storage tank | 26,414 | ||||
17 | tennis court | 12,266 | ||||
18 | train station | 1011 | ||||
19 | vehicle | 40,370 | ||||
20 | windmill | 5363 |
Methods | YOLOv5n | YOLOv8n | YOLOv10n | YOLOv11n | YOLOv12n | RT-DETR | Ours |
---|---|---|---|---|---|---|---|
small vehicle | 71.56 | 70.95 | 74.66 | 69.64 | 74.82 | 71.18 | 75.08 |
large vehicle | 87.52 | 86.78 | 88.02 | 87.59 | 88.53 | 85.05 | 88.74 |
plane | 92.61 | 92.30 | 92.65 | 92.36 | 93.20 | 92.64 | 93.51 |
storage tank | 76.39 | 76.24 | 78.93 | 77.80 | 78.44 | 78.89 | 78.78 |
ship | 90.57 | 90.41 | 91.12 | 90.91 | 90.65 | 89.64 | 91.49 |
harbor | 84.00 | 84.54 | 83.01 | 82.59 | 84.86 | 80.37 | 84.12 |
ground track field | 54.94 | 56.73 | 57.90 | 57.29 | 62.01 | 53.59 | 69.61 |
soccer ball field | 58.34 | 58.19 | 63.85 | 59.19 | 59.67 | 55.76 | 62.88 |
tennis court | 94.91 | 94.23 | 94.34 | 94.16 | 94.34 | 92.41 | 94.80 |
swimming pool | 62.01 | 60.30 | 66.31 | 64.85 | 67.77 | 63.33 | 69.15 |
baseball diamond | 74.27 | 72.15 | 75.45 | 77.42 | 77.11 | 79.08 | 79.60 |
roundabout | 55.82 | 57.32 | 58.80 | 59.52 | 59.04 | 63.99 | 61.74 |
basketball court | 65.20 | 65.80 | 63.70 | 61.41 | 63.56 | 59.75 | 66.08 |
bridge | 46.24 | 47.78 | 49.14 | 45.61 | 49.96 | 49.41 | 53.39 |
helicopter | 47.53 | 51.30 | 59.72 | 46.34 | 63.25 | 48.46 | 59.30 |
mAP50 | 70.79 | 71.00 | 73.17 | 71.11 | 73.82 | 70.90 | 75.22 |
Class | AP (%) | Precision (%) | Recall (%) | F1 Measure |
---|---|---|---|---|
small vehicle | 75.08 | 75.82 | 67.25 | 71.28 |
large vehicle | 88.74 | 86.21 | 83.89 | 85.03 |
plane | 93.51 | 93.65 | 87.96 | 90.72 |
storage tank | 78.78 | 91.99 | 62.55 | 74.47 |
ship | 91.49 | 91.68 | 87.43 | 89.50 |
harbor | 84.12 | 83.01 | 81.55 | 82.27 |
ground track field | 69.61 | 79.30 | 58.23 | 67.15 |
soccer ball field | 62.88 | 74.21 | 56.55 | 64.19 |
tennis court | 94.80 | 94.99 | 91.07 | 92.99 |
swimming pool | 69.15 | 69.20 | 74.05 | 71.54 |
baseball diamond | 79.60 | 85.31 | 72.07 | 78.13 |
roundabout | 61.74 | 80.34 | 45.49 | 58.09 |
basketball court | 66.08 | 74.40 | 56.76 | 64.39 |
bridge | 53.39 | 71.33 | 47.43 | 56.98 |
helicopter | 59.30 | 51.46 | 57.46 | 54.29 |
Models | DOTAv1 | SIMD | DIOR | GFLOPs | ||||||
---|---|---|---|---|---|---|---|---|---|---|
mAP50 (%) | mAP50-95 (%) | Precision (%) | mAP50 (%) | mAP50-95 (%) | Precision (%) | mAP50 (%) | mAP50-95 (%) | Precision (%) | ||
YOLOv5n | 70.79 | 47.61 | 77.58 | 79.87 | 63.41 | 76.11 | 84.94 | 60.81 | 87.07 | 5.8 |
YOLOv8n | 71.00 | 48.12 | 76.56 | 80.23 | 63.53 | 76.61 | 86.27 | 62.71 | 88.91 | 6.8 |
YOLOv10n | 73.17 | 49.80 | 78.24 | 79.65 | 64.52 | 76.41 | 86.62 | 64.54 | 89.06 | 8.3 |
YOLOv11n | 71.11 | 48.71 | 77.27 | 78.97 | 64.24 | 76.48 | 87.02 | 64.52 | 89.25 | 6.3 |
YOLOv12n | 73.82 | 50.47 | 79.18 | 80.39 | 66.09 | 75.37 | 87.22 | 65.25 | 89.44 | 6.3 |
RT-DETR | 70.90 | 46.50 | 74.98 | 75.49 | 61.34 | 77.54 | 85.38 | 62.90 | 86.21 | 103.5 |
Ours | 75.22 | 50.70 | 80.19 | 82.79 | 66.97 | 77.18 | 88.01 | 65.24 | 90.34 | 6.3 |
Models | mAP50 (%) | mAP50-95 (%) | Precision (%) | Recall (%) | GFLOPs |
---|---|---|---|---|---|
YOLOv11n (Baseline) | 71.35 | 48.85 | 76.64 | 66.64 | 6.3 |
Baseline + A | 72.76 | 49.72 | 77.35 | 67.94 | 6.4 |
Baseline + B | 72.44 | 49.74 | 77.99 | 67.15 | 6.2 |
Baseline + C | 72.84 | 48.89 | 77.83 | 68.50 | 6.3 |
Baseline + A + B | 74.20 | 50.20 | 79.93 | 68.65 | 6.3 |
Baseline + A + C | 73.45 | 49.27 | 77.45 | 70.16 | 6.4 |
Baseline + B + C | 73.73 | 49.79 | 78.26 | 68.71 | 6.2 |
Baseline + A + B + C | 75.22 | 50.70 | 80.19 | 68.65 | 6.3 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wang, A.; Fu, Z.; Zhao, Y.; Chen, H. A Remote Sensing Image Object Detection Model Based on Improved YOLOv11. Electronics 2025, 14, 2607. https://doi.org/10.3390/electronics14132607
Wang A, Fu Z, Zhao Y, Chen H. A Remote Sensing Image Object Detection Model Based on Improved YOLOv11. Electronics. 2025; 14(13):2607. https://doi.org/10.3390/electronics14132607
Chicago/Turabian StyleWang, Aili, Zhijia Fu, Yanran Zhao, and Haisong Chen. 2025. "A Remote Sensing Image Object Detection Model Based on Improved YOLOv11" Electronics 14, no. 13: 2607. https://doi.org/10.3390/electronics14132607
APA StyleWang, A., Fu, Z., Zhao, Y., & Chen, H. (2025). A Remote Sensing Image Object Detection Model Based on Improved YOLOv11. Electronics, 14(13), 2607. https://doi.org/10.3390/electronics14132607