remotesensing-logo

Journal Browser

Journal Browser

Object Detection in Remote Sensing Images Based on Artificial Intelligence

A special issue of Remote Sensing (ISSN 2072-4292). This special issue belongs to the section "AI Remote Sensing".

Deadline for manuscript submissions: 29 May 2026 | Viewed by 6186

Special Issue Editors


E-Mail Website
Guest Editor
School of Astronautics, Harbin Institute of Technology (HIT), Harbin 150001, China
Interests: optical remote sensing imaging; automatic target detection and recognition; image interpretation application; artificial intelligence
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
School of Astronautics, Harbin Institute of Technology (HIT), Harbin 150001, China
Interests: optical remote sensing image; automatic target detection and recognition; new system imaging; image acquisition and processing

E-Mail Website
Guest Editor
Department of Electronic Engineering, Tsinghua University (THU), Beijing 100084, China
Interests: optical remote sensing image; multimodal remote sensing; data fusion; foundation models; object detection; semantic segmentation
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

The rapid advancement of remote sensing technologies, including high-resolution satellites, unmanned aerial vehicles (UAVs), and aerial sensors, has generated an unprecedented volume of geospatial data. Extracting meaningful information from this vast resource is critical for applications such as environmental monitoring, urban planning, disaster management, and agriculture. Object detection in remote sensing images (RSIs) plays a pivotal role in automating the identification and localization of objects (e.g., vehicles, buildings, ships, aircraft) within complex, large-scale scenes. However, the unique challenges of RSIs—such as varying scales, arbitrary orientations, dense arrangements, occlusions, and diverse background clutter—significantly hinder the performance of traditional computer vision methods.

Recent breakthroughs in artificial intelligence (AI), particularly deep learning (DL), have revolutionized object detection in RSIs. Techniques like convolutional neural networks (CNNs), transformer-based architectures, and hybrid models have demonstrated remarkable capabilities in addressing domain-specific challenges, enabling higher accuracy, robustness, and efficiency. Despite these advances, critical gaps remain, including the need for lightweight models for edge deployment, generalization across heterogeneous datasets, interpretability of AI decisions, and handling of low-resolution or weakly annotated data. Furthermore, emerging trends such as multimodal data fusion and few-shot learning demand deeper exploration.

This Special Issue seeks to compile cutting-edge research on AI-driven object detection in RSIs, emphasizing novel algorithms, benchmark datasets, and real-world applications. By fostering interdisciplinary collaboration, we aim to accelerate progress in this field, bridging the gap between theoretical innovation and practical implementation to meet the growing demands of global remote sensing communities.

(1) Advanced deep learning architectures for RSI object detection.
(2) Robust target detection methods under complex conditions such as dense target arrangement, occlusion, and background interference.
(3) Multi-modal data fusion detection models, such as optical, LiDAR, SAR, hyperspectral, multispectral and infrared.
(4) Weakly supervised, semi-supervised, or unsupervised object detection frameworks for scenarios with scarce annotations.
(5) Lightweight detection models for satellites, UAVs (Unmanned Aerial Vehicles), and other edge devices.
(6) New datasets and benchmarks for specific detection tasks.

Dr. Jianming Hu
Dr. Xiyang Zhi
Dr. Yong-Qiang Mao
Dr. Longfei Ren
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 250 words) can be sent to the Editorial Office for assessment.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Remote Sensing is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2700 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • remote sensing images
  • object detection
  • artificial intelligence
  • multimodal data fusion
  • frontier applications

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • Reprint: MDPI Books provides the opportunity to republish successful Special Issues in book format, both online and in print.

Further information on MDPI's Special Issue policies can be found here.

Published Papers (6 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

28 pages, 5678 KB  
Article
FKIFM-DETR: A Multi-Domain Fusion-Based Transformer Framework for Small-Target Detection in UAV Remote Sensing Imagery
by Fan Yang, Long Chen, Xiaoguang Wang, Yang Zhang, Hongyu Li, Min He and Li Shen
Remote Sens. 2026, 18(5), 700; https://doi.org/10.3390/rs18050700 - 26 Feb 2026
Viewed by 364
Abstract
Unmanned Aerial Vehicle (UAV) remote sensing has become essential for real-time earth observation applications, including precision agriculture, traffic monitoring, and disaster response. However, small-target detection in UAV aerial imagery still faces critical challenges: extreme scale variation due to variable flight altitudes, background interference [...] Read more.
Unmanned Aerial Vehicle (UAV) remote sensing has become essential for real-time earth observation applications, including precision agriculture, traffic monitoring, and disaster response. However, small-target detection in UAV aerial imagery still faces critical challenges: extreme scale variation due to variable flight altitudes, background interference from complex terrain, and insufficient pixel information for tiny objects. To address these issues, this work proposes FKIFM-DETR, a real-time transformer-based detection framework leveraging multi-domain information fusion. First, a Spatial-Frequency Fusion Module (SFM) is designed to integrate spatial and frequency-domain features for capturing fine-grained target details while suppressing background noise; second, a High–Low Frequency Block (HL-Block) is introduced to separately process high-frequency local details and low-frequency global context, balancing detail retention and semantic awareness; finally, a Channel Feature Recalibration-Enhanced Feature Pyramid Network (SPCR-FPN) is employed to strengthen the interaction between shallow spatial features and deep semantic features. On the VisDrone2019 dataset, FKIFM-DETR achieves 6.3% and 5.3% improvements in mAP@0.5 and mAP@0.5:0.95 over the RT-DETR baseline, respectively; evaluations on TinyPerson and HIT-UAV datasets further demonstrate its cross-scenario applicability. These results demonstrate the potential of FKIFM-DETR for practical UAV remote sensing applications such as crowd surveillance, vehicle tracking, and emergency rescue. Full article
Show Figures

Figure 1

26 pages, 11755 KB  
Article
SAMKD: A Hybrid Lightweight Algorithm Based on Selective Activation and Masked Knowledge Distillation for Multimodal Object Detection
by Ruitao Lu, Zhanhong Zhuo, Siyu Wang, Jiwei Fan, Tong Shen and Xiaogang Yang
Remote Sens. 2026, 18(3), 450; https://doi.org/10.3390/rs18030450 - 1 Feb 2026
Viewed by 373
Abstract
Multimodal object detection is currently a research hotspot in computer vision. However, the fusion of visible and infrared modalities inevitably increases computational complexity, making most high-performance detection models difficult to deploy on resource-constrained UAV edge devices. Although pruning and knowledge distillation are widely [...] Read more.
Multimodal object detection is currently a research hotspot in computer vision. However, the fusion of visible and infrared modalities inevitably increases computational complexity, making most high-performance detection models difficult to deploy on resource-constrained UAV edge devices. Although pruning and knowledge distillation are widely used for model compression, applying them independently often leads to an unstable accuracy–efficiency trade-off. Therefore, this paper proposes a hybrid lightweight algorithm named SAMKD, which combines selective activation pruning with masked knowledge distillation in a staged manner to improve efficiency while maintaining detection performance. Specifically, the selective activation network pruning model (SAPM) first reduces redundant computation by dynamically adjusting network weights and the activation state of input data to generate a lightweight student network. Then, the mask binary classification knowledge distillation (MBKD) strategy is introduced to compensate for this degradation by guiding the student network to recover missing representation patterns under masked feature learning. Moreover, MBKD reformulates classification logits into multiple foreground–background binary mappings, effectively alleviating the severe foreground–background imbalance commonly observed in UAV aerial imagery. This paper constructs a multimodal UAV aerial imagery object detection dataset, M2UD-18K, which includes 9 types of targets and over 18,000 pairs. Extensive experiments show that SAMKD performs well on the self-constructed M2UD-18K dataset, as well as the public DroneVehicle dataset, achieving a favorable trade-off between detection accuracy and detection speed. Full article
Show Figures

Figure 1

20 pages, 5981 KB  
Article
A Multimodal Visual–Textual Framework for Detection and Counting of Diseased Trees Caused by Invasive Species in Complex Forest Scenes
by Rui Zhang, Zhibo Chen, Guangyu Huo, Xiaoyu Zhang, Wenda Luo and Liping Mu
Remote Sens. 2025, 17(24), 3971; https://doi.org/10.3390/rs17243971 - 9 Dec 2025
Viewed by 472
Abstract
With the large-scale invasion of alien species, forest ecosystems are facing severe challenges, and the health of trees is increasingly threatened. Accurately detecting and counting trees affected by such invasive species has become a critical issue in forest conservation and resource management. Traditional [...] Read more.
With the large-scale invasion of alien species, forest ecosystems are facing severe challenges, and the health of trees is increasingly threatened. Accurately detecting and counting trees affected by such invasive species has become a critical issue in forest conservation and resource management. Traditional detection methods usually rely only on the information of a single modality of an image, lack linguistic or semantic guidance, and often can only model a specific diseased tree situation during training, making it difficult to achieve effective differentiation and generalization of multiple diseased tree types, which limits their practicality. To address the above challenges, we propose an end-to-end multimodal diseased tree detection model. In the visual encoder of the model, we introduce rotational positional encoding to enhance the model’s ability to perceive detailed structures of trees in images. This design enables more accurate extraction of features related to diseased trees, especially when processing images with complex environments. At the same time, we further introduce a cross-attention mechanism between image and text modalities, so that the model can realize the deep fusion of visual and verbal information, thus improving the detection accuracy based on understanding and recognizing the semantics of the disease. Additionally, this method possesses strong generalization capabilities, enabling effective recognition based on textual descriptions even when samples are not available. Our model achieves optimal results on the Larch Casebearer dataset and the Pests and Diseases Tree dataset, verifying the effectiveness and generalizability of the method. Full article
Show Figures

Figure 1

26 pages, 102536 KB  
Article
SPOD-YOLO: A Modular Approach for Small and Oriented Aircraft Detection in Satellite Remote Sensing Imagery
by Jiajian Chen, Pengyu Guo, Yong Liu, Lu Cao, Dechao Ran, Kai Wang, Wei Hu and Liyang Wan
Remote Sens. 2025, 17(24), 3963; https://doi.org/10.3390/rs17243963 - 8 Dec 2025
Cited by 1 | Viewed by 744
Abstract
The accurate detection of small, densely packed and arbitrarily oriented aircraft in high-resolution remote sensing imagery remains highly challenging due to significant variations in object scale, orientation and background complexity. Existing detection frameworks often struggle with insufficient representation of small objects, instability of [...] Read more.
The accurate detection of small, densely packed and arbitrarily oriented aircraft in high-resolution remote sensing imagery remains highly challenging due to significant variations in object scale, orientation and background complexity. Existing detection frameworks often struggle with insufficient representation of small objects, instability of rotated bounding box regression and inability to adapt to complex background. To address these limitations, we propose SPOD-YOLO, a novel detection framework specifically designed for small aircraft in remote sensing images. This method is based on YOLOv11, combined with the feature attention mechanism of swintransformer, through targeted improvements on cross-scale feature modelling, dynamic convolutional adaptation, and rotational geometry optimization to achieve effective detection. Additionally, we have constructed a new dataset based on satellite remote sensing images, which has high density of small aircraft with rotated bounding box annotations to provide more realistic and challenging evaluation settings. Extensive experiments on MAR20, UCAS-AOD and the constructed dataset demonstrate that our method achieves consistent performance gains over state-of-the-art approaches. SPOD-YOLO achieves an 4.54% increase in mAP50 and a 11.78% gain in mAP50:95 with only 3.77 million parameters on the constructed dataset. These results validate the effectiveness and robustness of our approach in complex remote sensing scenarios, offering a practical advancement for the detection of small objects in aerospace imagery. Full article
Show Figures

Figure 1

21 pages, 7001 KB  
Article
CGNet: Remote Sensing Instance Segmentation Method Using Contrastive Language–Image Pretraining and Gated Recurrent Units
by Hui Zhang, Zhao Tian, Zhong Chen, Tianhang Liu, Xueru Xu, Junsong Leng and Xinyuan Qi
Remote Sens. 2025, 17(19), 3305; https://doi.org/10.3390/rs17193305 - 26 Sep 2025
Viewed by 1344
Abstract
Instance segmentation in remote sensing imagery is a significant application area within computer vision, holding considerable value in fields such as land planning and aerospace. The target scales of remote sensing images are often small, the contours of different categories of targets can [...] Read more.
Instance segmentation in remote sensing imagery is a significant application area within computer vision, holding considerable value in fields such as land planning and aerospace. The target scales of remote sensing images are often small, the contours of different categories of targets can be remarkably similar, and the background information is complex, containing more noise interference. Therefore, it is essential for the network model to utilize the background and internal instance information more effectively. Considering all the above, to fully adapt to the characteristics of remote sensing images, a network named CGNet, which combines an enhanced backbone with a contour–mask branch, is proposed. This network employs gated recurrent units for the iteration of contour and mask branches and adopts the attention head for branch fusion. Additionally, to address the common issues of missed and misdetections in target detection, a supervised backbone network using contrastive pretraining for feature supplementation is introduced. The proposed method has been experimentally validated in the NWPU VHR-10 and SSDD datasets, achieving average precision metrics of 68.1% and 67.4%, respectively, which are 0.9% and 3.2% higher than those of the suboptimal methods. Full article
Show Figures

Figure 1

25 pages, 4796 KB  
Article
Vision-Language Guided Semantic Diffusion Sampling for Small Object Detection in Remote Sensing Imagery
by Jian Ma, Mingming Bian, Fan Fan, Hui Kuang, Lei Liu, Zhibing Wang, Ting Li and Running Zhang
Remote Sens. 2025, 17(18), 3203; https://doi.org/10.3390/rs17183203 - 17 Sep 2025
Cited by 1 | Viewed by 1872
Abstract
Synthetic aperture radar (SAR), with its all-weather and all-day active imaging capability, has become indispensable for geoscientific analysis and socio-economic applications. Despite advances in deep learning–based object detection, the rapid and accurate detection of small objects in SAR imagery remains a major challenge [...] Read more.
Synthetic aperture radar (SAR), with its all-weather and all-day active imaging capability, has become indispensable for geoscientific analysis and socio-economic applications. Despite advances in deep learning–based object detection, the rapid and accurate detection of small objects in SAR imagery remains a major challenge due to their extremely limited pixel representation, blurred boundaries in dense distributions, and the imbalance of positive–negative samples during training. Recently, vision–language models such as Contrastive Language-Image Pre-Training (CLIP) have attracted widespread research interest for their powerful cross-modal semantic modeling capabilities. Nevertheless, their potential to guide precise localization and detection of small objects in SAR imagery has not yet been fully exploited. To overcome these limitations, we propose the CLIP-Driven Adaptive Tiny Object Detection Diffusion Network (CDATOD-Diff). This framework introduces a CLIP image–text encoding-guided dynamic sampling strategy that leverages cross-modal semantic priors to alleviate the scarcity of effective positive samples. Furthermore, a generative diffusion-based module reformulates the sampling process through iterative denoising, enhancing contextual awareness. To address regression instability, we design a Balanced Corner–IoU (BC-IoU) loss, which decouples corner localization from scale variation and reduces sensitivity to minor positional errors, thereby stabilizing bounding box predictions. Extensive experiments conducted on multiple SAR and optical remote sensing datasets demonstrate that CDATOD-Diff achieves state-of-the-art performance, delivering significant improvements in detection robustness and localization accuracy under challenging small-object scenarios with complex backgrounds and dense distributions. Full article
Show Figures

Figure 1

Back to TopTop