remotesensing-logo

Journal Browser

Journal Browser

Artificial Intelligence Remote Sensing for Earth Observation

A special issue of Remote Sensing (ISSN 2072-4292). This special issue belongs to the section "AI Remote Sensing".

Deadline for manuscript submissions: closed (15 September 2025) | Viewed by 18878

Special Issue Editors


E-Mail Website
Guest Editor
School of Cybersecurity, Northwestern Polytechnical University, Xi’an 710129, China
Interests: remote sensing; image processing; visual language model
Special Issues, Collections and Topics in MDPI journals

E-Mail Website1 Website2
Guest Editor
Key Laboratory of Intelligent Perception and Image Understanding of Ministry of Education, Xidian University, Xi’an 710071, China
Interests: deep learning; object detection and tracking; reinforcement learning; hyperspectral image processing
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
Department of Aerospace and Geodesy, Technical University of Munich, 85521 Munich, Germany
Interests: remote sensing; image segmentation; visual language model
School of Computer science, Xi’an University of Posts & Telecommunications, Xi’an 710121, China
Interests: remote sensing; image processing; machine learning
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

Remote sensing imaging captures electromagnetic radiation across various wavelengths, producing multimodal images with rich information. Consequently, remote sensing images have a wide range of applications in Earth observation, including environmental monitoring, agriculture, urban planning, and geological exploration. The development of artificial intelligence (AI) presents both opportunities and challenges for remote sensing-based Earth observations. Over the past decade, researchers have observed significant advancements in remote sensing image processing techniques driven by deep learning.

In recent years, the field of AI has experienced new developments. The remarkable success of ChatGPT has sparked a renewed wave of interest in AI, and advancements in visual language models (VLMs) have pushed this enthusiasm to new heights. Similar to previous developments, remote sensing is also embracing these new advancements and reaching a new level. Technological advancements have enabled us to design more efficient and lightweight AI models to handle specific remote sensing tasks. These advancements even allow us to move beyond traditional discriminative models, using a generative paradigm to solve problems that were previously impossible to model. However, the application of new technologies such as Mamba, TTT, and VLM in the field of remote sensing is still relatively limited. At the same time, the use of these new technologies also needs to overcome certain challenges unique to the field of remote sensing, such as modality gaps and resolution differences. Therefore, more effort should be paid to exploiting advanced AI techniques, e.g., CLIP, VLM, Mamba, and large-foundation modelling, facilitating the wide application of remote sensing images.

For this Special Issue, we encourage submissions that utilise advanced AI techniques to address remote sensing image processing tasks. This includes both traditional tasks such as image segmentation and fusion, and emerging tasks such as remote sensing-based visual question answering (VQA) and AI for scientific applications.

This Special Issue welcomes high-quality submissions that provide the community with the most recent advancements in remote sensing for Earth observation, including but not limited to the following:

  • Spatial and spectral remote sensing image super-resolution;
  • Remote sensing image segmentation/classification;
  • Multimodal remote sensing image fusion;
  • Remote sensing object detection;
  • Contrastive language and remote sensing image pretraining;
  • Remote sensing image-based visual language model for Earth observation;
  • Other topics on applications of remote sensing for Earth observation.

Prof. Dr. Haokui Zhang
Prof. Dr. Jie Feng
Dr. Xizhe Xue
Dr. Chen Ding
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Remote Sensing is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2700 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • image super-resolution
  • image segmentation
  • image classification
  • multimodal fusion
  • language–image contrastive learning
  • visual language model

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • Reprint: MDPI Books provides the opportunity to republish successful Special Issues in book format, both online and in print.

Further information on MDPI's Special Issue policies can be found here.

Published Papers (12 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

21 pages, 7029 KB  
Article
Cross-View Geo-Localization via 3D Gaussian Splatting-Based Novel View Synthesis
by Xiaokun Ding, Xuanyu Zhang, Shangzhen Song, Bo Li, Le Hui and Yuchao Dai
Remote Sens. 2025, 17(22), 3673; https://doi.org/10.3390/rs17223673 (registering DOI) - 8 Nov 2025
Abstract
Cross-view geo-localization allows an agent to determine its own position by retrieving the same scene from images taken from dramatically different perspectives. However, image matching and retrieval face significant challenges due to substantial viewpoint differences, unknown orientations, and considerable geometric distribution disparities between [...] Read more.
Cross-view geo-localization allows an agent to determine its own position by retrieving the same scene from images taken from dramatically different perspectives. However, image matching and retrieval face significant challenges due to substantial viewpoint differences, unknown orientations, and considerable geometric distribution disparities between cross-view images. To this end, we propose a cross-view geo-localization framework based on novel view synthesis that generates pseudo aerial-view images from given street-view scenes to reduce the view discrepancies, thereby improving the performance of cross-view geo-localization. Specifically, we first employ 3D Gaussian splatting to generate new aerial images from the street-view image sequence, where COLMAP is used to obtain initial camera poses and sparse point clouds. To identify optimal matching viewpoints from reconstructed 3D scenes, we design an effective camera pose estimation strategy. By increasing the tilt angle between the photographic axis and the horizontal plane, the geometric consistency between the newly generated aerial images and the real ones can be improved. After that, the DINOv2 is employed to design a simple yet efficient mixed feature enhancement module, followed by the InfoNCE loss for cross-view geo-localization. Experimental results on the KITTI dataset demonstrate that our approach can significantly improve cross-view matching accuracy under large viewpoint disparities and achieve state-of-the-art localization performance. Full article
(This article belongs to the Special Issue Artificial Intelligence Remote Sensing for Earth Observation)
Show Figures

Figure 1

34 pages, 7677 KB  
Article
JSPSR: Joint Spatial Propagation Super-Resolution Networks for Enhancement of Bare-Earth Digital Elevation Models from Global Data
by Xiandong Cai and Matthew D. Wilson
Remote Sens. 2025, 17(21), 3591; https://doi.org/10.3390/rs17213591 - 30 Oct 2025
Viewed by 361
Abstract
(1) Background: Digital Elevation Models (DEMs) encompass digital bare earth surface representations that are essential for spatial data analysis, such as hydrological and geological modelling, as well as for other applications, such as agriculture and environmental management. However, available bare-earth DEMs can have [...] Read more.
(1) Background: Digital Elevation Models (DEMs) encompass digital bare earth surface representations that are essential for spatial data analysis, such as hydrological and geological modelling, as well as for other applications, such as agriculture and environmental management. However, available bare-earth DEMs can have limited coverage or accessibility. Moreover, the majority of available global DEMs have lower spatial resolutions (∼30–90 m) and contain errors introduced by surface features such as buildings and vegetation. (2) Methods: This research presents an innovative method to convert global DEMs to bare-earth DEMs while enhancing their spatial resolution as measured by the improved vertical accuracy of each pixel, combined with reduced pixel size. We propose the Joint Spatial Propagation Super-Resolution network (JSPSR), which integrates Guided Image Filtering (GIF) and Spatial Propagation Network (SPN). By leveraging guidance features extracted from remote sensing images with or without auxiliary spatial data, our method can correct elevation errors and enhance the spatial resolution of DEMs. We developed a dataset for real-world bare-earth DEM Super-Resolution (SR) problems in low-relief areas utilising open-access data. Experiments were conducted on the dataset using JSPSR and other methods to predict 3 m and 8 m spatial resolution DEMs from 30 m spatial resolution Copernicus GLO-30 DEMs. (3) Results: JSPSR improved prediction accuracy by 71.74% on Root Mean Squared Error (RMSE) and reconstruction quality by 22.9% on Peak Signal-to-Noise Ratio (PSNR) compared to bicubic interpolated GLO-30 DEMs, and achieves 56.03% and 13.8% improvement on the same items against a baseline Single Image Super Resolution (SISR) method. Overall RMSE was 1.06 m at 8 m spatial resolution and 1.1 m at 3 m, compared to 3.8 m for GLO-30, 1.8 m for FABDEM and 1.3 m for FathomDEM, at either resolution. (4) Conclusions: JSPSR outperforms other methods in bare-earth DEM super-resolution tasks, with improved elevation accuracy compared to other state-of-the-art globally available datasets. Full article
(This article belongs to the Special Issue Artificial Intelligence Remote Sensing for Earth Observation)
Show Figures

Figure 1

28 pages, 16418 KB  
Article
Hybrid-SegUFormer: A Hybrid Multi-Scale Network with Self-Distillation for Robust Landslide InSAR Deformation Detection
by Wenyi Zhao, Jiahao Zhang, Jianao Cai and Dongping Ming
Remote Sens. 2025, 17(21), 3514; https://doi.org/10.3390/rs17213514 - 23 Oct 2025
Viewed by 395
Abstract
Landslide deformation monitoring via InSAR is crucial for assessing the risk of hazards. Quick and accurate detection of active deformation zones is crucial for early warning and mitigation planning. While the application of deep learning has substantially improved the detection efficiency, several challenges [...] Read more.
Landslide deformation monitoring via InSAR is crucial for assessing the risk of hazards. Quick and accurate detection of active deformation zones is crucial for early warning and mitigation planning. While the application of deep learning has substantially improved the detection efficiency, several challenges still persist, such as poor multi-scale perception, blurred boundaries, and limited model generalization. This study proposes Hybrid-SegUFormer to address these limitations. The model integrates the SegFormer encoder’s efficient feature extraction with the U-Net decoder’s superior boundary restoration. It introduces a multi-scale fusion decoding mechanism to enhance context perception structurally and incorporates a self-distillation strategy to significantly improve generalization capability. Hybrid-SegUFormer achieves detection performance (98.79% accuracy, 80.05% F1-score) while demonstrating superior multi-scale adaptability (IoU degradation of only 6.99–8.83%) and strong cross-regional generalization capability. The synergistic integration of its core modules enables an optimal balance between precision and recall, making it particularly effective for complex landslide detection tasks. This study provides a new approach for intelligent interpretation of InSAR deformation in complex mountainous areas. Full article
(This article belongs to the Special Issue Artificial Intelligence Remote Sensing for Earth Observation)
Show Figures

Figure 1

20 pages, 7578 KB  
Article
Cross Attention Based Dual-Modality Collaboration for Hyperspectral Image and LiDAR Data Classification
by Khanzada Muzammil Hussain, Keyun Zhao, Yang Zhou, Aamir Ali and Ying Li
Remote Sens. 2025, 17(16), 2836; https://doi.org/10.3390/rs17162836 - 15 Aug 2025
Viewed by 1163
Abstract
Advancements in satellite sensor technology have enabled access to diverse remote sensing (RS) data from multiple platforms. Hyperspectral Image (HSI) data offers rich spectral detail for material identification, while LiDAR captures high-resolution 3D structural information, making the two modalities naturally complementary. By fusing [...] Read more.
Advancements in satellite sensor technology have enabled access to diverse remote sensing (RS) data from multiple platforms. Hyperspectral Image (HSI) data offers rich spectral detail for material identification, while LiDAR captures high-resolution 3D structural information, making the two modalities naturally complementary. By fusing HSI and LiDAR, we can mitigate the limitations of each and improve tasks like land cover classification, vegetation analysis, and terrain mapping through more robust spectral–spatial feature representation. However, traditional multi-scale feature fusion models often struggle with aligning features effectively, which can lead to redundant outputs and diminished spatial clarity. To address these issues, we propose the Cross Attention Bridge for HSI and LiDAR (CAB-HL), a novel dual-path framework that employs a multi-stage cross-attention mechanism to guide the interaction between spectral and spatial features. In CAB-HL, features from each modality are refined across three progressive stages using cross-attention modules, which enhance contextual alignment while preserving the distinctive characteristics of each modality. These fused representations are subsequently integrated and passed through a lightweight classification head. Extensive experiments on three benchmark RS datasets demonstrate that CAB-HL consistently outperforms existing state-of-the-art models, confirm that CAB-HL consistently outperforms in learning deep joint representations for multimodal classification tasks. Full article
(This article belongs to the Special Issue Artificial Intelligence Remote Sensing for Earth Observation)
Show Figures

Figure 1

20 pages, 4929 KB  
Article
Remote Sensing Image-Based Building Change Detection: A Case Study of the Qinling Mountains in China
by Lei Fu, Yunfeng Zhang, Keyun Zhao, Lulu Zhang, Ying Li, Changjing Shang and Qiang Shen
Remote Sens. 2025, 17(13), 2249; https://doi.org/10.3390/rs17132249 - 30 Jun 2025
Cited by 1 | Viewed by 811
Abstract
With the widespread application of deep learning in Earth observation, remote sensing image-based building change detection has achieved numerous groundbreaking advancements. However, differences across time periods caused by temporal variations in land cover, as well as the complex spatial structures in remote sensing [...] Read more.
With the widespread application of deep learning in Earth observation, remote sensing image-based building change detection has achieved numerous groundbreaking advancements. However, differences across time periods caused by temporal variations in land cover, as well as the complex spatial structures in remote sensing scenes, significantly constrain the performance of change detection. To address these challenges, a change detection algorithm based on spatio-spectral information aggregation is proposed, which consists of two key modules: the Cross-Scale Heterogeneous Convolution module (CSHConv) and the Spatio-Spectral Information Fusion module (SSIF). CSHConv mitigates information loss caused by scale heterogeneity, thereby enhancing the effective utilization of multi-scale features. Meanwhile, SSIF models spatial and spectral information jointly, capturing interactions across different spatial scales and spectral domains. This investigation is illustrated with a case study conducted with the real-world dataset QL-CD (Qinling change detection), acquired in the Qinling region of China. The work includes the construction of QL-CD, which includes 12,724 pairs of images captured by the Gaofen-1 satellite. Experimental results demonstrate that the proposed approach outperforms a wide range of state-of-the-art algorithms. Full article
(This article belongs to the Special Issue Artificial Intelligence Remote Sensing for Earth Observation)
Show Figures

Figure 1

20 pages, 21844 KB  
Article
DWTMA-Net: Discrete Wavelet Transform and Multi-Dimensional Attention Network for Remote Sensing Image Dehazing
by Xin Guan, Runxu He, Le Wang, Hao Zhou, Yun Liu and Hailing Xiong
Remote Sens. 2025, 17(12), 2033; https://doi.org/10.3390/rs17122033 - 12 Jun 2025
Cited by 1 | Viewed by 1851
Abstract
Haze caused by atmospheric scattering often leads to color distortion, reduced contrast, and diminished clarity, which significantly degrade the quality of remote sensing images. To address these issues, we propose a novel network called DWTMA-Net that integrates discrete wavelet transform with multi-dimensional attention, [...] Read more.
Haze caused by atmospheric scattering often leads to color distortion, reduced contrast, and diminished clarity, which significantly degrade the quality of remote sensing images. To address these issues, we propose a novel network called DWTMA-Net that integrates discrete wavelet transform with multi-dimensional attention, aiming to restore image information in both the frequency and spatial domains to enhance overall image quality. Specifically, we design a wavelet transform-based downsampling module that effectively fuses frequency and spatial features. The input first passes through a discrete wavelet block to extract frequency-domain information. These features are then fed into a multi-dimensional attention block, which incorporates pixel attention, Fourier frequency-domain attention, and channel attention. This combination allows the network to capture both global and local characteristics while enhancing deep feature representations through dimensional expansion, thereby improving spatial-domain feature extraction. Experimental results on the SateHaze1k, HRSD, and HazyDet datasets demonstrate the effectiveness of the proposed method in handling remote sensing images with varying haze levels and drone-view scenarios. By recovering both frequency and spatial details, our model achieves significant improvements in dehazing performance compared to existing state-of-the-art approaches. Full article
(This article belongs to the Special Issue Artificial Intelligence Remote Sensing for Earth Observation)
Show Figures

Graphical abstract

35 pages, 7637 KB  
Article
Enhancing Real-Time Aerial Image Object Detection with High-Frequency Feature Learning and Context-Aware Fusion
by Xin Ge, Liping Qi, Qingsen Yan, Jinqiu Sun, Yu Zhu and Yanning Zhang
Remote Sens. 2025, 17(12), 1994; https://doi.org/10.3390/rs17121994 - 9 Jun 2025
Viewed by 2251
Abstract
Aerial image object detection faces significant challenges due to notable scale variations, numerous small objects, complex backgrounds, illumination variability, motion blur, and densely overlapping objects, placing stringent demands on both accuracy and real-time performance. Although Transformer-based real-time detection methods have achieved remarkable performance [...] Read more.
Aerial image object detection faces significant challenges due to notable scale variations, numerous small objects, complex backgrounds, illumination variability, motion blur, and densely overlapping objects, placing stringent demands on both accuracy and real-time performance. Although Transformer-based real-time detection methods have achieved remarkable performance by effectively modeling global context, they typically emphasize non-local feature interactions while insufficiently utilizing high-frequency local details, which are crucial for detecting small objects in aerial images. To address these limitations, we propose a novel VMC-DETR framework designed to enhance the extraction and utilization of high-frequency texture features in aerial images. Specifically, our approach integrates three innovative modules: (1) the VHeat C2f module, which employs a frequency-domain heat conduction mechanism to fine-tune feature representations and significantly enhance high-frequency detail extraction; (2) the Multi-scale Feature Aggregation and Distribution Module (MFADM), which utilizes large convolution kernels of different sizes to robustly capture effective high-frequency features; and (3) the Context Attention Guided Fusion Module (CAGFM), which ensures precise and effective fusion of high-frequency contextual information across scales, substantially improving the detection accuracy of small objects. Extensive experiments and ablation studies on three public aerial image datasets validate that our proposed VMC-DETR framework effectively balances accuracy and computational efficiency, consistently outperforming state-of-the-art methods. Full article
(This article belongs to the Special Issue Artificial Intelligence Remote Sensing for Earth Observation)
Show Figures

Figure 1

21 pages, 1426 KB  
Article
Adaptive Conditional Reasoning for Remote Sensing Visual Question Answering
by Yiqun Gao, Zongwen Bai, Meili Zhou, Bolin Jia, Peiqi Gao and Rui Zhu
Remote Sens. 2025, 17(8), 1338; https://doi.org/10.3390/rs17081338 - 9 Apr 2025
Viewed by 1121
Abstract
Remote Sensing Visual Question Answering (RS-VQA) is a research task that combines remote sensing image processing and natural language understanding. The increasing complexity and diversity of question types in Remote Sensing Visual Question Answering (RS-VQA) pose significant challenges for unified multimodal reasoning within [...] Read more.
Remote Sensing Visual Question Answering (RS-VQA) is a research task that combines remote sensing image processing and natural language understanding. The increasing complexity and diversity of question types in Remote Sensing Visual Question Answering (RS-VQA) pose significant challenges for unified multimodal reasoning within a single model architecture. Therefore, we propose the Adaptive Conditional Reasoning (ACR) network, a novel framework that dynamically tailors reasoning pathways to question semantics through type-aware feature fusion. The ACR module selectively applies different reasoning strategies depending on whether the question is open-ended or closed-ended, thereby tailoring the reasoning process to the specific nature of the question. In order to enhance the multimodal fusion process of different types of questions, the ACR model further integrates visual and textual features by leveraging type-guided cross-attention. Meanwhile, we use a Dual-Reconstruction Feature Enhancer that mitigates spatial and channel redundancy in remote sensing images via spatial and channel reconstruction convolution, enhancing discriminative feature extraction for key regions. Experimental results demonstrate that our method achieves 78.5% overall accuracy on the EarthVQA dataset, showcasing the effectiveness of adaptive reasoning in remote sensing application. Full article
(This article belongs to the Special Issue Artificial Intelligence Remote Sensing for Earth Observation)
Show Figures

Figure 1

22 pages, 8743 KB  
Article
A Lightweight and Adaptive Image Inference Strategy for Earth Observation on LEO Satellites
by Bo Wang, Yuhang Fang, Dongyan Huang, Zelin Lu and Jiaqi Lv
Remote Sens. 2025, 17(7), 1175; https://doi.org/10.3390/rs17071175 - 26 Mar 2025
Viewed by 1126
Abstract
Low Earth Orbit (LEO) satellite equipped with image inference capabilities (LEO-IISat) offer significant potential for Earth Observation (EO) missions. However, the dual challenges of limited computational capacity and unbalanced energy supply present significant obstacles. This paper introduces the Accuracy-Energy Efficiency (AEE) index to [...] Read more.
Low Earth Orbit (LEO) satellite equipped with image inference capabilities (LEO-IISat) offer significant potential for Earth Observation (EO) missions. However, the dual challenges of limited computational capacity and unbalanced energy supply present significant obstacles. This paper introduces the Accuracy-Energy Efficiency (AEE) index to quantify inference accuracy unit of energy consumption and evaluate the inference performance of LEO-IISat. It also proposes a lightweight and adaptive image inference strategy utilizing the Markov Decision Process (MDP) and Deep Q Network (DQN), which dynamically optimizes model selection to balance accuracy and energy efficiency under varying conditions. Simulations demonstrate a 31.3% improvement in inference performance compared to a fixed model strategy at the same energy consumption, achieving a maximum inference accuracy of 91.8% and an average inference accuracy of 89.1%. Compared to MDP-Policy Gradient and MDP-Q Learning strategies, the proposed strategy improves the AEE by 12.2% and 6.09%, respectively. Full article
(This article belongs to the Special Issue Artificial Intelligence Remote Sensing for Earth Observation)
Show Figures

Figure 1

20 pages, 707 KB  
Article
Remote Sensing Cross-Modal Text-Image Retrieval Based on Attention Correction and Filtering
by Xiaoyu Yang, Chao Li, Zhiming Wang, Hao Xie, Junyi Mao and Guangqiang Yin
Remote Sens. 2025, 17(3), 503; https://doi.org/10.3390/rs17030503 - 31 Jan 2025
Cited by 3 | Viewed by 2692
Abstract
Remote sensing cross-modal text-image retrieval constitutes a pivotal component of multi-modal retrieval in remote sensing, central to which is the process of learning integrated visual and textual representations. Prior research predominantly emphasized the overarching characteristics of remote sensing images, or employed attention mechanisms [...] Read more.
Remote sensing cross-modal text-image retrieval constitutes a pivotal component of multi-modal retrieval in remote sensing, central to which is the process of learning integrated visual and textual representations. Prior research predominantly emphasized the overarching characteristics of remote sensing images, or employed attention mechanisms for meticulous alignment. However, these investigations, to some degree, overlooked the intricacies inherent in the textual descriptions accompanying remote sensing images. In this paper, we introduce a novel cross-modal retrieval model, specifically tailored for remote sensing image-text, leveraging attention correction and filtering mechanisms. The proposed model is architected around four primary components: an image feature extraction module, a text feature extraction module, an attention correction module, and an attention filtering module. Within the image feature extraction module, the Visual Graph Neural Network (VIG) serves as the principal encoder, augmented by a multi-tiered node feature fusion mechanism. This ensures a comprehensive understanding of remote sensing images. For text feature extraction, both the Bidirectional Gated Recurrent Unit (BGRU) and the Graph Attention Network (GAT) are employed as encoders, furnishing the model with an enriched understanding of the associated text. The attention correction segment minimizes potential misalignments in image-text pairings, specifically by modulating attention weightings in cases where there’s a unique correlation between visual area attributes and textual descriptors. Concurrently, the attention filtering segment diminishes the influence of extraneous visual sectors and terms in the image-text matching process, thereby enhancing the precision of cross-modal retrieval. Extensive experimentation carried out on both the RSICD and RSITMD datasets, yielded commendable results, attesting to the superior efficacy of the proposed methodology in the domain of remote sensing cross-modal text-image retrieval. Full article
(This article belongs to the Special Issue Artificial Intelligence Remote Sensing for Earth Observation)
Show Figures

Figure 1

20 pages, 2388 KB  
Article
The Spectrum Difference Enhanced Network for Hyperspectral Anomaly Detection
by Shaohua Liu, Huibo Guo, Shiwen Gao and Wuxia Zhang
Remote Sens. 2024, 16(23), 4518; https://doi.org/10.3390/rs16234518 - 2 Dec 2024
Cited by 1 | Viewed by 1789
Abstract
Most deep learning-based hyperspectral anomaly detection (HAD) methods focus on modeling or reconstructing the hyperspectral background to obtain residual maps from the original hyperspectral images. However, these methods typically do not pay enough attention to the spectral similarity in the complex environment, resulting [...] Read more.
Most deep learning-based hyperspectral anomaly detection (HAD) methods focus on modeling or reconstructing the hyperspectral background to obtain residual maps from the original hyperspectral images. However, these methods typically do not pay enough attention to the spectral similarity in the complex environment, resulting in inadequate distinction between background and anomalies. Moreover, some anomalies and background are different objects, but they are sometimes recognized as the objects with the same spectrum. To address the issues mentioned above, this paper proposes a Spectrum Difference Enhanced Network (SDENet) for HAD, which employs variational mapping and Transformer to amplify spectrum differences. The proposed network is based on the encoder–decoder structure, which contains a CSWin-Transformer encoder, Variational Mapping Module (VMModule), and CSWin-Transformer decoder. First, the CSWin-Transformer encoder and decoder are designed to supplement image information by extracting deep and semantic features, where a cross-shaped window self-attention mechanism is designed to provide strong modeling capability with minimal computational cost. Second, in order to enhance the spectral difference characteristics between anomalies and background, a randomly sampling VMModule is presented for feature space transformation. Finally, all fully connected mapping operations are replaced with convolutional layers to reduce the model parameters and computational load. The effectiveness of the proposed SDENet is verified on three datasets, and experimental results show that it achieves better detection accuracy and lower model complexity compared with existing methods. Full article
(This article belongs to the Special Issue Artificial Intelligence Remote Sensing for Earth Observation)
Show Figures

Figure 1

23 pages, 6153 KB  
Article
An Enhanced Shuffle Attention with Context Decoupling Head with Wise IoU Loss for SAR Ship Detection
by Yunshan Tang, Yue Zhang, Jiarong Xiao, Yue Cao and Zhongjun Yu
Remote Sens. 2024, 16(22), 4128; https://doi.org/10.3390/rs16224128 - 5 Nov 2024
Cited by 5 | Viewed by 3367
Abstract
Synthetic Aperture Radar (SAR) imagery is widely utilized in military and civilian applications. Recent deep learning advancements have led to improved ship detection algorithms, enhancing accuracy and speed over traditional Constant False-Alarm Rate (CFAR) methods. However, challenges remain with complex backgrounds and multi-scale [...] Read more.
Synthetic Aperture Radar (SAR) imagery is widely utilized in military and civilian applications. Recent deep learning advancements have led to improved ship detection algorithms, enhancing accuracy and speed over traditional Constant False-Alarm Rate (CFAR) methods. However, challenges remain with complex backgrounds and multi-scale ship targets amidst significant interference. This paper introduces a novel method that features a context-based decoupled head, leveraging positioning and semantic information, and incorporates shuffle attention to enhance feature map interpretation. Additionally, we propose a new loss function with a dynamic non-monotonic focus mechanism to tackle these issues. Experimental results on the HRSID and SAR-Ship-Dataset demonstrate that our approach significantly improves detection performance over the original YOLOv5 algorithm and other existing methods. Full article
(This article belongs to the Special Issue Artificial Intelligence Remote Sensing for Earth Observation)
Show Figures

Figure 1

Back to TopTop