MDPI - Publisher of Open Access Journals

23 pages, 18509 KB

Open AccessArticle

MSRNet: Mamba-Based Self-Refinement Framework for Remote Sensing Change Detection

by Haoxuan Sun, Xiaogang Yang, Ruitao Lu, Jing Zhang, Bo Li and Tao Zhang

Remote Sens. 2026, 18(7), 1042; https://doi.org/10.3390/rs18071042 (registering DOI) - 30 Mar 2026

Accurate change detection (CD) in very high-resolution (VHR, <1 m) optical remote sensing images remains challenging, as it requires effective modeling of long-range bi-temporal dependencies and robustness against label noise in complex urban environments. Existing deep learning-based CD methods either rely on convolutional [...] Read more.

Accurate change detection (CD) in very high-resolution (VHR, <1 m) optical remote sensing images remains challenging, as it requires effective modeling of long-range bi-temporal dependencies and robustness against label noise in complex urban environments. Existing deep learning-based CD methods either rely on convolutional operations with limited receptive fields or employ global attention mechanisms with high computational cost, making it difficult to simultaneously achieve efficient global context modeling and fine-grained structural sensitivity. To address these challenges, we propose a Mamba-based self-refinement framework for remote sensing change detection (MSRNet). Specifically, we introduce an attention-enhanced oblique state space module (AOSS) to model spatio-temporal dependencies with linear complexity while preserving fine-grained structural information. The four-branch attention fusion module (FBAM) further enhances cross-dimensional feature interaction to improve the discriminative capability of differential representations. In addition, a self-refinement module (SRM) incorporates a momentum encoder to generate high-quality pseudo-labels, mitigating annotation noise and enabling learning from latent changes. Extensive experiments on two benchmark VHR datasets, LEVIR-CD and WHU-CD, demonstrate that MSRNet achieves state-of-the-art performance in both accuracy and computational efficiency. Full article

(This article belongs to the Section AI Remote Sensing)

34 pages, 20615 KB

Open AccessArticle

Unsupervised Change Detection in Heterogeneous Remote Sensing Images via Dynamic Mask Guidance

by Paixin Xie, Gao Chen, Qingfeng Zhou, Xiaoyan Li and Jingwen Yan

Remote Sens. 2026, 18(7), 1022; https://doi.org/10.3390/rs18071022 - 29 Mar 2026

Abstract

Unsupervised change detection (CD) in heterogeneous remote sensing images is intrinsically difficult due to severe sensor-specific discrepancies. In the absence of ground truth, these discrepancies result in ambiguous optimization objectives that make it difficult for models to distinguish true land-cover changes from modality-driven [...] Read more.

Unsupervised change detection (CD) in heterogeneous remote sensing images is intrinsically difficult due to severe sensor-specific discrepancies. In the absence of ground truth, these discrepancies result in ambiguous optimization objectives that make it difficult for models to distinguish true land-cover changes from modality-driven pseudo-changes. To address these challenges, we propose MaskUCD, a novel unsupervised framework that reformulates heterogeneous CD as a dynamic mask-driven constraint scheduling problem. Fundamentally distinct from conventional strategies that enforce selective feature alignment, MaskUCD employs a spatially adaptive optimization mechanism. Specifically, the iteratively refined mask serves as a geometric reference to guide optimization. It enforces strict feature alignment in mask-unchanged regions to suppress modality-induced discrepancies, while simultaneously promoting feature divergence in mask-changed regions to emphasize semantic inconsistencies. In this way, explicit optimization objectives are established, together with an intrinsic interpretability constraint that guides the CD process. This strategy treats the mask as a structural guide for representation learning rather than a ground-truth reference, thereby avoiding error accumulation caused by directly using inaccurate masks as supervisory signals. To facilitate this optimization, we design a specialized asymmetric autoencoder with a hybrid encoder architecture, utilizing multi-scale frequency analysis and global context modeling to enhance feature representation capabilities. Consequently, this design enables the generation of refined and semantically consistent masks, which provide increasingly precise structural guidance, yielding converged and discriminative difference maps. Extensive experiments demonstrate that MaskUCD achieves state-of-the-art performance and superior robustness compared to existing advanced methods. Full article

(This article belongs to the Special Issue Artificial Intelligence and Machine Learning for Multi-Modal and Multi-Spectral Remote Sensing Image Processing)

31 pages, 11688 KB

Open AccessArticle

RShDet: An Adaptive Spectral-Aware Network for Remote Sensing Object Detection Under Haze Corruption

by Wei Zhang, Yuantao Wang, Haowei Yang and Xuerui Mao

Remote Sens. 2026, 18(7), 1020; https://doi.org/10.3390/rs18071020 - 29 Mar 2026

Abstract

Remote sensing (RS) object detection faces intrinsic challenges arising from the overhead imaging paradigm and the diversity of climatic conditions. In particular, atmospheric phenomena such as clouds and haze cause severe visual degradation, making reliable object detection difficult. However, most existing detectors are [...] Read more.

Remote sensing (RS) object detection faces intrinsic challenges arising from the overhead imaging paradigm and the diversity of climatic conditions. In particular, atmospheric phenomena such as clouds and haze cause severe visual degradation, making reliable object detection difficult. However, most existing detectors are developed under clear-weather conditions, which limits their generalization capability in realistic haze-degraded RS scenarios. To alleviate this issue, an adaptive spectral-aware network for RS object detection under haze interference is proposed, termed RShDet, which is designed to handle both high-altitude RS imagery and low-altitude Unmanned Aerial Vehicle (UAV) scenarios. Firstly, the Object-Centered Dynamic Enhancement (OCDE) module dynamically adjusts the spatial positions of key-value pairs through query-agnostic offsets, enabling the network to emphasize object-relevant regions while suppressing haze-induced background interference. Secondly, the Dynamic Multi-Spectral Perception and Filtering (DSPF) module introduces a multi-spectral attention mechanism that adaptively selects informative frequency components, thereby enhancing discriminative feature representations in hazy environments. Thirdly, the Frequency-Domain Multi-Feature Fusion (FDMF) module employs learnable weights to complementarily integrate amplitude and phase information in the frequency domain, enabling effective cross-task feature interaction between the enhancement and detection branches. Extensive experiments demonstrate that RShDet consistently achieves superior detection performance under hazy conditions across both synthetic and real-world benchmarks. Specifically, it achieves improvements of 2.4% mAP50 on Hazy-DOTA, 1.9% mAP on HazyDet, and 2.33% mAP on the real-world foggy dataset RTTS, surpassing existing state-of-the-art methods. Full article

(This article belongs to the Special Issue Advances in Remote Sensing Image Target Detection and Recognition)

42 pages, 6313 KB

Open AccessArticle

When Lie Groups Meet Hyperspectral Images: Equivariant Manifold Network for Few-Shot HSI Classification

by Haolong Ban, Junchao Feng, Zejin Liu, Yue Jiang, Zhenxing Wang, Jialiang Liu, Yaowen Hu and Yuanshan Lin

Sensors 2026, 26(7), 2117; https://doi.org/10.3390/s26072117 - 29 Mar 2026

Abstract

Hyperspectral imagery (HSI) offers rich spectral signatures and fine-grained spatial structures for remote sensing, but practical HSI classification is often constrained by scarce labels and complex geometric disturbances, including translation, rotation, scaling, and shear. Existing deep models are typically developed under Euclidean assumptions [...] Read more.

Hyperspectral imagery (HSI) offers rich spectral signatures and fine-grained spatial structures for remote sensing, but practical HSI classification is often constrained by scarce labels and complex geometric disturbances, including translation, rotation, scaling, and shear. Existing deep models are typically developed under Euclidean assumptions and rely on data-hungry training pipelines, which makes them brittle in the few-shot regime. To address this challenge, we propose EMNet, a Lie-group-based Equivariant Manifold Network for few-shot HSI classification that explicitly encodes geometric invariance and improves discriminative accuracy. EMNet couples an SE(2)-based Equivariance-Guided Module (EGM) to enforce equivariance to translations and rotations with an affine Lie-group-based Characteristic Filtering Convolution (CFC) that models scaling and shearing on the feature manifold while adaptively suppressing redundant responses. Extensive experiments on WHU-Hi-HongHu, Houston2013, and Indian Pines demonstrate state-of-the-art performance with competitive complexity, achieving OAs of 95.77% (50 samples/class), 97.37% (50 samples/class), and 96.09% (5% labeled samples), respectively, and yielding up to +3.34% OA, +6.01% AA, and +4.14% Kappa over the strong DGPF-RENet baseline. Under a stricter 25-samples-per-class protocol with 10 repeated random hold-out splits, EMNet consistently improves the mean accuracy while exhibiting lower variance, indicating better stability to sampling uncertainty. On the city-scale Xiongan New Area dataset with extreme long-tail imbalance (1580 × 3750 pixels, 256 bands, and 5.925 M labeled pixels), EMNet further boosts OA from 85.89% to 93.77% under the 1% labeled-sample protocol, highlighting robust generalization for large-area mapping. Beyond point estimates, we report mean ± SD/SE across repeated splits and provide rigorous statistical validation by computing Yule’s Q statistic for class-wise behavior similarity, performing the Friedman test with Nemenyi post hoc comparisons for multi-method ranking significance, and presenting 95% confidence intervals together with Cohen’s d effect sizes to quantify practical improvement. Full article

(This article belongs to the Special Issue Hyperspectral Sensing: Imaging and Applications)

35 pages, 51980 KB

Open AccessArticle

Structurally Consistent and Grounding-Aware Stagewise Reasoning for Referring Remote Sensing Image Segmentation

by Shan Dong, Jianlin Xie, Liang Chen, He Chen, Baogui Qi and Yunqiu Ge

Remote Sens. 2026, 18(7), 1015; https://doi.org/10.3390/rs18071015 - 28 Mar 2026

Viewed by 51

Abstract

Referring Remote Sensing Image Segmentation (RRSIS) is a representative multimodal understanding task for remote sensing, which segments designated targets from remote images according to free-form natural language descriptions. However, complex remote sensing characteristics, such as cluttered backgrounds, large-scale variations, small scattered targets and [...] Read more.

Referring Remote Sensing Image Segmentation (RRSIS) is a representative multimodal understanding task for remote sensing, which segments designated targets from remote images according to free-form natural language descriptions. However, complex remote sensing characteristics, such as cluttered backgrounds, large-scale variations, small scattered targets and repetitive textures, lead to unstable visual grounding and further spatial grounding drift, resulting in inaccurate segmentation results. Existing approaches typically perform implicit visual–linguistic fusion across encoding and decoding stages, entangling spatial grounding with mask refinement. This tightly coupled formulation lacks explicit structural constraints and is prone to cross-modal ambiguity, especially in complex remote sensing layouts. To address these limitations, we propose a Structurally consistent and Grounding-aware Stagewise Reasoning Framework (SGSRF) that follows a grounding-first, segmentation-second paradigm. The framework decomposes inference into three cascaded stages with progressively imposed structural constraints. First, Cross-modal Consistency Refinement (CCR) lays the foundation for stable spatial grounding by enhancing visual–textual structural alignment via CLIP-based features and Structural Consistency Regularization (SCR), producing well-aligned multimodal representations and reliable grounding cues. Second, Grounding-aware Prompt (GPG) Generation bridges grounding and segmentation by converting aligned representations into complementary sparse and dense prompts, which serve as explicit grounding guidance for the segmentation model. Third, Grounding Modulated Segmentation (GMS) leverages the Segment Anything Model (SAM) to generate fine-grained mask prediction under the joint guidance of prompts and grounding cues, improving spatial grounding stability and robustness to background interference and scale variation. Extensive experiments on three remote sensing benchmarks , namely RefSegRS, RRSIS-D, and RISBench, demonstrate that SGSRF achieves state-of-the-art performance. The proposed stagewise paradigm integrates structural alignment, explicit grounding, and prompt-driven segmentation into a unified framework, providing a practical and robust solution for RRSIS in real-world Earth observation applications. Full article

(This article belongs to the Special Issue Deep Learning for Multi-Source Remote Sensing Image Interpretation: Exploring, Rethinking, and Limiting Breakthroughs)

18 pages, 11374 KB

Open AccessArticle

CSGL-Former: Cross-Stripes Global–Local Fusion Transformer for Remote Sensing Image Dehazing

by Shuyi Feng, Xiran Zhang, Jie Yuan and Youwen Zhu

Sensors 2026, 26(7), 2102; https://doi.org/10.3390/s26072102 - 28 Mar 2026

Viewed by 96

Abstract

Remote sensing (RS) images are often degraded by atmospheric haze, which compromises both visual interpretation and downstream applications. To address this, we introduce CSGL-Former, a novel Cross-Stripes Global–Local Fusion Transformer for RS image dehazing. Our model efficiently captures anisotropic long-range dependencies using cross-stripes [...] Read more.

Remote sensing (RS) images are often degraded by atmospheric haze, which compromises both visual interpretation and downstream applications. To address this, we introduce CSGL-Former, a novel Cross-Stripes Global–Local Fusion Transformer for RS image dehazing. Our model efficiently captures anisotropic long-range dependencies using cross-stripes attention (CSA) and aggregates hierarchical global semantics via a Multi-Layer Global Aggregation (MLGA) module. In the decoder, global context is adaptively blended with fine-grained local features to restore intricate textures. Finally, inspired by the atmospheric scattering model, a soft reconstruction head restores the clear image by predicting spatially varying affine parameters, strictly preserving content fidelity while effectively removing haze. Trained end-to-end, CSGL-Former demonstrates a compelling balance of accuracy and efficiency. Extensive experiments on the RRSHID and SateHaze1K benchmarks show that our model achieves state-of-the-art or highly competitive performance against representative baselines. Ablation studies further validate the effectiveness of each proposed component. Full article

(This article belongs to the Special Issue Advanced Pattern Recognition: Intelligent Sensing and Imaging)

► Show Figures

Figure 1

21 pages, 922 KB

Open AccessArticle

DBCF-Net: A Dual-Branch Cross-Scale Fusion Network for Heterogeneous Satellite–UAV Change Detection

by Yan Ren, Ruiyong Li, Pengbo Zhai and Xinyu Chen

Remote Sens. 2026, 18(7), 1009; https://doi.org/10.3390/rs18071009 - 27 Mar 2026

Viewed by 101

Abstract

Heterogeneous change detection (HCD) using satellite and Unmanned Aerial Vehicle (UAV) imagery is a pivotal task in remote sensing and Earth observation. However, the effective utilization of such multi-source data is significantly hindered by extreme spatial resolution disparities and distinct radiometric characteristics. Existing [...] Read more.

Heterogeneous change detection (HCD) using satellite and Unmanned Aerial Vehicle (UAV) imagery is a pivotal task in remote sensing and Earth observation. However, the effective utilization of such multi-source data is significantly hindered by extreme spatial resolution disparities and distinct radiometric characteristics. Existing deep learning methods, often based on weight-sharing Siamese architectures, struggle to bridge these domain gaps, leading to spectral pseudo-changes and blurred detection boundaries. To address these challenges, we propose a novel Dual-Branch Cross-Scale Fusion Network (DBCF-Net) specifically tailored for heterogeneous satellite–UAV change detection. We introduce a Difference-Aware Attention Module (DAAM) to explicitly align cross-modal feature spaces and suppress domain-related noise through a hybrid local–global attention mechanism. Furthermore, an Adaptive Gated Fusion Module (AGFM) is designed to dynamically weight multi-scale interactions, ensuring the preservation of high-frequency spatial details from UAV imagery while maintaining the semantic consistency of satellite data. Extensive experiments on the Heterogeneous Satellite–UAV Dataset (HSUD) demonstrate that DBCF-Net achieves state-of-the-art performance, reaching an F1-score of 88.75% and an IoU of 80.58%. This study provides a robust technical framework for heterogeneous sensor fusion and high-precision monitoring in complex remote sensing scenarios. Full article

(This article belongs to the Section Remote Sensing Image Processing)

21 pages, 11455 KB

Open AccessArticle

Cross-Scale Spectral Calibration for Spatiotemporal Fusion of Remote Sensing Images

by Yishuo Tian, Xiaorong Xue, Jingtong Yang, Wen Zhang, Bingyan Lu, Xin Zhao and Wancheng Wang

Sensors 2026, 26(7), 2090; https://doi.org/10.3390/s26072090 - 27 Mar 2026

Viewed by 271

Abstract

Spatiotemporal fusion aims to generate remote sensing images with both high spatial and high temporal resolution by integrating multi-source observations. However, significant spectral inconsistencies often arise when fusing images acquired at different spatial scales, which severely degrade the radiometric fidelity and temporal reliability [...] Read more.

Spatiotemporal fusion aims to generate remote sensing images with both high spatial and high temporal resolution by integrating multi-source observations. However, significant spectral inconsistencies often arise when fusing images acquired at different spatial scales, which severely degrade the radiometric fidelity and temporal reliability of the fused results. Most existing methods focus on enhancing spatial details or temporal consistency, while the cross-scale spectral discrepancy between coarse- and fine-resolution images has not been sufficiently addressed. To tackle this issue, we propose a cross-scale spectral calibration framework for spatiotemporal fusion (XSC-Net), which explicitly models and corrects spectral responses across different spatial scales. The proposed method introduces a spatial feature refinement block to enhance spatially discriminative structures and a hierarchical spectral refinement block to adaptively calibrate channel-wise spectral representations. By jointly exploiting spatial and spectral correlations, the proposed framework effectively suppresses spectral distortion while preserving fine spatial details. Extensive experiments on the public CIA and LGC datasets indicate that XSC-Net compares favorably with state-of-the-art methods, demonstrating superior performance over established baselines. Furthermore, ablation studies verify the efficacy and contribution of the proposed architectural components. Full article

(This article belongs to the Section Remote Sensors)

► Show Figures

Figure 1

25 pages, 42196 KB

Open AccessArticle

Frequency–Spatial Domain Jointly Guided Perceptual Network for Infrared Small Target Detection

by Yeteng Han, Minrui Ye, Bohan Liu, Jie Li, Chaoxian Jia, Wennan Cui and Tao Zhang

Remote Sens. 2026, 18(7), 1000; https://doi.org/10.3390/rs18071000 - 26 Mar 2026

Viewed by 371

Abstract

Infrared small target detection is a critical task in remote sensing. However, it remains highly challenging due to low contrast, heavy background clutter, and large variations in target scale. Traditional convolutional networks are inadequate for joint modeling, as they cannot effectively capture both [...] Read more.

Infrared small target detection is a critical task in remote sensing. However, it remains highly challenging due to low contrast, heavy background clutter, and large variations in target scale. Traditional convolutional networks are inadequate for joint modeling, as they cannot effectively capture both fine structural details and global contextual dependencies. To address these issues, we propose FSGPNet, a frequency–spatial domain jointly guided perceptual network that explicitly exploits complementary representations in both the frequency and spatial domains. Specifically, a Frequency–Spatial Enhancement Module (FSEM) is introduced to strengthen target details while suppressing background interference through high-frequency enhancement and Perona–Malik diffusion. To enhance global context modeling, we propose a Multi-Scale Global Perception (MSGP) module that integrates non-local attention with multi-scale dilated convolutions, enabling robust background modeling. Furthermore, a Gabor Transformer Attention Module (GTAM) is designed to achieve selective frequency–spatial feature aggregation via self-attention over multi-directional and multi-scale Gabor responses, effectively highlighting discriminative structures of various small targets. Extensive experiments are conducted on two benchmark datasets (IRSTD-1K and NUDT-SIRST) that cover typical remote sensing infrared scenarios. Quantitative and qualitative results demonstrate that FSGPNet consistently outperforms state-of-the-art methods across multiple evaluation metrics. These findings validate the effectiveness and robustness of the proposed FSGPNet for detecting small infrared targets in remote sensing applications. Full article

(This article belongs to the Special Issue Deep Learning-Based Small-Target Detection in Remote Sensing)

► Show Figures

Figure 1

33 pages, 172200 KB

Open AccessArticle

HDCGAN+: A Low-Illumination UAV Remote Sensing Image Enhancement and Evaluation Method Based on WPID

by Kelly Chen Ke, Min Sun, Xinyi Wang, Dong Liu and Hanjun Yang

Remote Sens. 2026, 18(7), 999; https://doi.org/10.3390/rs18070999 - 26 Mar 2026

Viewed by 143

Abstract

Remote sensing images acquired by UAVs under nighttime or low-illumination conditions suffer from insufficient illumination, leading to degraded image quality, detail loss, and noise, which restrict their application in public security and disaster emergency scenarios. Although existing machine learning-based enhancement methods can recover [...] Read more.

Remote sensing images acquired by UAVs under nighttime or low-illumination conditions suffer from insufficient illumination, leading to degraded image quality, detail loss, and noise, which restrict their application in public security and disaster emergency scenarios. Although existing machine learning-based enhancement methods can recover part of the missing information, they often cause color distortion and texture inconsistency. This study proposes an improved low-illumination image enhancement method based on a Weakly Paired Image Dataset (WPID), combining the Hierarchical Deep Convolutional Generative Adversarial Network (HDCGAN) with a low-rank image fusion strategy to enhance the quality of low-illumination UAV remote sensing images. First, YCbCr color channel separation is applied to preserve color information from visible images. Then, a Low-Rank Representation Fusion Network (LRRNet) is employed to perform structure-aware fusion between thermal infrared (TIR) and visible images, thereby enabling effective preservation of structural details and realistic color appearance. Furthermore, a weakly paired training mechanism is incorporated into HDCGAN to enhance detail restoration and structural fidelity. To achieve objective evaluation, a structural consistency assessment framework is constructed based on semantic segmentation results from the Segment Anything Model (SAM). Experimental results demonstrate that the proposed method outperforms state-of-the-art approaches in both visual quality and application-oriented evaluation metrics. Full article

(This article belongs to the Section Remote Sensing Image Processing)

► Show Figures

Figure 1

25 pages, 3612 KB

Open AccessArticle

CrtNet: A Cross-Model Residual Transformer Network for Structure-Guided Remote Sensing Scene Classification

by Chaoran Chen, Tianyuan Zhu, Tao Cui, Dalin Li, Adriano Tavares, Yanchun Liang and Yanheng Liu

Electronics 2026, 15(7), 1366; https://doi.org/10.3390/electronics15071366 - 25 Mar 2026

Viewed by 242

Abstract

Accurate remote sensing scene classification is essential for large-scale Earth observation but remains challenging due to significant inter-class similarity and complex spatial layouts in medium- and low-resolution imagery. Conventional convolutional neural networks (CNNs) effectively capture local structural patterns but struggle to model long-range [...] Read more.

Accurate remote sensing scene classification is essential for large-scale Earth observation but remains challenging due to significant inter-class similarity and complex spatial layouts in medium- and low-resolution imagery. Conventional convolutional neural networks (CNNs) effectively capture local structural patterns but struggle to model long-range semantic dependencies, whereas Vision Transformers excel at global context modeling yet often show reduced sensitivity to fine-grained spatial structures. To address these limitations, we propose CrtNet, a structure-aware Cross-Model Residual Transformer Network that establishes a dual-stream collaborative architecture integrating convolutional structural representations with Transformer-based semantic modeling through gated residual cross-model interactions. In this framework, a convolutional branch first extracts stable local structural features with strong spatial inductive biases. These features are continuously injected into the Transformer encoding process via residual cross-model connections, enabling persistent structural guidance during global attention modeling. In addition, a sample-adaptive dynamic gating mechanism is introduced to flexibly balance structural and semantic features during prediction. Extensive experiments conducted on two public remote sensing benchmarks, EuroSAT and UCM, demonstrate that CrtNet consistently outperforms representative CNN-based, Transformer-based, and hybrid state-of-the-art models, particularly in visually ambiguous scene categories. Full article

(This article belongs to the Special Issue Computer Vision and Machine Learning: Real-World Applications)

► Show Figures

Figure 1

27 pages, 8177 KB

Open AccessArticle

DINOv3-PEFT: A Dual-Branch Collaborative Network with Parameter-Efficient Fine-Tuning for Precise Road Segmentation in SAR Imagery

by Debao Chen, Wanlin Yang, Ye Yuan and Juntao Gu

Remote Sens. 2026, 18(7), 973; https://doi.org/10.3390/rs18070973 - 24 Mar 2026

Viewed by 109

Abstract

Extracting road networks from Synthetic Aperture Radar (SAR) data represents a core challenge in remote sensing scene analysis, particularly for applications in traffic monitoring and emergency management. The task is complicated by several inherent limitations: speckle noise degrades image quality, geometric distortions arise [...] Read more.

Extracting road networks from Synthetic Aperture Radar (SAR) data represents a core challenge in remote sensing scene analysis, particularly for applications in traffic monitoring and emergency management. The task is complicated by several inherent limitations: speckle noise degrades image quality, geometric distortions arise from the side-looking acquisition geometry, and roads often exhibit weak radiometric separation from surrounding terrain. Traditional processing pipelines and recent single-branch deep learning frameworks have shown insufficient performance when global contextual reasoning and fine-scale spatial detail must both be addressed. This work presents DINOv3-PEFT, a parameter-efficient dual-encoder network designed specifically for SAR road segmentation. The architecture employs two complementary processing streams tailored to SAR characteristics: one stream utilizes adapter-based fine-tuning applied to pre-trained DINOv3 weights (kept frozen), which captures long-distance spatial relationships crucial for maintaining network connectivity despite speckle corruption. The second stream, based on convolutional operations, focuses on extracting localized geometric features that preserve the narrow, elongated structure and sharp boundaries typical of road infrastructure. Feature fusion occurs through the Topological-Geometric Feature Integration (TGFI) Module, which synthesizes multi-scale representations hierarchically. This mechanism proves effective at bridging fragmented road segments and recovering geometric accuracy in scenarios with heavy shadow casting or signal interference. Performance evaluation on the GF-3 satellite dataset across four spatial resolutions (1 m, 3 m, 5 m, and 10 m) demonstrates the proposed method achieves an 82.61% F1-score, a 76.51% IoU, and a 98.08% overall accuracy, all averaged across the four resolutions. When benchmarked against six state-of-the-art methods, DINOv3-PEFT demonstrates substantial improvements in road class segmentation quality and topological connectivity preservation, supporting its robustness for operational SAR road mapping tasks. Full article

(This article belongs to the Special Issue Road Extraction and Distress Assessment by Spaceborne, Airborne and Terrestrial Platforms (Second Edition))

► Show Figures

Figure 1

28 pages, 22901 KB

Open AccessArticle

IAMS (Interior-Anchored Mean-Shift) Algorithm for Supervoxel Segmentation of Airborne LiDAR Roof Points

by Hanyu Zhou, Liang Zhang, Zhiyue Zhang, Haiqiong Yang, Xiongfei Tang, Hongchao Ma and Chunjing Yao

Remote Sens. 2026, 18(6), 965; https://doi.org/10.3390/rs18060965 - 23 Mar 2026

Viewed by 158

Abstract

Accurate building roof classification from airborne LiDAR point clouds is fundamental to reliable three-dimensional (3D) urban reconstruction. While supervoxel-based methods offer efficiency and resilience to uneven point density, their performance is critically undermined by cross-boundary segmentation errors—a direct consequence of random seed initialization [...] Read more.

Accurate building roof classification from airborne LiDAR point clouds is fundamental to reliable three-dimensional (3D) urban reconstruction. While supervoxel-based methods offer efficiency and resilience to uneven point density, their performance is critically undermined by cross-boundary segmentation errors—a direct consequence of random seed initialization that merges geometrically similar yet semantically distinct objects. To address this root cause, this study proposes Interior-Anchored Mean-Shift (IAMS), a novel supervoxel segmentation framework that rethinks seed placement as a geometry-aware interior localization problem. By integrating local geometric consistency point density, and spatial correlation into a unified kernel density estimator, supplemented by density-adaptive voxel weighting and a semi-variogram-driven bandwidth, IAMS reliably anchors seeds within object interiors, yielding highly homogeneous supervoxels without post-processing. Extensive experiments on three diverse airborne LiDAR datasets demonstrated that IAMS consistently outperformed state-of-the-art baselines. On the International Society for Photogrammetry and Remote Sensing (ISPRS) Vaihingen benchmark, our approach improved roof classification completeness, correctness, and quality by up to 7.1% (per-object) over the conventional Voxel Cloud Connectivity Segmentation (VCCS) algorithm while being significantly faster than recent boundary-preserving alternatives. Critically, IAMS maintains robust performance under challenging conditions, including sparse sampling and dense vegetation occlusion, making it a practical solution for real-world urban remote sensing. Full article

(This article belongs to the Section Urban Remote Sensing)

► Show Figures

Figure 1

18 pages, 6071 KB

Open AccessArticle

DFENet: A Novel Dual-Path Feature Extraction Network for Semantic Segmentation of Remote Sensing Images

by Li Cao, Zishang Liu, Yan Wang and Run Gao

J. Imaging 2026, 12(3), 141; https://doi.org/10.3390/jimaging12030141 - 23 Mar 2026

Viewed by 205

Abstract

Semantic segmentation of remote sensing images (RSIs) is a fundamental task in geoscience research. However, designing efficient feature fusion modules remains challenging for existing dual-branch or multi-branch architectures. Furthermore, existing deep learning-based architectures predominantly concentrate on spatial feature modeling and context capturing while [...] Read more.

Semantic segmentation of remote sensing images (RSIs) is a fundamental task in geoscience research. However, designing efficient feature fusion modules remains challenging for existing dual-branch or multi-branch architectures. Furthermore, existing deep learning-based architectures predominantly concentrate on spatial feature modeling and context capturing while inherently neglecting the exploration and utilization of critical frequency-domain features, which is crucial for addressing issues of semantic confusion and blurred boundaries in complex remote sensing scenes. To address the challenges of feature fusion and the lack of frequency-domain information, we propose a novel dual-path feature extraction network (DFENet) in this paper. Specifically, a dual-path module (DPM) is developed in DFENet to extract global and local features, respectively. In the global path, after applying the channel splitting strategy, four feature extraction strategies are innovatively integrated to extract global features from different granularities. According to the strategy of supplementing frequency-domain information, a frequency-domain feature extraction block (FFEB) dominated by discrete Wavelet transform (DWT) is designed to effectively captures both high- and low-frequency components. Experimental results show that our method outperforms existing state-of-the-art methods in terms of segmentation performance, achieving a mean intersection over union (mIoU) of 83.09% on the ISPRS Vaihingen dataset and 86.05% on the ISPRS Potsdam dataset. Full article

(This article belongs to the Section Image and Video Processing)

► Show Figures

Figure 1

20 pages, 8955 KB

Open AccessArticle

Language-Guided Contrastive Learning and Difference Enhancement for Semantic Change Detection in Remote Sensing Images

by Yongli Hu, Lintian Ren, Huajie Jiang, Kan Guo, Tengfei Liu, Junbin Gao, Yanfeng Sun and Baocai Yin

Remote Sens. 2026, 18(6), 964; https://doi.org/10.3390/rs18060964 - 23 Mar 2026

Viewed by 209

Abstract

Semantic change detection (SCD) in remote sensing images aims not only to localize changed regions but also to identify their specific “from–to” semantic transitions. This task remains challenging due to the inherent semantic ambiguity of spectral changes and the presence of pseudo-change noise. [...] Read more.

Semantic change detection (SCD) in remote sensing images aims not only to localize changed regions but also to identify their specific “from–to” semantic transitions. This task remains challenging due to the inherent semantic ambiguity of spectral changes and the presence of pseudo-change noise. While recent vision–language models have shown promise in remote sensing, existing approaches like RemoteCLIP predominantly focus on static scene classification, lacking the ability to explicitly model dynamic temporal transitions. Other adaptations of foundation models (e.g., AdaptVFMs-RSCD) often rely on heavy backbones, incurring prohibitive computational costs. To address these limitations, this paper proposes LGDENet, a lightweight, end-to-end framework that unifies Language-Guided Temporal Contrastive Learning with a noise-robust difference enhancement mechanism. Specifically, we construct a temporal transition prompt learning strategy that aligns visual difference features with textual descriptions of dynamic processes, thereby resolving directional semantic ambiguities. Furthermore, we introduce a Difference Enhancement Module (DEM) that leverages the channel–spatial decoupling property of depthwise separable convolutions to adaptively isolate and suppress irrelevant variations (e.g., registration errors) before feature fusion. Experiments on the SECOND and Landsat-SCD datasets demonstrate that LGDENet achieves state-of-the-art performance, yielding a semantic F1 score (

F_{s c d}

) of 87.90% and 88.71%, respectively. Moreover, with a modest parameter count of 33.45 M, it offers a superior trade-off between accuracy and efficiency compared to heavy foundation model-based approaches. Full article

► Show Figures

Figure 1

Search Results (2,110)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (2,110)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI