MDPI - Publisher of Open Access Journals

23 pages, 7575 KB

Open AccessArticle

Pixel’s Neighbors Are Noteworthy: Localized Vision–Language Attention for Remote Sensing Semantic Segmentation

by Cheng Zeng, Sheng Tao, Xiaowei Tan, Zhifeng Xiao and Lei Hu

Remote Sens. 2026, 18(11), 1708; https://doi.org/10.3390/rs18111708 - 26 May 2026

In recent years, vision–language models (VLMs) have been introduced into remote sensing semantic segmentation to provide richer semantic representations through visual–textual alignment. However, most existing VLM-based segmentation methods focus on global semantic alignment while neglecting pixel-level local neighborhood features, which are crucial for [...] Read more.

In recent years, vision–language models (VLMs) have been introduced into remote sensing semantic segmentation to provide richer semantic representations through visual–textual alignment. However, most existing VLM-based segmentation methods focus on global semantic alignment while neglecting pixel-level local neighborhood features, which are crucial for reliably understanding remote sensing imagery with high spatial resolution, complex structures, and strong spatial continuity. To address this issue, we propose LoVLANet (Localized Vision–Language Attention Network), a novel vision–language segmentation framework that integrates language-driven global semantics with local spatial context. LoVLANet consists of a text encoder, a visual encoder, and a segmentation decoder. Specifically, the text encoder is inherited from RemoteCLIP to preserve domain-adapted vision–language alignment. The visual encoder is built upon a Vision Transformer (ViT). To enhance local dependency modeling, we propose a Neighborhood Key–Key Encoder. It leverages a Gaussian-weighted neighborhood matrix for spatial correlation and uses key–key similarity to emphasize intrinsic semantic similarity over query-driven features, thus, preserving spatial consistency. Finally, the segmentation decoder fuses multi-scale visual features and aligns the image–text representations to generate accurate pixel-level segmentation results. Experiments on RGB remote sensing benchmarks, including LoveDA and GID, show that LoVLANet achieves competitive segmentation performance under the adopted experimental settings, with improved mIoU and clearer boundary delineation in qualitative visualizations. These results suggest the effectiveness of explicitly modeling local neighborhood relationships in VLM-based segmentation for supervised remote sensing scene understanding. Full article

(This article belongs to the Section Remote Sensing Image Processing)

► Show Figures

Figure 1

24 pages, 62422 KB

Open AccessArticle

GDBNet: A Three-Branch Semantic Segmentation Network Integrating CNN and Transformer for Land Cover Classification in Ski Resorts

by Zhiwei Yi, Lingjia Gu, Ruifei Zhu, Junwei Tian and He Mi

Remote Sens. 2026, 18(10), 1666; https://doi.org/10.3390/rs18101666 - 21 May 2026

Viewed by 105

Abstract

As a critical component of ice-snow tourism, land cover classification for ski resorts is crucial to ice-snow resource management. However, there is currently a scarcity of datasets and methods capable of high-precision mapping for such fine-grained scenarios. Although Transformers with long-sequence interactions and [...] Read more.

As a critical component of ice-snow tourism, land cover classification for ski resorts is crucial to ice-snow resource management. However, there is currently a scarcity of datasets and methods capable of high-precision mapping for such fine-grained scenarios. Although Transformers with long-sequence interactions and convolutional neural networks (CNNs) have emerged as mainstream solutions, their performance remains limited on high-resolution remote sensing data characterized by small datasets and high heterogeneity. Targeting land cover classification in ski resort areas, this study proposes a triple-branch segmentation framework integrating CNNs and Transformers to extract global, detail and boundary features (GDBNet), and constructs the first high-resolution ski resort land cover dataset with a resolution of 0.75 m using JiLin-1 satellite constellation (LULC_SKI). The framework employs a backbone combining SegFormer with dual CNN branches. SegFormer captures global semantic context, while dual ResNet-18 branches extract local semantics and edge details respectively. The neck integrates two specialized feature interaction modules, the proposed Pixel-Guided Feature Attention (PG-AFM) and Boundary-Guided Feature Attention (BG-AFM), which synergistically fuse these heterogeneous feature representations for enhanced multi-scale modeling. For the segmentation head, a multi-task learning approach supervises both semantic and edge outputs. LULC_SKI covers seven representative ski resorts in Jilin Province, China, comprising 10,000 multi-seasonal images annotated with six land cover classes, including roads, vegetation, built-up areas, ski runs, water bodies, and cropland. Experiments demonstrate GDBNet achieves 85.44% mIoU and 91.84% mF1 on LULC_SKI, outperforming other advanced models with particularly significant improvements for linear objects like roads and ski runs. Extensive experimental comparisons show that GDBNet delivers consistently excellent performance on both the iSAID and LoveDA datasets, underscoring the superiority of our proposed method. Ablation studies validate the effectiveness of the triple-branch architecture, attention modules, and multi-task supervision. This work proposes a modular framework for land cover classification in complex ski resort scenarios. Full article

(This article belongs to the Special Issue Signal Processing, Image Processing and Fusion Techniques in Remote Sensing)

► Show Figures

Figure 1

27 pages, 3616 KB

Open AccessArticle

LiteRoadSegNet: A Lightweight Road Segmentation Framework with Semantic–Topological Contrastive Learning in High-Resolution Remote Sensing Imagery

by Tao Wu, Yu Peng, Jianxin Qin, Yiliang Wan and Yaling Hu

Remote Sens. 2026, 18(10), 1664; https://doi.org/10.3390/rs18101664 - 21 May 2026

Viewed by 125

Abstract

Deploying deep learning models for high-resolution remote sensing image segmentation remains challenging in resource-constrained scenarios due to the high computational cost of dense prediction and the structural vulnerability of thin objects such as roads. To address these challenges, we propose LiteRoadSegNet, a lightweight [...] Read more.

Deploying deep learning models for high-resolution remote sensing image segmentation remains challenging in resource-constrained scenarios due to the high computational cost of dense prediction and the structural vulnerability of thin objects such as roads. To address these challenges, we propose LiteRoadSegNet, a lightweight and deployment-oriented segmentation framework that achieves a favorable balance among efficiency, accuracy, and structural preservation. The proposed model adopts a compact encoder–decoder architecture composed of a lightweight hierarchical vision transformer and a streamlined decoder, enabling efficient multi-scale feature representation under limited computational budgets. To enhance structural consistency without increasing inference overhead, we further design a low-cost semantic–topological dual-branch contrastive learning scheme which enhances feature discriminability and preserves road connectivity during training. In addition, to improve deployment robustness in cross-region scenarios, we incorporate a lightweight test-time adaptation strategy based on Adaptive Batch Normalization (AdaBN) and sliding-window inference. This strategy enables seamless adaptation to unlabeled target domains without requiring model retraining. Extensive experiments demonstrate that LiteRoadSegNet achieves competitive segmentation performance and superior topology preservation while maintaining a small model footprint and high inference efficiency, making it well suited for large-scale remote sensing applications under resource-constrained environments. Full article

(This article belongs to the Special Issue Lightweight Artificial-Intelligence Techniques for Remote-Sensing Image Processing)

► Show Figures

Figure 1

29 pages, 3512 KB

Open AccessArticle

BGE-ICMER: Bare-Ground-Echo-Based Iterative Correction of Multi-Echo Reflectance for Hyperspectral LiDAR

by Xinyi Pan, Binhui Wang, Jiahang Wan, Shalei Song and Shuo Shi

Remote Sens. 2026, 18(10), 1648; https://doi.org/10.3390/rs18101648 - 20 May 2026

Viewed by 205

Abstract

Full-waveform hyperspectral LiDAR offers a new approach for precise forest ecological monitoring by simultaneously acquiring the three-dimensional structure and continuous spectral information of targets. However, uncertainty in the backscattering cross-section and the inseparability of the reflectance coefficient lead to systematic underestimation of multi-echo [...] Read more.

Full-waveform hyperspectral LiDAR offers a new approach for precise forest ecological monitoring by simultaneously acquiring the three-dimensional structure and continuous spectral information of targets. However, uncertainty in the backscattering cross-section and the inseparability of the reflectance coefficient lead to systematic underestimation of multi-echo reflectance retrieved using traditional methods. This limitation significantly hinders quantitative applications. The existing multi-echo reflectance correction using neighborhood single-echo reflectance (MCNS) method provides an effective solution by establishing proportional models between similar targets, laying an important foundation for the extraction of multi-echo reflectance. However, its applicability in complex forest scenes is limited due to its dependence on specific vegetation single-echo samples. To address this, an iterative correction method based on ground reflectance baseline, namely Bare-Ground-Echo-Based Iterative Correction of Multi-Echo Reflectance for Hyperspectral LiDAR (BGE-ICMER), is proposed. Using ground single-echo reflectance as a stable baseline, a multi-target energy distribution model is constructed based on energy conservation, and backscattering cross-section proportions for each echo are iteratively solved to recover true reflectance. Validation using a high-fidelity dataset generated by the Large-Scale remote sensing data and image Simulation framework (LESS) confirmed the effectiveness of the proposed method. This dataset encompasses three typical tree species with vegetation layers ranging from two to four, incorporates micro-topographic ground surfaces and ten spectral channels from 500 to 1000 nm, thereby capturing the structural and spectral complexity of real forests. The results showed that coefficients of determination (R²) between the corrected and true reflectance exceeded 0.9560, with an RMSE below 0.0418 and MAE below 0.0360. The average relative error was reduced from 26.66% to 10.07%, representing a 62.22% improvement in accuracy. Even in the most challenging scenarios with four-layer vegetation occlusion within this dataset, no significant error accumulation occurred. These results demonstrate the robustness and effectiveness of the proposed method for multi-echo reflectance extraction. This study lays a foundation for more accurate forest biochemical attribute assessment and enables the vertical characterization of multiple targets using high-resolution spectral reflectance. Full article

(This article belongs to the Special Issue Vegetation Biophysical Variables and Remote Sensing Applications (Second Edition))

► Show Figures

Figure 1

21 pages, 6797 KB

Open AccessArticle

MEF-TransUNet: A Newly Developed Remote Sensing Detection Model for Micro Water Body Targets

by Yongkang Yu, Sijia Li, Xingming Zheng, Kai Li and Jianhua Ren

Remote Sens. 2026, 18(10), 1611; https://doi.org/10.3390/rs18101611 - 17 May 2026

Viewed by 240

Abstract

Micro water bodies are essential to regional ecosystems but are difficult to extract from high-resolution remote sensing images due to fragmentation and building shadows. To address edge breakage and high false-alarm rates in existing semantic segmentation models, this study proposes MEF-TransUNet, an improved [...] Read more.

Micro water bodies are essential to regional ecosystems but are difficult to extract from high-resolution remote sensing images due to fragmentation and building shadows. To address edge breakage and high false-alarm rates in existing semantic segmentation models, this study proposes MEF-TransUNet, an improved TransUNet-based model for fine micro water body extraction. The model integrates a multi-scale edge-guided attention module (MEGA), a high–low-frequency decomposition fusion module (HLFD), and a convolutional block attention module (CBAM). Specifically, MEGA extracts edge priors using a Laplacian pyramid to repair topological breaks in slender water bodies. HLFD uses frequency-domain decoupling to suppress high-frequency background noise and reduce confusion between water bodies and shadows. CBAM enhances channel and spatial feature attention. Experiments using PlanetScope images from the Songhuajiang River Basin in Daqing City of the Heilongjiang Province in China showed that MEF-TransUNet achieves 91.74% precision, a 90.07% F1-score, a recall of 90.22%, and a B-IoU of 43.88%. For the GID dataset, the model attains a precision of 91.85%, an F1-score of 91.48%, a recall of 92.01%, and a B-IoU of 55.42%. Its overall performance clearly outperforms DeepLabV3+, SegFormer, U-Net, AttenUNet, and UNet++, enabling accurate micro water body localization, high output purity, and reduced manual correction costs, thus supporting fine water resource management in complex surface environments. Full article

(This article belongs to the Special Issue Global Monitoring of Inland Water Using Remote Sensing and Artificial Intelligence (Second Edition))

► Show Figures

Figure 1

25 pages, 9029 KB

Open AccessArticle

GC2F-Net: A Global Category-Center Prior-Guided Spatial-Frequency Collaborative Network for Remote Sensing Semantic Segmentation

by Teng Li, Laide Guo, Junchang Xin, Hongfei Yu and Bowen Li

Remote Sens. 2026, 18(10), 1600; https://doi.org/10.3390/rs18101600 - 16 May 2026

Viewed by 222

Abstract

Semantic segmentation of high-resolution remote sensing images constitutes an important foundation for urban mapping and land-cover interpretation. However, objects in remote sensing scenes usually exhibit large-scale variations, significant intra-class differences, and complex background interference. Due to these factors, existing methods for complex high-resolution [...] Read more.

Semantic segmentation of high-resolution remote sensing images constitutes an important foundation for urban mapping and land-cover interpretation. However, objects in remote sensing scenes usually exhibit large-scale variations, significant intra-class differences, and complex background interference. Due to these factors, existing methods for complex high-resolution scenes still suffer from insufficient global semantic modeling, boundary blurring, and small-object omission. To address the above challenges, this paper proposes a Global Category-Center Prior-Guided Spatial-Frequency Collaborative Network (GC2F-Net). Specifically, ResNet-50 is adopted as the encoder, and a Global Category-Center Module is utilized to generate a global category-center prior based on deep features, which is then combined with a Fourier Global Enhancement Module to enhance deep features in the frequency domain. During the decoding stage, a Local Category-Aware Frequency Attention Module is employed to progressively refine feature representations under the guidance of the global category-center prior, thereby achieving collaborative improvement in global semantic consistency and local detail recovery. Experimental results demonstrate that GC2F-Net achieves robust and competitive segmentation performance on multiple public remote sensing semantic segmentation datasets. The proposed method provides an effective spatial-frequency collaborative modeling paradigm for the semantic segmentation of high-resolution remote sensing images. Full article

► Show Figures

Figure 1

27 pages, 12820 KB

Open AccessArticle

Positive-Guided Local Supervision for Robust Road Extraction from Remote Sensing Imagery

by Hao He, Shuyang Wang, Lei Huang, Xiaohu Fan, Yongfei Li and Dongfang Yang

Remote Sens. 2026, 18(10), 1589; https://doi.org/10.3390/rs18101589 - 15 May 2026

Viewed by 149

Abstract

Road extraction from high-resolution remote sensing imagery is fundamental to numerous practical applications, yet still faces notable challenges caused by label noise, particularly the underlabeling of rural roads within training datasets. End-to-end dense prediction networks deliver high efficiency and strong global context capture [...] Read more.

Road extraction from high-resolution remote sensing imagery is fundamental to numerous practical applications, yet still faces notable challenges caused by label noise, particularly the underlabeling of rural roads within training datasets. End-to-end dense prediction networks deliver high efficiency and strong global context capture capability, yet they are highly vulnerable to such label noise. In contrast, patch-based methods achieve better robustness but sacrifice global reasoning ability and computational efficiency. This paper proposes a novel training strategy named Positive-guided Local Supervision (PLS), which integrates the strengths of the two aforementioned paradigms. PLS preserves the full end-to-end forward pass to leverage global context, while restricting loss computation to local patches centered on reliably annotated road pixels (positive samples) via a standard dense segmentation loss. By isolating the model from misleading gradients generated in underlabeled regions, PLS effectively mitigates the negative impact of underlabeling without compromising computational efficiency and prediction quality. We evaluate the proposed PLS on two datasets: the public DeepGlobe benchmark and a newly constructed challenging dataset, namely China Four Provinces (CH4P). CH4P includes 13,498 high-resolution images of rural China, which suffers from severe underlabeling inherited from public web maps. Extensive quantitative evaluations on DeepGlobe and the newly built CH4P dataset validate that our PLS strategy surpasses conventional end-to-end baselines and competitive state-of-the-art methods under both noisy original labels and manually refined annotations. On the refined DeepGlobe-mini-test and CH4P-mini-test subsets, PLS obtains prominent absolute IoU improvements of 0.127 and 0.104 over baseline models, respectively, showing distinct superiority in handling severe real-world underlabeling. Qualitative visualizations and cross-dataset generalization tests further demonstrate that PLS can effectively retrieve road segments omitted in raw annotations, delivers strong robustness against practical label noise, and introduces no extra computational burden in the inference stage. Full article

(This article belongs to the Special Issue Road Extraction and Distress Assessment by Spaceborne, Airborne and Terrestrial Platforms (Second Edition))

► Show Figures

Figure 1

32 pages, 14314 KB

Open AccessReview

Benchmark Datasets for Satellite Image Time Series Classification: A Review

by Anming Zhang, Zheng Zhang, Keli Shi and Ping Tang

Remote Sens. 2026, 18(10), 1581; https://doi.org/10.3390/rs18101581 - 15 May 2026

Viewed by 335

Abstract

Recent advances in satellite missions, particularly the Landsat, Sentinel, and Gaofen series, have led to the rapid accumulation of high-quality remote sensing data with frequent revisits. As these data have become more widely available, Satellite Image Time Series (SITS) have become an important [...] Read more.

Recent advances in satellite missions, particularly the Landsat, Sentinel, and Gaofen series, have led to the rapid accumulation of high-quality remote sensing data with frequent revisits. As these data have become more widely available, Satellite Image Time Series (SITS) have become an important tool for monitoring Earth surface dynamics. SITS now supports a wide range of applications, including precision agriculture, Land Use/Cover Change (LULCC) monitoring, environmental management, and disaster response. This growth has also promoted the development of advanced SITS classification datasets. However, existing reviews have mainly focused on SITS classification algorithms or specific applications, while systematic comparisons of public SITS benchmark datasets remain limited. This lack of synthesis makes it difficult for researchers to navigate fragmented resources and select datasets that match specific scientific or operational tasks. To address this gap, this paper provides a comprehensive review and analysis of 29 publicly available medium-to-high-resolution SITS classification benchmark datasets released between 2017 and 2025. These datasets are intended for training, testing, and validating land-cover classification algorithms, rather than for direct use as operational map products. We conduct a detailed statistical and comparative analysis of these datasets, focusing on their key characteristics across spectral, temporal, and spatial dimensions, as well as their labeling systems. In addition, this review summarizes the SITS classification algorithms that have been developed and benchmarked using these datasets. Finally, we identify the main challenges in constructing and applying SITS classification datasets and discuss future research directions, particularly in data reconstruction, multimodal fusion, change analysis, and advanced model architectures. This survey provides the research community with a systematic overview of SITS classification benchmark datasets and aims to support continued progress in this rapidly developing field. Full article

► Show Figures

Figure 1

27 pages, 6893 KB

Open AccessArticle

LoRA-Based Deep Learning for High-Fidelity Satellite Image Super-Resolution in Big Data Remote Sensing

by Noha Rashad Mahmoud, Hussam Elbehiery, Basheer Abdel Fattah Youssef and Hanaa Bayomi Ali Mobarz

Computers 2026, 15(5), 313; https://doi.org/10.3390/computers15050313 - 14 May 2026

Viewed by 290

Abstract

High-resolution satellite imagery is pivotal for accurate analysis in remote sensing applications, including land-use monitoring, urban planning, and environmental assessment. However, obtaining such data is often costly and limited. Consequently, super-resolution techniques, such as deep learning models and fine-tuning strategies like LoRA, offer [...] Read more.

High-resolution satellite imagery is pivotal for accurate analysis in remote sensing applications, including land-use monitoring, urban planning, and environmental assessment. However, obtaining such data is often costly and limited. Consequently, super-resolution techniques, such as deep learning models and fine-tuning strategies like LoRA, offer a promising alternative to the critical research challenge, especially given the diversity and large scale of satellite datasets. While deep learning-based super-resolution models have been very promising recently, their effectiveness, efficiency, and scalability across heterogeneous satellite scenes are not well studied. This work studies the performance of representative deep learning Super-Resolution frameworks, including the Enhanced Super-Resolution Generative Adversarial Network. (ESRGAN), Swin Transformer for Image Restoration (SwinIR), and latent diffusion models (LDM), under unified experimental conditions using the WorldStrat dataset. The main goal is to establish whether adaptation strategies for parameter efficiency can boost reconstruction quality while reducing computational and training costs. Toward this goal, we investigate hybrid sequential pipelines, ensemble averaging, and Low-Rank Adaptation (LoRA)–based fine-tuning. The experiments indicate that these pipelines, which use multi-model methods, achieve only marginal performance gains while incurring substantial increases in computational complexity. LoRA-Based Fine-Tuning, by contrast, has demonstrated superiority in enhancing reconstruction accuracy and quality across all model families, despite using only a small percentage of trainable parameters. LoRA-based models demonstrate superiority over multi-model methods in both efficiency and performance. The presented results confirm that LoRA is an effective and accessible technique for high-fidelity satellite-based super-resolution image synthesis. The manuscript identifies LoRA as one of the enabling technologies advancing the state of the art in Deep Learning-based Super Resolution for large-scale satellite-based image synthesis. Full article

(This article belongs to the Special Issue Machine Learning: Techniques, Industry Applications, Code Sharing, and Future Trends)

► Show Figures

Figure 1

27 pages, 29964 KB

Open AccessArticle

TriFusion-CD: Tri-Source Fusion for Robust Remote Sensing Change Detection Under Pseudo-Change Interference

by Jinbo Wang, Qiancheng Yu, Ruiqing Zhang and Nan Xiao

Remote Sens. 2026, 18(10), 1572; https://doi.org/10.3390/rs18101572 - 14 May 2026

Viewed by 243

Abstract

Remote sensing change detection (RSCD) is often disturbed by nuisance appearance variations, which can introduce pseudo-changes and degrade the reliability of predicted change masks. Robust change localization therefore requires that such spurious responses be suppressed while the structural integrity of change regions in [...] Read more.

Remote sensing change detection (RSCD) is often disturbed by nuisance appearance variations, which can introduce pseudo-changes and degrade the reliability of predicted change masks. Robust change localization therefore requires that such spurious responses be suppressed while the structural integrity of change regions in complex, high-resolution scenes is maintained. We propose TriFusion-CD, a tri-branch framework that fuses complementary sources of information for reliable change localization. The first branch uses MobileSAM to provide global semantic guidance that promotes spatially coherent predictions. The second branch adopts the CLIP-ResNet50 image encoder with a change-aware enhancement module to extract detail-sensitive change features. The third branch performs frequency decomposition and interacts frequency features with CLIP text embeddings via cross-attention, producing a structural–semantic prior to suppress appearance-induced pseudo-changes. We further design a Semantic Attention Fusion Module (SAFM) to inject MobileSAM semantics into CLIP change features through cross-attention with learnable residual scaling. In addition, an Attention-Modulated Decoder (AMD) translates the fused guidance into multi-scale attention maps and performs progressive top-down refinement, extracting more spatially complete change regions. On the challenging SYSU-CD, JL1-CD, and CDD datasets, which exhibit diverse change patterns and frequent appearance-induced pseudo-changes, TriFusion-CD achieves 72.48% IoU/84.04% F1 on SYSU-CD, 66.04% IoU/79.54% F1 on JL1-CD, and 96.41% IoU/98.17% F1 on CDD, demonstrating strong performance. Full article

► Show Figures

Figure 1

37 pages, 4167 KB

Open AccessArticle

EGMamba-Net: Edge-Guided Global–Local Mamba Network with Region-Adaptive Routing for Salient Object Detection in Optical Remote Sensing Images

by Fubin Zhang, Zichi Zhang and Feihu Zhang

Remote Sens. 2026, 18(10), 1568; https://doi.org/10.3390/rs18101568 - 14 May 2026

Viewed by 314

Abstract

Salient object detection in optical remote sensing images remains challenging due to complex backgrounds, blurred boundaries, small objects, unstable foreground–background contrast, and dense object distributions. Existing convolution-based methods are effective at modeling local structures, but they are limited in capturing long-range dependencies, whereas [...] Read more.

Salient object detection in optical remote sensing images remains challenging due to complex backgrounds, blurred boundaries, small objects, unstable foreground–background contrast, and dense object distributions. Existing convolution-based methods are effective at modeling local structures, but they are limited in capturing long-range dependencies, whereas Transformer-based approaches usually incur substantial computational cost when handling high-resolution remote sensing imagery. To address these issues, this paper proposes EGMamba-Net, an edge-guided global–local collaborative network for salient object detection in optical remote sensing images. Specifically, a hybrid global–local backbone is first constructed to preserve shallow texture, edge, and geometric details while introducing Mamba-based global modeling in deeper stages for efficient long-range dependency representation. An Edge Prior Enhancement Module (EPEM) is then designed to explicitly extract boundary priors from shallow features and refine feature representations through edge-guided modulation. To alleviate the representation conflict between global semantics and local details, a Global–Local Interaction Module (GLIM) is further developed, where convolutional local modeling and Mamba-based global modeling interact through cross-gating for complementary feature learning. Moreover, a Region-Adaptive Routing Decoder (RARD) is introduced to dynamically assign different refinement paths according to regional saliency response, boundary intensity, and contextual complexity, thereby improving the recovery of small, low-contrast, and densely distributed objects. In addition, a Difficulty-Aware Joint Loss (DAJL) is designed to enhance optimization on boundary regions and hard samples, improving robustness under challenging conditions. Extensiveexperiments on ORSSD, EORSSD, and ORSI-4199 datasets demonstrate the superiority of the proposed method. In particular, on the more challenging EORSSD dataset, EGMamba-Net achieves 0.9389 S-measure, 0.8972 max F-measure, and 0.0066 MAE. Compared with the representative remote-sensing method DAF-Net, it improves S-measure and max F-measure by 0.0223 and 0.0358, respectively, indicating stronger capability in background suppression, structural preservation, and boundary recovery. Full article

(This article belongs to the Section Remote Sensing Image Processing)

► Show Figures

Figure 1

15 pages, 3297 KB

Open AccessArticle

A Weakly Supervised Multi-Scale Cross-Modal Information Fusion Method for Wildfire Detection

by Dawei Wen, Zhoujiang Peng and Yuan Tian

Computers 2026, 15(5), 311; https://doi.org/10.3390/computers15050311 - 14 May 2026

Viewed by 190

Abstract

In recent years, wildfires have occurred with increasing frequency. Pixel-level annotation of high-resolution remote sensing wildfire imagery is costly and labor-intensive. Therefore, there is an urgent need for a weakly supervised wildfire detection method that balances detection accuracy and annotation efficiency. To address [...] Read more.

In recent years, wildfires have occurred with increasing frequency. Pixel-level annotation of high-resolution remote sensing wildfire imagery is costly and labor-intensive. Therefore, there is an urgent need for a weakly supervised wildfire detection method that balances detection accuracy and annotation efficiency. To address the key limitations of existing weakly supervised approaches based on class activation maps (CAMs), including imprecise delineation of fire boundaries, insufficient utilization of cross-modal information, and limited capability in modeling temporal characteristics, this paper proposes a dual-branch multi-scale feature fusion framework for weakly supervised wildfire detection. The proposed framework consists of a multispectral branch and a shortwave infrared (SWIR) temporal branch, which are designed to capture the spatial structural information of fire regions and the temporal variation of thermal anomalies, respectively. Attention-guided feature fusion modules are introduced at each network stage to enable complementary integration of cross-modal information. In addition, a multi-scale CAM-weighted fusion strategy is designed to jointly enhance region localization accuracy and semantic discrimination capability. Experimental evaluations are conducted on a high-resolution wildfire dataset covering 29 regions and consisting of 2206 images. The results demonstrate that the proposed method achieves an IoU of 58.7% and an F1-score of 73.5%, outperforming the state-of-the-art methods by 4.6% and 3.2%, respectively. Ablation and comparative experiments further verify that the dual-branch architecture and feature fusion strategy significantly improve fire localization accuracy and effectively reduce the missed detection rate. Full article

► Show Figures

Figure 1

19 pages, 3469 KB

Open AccessArticle

An MEM-DMD-Enabled Ghost Imaging System Enhanced by a Hybrid CNN-GAN for High-Resolution Imaging Under Scattering Media

by Zeenat Akhter, Rehmat Iqbal, Giedrius Janusas, Sigita Urbaite and Arvydas Palevicius

Micromachines 2026, 17(5), 598; https://doi.org/10.3390/mi17050598 - 14 May 2026

Viewed by 223

Abstract

This paper presents a Micro-Electro-Mechanical Systems digital micromirror device (MEMS-DMD)-enabled ghost imaging (GI) framework for high-resolution imaging under scattering conditions. Unlike conventional ghost imaging systems that rely on fixed illumination patterns, the proposed approach exploits the high-speed programmability of a DMD to implement [...] Read more.

This paper presents a Micro-Electro-Mechanical Systems digital micromirror device (MEMS-DMD)-enabled ghost imaging (GI) framework for high-resolution imaging under scattering conditions. Unlike conventional ghost imaging systems that rely on fixed illumination patterns, the proposed approach exploits the high-speed programmability of a DMD to implement adaptive illumination strategies, enabling dynamic selection of informative patterns during data acquisition. This hardware-enabled pattern selection strategy improves sampling efficiency and reconstruction stability under the modeled fog conditions considered here. A hybrid convolutional neural network–generative adversarial network (CNN–GAN) model is employed as an inversion tool to reconstruct high-quality images from compressed bucket measurements. The proposed system achieves substantial improvements in reconstruction quality, with 23–40% gains in PSNR and 18–26% in SSIM compared to traditional ghost imaging methods, while reducing the number of required measurements by up to 60%. Additional performance gains are achieved through adaptive pattern selection enabled by the MEMS-DMD. The results demonstrate that integrating programmable MEMS hardware with learning-based reconstruction provides an effective solution for imaging under scattering conditions, with potential applications in remote sensing, environmental monitoring, and surveillance. Full article

(This article belongs to the Special Issue MEMS Ultrasonic Transducers, 2nd Edition)

► Show Figures

Figure 1

23 pages, 8187 KB

Open AccessArticle

DCFENet: A Dual-Branch Collaborative Feature Enhancement Network for Farmland Boundary Detection

by Mengyao Lan, Bangjun Huang and Peng Wu

Agronomy 2026, 16(10), 964; https://doi.org/10.3390/agronomy16100964 - 12 May 2026

Viewed by 231

Abstract

Farmland resources are fundamental to human survival and play a vital role in ensuring global food security. However, farmland boundary detection remains a significant technical challenge due to the low proportion of boundary pixels, multi-scale variations, and weak boundary continuity. To address these [...] Read more.

Farmland resources are fundamental to human survival and play a vital role in ensuring global food security. However, farmland boundary detection remains a significant technical challenge due to the low proportion of boundary pixels, multi-scale variations, and weak boundary continuity. To address these issues, this study proposes DCFENet, a dual-branch collaborative feature enhancement network. Specifically, a multi-scale feature fusion attention module TA-ASPP (Task-Aware Atrous Spatial Pyramid Pooling) is designed, which effectively enhances the network’s perception of farmland boundary features by integrating multi-scale dilated convolutions with skeleton-aware attention. In addition, a dual-branch decoding structure is proposed to enhance boundary localization and global topology modeling through boundary-aware gating and cross-branch feature fusion, thereby improving the boundary continuity. Furthermore, a collaborative constraint mechanism is proposed for dual-branch decoding, which supervises the two decoders using boundary loss and skeleton loss, thereby enhancing structural consistency and topology preservation. Experimental results demonstrate that DCFENet achieves precision, recall, and boundary IoU of 74.5%, 68.1%, and 77.4%, respectively, representing an improvement of 26.8%, 36.3%, and 13.2% compared with ResNet18_UNet. It also outperforms mainstream methods such as UNet, EdgeNAT, and EDTER. In terms of computational efficiency, DCFENet contains 26.43 M parameters and 37.43 G FLOPs, with a memory usage of 1.03 GB and an inference speed of 97.97 FPS, achieving a good balance between accuracy and efficiency. The results demonstrate the efficiency and accuracy of DCFENet in extracting farmland boundaries from high-resolution remote sensing images, providing technical support for farmland management and the advancement of precision and digital agriculture. Full article

(This article belongs to the Special Issue Remote Sensing and GIS in Sustainable and Precision Agriculture)

► Show Figures

Figure 1

25 pages, 12587 KB

Open AccessArticle

A Spectral Variability and Class-Constrained Diffusion Model for Unsupervised Hyperspectral Unmixing

by Mingwei Wang, Kaiyuan Yang, Jingyan Lu, Wei Liu and Tian Zeng

Remote Sens. 2026, 18(10), 1483; https://doi.org/10.3390/rs18101483 - 9 May 2026

Viewed by 193

Abstract

Hyperspectral remote sensing is increasingly utilized due to its high spectral resolution and broad observational capabilities, and hyperspectral unmixing aims to decompose mixed pixels into their constituent endmembers with corresponding classes. The core research directions in this area include how to construct a [...] Read more.

Hyperspectral remote sensing is increasingly utilized due to its high spectral resolution and broad observational capabilities, and hyperspectral unmixing aims to decompose mixed pixels into their constituent endmembers with corresponding classes. The core research directions in this area include how to construct a proprietary spectral library and how to optimize the corresponding abundance maps. However, due to the influence of complex terrain and variable illumination conditions, hyperspectral images (HSI) exhibit significant spectral variability (SV), which undermines the performance of traditional unmixing methods. In the paper, we propose an SV and class-constrained diffusion model (SVCDM) for unsupervised hyperspectral unmixing that integrates endmember extraction and abundance optimization. Specifically, a Dirichlet-based variational autoencoder is employed to construct a spectral library from the original HSI with a class constraint and prior distribution, which subsequently guide a conditional diffusion model to learn the distribution. During the reverse process, the endmembers are iteratively updated at each time step, enhancing diversity while ensuring class consistency. Subsequently, the endmember matrix is synthesized with the original HSI to optimize the abundance maps under the linear mixing assumption. The proposed SVCDM effectively mitigates the impact of SV induced by imaging characteristics. Experimental results demonstrate that the SVCDM achieves a root mean square error (RMSE) of 0.0371 for abundance maps on a synthetic dataset and a spectral angle mapper (SAM) for endmembers of 0.0309 on the Samson dataset, outperforming existing state-of-the-art hyperspectral unmixing methods on both synthetic and real datasets. Full article

(This article belongs to the Special Issue Artificial Intelligence in Hyperspectral Remote Sensing Data Analysis)

► Show Figures

Figure 1

Search Results (3,441)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (3,441)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI